Im trying to make an operation of creating user network based on call detail records in my CDR table.
To make things simple lets say Ive got CDR table :
CDRid
UserAId
UserBId
there is more than 100 mln records so table is quite big.
I reated user2user table:
UserAId
UserBId
NumberOfConnections
then using curos I iterate through each row in the table, then I make select statement:
if in user2user table there is record which has UserAId = UserAId from CDR record and UserBId = UserBId from CDR record then increase NumberOfConnections.
otherwise insert such a row which NumebrOfConnections = 1.
Quite simple task and it works as I said using cursor but it is very bad in performance (estimated time at my computer ~60 h).
I heard about Sql Server Integration Services that it has got better performance when we are talking about such big tables.
Problem is that I have no idea how to customize SSIS package for creating such task.
If anyone has got any idea how to help me, any good resources etc I would be really thankful.
Maybe there is any other good solution to make it work faster. I used indexes and variable tables and so on and performance is still pure.
thanks for help,
P.S.
This is script which I wrote and execution of this takes sth like 40 – 50 h.
DECLARE CDR_cursor CURSOR FOR
SELECT CDRId, SubscriberAId, BNumber
FROM dbo.CDR
OPEN CDR_cursor;
FETCH NEXT FROM CDR_cursor
INTO @CdrId, @SubscriberAId, @BNumber;
WHILE @@FETCH_STATUS = 0
BEGIN
–here I check if there is a user with this number (Cause in CDR i only have SubscriberAId –and BNumber so that I need to check which one user is this (I only have users from
–network so that each time I cant find this user I add one which is outide network)
SELECT @UserBId = (Select UserID from dbo.Number where Number = @BNumber)
IF (@UserBId is NULL)
BEGIN
INSERT INTO dbo.[User] (ID, Marked, InNetwork)
VALUES (@OutUserId, 0, 0);
INSERT into dbo.[Number](Number, UserId) values (@BNumber, @OutUserId);
INSERT INTO dbo.User2User
VALUES (@SubscriberAId, @OutUserId, 1)
SET @OutUserId = @OutUserId - 1;
END
else
BEGIN
UPDATE dbo.User2User
SET NumberOfConnections = NumberOfConnections + 1
WHERE User1ID = @SubscriberAId AND User2ID = @UserBId
-- Insert the row if the UPDATE statement failed.
if(@@ROWCOUNT = 0)
BEGIN
INSERT INTO dbo.User2User
VALUES (@SubscriberAId, @UserBId, 1)
END
END
SET @Counter = @Counter + 1;
if((@Counter % 100000) = 0)
BEGIN
PRINT Cast (@Counter as NVarchar(12));
END
FETCH NEXT FROM CDR_cursor
INTO @CdrId, @SubscriberAId, @BNumber;
END
CLOSE CDR_cursor;
DEALLOCATE CDR_cursor;
The thing about SSIS is that it probably won’t be much faster than a cursor. It’s pretty much doing the same thing: reading the table record by record, processing the record and then moving to the next one. There are some advanced techniques in SSIS like sharding the data input that will help if you have heavy duty hardware, but without that it’s going to be pretty slow.
A better solution would be to write an INSERT and an UPDATE statement that will give you what you want. With that you’ll be better able to take advantage of indices on the database. They would look something like:
(NB: I don’t have time to test this code, you’ll have to debug it yourself)