As a preface, I am brand new to using SQL Server 2005; I know how to use the SELECT, UPDATE, DELETE, and INSERT commands and that’s about it. I am also using Express Edition on my local PC (E8400 processor, 8GB of DDR2-800, 2 x 640GB SATA-II HDD in RAID 1)
I have a table that I set up with 8 columns, all are NVARCHAR(Max) and I allow Null. I know in concept what a primary key is, but I don’t have one (nor do I know how to set one up).
My VB.NET program that I’m working on is downloading a historical stock price chart from Yahoo, for every single ticker symbol in existence. The first 50,000 rows or so I added was super fast. Then I went to bed, and when I woke up it was still running – but the rate of row additions has slowed waaaaaay down; I noticed this around row 300,000. I always expected the rate of row addition to be constant over time, but obviously this is not so!
From browsing other Stack Overflow questions, I suspect my slow down is related to my piss-poor table setup. If this is the case, where should I begin first to fix this, and are there any good resources I could read up on to get started? I’m hoping this is something simple I can fix 🙂
In case it matters, this is how I’m adding rows:
cmdtext = "IF NOT EXISTS(SELECT DateStamp FROM DailyPrice WHERE (DateStamp = '" + datestamp + "' AND Ticker = '" + ticker + "')) INSERT INTO DailyPrice (Ticker,OpenPrice,ClosePrice,HighPrice,LowPrice,AdjustedClose,Volume,DateStamp) VALUES('" + ticker + "','" + openprice + "','" + closeprice + "','" + highprice + "','" + lowprice + "','" + adjustedclose + "','" + volume + "','" + datestamp + "')"
cmd = New SqlCommand(cmdtext, conn)
howmanygotinserted = cmd.ExecuteNonQuery
I iterate through that for every stinking row of the CSV file, which is around 30,000 rows per CSV file (and I have over 5000 of them).
There’s your first problem. Databases work best if you tell them what type of data you have, and choose the smallest datatype that works for your data. NVARCHAR(Max) is about the most inefficient choice you could have made.
And that’s your second problem. On every insert you are checking to see if you already have inserted a row that has the same values as another row for certain columns. Because you haven’t told the database to index those columns, it has to check the entire table each time, so your query gets slower and slower as the table grows. To add a primary key to an existing table you can use this:
See here for more info.