I have a project which is about crawling a e commerce site which has nearly 15 thousands of products and i couldn’t count but nearly 25 thousands of page.I wrote a program for that in c# using multithreading, totaly i used 20 thread.But there is no high performanse when i used just 5 thread.Am i thinking wrong ? couldn’t it be increased while crawling ?
I test that i crawl 500 page and take their html to db in ten minustes.Is it normal ? or can i increase it ?
Moreover SQL server is ok for 20 thread concurrent insert, update operations, but when i increase that thread number to 100 thread, will i face a problem ?
I calculate totally site process will take 5 hours with 10 thread .. I need help to decrease this period .. or it is normal .. I don’t want to use more computer …
My pc details are 2 GB RAM, 1.87 GHz intel t2130…
I checked my CPU is %90 , I crawl the site from internet, my ram is %75 and each second while program is running, i take 70 kb per/second. How can increase crawling speed ?
Are you crawling the site over the internet? If so, how fast is your internet connection. Check your task manager. If your CPU is maxed out, you need to get a faster machine or make your algorithms for parsing the pages more efficient. If your CPU isn’t doing anything you probably need a faster connection. Also, If you are crawling the site over the internet, I believe there’s a maximum number of requests to 1 domain restriction, which (I think) is by default set to 2, but this can be changed. Also, ensure that if this isn’t your site that you obey the robots.txt file to ensure they don’t block you. Perhaps the site itself is throttling you because of the amount of traffic they detect.