I currently have a multi-threaded program that crawls websites and writes their text to a file. CPU wise I could have tons of threads running at once but quickly the I/O becomes a bottleneck. I was thinking I could have each thread write to a Array Blocking Queue, but I know I am going to generate more than my available 32GB of ram. Is there a way to have the Array dumped to a text file after it reaches a certain size so that I can free up that space? Or is there another way around this I/O issue I am missing?
I currently have a multi-threaded program that crawls websites and writes their text to
Share
Let’s take an image there is SATA 2 controller which allows to write with 300 MB per second speed. Now it is a question what is the Internet connection bandwidth of our imaginary computer. I know that the last maximum bandwidth in production which is supported by Ethernet adapters is 1 GB per second. But I think such bandwidth of Internet connection is very expensive (I even doubt that some commercial hosts support it). I think that 300 MB per second Internet connection is enough. Let our computer has such one.
Result is
Resume: If you want to download Internet, you must increase connections and disks as well as disk controllers. Otherwise 300 MB/s looks pretty nice. And threads don’t help you. CPU and memory doesn’t relate to our problem either.