I have used arachnode.net crawler to crawl a website. The resulting crawl data has

Question

0

Asked: June 18, 20262026-06-18T15:43:00+00:00 2026-06-18T15:43:00+00:00

I have used arachnode.net crawler to crawl a website. The resulting crawl data has

0

I have used arachnode.net crawler to crawl a website. The resulting crawl data has resulted in a database at the size of +100 gb!!!

I have looked around at the arachnode.net database and found the table “webpages” to be the culprit. When I crawl a website I do not download, images, media or anything a like, I only download the html code. However in this case I can see that the html webpages contains huge about of hidden viewdata and javascript.

So I need to do the crawling once again and this time strip out the hidden viewdata and javascript code before saving to the webpages table.

Anyone have some idea on how to achieve it.

Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T15:43:01+00:00

Editorial Team

2026-06-18T15:43:01+00:00Added an answer on June 18, 2026 at 3:43 pm

Yes, you can write a plugin which modifies the CrawlRequest.Data and CrawlRequest.DecodedHtml before the data is inserted into the database.

Create a PostRequest CrawlAction as shown here: http://arachnode.net/Content/CreatingPlugins.aspx

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have used arachnode.net crawler to crawl a website. The resulting crawl data has

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply