So I want to create a web crawler in C. There are hardly any

Question

0

Asked: June 17, 20262026-06-17T15:07:30+00:00 2026-06-17T15:07:30+00:00

So I want to create a web crawler in C. There are hardly any

0

So I want to create a web crawler in C. There are hardly any libraries to support this.
I can use libtidy to convert HTML to XHTML and get the HTML files using libcurl (which has decent documentation).

My problem is parsing the HTML files and getting all the links present in it. I know libxml2
is there but its extremely hard to understand because there is no good documentation for its API.

Should I even do this in C or go with another language like Java ?
Or are there any good alternatives to libxml2 ?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T15:07:31+00:00

Parsing HTML requires basically just string manipulation.

But it’s quite hard to do without an HTML or XML (if it’s XHTML) parser.

As for the second part of the question I woudn’t choose C for such task because even basic string operations are much complex than many other languages that support them natively.

I would go for a scripting lanuguage such Python, JavaScript, PHP…

Instead of using libcurl you’ll invoke curl as a command line tool.

Btw: libcurl documentation is very good (in my opinion).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

So I want to create a web crawler in C. There are hardly any

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply