What I’m doing I am writing a web crawler in OCaml. Using the function

Question

0

Asked: May 23, 20262026-05-23T15:50:17+00:00 2026-05-23T15:50:17+00:00

What I’m doing I am writing a web crawler in OCaml. Using the function

0

What I’m doing

I am writing a web crawler in OCaml. Using the function string_of_uri (below) defined by nlucaroni in a previous answer to a question I posted, I can fetch the HTML text of a URL from the web.

let string_of_uri uri = 
try let connection = Curl.init () and write_buff = Buffer.create 1763 in
    Curl.set_writefunction connection
            (fun x -> Buffer.add_string write_buff x; String.length x);
    Curl.set_url connection uri;
    Curl.perform connection;
    Curl.global_cleanup ();
    Buffer.contents write_buff;
with _ -> raise (IO_ERROR uri)

I’ve already written some code to extract a list of all the hyperlinks in the fetched HTML (i.e. all the [LINK] parts in anything like <A HREF="[LINK]">text</A>). This all works fine.

The Problem

The problem is that some pages redirect you and I don’t know how to follow the redirection. For example, my program will output 0 tags in the page http://en.wikipedia.org because Wikipedia will actually redirect you to http://en.wikipedia.org/wiki/Main_Page. If I give this last page to my program, it all works fine. But if I give the initial one, it just returns 0 <A> tags.

Unfortunately there’s no documentation at all for ocurl, except for the names of the functions in the interface. Does any one have an idea on how I can improve the function string_of_uri above so that it follows any possible redirections and outputs the HTML of the last page it falls in?

I noticed that applying the function Curl.get_redirectcount to a connection on http://en.wikipedia.org returns 0, which is not what I was expecting, since the page is redirected to some other page…

Thanks for any help!

All the best,
Surikator.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T15:50:17+00:00

Editorial Team

2026-05-23T15:50:17+00:00Added an answer on May 23, 2026 at 3:50 pm

This question has already been answered in the comments of this answer. The solution is to add Curl.set_followlocation connection true just above Curl.perform connection.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

What I’m doing I am writing a web crawler in OCaml. Using the function

What I’m doing

The Problem

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply