I’m writing a bot to automatically download pages from my WordPress blog. The bot

Question

0

Asked: June 9, 20262026-06-09T15:46:47+00:00 2026-06-09T15:46:47+00:00

I’m writing a bot to automatically download pages from my WordPress blog. The bot

0

I’m writing a bot to automatically download pages from my WordPress blog. The bot gets most of the pages without a problem. For example, it can easily get the first page of the article listing of a given tag: http://example.com/myblog/index.php/archives/tag/mytag. However, for some reason it can’t get the subsequent pages, like http://example.com/myblog/index.php/archives/tag/mytag/page/2.

I’ve tried to figure out what was going on, and here’s what I found: while the server answers normally to most requests, upon such requests it answers with a 301 permanent redirect. Peculiarly, the Location header is set to the exact same URL as the request! Basically, the server tells me to redirect my request of the page http://example.com/myblog/index.php/archives/tag/mytag/page/2 to… the very same page 😛

When trying to access the page from the browser I get the page without a problem. I thought maybe the browser sends some headers (including cookies) that my bot doesn’t send, so I copied the headers (including the cookies) from my browser’s web console, but the behaviour didn’t change.

I would appreciate any suggestions regarding what might be causing this strange behaviour, what I can do in order to understand what’s going on better, and of course what I can do in order to fetch those pages automatically, just like I fetch their brethren.

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T15:46:48+00:00

It seems this post hasn’t generated much public interest. However, in case somebody ever runs into the same problem and finds this post, here’s the solution I used. Important note: I still don’t understand the behaviour I witnessed, and would appreciate it if somebody could explain it.

So the solution I’ve found is basically to use the URL http://example.com/myblog/archives/tag/mytag?paged=2 instead of http://example.com/myblog/index.php/archives/tag/mytag/page/2. Funnily enough, this URL gets redirected to the original one when browsed to from a browser! But when the bot requested it it got the page without redirection or anything. (So I managed to do what I wanted to do, but I’ve got no idea what happened there, why there was a problem in the first place, and why this solution worked: for one URL the bot gets infinite redirection and the browser just gets the page, while for the other the browser gets redirected [finitely] and the bot gets the page. I am yet to figure this one out…)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m writing a bot to automatically download pages from my WordPress blog. The bot

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply