Situation: Site with content protected by username/password (not all controlled since they can be

Question

0

Editorial Team

Asked: May 11, 20262026-05-11T01:21:15+00:00 2026-05-11T01:21:15+00:00

Situation: Site with content protected by username/password (not all controlled since they can be

0

Situation:

Site with content protected by username/password (not all controlled since they can be trial/test users)
a normal search engine can’t get at it because of username/password restrictions
a malicious user can still login and pass the session cookie to a ‘wget -r’ or something else.

The question would be what is the best solution to monitor such activity and respond to it (considering the site policy is no-crawling/scraping allowed)

I can think of some options:

Set up some traffic monitoring solution to limit the number of requests for a given user/IP.
Related to the first point: Automatically block some user-agents
(Evil :)) Set up a hidden link that when accessed logs out the user and disables his account. (Presumably this would not be accessed by a normal user since he wouldn’t see it to click it, but a bot will crawl all links.)

For point 1. do you know of a good already-implemented solution? Any experiences with it? One problem would be that some false positives might show up for very active but human users.

For point 3: do you think this is really evil? Or do you see any possible problems with it?

Also accepting other suggestions.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T01:21:16+00:00

Point 1 has the problem you have mentioned yourself. Also it doesn’t help against a slower crawl of the site, or if it does then it may be even worse for legitimate heavy users.

You could turn point 2 around and only allow the user-agents you trust. Of course this won’t help against a tool that fakes a standard user-agent.

A variation on point 3 would just be to send a notification to the site owners, then they can decide what to do with that user.

Similarly for my variation on point 2, you could make this a softer action, and just notify that somebody is accessing the site with a weird user agent.

edit: Related, I once had a weird issue when I was accessing a URL of my own that was not public (I was just staging a site that I hadn’t announced or linked anywhere). Although nobody should have even known this URL but me, all of a sudden I noticed hits in the logs. When I tracked this down, I saw it was from some content filtering site. Turned out that my mobile ISP used a third party to block content, and it intercepted my own requests – since it didn’t know the site, it then fetched the page I was trying to access and (I assume) did some keyword analysis in order to decide whether or not to block. This kind of thing might be a tail end case you need to watch out for.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Situation: Site with content protected by username/password (not all controlled since they can be

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply