I am bouncing between posting this here and on Superuser. Please excuse me if you feel this does not belong here.
I am observing the behavior described here – Googlebot is requesting random urls on my site, like aecgeqfx.html or sutwjemebk.html. I am sure that I am not linking these urls from anywhere on my site.
I suspect this may be google probing how we handle non existent content – to cite from an answer to the linked question:
[google is requesting random urls to] see if your site correctly
handles non-existent files (by returning a 404 response header)
We have a custom page for nonexistent content – a styled page saying “Content not found, if you believe you got here by error, please contact us”, with a few internal links, served (naturally) with a 200 OK. The URL is served directly (no redirection to a single url).
I am afraid this may discriminate the site at google – they may not interpret the user friendly page as a 404 - not found and may think we are trying to fake something and provide duplicate content.
How should I proceed to ensure that google will not think the site is bogus while providing user friendly message to users in case they click on dead links by accident?
The best practice would be to return the user friendly 404 page with a 404 response code, not a 200. Your web server should handle this for you relatively easily.