I’m trying to create a controller for my sitemap, but only allow search engines to view it.
If you look at https://stackoverflow.com/robots.txt you’ll see that their sitemap is https://stackoverflow.com/sitemap.xml. If you try to visit the sitemap, you’ll be redirected to the 404 page.
This meta question confirms this behavior (answered by Jeff himself).
Now I don’t want this question closed as “belongs on Meta”, as I’m just using StackOverflow as an example. What I really need answered is…
You can probably create a filter attribute that rejects the request using the User Agent header. The usefulness of this is questionable(and is not a security feature) as the header can be easily faked, but it will stop people doing it in a stock browser.
This page contains a list of user agent strings that googlebot uses.
Sample code to redirect non-googlebots to a 404 action on an error controller:
EDIT To respond to comments. If server load is an issue for your sitemap, restricting access to the bots might not be sufficient. Googlebot by itself has the ability to grind your server to a halt if it decides to scrape aggressively. You should probably cache the response as well. You can use the same
FilterAttributeandApplication.Cachefor that.Here is a very rough example, might need tweaking with propert HTTP headers: