I have taken on an ASP.NET web site where the client is using the web server as a code repository, i.e. removing a page from the site involves not linking to it any more. There are a stupendous number of unsused files, and I would like to archive these off and arrive at a lean git repository of only files used by the active site.
How can I get usage or coverage data that will tell me, over an agreed upon period, i.e. a month, which pages are being hit? I know there are many ways of doing this in ASP.NET, and even in plain IIS, but I’d like some suggestions on a convenient and simple way of doing this.
I would suggest the IIS logs, but that wouldn’t report linked pages that haven’t been accessed by users.
You could try running a spider on the site. Here’s a free tool. http://www.trellian.com/sitespider/download.htm
You should be careful what which files you delete from the web server if there are cached links to the pages out there. A good strategy would be to use Google. Run the following search query to see what pages are returned “site:example.com” where example.com is the domain for your site.