I have a web page with links pointing to downloadable files. For example:
http://www.mysite.com/download.php?FILE=downloads/programming/various/ebook.pdf
But it can also have navigation links as follows:
http://www.mysite.com/index.php
http://www.mysite.com/index.php?category=programming
http://www.mysite.com/index.php?section=programming&category=various
How can I determine if a URL is pointing to a file as in the first link ? Or inversely, filter out URLs which don’t fit ?
Going with your edited question: if you want to filter out files,
Here is an informal list of common mime-types
You can inspect response headers to determine if the response will conform, e.g. to an
application/pdfBut you cannot, just from the URL / URI itself, make this determination.In fact, I could construct a web application that would respond to the URL
http://myapp.com/test.pdfwith headerContent-Type: image/jpegand data of a JPG.Also, I could really break things by sending a header
Content-Type: image/jpegand data of for a PDF.Presuming that it wasn’t intentionally-broken (as I mentioned above) then you can rely on the response.
Be aware if the content itself deviates from the
Content-Typeheader then you could have an exploit happen. This is how the iPhone was jailbroken: through acting on malformed PDF data.