The repository I am asking is for Linux, but my problem is related to client — i.e. with retrieving those data, and client can be Linux, Windows, Mac OS X, etc. So I opted against asking this question on Unix&Linux site, if admins feel it should be U&L question please move it to the other site.
Consider such repository as http://download.opensuse.org/repositories/LCD/openSUSE_11.4/x86_64/ — you can fetch the html for it, parse it, and get the list of files. However I hardly believe it is correct way — since the html is created by website engine (MirrorBrain in this case), there should be some web service API to get this list directly.
I googled, but didn’t find anything relevant.
So — how to get the list of the file directly, no parsing, just call, and getting the collection of file names.
MirrorBrain doesn’t have an API call to retrieve a list of files. (It only has API calls to retrieve a list of mirrors for a single file, by appending
.mirrorlistor.meta4to a file’s URL.) It would be a worthwhile idea to add such an api call (patches welcome!).So there’s only the standard HTTP server directory index to read a file list from. The format varies from server to server, and even Apache has different variants. With Apache, a little trick that can help is to append
?F=0to the directory URL if you want to get only the filenames (it will simplify the index), or to append?F=1to switch to the fancier variant which includes more details.Hope this helps.