I have an XML document collection, an inverted file indexer, and a command-line tool for searching the index (or indices) produced by the indexer. Note that the latter returns a list of document IDs and various statistics about each document (rankings according to various functions, term hits, etc) rather than the actual document text. Both programs were written in straight C (by me).
- The collection is not huge (~1GB).
- The index is about 10-20% of the collection size.
- This is not intended (and never will be) for public use (using it will require logging in).
- It needs to run with client-side scripting totally disabled.
I’d like to whip up a simple web frontend that would allow me to query the index with a search term or terms and present the results appropriately, but it’s been a while since I touched any web stuff.
I want to see more or less the same info a query returns at the moment, but I’m not sure whether to write something (e.g. PHP, Ruby – alternative suggestions are welcome) that calls my command-line query program and processes the output, or whether re-implementing the query program would be more appropriate.
Are there any distinct advantages one has over the other? Security risks?
And can anyone recommend me a lightweight framework or library appropriate for any of this? (Like I said, haven’t touched web stuff in a while.)
Should I call the CLI query program? Why or why not?
(=/ I hope I’m not being too vague… do tell me if I should be asking this in a different manner.)
For something simple like this, I would use PHP and an Apache server. Why?
It doesn’t require a web framework to interface between Apache; less complexity = less time for you to spend configuring. You could just install Apache and the php module, then drop in this file in your web-root, and point a html form to
http://127.0.0.1/indexer.phpwith the textareas"name"and"author":(Note this is just to show the simplicity, it needs validation of the post values received).
Then this would run your application with the 2 values as arguments, then print whatever was sent to
stdoutby your application. No more hassle or things to setup. It would take you a couple of minutes to get up and running.So the main reason would be simple and fast to setup, for something internal and simple as this.