The program i am working on looks at various ASCII text files and does some processing. In order to know how to handle things, it needs to know whether file
- IS_EMPTY // done
- IS_JSON // done via parsing, using gson
- IS_XML // done via parsing, using dom4j
- IS_PROPERTIES
- IS_SCRIPT
I wonder if there is an effective way to determine whether file is of type properties without reading each line to see if it contains Key=Value pair?
Additionally, is there an effective way to determine whether file is a shell script?
Are there any parsers available to check this?
If a requirement of your program is that the input files be well-formatted and not mixed-type, then I would recommend replacing your JSON and XML impls with the following:
JSON – simply look for an opening ‘{‘ as one of the first chars in the file; this is an invalid format for any other files (except maybe script depending on your format). If you find ‘{‘ as the first char, its a JSON. This saves on processing the entire file with GSON.
XML – look for file header; well-formed XML files cannot even have space before this header; it must appear immediately. Again, no reason to try and ingest the entire thing just to catch an exception.
PROPERTIES – in the same vein, I would check the first line and make sure it has =\n format. If they do, you are good to go.
SCRIPT – I am not sure the format of your scripting language, but you get the idea.
All told, doing cheap/quick checks if your requirements are well-defined is the way to go here. If you require a JSON file to be all JSON and the first char you encounter reading the file is ‘{‘ then I’d say that is a JSON file and not EMPTY, XML or PROPERTIES (again, excluding SCRIPT because I don’t know the format).
Then you can rewind the input stream and give it to your parsing library to read (this is where PushbackInputStreams can come handy)