It’s been a few years since I’ve had to parse any files which were harder than CSV or XML so I am out of practice. I’ve been given the task of parsing a file format called NeXus in a Delphi application.
The problem is I just don’t know where to start, do I use a tokenizer, regex, etc? Maybe even a tutorial might be what I need at this point.
Have a look at GOLD Parser. It’s a meta-parsing system that allows you to define a formal grammar for a language/file format. It creates a parsing rules file which you feed into a tokenizer, together with your input file, and it creates a syntax tree in memory.
There’s a Delphi implementation of the tokenizer available on the website. It makes parsing a lot easier since the lexing and tokenizing is already taken care of for you, and all you have to worry about is defining the tokens in a formal grammar and then interpreting them once they’ve been parsed.