For academic purposes, our essential team (me and a friend) is programming a tile-based game in java. In order to play with those tiles, we have been given a configuration file containing a string representation of all the tiles in the deck, once per line. Here are some examples:
N=N S=C O=C E=N NS=0 NE=0 NO=0 OE=0 SE=0 SO=1
N=S S=S O=S E=S NS=0 NE=0 NO=0 OE=0 SE=0 SO=0
In the above representations, N, S, O and E map to the cardinal points north, south, east and west, whilst the right member of the first four assignments maps to road (S), city (C) and field (N). The following six groups denotes whether a link between two point exists. For example, SO=1 means that south and west are linked.
Our first idea was to parse those lines with regular expressions, using the Pattern class provided by standard Java libaries. My teammate has written a code to generate a Pattern for the entire string, assembling smaller patterns that indicate the possible values of some enumerations (formerly Position to contain cardinal points and AssetType to contain structures like road or city). I won’t paste the code for the generation because it is quite space consuming and not very elegant. However, I can grant you that it is correct.
Before going on, I would like to point out that are actually two main parts which the tile string is composed of: the specification of borders (i.e. the first 4 assignments) and that of links (the last six). We have, therefore, two parser. The first is able to parse strings like “N=N S=C O=C E=N” and the second “NS=0 NE=0 NO=0 OE=0 SE=0 SO=1”. Their patterns are correct. We have thoroughly tested them and all tests successfully passed smoothly.
And now comes the rub. Since the tile string is always composed by the first part and the second part, we created the pattern for the whole string simply appending the pattern for the first one and that for the second one, separating them with a \s+ and surrounding each of them with parantheses. The resulting expression is the following:
(N\s*\=\s*(N|S|C)(,(R|B|V|G|N))?\s+S\s*\=\s*(N|S|C)(,(R|B|V|G|N))?\s+O\s*\=\s*(N|S|C)(,(R|B|V|G|N))?\s+E\s*\=\s*(N|S|C)(,(R|B|V|G|N))?)\s+(NS\s*\=\s*(0|1)\s+NE\s*\=\s*(0|1)\s+NO\s*\=\s*(0|1)\s+OE\s*\=\s*(0|1)\s+SE\s*\=\s*(0|1)\s+SO\s*\=\s*(0|1))
It looks awful, I know, but it is a compile-time result. Nevertheless, we tested it against some strings, like those I’ve posted above, only to discover that it won’t match, although the single patterns match.
We tried to run it on an online simulator, like this and it matches flawlessly. We don’t know how to make it match. Any ideas?
Some piece of code:
public Tile from(String tileString) {
Matcher matcher = pattern.matcher(tileString);
return new InnerTile(
tileBorderBuilder.from(matcher.group(1)),
tileLinkageBuilder.from(matcher.group(14)));
}
tileBorderBuilder.from parses the first part and returns a TileBorder object. tileLinkageBuilder.from does the same thing and returns a TileLinkage object. It throws an exception: “No match found”.
P.S.: we are using Java SE 1.6 or Open-JDK6 (it fails on both of them).
To debug problems like this, start with a more simple regex and build from there, i.e. try to match
(shorten the regex accordingly). This will help you identify the position in the regex where the problem occurs.
That said, I’d suggest to parse the config with this regex:
Split the input into lines and then apply this repeatedly to each line to read each “word”.
This is more effort on the Java side but it keeps the regex in check and makes your code easy to read, understand and extend … because in a few days, you’re going to add another tile or a new option — one month later, the regex will have taken control of your life.