Is there any Java open source library that supports multi-character (i.e., String with length > 1) separators (delimiters) for CSV?
By definition, CSV = Comma-Separated Values data with a single character (‘,’) as the delimiter. However, many other single-character alternatives exist (e.g., tab), making CSV to stand for “Character-Separated Values” data (essentially, DSV: Delimiter-Separated Values data).
Main Java open source libraries for CSV (e.g., OpenCSV) support virtually any character as the delimiter, but not string (multi-character) delimiters. So, for data separated with strings like “|||” there is no other option than preprocessing the input in order to transform the string to a single-character delimiter. From then on, the data can be parsed as single-character separated values.
It would therefore be nice if there was a library that supported string separators natively, so that no preprocessing was necessary. This would mean that CSV now standed for “CharSequence-Separated Values” data. 🙂
This is a good question. The problem was not obvious to me until I looked at the javadocs and realised that opencsv only supports a character as a separator, not a string….
Here’s a couple of suggested work-arounds (Examples in Groovy can be converted to java).
Ignore implicit intermediary fields
Continue to Use OpenCSV, but ignore the empty fields. Obviously this is a cheat, but it will work fine for parsing well-behaved data.
or
Roll your own
Use the Java String tokenizer method.
Disadvantage of this approach is that you lose the ability to ignore quote characters or escape separators..
Update
Instead of pre-processing the data, altering it’s content, why not combine both of the above approaches in a two step process:
Not very efficient, but possibly easier that writing your own CSV parser 🙂