I am writing some simple code that tries to deduce whether or not a specific String is actually a Java date and, if yes, identify its format (pattern).
Obviously, because there are many possible date formats, establishing which one is applicable for a string requires successive pattern matching, which is really time and CPU-consuming, given that the input string can have other values, too.
So, what I have ended up doing, for a String variable called input, is something like
String datePattern;
if (isLikeDate(input))
{
datePattern = matchAnyOfThePredefinedDatePatterns(input);
}
where the isLike... method rejects obvious non-date strings and the match... method goes over about 40-50 predefined patterns, trying to construct a valid SimpleDateFormat object. The constructor throws an exception if the input string is not a valid date for the pattern examined each time.
The exception handling slows things down dramatically, but there seems to be no avoiding it. The Apache Commons Date packages exhibit similar performance.
Is there any faster way of implementing this date pattern matching?
Depending on the complexity of the patterns, you might want to match each potential pattern with a regex (or hand-written code) before trying to parse it properly as a date. For example, if the pattern is “yyyyMMddThh:mm:ss” you could check for the length, the position of the T, the position of the colons, and that everything else is a digit before passing it on to the date parsing code.
This level of pattern matching can be very liberal – it’s only trying to rule out definite infringements of the pattern. The important thing is that it doesn’t reject any values which are actually valid.
The downside is that for any pattern which does match, you’re doing work twice – but that may well still be easily balanced by significantly reducing the number of exceptions you throw.
EDIT: Just to clarify, you’re currently testing whether it looks like it could match any of the patterns, and then testing all of them. I’m suggesting that you have a regex for each pattern, and only try parsing against patterns which have already matched the corresponding regex.
I’d also suggest trying Joda Time – not only is it a generally better API, but its patterns are thread-safe, so you can reuse them. Presumably you’re currently creating new
SimpleDateFormatobjects each time you have something to parse.