In my project I have to parse a set of dynamic strings which contains numbers, date, other info marion. I tried writing a parser with regular expression. It’s working but not all time. Can someone suggest a better solution for this? Below is a sample string
“Thank you for using your HDFC Bank Debit/ATM Card ending 4444 for Rs.
125.25 towards ATM WDL in T NAGAR CAP at ATM on 2012-04-16:17:33:03.”
here I want data like
bank name =hdfc
card no =4444
amount = 125.25
category = atm
date = 2012-04-16:17:33:03
Solving this just with regular expressions, especially when the exact content of the String is dynamic, won’t work very well. What you need is a tokenizer and a lexical analyzer with a grammar. I haven’t done something like this in Java, but first of all you need to break down your string into tokens (keywords, values, expressions, phrases etc.)
like
“Thank you for using your HDFC Bank Debit/ATM Card ending 4444 for Rs. 125.25 towards ATM WDL in T NAGAR CAP at ATM on 2012-04-16:17:33:03.”
You can do so by defining tokens, give them convinient names and defining rules for them i.e. with regular expressions. The focus is on what you have, not what it means
Afterwards you need a gramer as regular expressions won’t help you understand the ‘what’:
Which basically gives you a tree of rules.
By tokenizing your input string and applying you grammar, you should be able to analyze the string and find the parts that are of interest.
Unfortunately this is only an theoretical introduction to this quiet complex but very interesting topic and I cannot provide any code examples, but I hope this helps to get started.