I have a regular expression that I want to match a latitude/longitude pair in a variety of fashions, e.g.
123 34 42
-123* 34' 42"
123* 34' 42"
+123* 34' 42"
45* 12' 22"N
45 12' 22"S
90:00:00.0N
I want to be able to match these in a pair such that
90:00:00.0N 180:00:00.0E is a latitude/longitude pair.
or
45* 12' 22"N 46* 12' 22"E is a latitude/longitude pair (1 degree by 1 degree cell).
or
123* 34' 42" 124* 34' 42" is a latitude/longitude pair
etc
Using the below regular expression, when I type in 123, it matches. I suppose this is true since 123 00 00 is a valid coordinate. However, I want to use this regular expression to match pairs in the same format above
"([-|\\+]?\\d{1,3}[d|D|\u00B0|\\s](\\s*\\d{1,2}['|\u2019|\\s])?"
+ "(\\s*\\d{1,2}[\"|\u201d|\\s])?\\s*([N|n|S|s|E|e|W|w])?\\s?)"
I am using Java.
* denotes a degree.
What am I doing wrong in my regular expression?
Well, for one thing, you’re filling your character sets with a bunch of unnecessary pipe characters – alternation is implied in a
[]pair. Additional cleanup:+doesn’t need to be escaped in a character class. Your regular expression seems to be addressing a bigger problem statement than you gave us – you make no mention ofdorDas matchable character. And you’ve made pretty much the entire back half of your RegEx optional. Going off of what I think your original problem statement is, I built the following regular expression:It’s a bit of a doozy, but I’ll break it down for you, or anyone who happens across this in the future (Hello, future!).
Start of string, simple.
Any amount of whitespace – even none.
Denotes the beginning of a group – we’ll get back to that.
An optional sign
1 to three digits
An optional Asterisk – the escape here is key for an asterisk, but if you want to replace this with the unicode codepoint for an actual degree, you won’t need it.
At least one character of whitespace
1 or two digits.
Optional apostrophe
You’ve seen these before, but there’s a new curveball – there’s a plus after the
{1,2}quantifier! This makes it a possessive quantifier, meaning that the matcher won’t give up its matches for this group to make another one possible. This is almost exclusively here to prevent1 1 11 1 1from matching, but can be used to increase speed anywhere you’re 100% sure you don’t need to be able to backtrack.Optional double quote. You’ll have to escape this in Java.
An optional cardinal direction, designated by letter
OR – you can match everything in the group before this, or everything in the group after this.
Old news.
A colon, followed by two characters…
twice!
Decimal point, followed by a single digit.
Same as before, but this time it’s mandatory.
Some space, and finally the end of the group. Now, the first group has matched an entire longitude/latitude denotation, with an arbitrary amount of space at the end. Followed closely by:
Do that one, or two times – to match a single or a pair, then finally:
The end of the string.
This isn’t perfect, but it’s pretty close, and I think it answers the original problem statement. Plus, I feel my explanation has demystified it enough that you can edit it to further suit your needs. The one thing it doesn’t (and won’t) do, is enforce that the first coordinate matches the second in style. That’s just too much to ask of Regular Expressions.
Doubters: Here it is in action. Please, enjoy.