I have phone number list on our PhoneNos table
ID | PhoneNo
1 | +61 2 9666 8000
We try to search this phone no into our Content table (ie. desc field)
The challenge is actualy:
The desc field is a text and the input can be any thing such as:
ContentID | Desc
1 | bla bla ... +61 (02) 9666 8000 ... bla bla
2 | bla bla ... +61-2-9666-8000 bla bla
3 | bla bla ... +61 2 96668000 bla bla
4 | bla bla ... +61296668000 00116129668000 bla bla
or could be anything arranging from extra spacing such as
5 | bla bla ... +61 (02) 9666 8000 ... bla bla
6 | bla bla ... +61-2 9662 0382 ... bla bla
That’s an Australia phone number BUT again it could be USA or any other countries SO it’s not tight with 1 particular country.
This phone no have no pattern what so ever before and after this phone no. So it could be anything.
Is there anyway to handle this sort of thing easily? I can probably construct each condition above BUT I’m just wondering if there is a better solution.
My (highly uneducated) thought would be to use a regular expression replace (see here). Essentially strip everything in the content except for numbers and plus signs (feeling clunky yet? 🙂 ), and then compare to your control string with the same processing (
\\+\d+, basically). That makes the rather broad assumption that there will be no false positives created by another random string of numbers/characters matching your number (I imagine somewhat unlikely from a probability perspective, but always a possibility).I was tinkering around with what I’m sure is highly inefficient, inelegant and likely incorrect solution, and realized that it won’t handle the case with a leading
0inside parentheses (since this doesn’t seem to be present in other patterns). You can find it here if you’re curious, but I think the regex solution may be the most efficient way to handle.