For an online calculator where users may enter an energy amount to calculate corresponding fees, I need the PHP script to accept various user inputs. The value of “2 million and one fourth joule” may be entered as:
2000000.25 (default notation)
2,000,000.25 (with thousands separator)
2000000,25 (comma as decimal point)
2.000.000,25 (comma as decimal point, with thousands separator)
2’000’000.25 (alternative format)
2 000 000,25 (French notation)
How could I make the script aware of such differences?
My first try was to just str_replace alternative characters with the default ones, but the period (.) may be either a decimal or a thousands separator. I tried using sscanf but how can I make sure that it reads the number correctly?
Most users will only provide two digits after the decimal point, but is there any way I can distinguish 1.234 (1 point 234, period as decimal separator) and 1.234 (one thousand two hundred thirty-four, period as thousands separator)?
Since I wasn’t able to find a simple solution via some built-in PHP functions, I wrote two functions to (1) check if the entered string may be a number at all and (2) if it is well-formed depending on the separators used.
I restricted the possible separators to period (
.), comma (,), space () and apostrophe (') as thousands separators. The decimal point may only be one of the first two options. Both sets of separators can be edited to allow even more or restrict the ones in place.What I am actually doing is to look for all number columns and all separators by using a couple of simple
preg_match_allcalls.The complete code reads as follows and should be self-explaining as I added some comments when throwing a
false. I’m sure, this can be simplified somehow, but it works right now and filters many errors while allowing even some strange combinations such as2 000 000.25or2'000'000,25.I am aware of one flaw this set of function has:
1.234or1,234will always be treated as the whole number1234, as the function assumes the separator must be a thousands separator if there are less than 4 digits in front of the single separator.