I am trying to get a regex expression to match a specific url format. Specifically the api urls for stackexchange. For example I want both of these to match:
http://api.stackoverflow.com/1.1/questions/1234/answers http://api.physics.stackexchange.com/1.0/questions/5678/answers
Where
- everything not in bold must identical.
- The first bold part, can only be made of a to z, and either one or no full stop.
- Also it would be good, if there is one full stop the word “stackexchange” must follow. However this isn’t crucial.
- The second bold part can only be a 1 or a 0.
- The last bold part can be only numbers 0 to 9, and can be any length
- There can’t be anything at all before or after the url, not even a trailing slash
The
^makes sure it starts at the start of input, and the\\zmakes sure it ends at the end of input. All the dots are escaped so they are literal. The(?i:...)part makes the domain and scheme case-insensitive as per the URL spec. The[01]only matches the characters 0 or 1. The[0-9]+matches 1 or more Arabic digits. The rest is self explanatory.