I have a variable that has all the data source of a web page. It’s a large string with lots of words, strings, special characters, etc.
I want to go through this variable and extract the ticket number. Which is after tickets/ and before .json. In the following case, my list would be only 1, which is the value 15.
https://company.zendesk.com/api/v2/tickets/15.json
This web page will have multiples of this link in between lots of text. In the following case, my list would have 2 items, the values 19 and 20.
https://company.zendesk.com/api/v2/tickets/19.json blahblahblajlkdfjfaiofjd3289239lkdj
2398283j;lkjfe89j2pefj2efljefkj
https://company.zendesk.com/api/v2/tickets/20.json blah blhahblbahlhkaldk
How would I go about extracting JUST the ticket numbers from these links in this huge file and put them into a list?
Would I use Regex? I’m not really sure how I’d approach this.
By the way, there is no format to this page. It’s not like it’s an XML doc or anything.
Thanks!
Something like this should get you started work
@“https://company.zendesk.com/api/v2/tickets/\d+.json”;
take note of the bolded parts. the @ means that it’s a literal string, so you don’t have to double-escape your
\. the \d is a stand-in for any digit. the + means the previous character occurs 1 or more times.*would mean that it occurs 0 or more times.here’s a reference on how you can futher customize the pattern http://msdn.microsoft.com/en-us/library/az24scfc.aspx
To get just the ticket numbers, you can put the
"\d+"in parenthesishttps://company.zendesk.com/api/v2/tickets/(\d+).json"and then your match will have a property called
Groupsyour ticket number will be one of those groupsAt that point, you can filter out the full match group from the ticket number of groups using a number of heuristics including but limited to the string length, or you can use another regex.