element on the page has needed content that i’m trying to pull
here’s the element.content after a parse with Nokogiri
["\n \n \n \n itemId[0]=1234;\n \n \n \n \n \n \n \n My Project: First Edition\n \n ", "\n \n \n \n itemId[1]=2345;\n \n \n \n \n \n \n \n My Second Edition\n \n ", "\n \n \n \n itemId[2]=1234;\n \n \n \n \n \n \n \n Third\n \n \n"]
I was able to get the RegEx for the itemId[0]=1234 which is (/itemId.\d+..\d{4}/) but I’m totally stuck on how to grab the names of the content. Any advice? Perhaps I can just parse with Ruby through HTML?
Given a string like this:
You could do this:
Basically you pull out the
itemId...part using (more or less) or existing expression, grab the rest of the string ((.*)) in multi-line mode (/m, so that.matches a newline), and then strip off the offending whitespace outside the regex usingstrip. You don’t have to build one unreadable regex that does everything you need, post-processing a match result is allowed and sometimes even encouraged.