I need a regular expression that matches this text:
894975||0||Lever 2000 Anti-Bacterial Bar Soap ||Health & Spa/Personal Care/Body Care/Soap
I want to search the text, and, after two sets of pipes, match “Bar Soap”.
If the words are not in order then it’s not matching.
My regex is:
/^(?:\d+\|\|).?\|\|[^|]*?(Bar[^|]*? Soap)/i
This is not matching when “soap” comes first and “bar” second.
The sample data looks like a standard pipe-delimited (
'|') file you’d see from database extracts. It’s common to see fields with a null value show up as||in the output.Rather than try to parse it using a regex, it’s usually handled by splitting on the pipes, or by treating it as a CSV record with a pipe instead of comma as the column-separator. Splitting on double-pipes (
||) will fail if you get a record where the field actually has content.Here’s two different samples showing how I’d do it. The first is splitting on
|into fields.fieldsat this point looks like:Grabbing the fifth field retrieves the product:
A second way to do this is to treat the content as a CSV file with
|delimiters:The advantage of doing it using CSV is it’s possible to find a
|character in text, and CSV will handle decoding embedded pipes correctly.Because there’s only one sample input line, this solution can’t be more thorough.