I have a csv (comma separated file). I would like to know how to search for a pattern where the 7th and 8th field are the same using only grep (no using cut). I have tried something like this:
grep -E '[^,]*,{6,6}' input.csv | grep '\(.*\)\(,\)\(\1$\)' | less
Unfortunately, this does not print anything. How could I get the output I need?
Assuming there’s nothing awkward like fields with commas in them (because if there are such fields in the first 8 fields, you can’t process the files without a full CSV-cognizant tool), and that there is a 9th field (so the 7th and 8th fields are both followed by a comma) then:
The first bit says 6 sequences of zero-or-more non-commas, each followed by a comma. Then there’s the 7th (possibly empty) field with its trailing comma; that’s followed by the same-thing-again (the
\2).Note that the
g,h,iline does not appear in the output (and it shouldn’t); the rest should and do appear.All of this is done using POSIX Basic Regular Expressions or BREs. If you use
egreporgrep -E, you have Extended Regular Expressions or EREs at your disposal and you can forego all the backslashes except the\2; you could also deal with a file that has some lines with 8 fields and other lines with 9 or more, but that isn’t a regular CSV file. The BRE version can also be modified to work with a CSV file that has precisely 8 columns:Part of the art of using regular expressions is having a flexible mindset about different ways to achieve a given result; there is often more than one way to do it.