I have bash function which run python (which return all finded regex from stdin)
function find-all() {
python -c "import re
import sys
print '\n'.join(re.findall('$1', sys.stdin.read()))"
}
When I use this regex find-all 'href="([^"]*)"' < index.html it should return first group from the regex (value of href attribute from file index.html)
How can I write this in sed or awk?
I suggest you use
grep -o.E.g.:
Update
If you were extracting href attributes from html files, using a command like:
You could extract the values by using
cutandsedlike this:But you’d be better off using html/xml parsers for reliability.