I have the following code snippet from a HTML file:
<div id="rwImages_hidden" style="display:none;">
<img src="http://example.com/images/I/520z3AjKzHL._SL500_AA300_.jpg" style="display:none;"/>
<img src="http://example.com/images/I/519z3AjKzHL._SL75_AA30_.jpg" style="display:none;"/>
<img src="http://example.com/images/I/31F-sI61AyL._SL75_AA30_.jpg" style="display:none;"/>
<img src="http://example.com/images/I/71k-DIrs-8L._AA30_.jpg" style="display:none;"/>
<img src="http://example.com/images/I/61CCOS0NGyL._AA30_.jpg" style="display:none;"/>
</div>
I want to extract the code
520z3AjKzHL
519z3AjKzHL
31F-sI61AyL
71k-DIrs-8L
61CCOS0NGyL
from the HTML.
Please note that: <img src="" style="display:none;"/> must be used because there are other similar urls in HTML file but I only what the ones between <img src="" style="display:none;"/>.
My Code is:
cat HTML | grep -Po '(?<img src="http://example.com/images/I/).*?(?=.jpg" style="display:none;"/>)'
Something seems to be wrong.
You can solve it by using positive look ahead / look behind:
Demonstration:
Regexp breakdown:
.*?match all characters reluctantly(?<=<img src=...ges/I/)preceeded by<img .../I/(?=\._...ne;\"/>)succeeded by._...ne;\"/>