How can I use regex to find everything except for data within div with a specific style? e.g.
<div style="float:left;padding-left:10px; padding-right:10px">
<img src="../Style/BreadCrumbs/Divider.png">
</div>
<div style="float:left; padding-top:5px;">
Data to keep
</div>
<div style="float:left;padding-left:10px; padding-right:10px">
<img src="../Style/BreadCrumbs/Divider.png">
</div>
I want regex to match everything except for the data. The best way I can see is to just remove the html markup and combine the files afterwards with vb (I already have the code for vb.)
I’m using regex because I need to extract the data from several hundred files.
Your suggested method is probably not a good way to do this. If:
grepPCRE)divonly wraps your data, not other elementsdivdoes not contain otherdivsThen you can use:
The important parts of this are:
(?s)which activatesDOTALL, which means that.will match newlines.*?which matches the contents of the div reluctantly, which means it’ll stop at the first</div>it finds.To use this, you’ll need to activate a few grep options:
For these:
-Pactivates thePCRE-zreplaces\nbyNULso grep will treat the entire file as a single line-oprints only the matching partsAfter this you’ll need to strip off the divs.
sedis a good tool for this.If you put all of your files in one directory you can do the joining at the same time: