((<(\\s*?)(object|OBJECT|EMBED|embed))+(.*?)+((object|OBJECT|EMBED|embed)(\\s*?)>))
I need to get object and embed tags from some html files stored locally on disk. I’ve come up with the above regex to match the tags in java then
use matcher.group(1); to get the entire tag and its contents
Can anyone perhaps improve this? Is there anything that stands out immediately to you that i should change?
It does work BTW, just wanting an input to see if it can be better because i’m fairly new to regex myself.
Yes, here’s the improvement:
Download a fullworthy Java HTML parser like Jsoup and put it in classpath.
Now you can select all
<object>and<embed>elements as follows:See also: