I am trying to use regex in Java to extract contents out of a multiline string that is between 2 tags. For example, the content may look like this:
--_000_CAKETFEgg78oKKJPNySnxF4BgQoh9ifHP4XzXGeJddUvOtz5wmailgm_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
hello test
...
..
!@#!@%$#^%$&*^(*)*()
..
..
..
..
--_000_CAKETFEgg78oKKJPNySnxF4BgQoh9ifHP4XzXGeJddUvOtz5wmailgm_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
hello test<br><br>..<br>!@#!@%$#^%$&*^(*)*()<br>.<br><br>.<br>.<br>.<br><br><br><br>.<br><br>
--_000_CAKETFEgg78oKKJPNySnxF4BgQoh9ifHP4XzXGeJddUvOtz5wmailgm_
I want to extract just the contents between the --_000_CAKETFEgg78oKKJPNySnxF4BgQoh9ifHP4XzXGeJddUvOtz5wmailgm_ boundaries.
I used a regular expression that looks like this: --_000_CAKETFEgg78oKKJPNySnxF4BgQoh9ifHP4XzXGeJddUvOtz5wmailgm_\n?[.\n]+\n?--_000_CAKETFEgg78oKKJPNySnxF4BgQoh9ifHP4XzXGeJddUvOtz5wmailgm_
But it doesn’t work. What should my regular expression be to extract the content? Also, when extracting, would the boundary tags be included together with the content?
I would highly recommend not to use regular expressions for parsing. They are not well suited for that. Just write a parser that iterates over your input, searches for the start tag, sets a flag, records all following lines, recognizes the end tag and resets the flag. Easily done and much more flexible than regex.