Going crazy trying to figure this out for the past 2 hours. I have this html returned as a string from an AJAX request:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Preview</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="author" content="Connected Ventures LLC. Copyright 1999-2010." />
<script type="text/javascript" src="js/jquery.js"></script>
<script type="text/javascript" src="js/jquery.ui.js"></script>
<script type="text/javascript" src="js/article.js"></script>
<link href="/css/global.css" rel="stylesheet" type="text/css" />
<link href="/css/article.css" rel="stylesheet" type="text/css" />
<style type="text/css">
html, body { background: #fff; color: #000; }
</style>
</head>
<body class="the_article">
<p>s</p></body>
</html>
I need to get the content in between the body tags. I already tried this which was suggested in another SO question on parsing html via jQuery:
$(ajax_response).find('body.the_article').html();
Didn’t work. Even after adding:
dataType: 'html'
as an ajax request parameter. Then I tried to parse it using regex:
ajax_response.match(/<body class="the_article">.*?<\/body>/);
it just alerts null. Any idea how I can get the body content?
Your REGEX is failing because the string is multi-line, and the
.wildcard matches all characters except whitespace characters, so the newline after, say, the openingbodytag and the body’s content, breaks the pattern.Use
[\s\S]instead of.(literally, allow non-space and space characters)[EDIT] – in response to the comment, to capture the body content exclusive of its tags, capture the contents as a sub-group:
Note also we specify the closing body tag as a look-ahead, since we don’t need to match that at all, merely anchor to it. (JS doesn’t support look-behinds, short of simulations like the one I wrote, so we have no choice but to capture the opening body tag).