I have a page that has many posts by different authors. I want the posts from user A from that page of posts.
How can I set up grep to look at each post’s html block in the page for the author, then print the content of the post to a file? The post structure is something like
<!--Begin Msg Number #####-->
[useless junk i'm not interested in here]
<span class="author vcard"><a class="url fn" href='url here'>User A</a> </span>
[more junk]
<div class='post entry-content '>
<!--cached-some date string--> Here's the text I want to extract
</div>
[more junk]
<hr />
I think the structure is something like
grep /pattern/ output file
but do I need to explicitly tell it to hunt only between the
<!-- begin msg ... -->
and
<hr />
tags that bound the post, or is grep smart enough to do that automatically? I’m worried that when grep finds the pattern of User A, it will print all the post contents to a file instead of just that particular one.
If all the post text is on one line, then try