I am trying to write a regular expression to strip all HTML with the exception of links and <p> and </p>.
Now , I can just remove all HTML tags except links, but I have no idea how to keep the links tag and p tags in the same time ?
By the way, somebody can recommend some books about how to learn regular expression ?
You must not parse HTML with regular expressions, (as shown here and here), reason being that HTML can, and cannot be well formed.
You will need to use a specialized framework to do what you need, if you are using Java, you can try JSoup, for C# there is the HTML Agility Pack and for PHP there is the Simple DOM Parser.