I want to know the input html string is vaild or not.
I researched various HTML parser. But anything doesn’t have validating html method.
Jsoup is almost same what I want. But it generates valid parsed html.
Basically I want to check valid html structure as below.
<html>
<head>~</head>
<body>~</body>
</html>
So, I wrote code in Java.
String html = "<html><head><title>asdf</title></Head><body>asfd</body></html>";
String compile = "(?i)<html.*>.*<head>.*?</head>.*<body>.*</body>.*</html>";
Pattern pattern = Pattern.compile(compile);
Matcher matcher = pattern.matcher(html);
if (matcher.matches()) {
System.out.println("Valid html");
} else {
System.out.println("Invalid html");
}
But if html has 2 of <head> element, it also checks valid html.
How to check valid html structure efficiently?
How about using some library to do it? I recommend JSoup.