I need to extract the doctype of a HTML page which may be XHTML,

Question

0

Asked: June 4, 20262026-06-04T07:04:14+00:00 2026-06-04T07:04:14+00:00

I need to extract the doctype of a HTML page which may be XHTML,

0

I need to extract the doctype of a HTML page which may be XHTML, HTML html or WML, using C or C++.
I will be giving the input as a HTML file or as an array.

if html pages does’t contain header then result should be in with respect to page like if it is html result = html or if it is xhtml result = xhtml….

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T07:04:15+00:00

This seems like two distinct questions:

1) how to simply grab the “doctype” declaration from an html page, for which I was going to suggest something simple like:

char doctype[1024];

void
get_doctype(char *html_page)
{
  sscanf(html_page, "<!DOCTYPE %1024s>", doctype);
}

Then perhaps match against known doctype strings to get an enumerated value.

But you’re also asking 2) how to detect the type of a page with no doctype declaration. That’s harder, and there may be multiple correct answers for each page. I would suggest outsourcing to a library like libxml. It has functions to validate input streams as certain types of documents.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to extract the doctype of a HTML page which may be XHTML,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply