I know that Twitter is API-centric, meaning that it, like all Twitter apps, retrieves its data via the API.
When I access https://api.twitter.com/1/legal/tos.json (or .xml if you prefer) I am given a JSON formatted result that only seems to contain plaintext and no additional formatting. That being said, if we assume that https://twitter.com/tos retrieves it’s data from the same resource, how does it become formatted with hyperlinks, headings, etc.?
I’ve noticed that there are line breaks \n which break the JSON response into multiple segments. Is there some post-processing of this data to replace certain numbers of line breaks with specific HTML tags?
For example:
\n\n\n => <h3>
\n\n => <p>
Tip => <p class="tip>
However, the above formatting rules would not account for the hyperlinks strewn throughout the page. Any thoughts?
My guess is that there’s a master copy of the text formatted as HTML. The formatted page has access to the HTML version of the text through either a private API or a different request to the public API.
Separately, the JSON and XML versions are generated by a backend service which has access to the HTML-formatted text and converts it to JSON or XML or whatever else Twitter wants it to.
I think it’s extremely unlikely that twitter is using either the JSON or XML versions of the text you linked to to build their pages.