My questions are simple:
Is the following valid? If it is, would it break in some browsers?
<div data-text="Blah blah blah
More blah
And just a little extra blah to finish"> ... </div>
Which characters “must” be encoded in attribute values? I know " should be ", but are any others required to be encoded?
It’s a valid fragment of HTML5, yes.
Unlikely.
That depends on whether the attribute value is double quoted, single quoted or unquoted.
For the double quoted form
"must be replaced by its character reference, and&may need to be replaced by its character reference depending on the characters that follow it. See attribute-value-double-quoted-stateFor the single quoted form
'must be replaced by its character reference, and&may need to be replaced by its character reference depending on the characters that follow it. See attribute-value-single-quoted-stateFor the unquoted form
TAB,LINEFEED,FORMFEED,SPACE,>must be replaced by their character references, and&may need to be replaced by its character reference depending on the characters that follow it. See attribute-value-unquoted-state