To resolve the incompatibilities between XHTML and HTML (discussed earlier), I propose that browser developers adopt these rules:
Certain HTML is NOT lexically valid XML.
XHTML is the subset of HTML which is also valid XML.
HTML is a superset of XML
- Any XML is lexically valid HTML.
- HTML readers MUST accept syntactically valid XML. For example: <script src="..." /> SHOULD be read to close the script tag and not to treat the rest of the web page as javascript. Alternate parsing methods SHOULD only be attempted if the page fails to lex.
- An HTML reader MUST accept certain XML control sequences. For example, a reader reading <script><![CDATA[ ... ]]></script> MUST read the CDATA as CDATA and MUST NOT send the XML control characters to the script reader.
- HTML tags and attribute names MAY be case-insensitive.
- Certain HTML tags may be self-closing without a '/' self-closing mark.
- Any fragment of HTML which is a complete element is itself a valid HTML document. For example, "<p>Hello World" is a valid HTML document.
- Use of XHTML by web developers is optional.
- Documents which claim to be XHTML SHOULD comply with all of the rules of XML.
- Readers SHOULD consider a document's declarations of its own file type, such as DOCTYPE and xml ?> control sequences, in considering whether to interpret a document as XHTML.