tangaroa | Scrap dump - notes on HTML

Old notes from when I was trying to design my own next version of HTML:

1. Different tag types increase the complexity of writing a parser.

The language has more unique branches than would be most elegant. We have <html> tags and  tags and <!CDATA[[ ]]> tags and <!DOCTYPE> and more. They all have different parsing and closing rules.

Possible resolutions, and why they are bad solutions:

Yeah, so?
- The complexity makes it harder for developers to write parsers.
Make all the special stuff use one of the <! > formats
- Breaks backwards compatibility.
Make everything use <html> tag format.
- Breaks backwards compatitility.
- All the special tag types can be seen as meta instructions, but this triggers the next problem.

2. Contents of future meta tags will be displayed by old browsers

Imagine the creation of a new tag <foobar/> which is an instruction to the browser to do something, but is not meant to be data. Old browsers will treat the tag as data and display its contents. This will annoy users and authors.

Consider the current practice of placing <script> in the body, and imagine a browser that does not know what to do with <script>.

Possible resolutions, and why they are bad solutions:

Use <! > for all metadata tags.
- Wrong context. These are currently instructions to the lexer, not the renderer.
- Breaks backwards compatibility.
Establish a tag type hierarchy and an inheritance language, and expect authors to define the next HTML version's meta tags as descendant from the meta tag.
- Every HTML7 page would be expected to include these redundant definitions for backwards compatibility, making for a waste of bandwidth.
Force all metadata tags to go in the <head> section.
Ideally, metadata should be applicable to any scope and any tag should be able to be its own scope for metadata instructions.

3. Collision between anonymous self-closing tags and anonymous closing tags

If anonymous (nameless) tags are allowed in a future version of the language, the structure </> can be read in one of two ways:

A self-closing tag that has no name
The closing tag for an earlier opening tag, with the name omitted.

Consider this example:

< attr=value>
< /> 
< />

Possible resolutions, and why they are bad solutions:

Forbid anonymous opening tags (require names for all tags)
- I can't think of a bad reason for this other than that it would scuttle one of my ideas for changing the language.
Forbid anonymous closing tags
- Anonymous tags now cannot be closed.
Forbid empty anonymous tags
- Adds some complexity to writers.
- Potential to cause unexpected behavior if a tag dynamically becomes anonymous and empty.
Establish <//> or <-> as the anonymous self-closing tag
- Inconsistent with rest of language.
- Authors will use </> regardless.
Change the self-closing tag character to '-'
- Inconsistent with rest of language.
- Breaks backwards compatibility.
Establish "_" as the name of an anonymous tag
- Does not address the problem

HTML5 eliminates this problem by outlawing self-closing tags.

4. CDATA ]] versus Javascript

CDATA blocks end at ]] which is a common character pair in Javascript and any other scripting language that uses [] for array dereferences.

Possible resolutions, and why they are bad solutions:

Point and laugh at all the Javascript programmers.
- Does not fix the problem.
Use a different end-of-scope token.
- Breaks backwards compatibility.
- Arbitrary contents will trigger the same problem.

Some of the changes that I would make to HTML:

Make comments nest.
Make line breaks a character &br; rather than a tag <br>.
Separate lexing and parsing to allow parsing to be done in parallel. In English, the lexer should not need to know the name of the tag it is inside to recognize when the tag ends. In practice, this would mean using an XML-like strict syntax rather than HTML5's developer-friendly error-tolerant attitude. Browser developers could always choose to use a more lenient syntax and web servers could automatically tidy html code before sending it. In general, the language syntax should be completely separate from the concept of what the tags are meant to represent.
Any tag should be able to have a src attribute. This will end the web-breaking Javascript circus act that "HTML5" sites do to cache data locally and reduce server processing time. To optimize traffic flows, the HTTP spec can be modified to send timestamps for attached files in a HEAD response and to send multiple files in response to GET.
Allow tag inheritance through an "aka" attribute:
<p attr="value" aka="fred" />
<fred>This is a paragraph with predefined attributes</fred>
CSS should be considered a language for mass-assigning attributes to HTML tags. The styles should be considered official attribute sets. Some of the deprecated style tags should be brought back as tags that are guaranteed to have certain style attributes set to standards.
Canvas should be an optional plugin with its own standard scripting interface. Browsers should not be required by the HTML spec to support it.

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Tang's DW

Scrap dump - notes on HTML

Scrap dump - notes on HTML

1. Different tag types increase the complexity of writing a parser.

2. Contents of future meta tags will be displayed by old browsers

3. Collision between anonymous self-closing tags and anonymous closing tags

4. CDATA ]] versus Javascript

Profile

Navigation

April 2020

Most Popular Tags

Style Credit

Expand Cut Tags