Arc Forumnew | comments | leaders | submitlogin
Anyone have strong feelings about the 'para function missing a closing tag?
4 points by zck 119 days ago | 5 comments
I'm doing some reworking of my static site generator, and I realized I have different opinions about the <p> html tag than pg does.

Basically, the closing </p> tag can be sometimes left out (see "tag omission" on https://developer.mozilla.org/en-US/docs/Web/HTML/Element/p). But Anarki always does (https://github.com/arclanguage/anarki/blob/master/lib/html.arc#L275).

I'd like the behavior better if, when given a body (e.g., (para "hi there") ), it output a closing tag (<p>hi there</p>), and when not given a body ( (para) ), it always self-closed the tag (<p />).

Any objections to this? I know I'm going to have to rework some parts of html.arc to allow for self-closed tags (<p />), but it at least is something I'd like.



4 points by krapp 116 days ago | link

I support updating html.arc in general, particularly to bring it more in line with HTML 5. I think I added support for data attributes at some point, but some things like an async attribute for script tags are still difficult, because it expects all attributes to have a value.

In a perfect world, though, it would use a structure like Racket's html parser[0], and could be used without namespace pollution.

In a more perfect world (to me), Arc would support XML natively. But I'm probably the only person here who would want to write HTML directly in an Arc script.

[0]https://docs.racket-lang.org/html-parsing/index.html

-----

2 points by akkartik 119 days ago | link

I think if your proposal is always strictly better than not closing, then I support it. Programming in Arc tries to take away the need to read raw html, so we can handle a little extra verbosity in the emitted code in some situations.

-----

2 points by zck 118 days ago | link

I think it is according to the spec, but I don't do enough frontend to really know.

And I find it actually easier to read, because it's properly nested with a closing tag. And a self-closed tag lets you know there's no body, which is also a plus.

-----

2 points by rocketnia 116 days ago | link

"And a self-closed tag lets you know there's no body, which is also a plus."

A <p> tag does have a body. HTML like this:

  <body>
    <p>First paragraph
    <p>Second paragraph
  </body>
Is treated basically like this:

  <body>
    <p>First paragraph
    </p><p>Second paragraph
  </p></body>
If instead you write:

  <body>
    <p />First paragraph
    <p />Second paragraph
  </body>
Then... Well, I should try to be precise....

It looks like the HTML specification defines this as a "non-void-html-element-start-tag-with-trailing-solidus parse error." The spec says that in this case, "The parser behaves as if the U+002F (/) is not present," but also that "[browsers] may abort the parser at the first parse error that they encounter for which they do not wish to apply the rules described in this specification."

I don't know of any browsers that abort the parsing altogether, so it's still reliable to write the HTML that way.

However, the similarity to XML is actively misleading in this case. When you process that document as HTML, you still get structure like this:

  <body>
    <p>First paragraph
    </p><p>Second paragraph
  </p></body>
But when you process it as XML, you get structure like this:

  <body>
    <p></p>First paragraph
    <p></p>Second paragraph
  </body>
So if you're trying to write a polyglot HTML/XML document, self-closing <p /> tags still probably aren't a great option. Closing the paragraphs explicitly, like so, makes it clearer how the structure will end up:

  <body>
    <p>First paragraph</p>
    <p>Second paragraph</p>
  </body>
---

I think modern HTML does have a reliable common subset with XML. Modern HTML treats <br></br> and <p /> as parse errors, but it treats <br /> and <p></p> as valid. To write HTML/XML polyglot content, you just need to pay attention to whether you're dealing with a void element like "br" or a non-void element like "p".

Incidentally, why use an HTML/XML polyglot at all? There are at least a few situations where it can make sense:

- You're serving it as HTML, but (at least someday) you might want to use an XML-processing tool on it or serve it as XHTML.

- You're trying to serve it as XHTML, but you're worried you'll mess up your server configuration and serve it as HTML by mistake.

- You're confident you can serve it as XHTML today, but you have a backup plan to serve it as HTML if needed. In particular, you're afraid someday your XHTML will be invalid due to a bug in your code, a bug in a browser, an intentional spec violation in a browser (e.g. for security or user privacy), or a backwards-incompatible change in the spec. The XHTML spec dictates that an invalid page won't be displayed at all, so if you end up with invalid XHTML for any of those reasons, your site will be rather unusable until you can implement a fix. If that happens at a time you're not ready to drop everything and look at the bug in depth, then you can make a pretty quick switch to serving it as HTML, and most of the page will display again.

Because of the brittle handling of errors, XHTML still hasn't really gotten off the ground. So it seems like the primary value of the HTML/XML polyglot is to serve a document as HTML but use XML-processing tools on it behind the scenes.

---

A side note...

In the very early days of XML and XHTML, when people were trying to make their HTML pages as XML-like as possible, many browsers would interpret something like <br/> as an element with the tag name "br/". That's why people got into the habit of putting in a space like <br />. That way those browsers would instead interpret the / as an attribute named "/", which was mostly harmless. Nowadays, the space is pretty much vestigial and you can just write <br/> if you want to.

-----

3 points by zck 115 days ago | link

Yeah, when I dug more into the spec, I found out that <p> tags can't be self-closed. Only void and foreign elements can be self-closed (https://html.spec.whatwg.org/multipage/syntax.html#start-tag...).

The closing </p> can be omitted if the next tag is one of 25 different tags (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/p, see "tag omission").

What I ended up coding was that the (para) call will always add a closing tag. This is more consistent with the spec -- as far as I can tell, the closing tag is never required to be omitted.

-----