HTML Shorthand

HTML by itself is a very verbous document language with lengthy opening and ending tags and detailed attribute descriptions. This makes writing HTML tedious and error prone. Many web authors resort to markup languages like retext or markdown. Naturally however, all markup languages include only the most common subset of HTML functionality, limiting an authors abilty to define complex document structures, class attributes, and even some HTML tags.

Enter Shorthand

Shorthand aims to be a simplified notation of HTML. It lets you enter arbitrary tags and attributes, without the unnecessary verbosity of plain HTML. It lets you build the full document structure including the HTML header, empty elements, wrappers, input forms, and so on.

Shorthand is written in sed script. A sed interpreter is availale in all posix compliant Unices, including GNU Linux and busybox based OSes.

Shorthand is developed as part of cgilite, but can be used standalone.

Download

you can either clone the entire cgilite repository, or just download the plain scripts file:

$ git clone https://git.plutz.net/git/cgilite

File Download

How it works

Here is an example

[!DOCTYPE HTML]
[html [head
  [title Example]
  [link rel="stylesheet" type="text/css" href="style.css"]
][body
  [h1 Headline]
  [p You can also define classes and IDs]
  [form #search method="POST" action="/search"
     [input name="q" placeholder="Search"]
  ]
  [pre .code \[p a Shorthand example\]]
]]

Overview

[tag .foo #id .bar  narf="noob"  Content also="content"]

is equivalent to

<tag class="foo bar" id="id" narf="noob">Content also="content"</tag>

meaning

  • [tag] becomes <tag> with everything between brackets as either attribute or content
  • the tag name is followed by optional attributes
  • words of the form key="value" become attributes
  • words starting with dot (.) become class names, but class="classname" is also fine
  • words starting with sharp (#) become the id, but id="id" is also fine
  • the first word, which does not look like an attribute will start the content
  • inserting a line break will also start the content (meaning all attributes must follow on the same line)
  • original HTML tags (<tag>) will still act as valid HTML
  • HTML entities (e.g. &auml;) will also retain their meaning
  • Characters can be escaped using backslash (\). This goes for tags (<>), brackets ([]), dot, sharp, quotation, ampersand, (. # " ' &). Literal backslashes must also be escaped via backslash (\\ becomes \, or actually &#92;)

There are a few simplifications, e.g.

[a "http://example.org" link]

will turn into a link, with the first item in quotes becoming href.

here is the source code of this document

that's it.

Usage

The shorthand script reads shorthand notation from stdin and prints HTML to stdout. Text is processed line by line, so it can be streamed from CGI-like programs.

$ html-sh.sed < input.short > output.html

or

$ printf '[html [head [title Title]][body \n %s \n ]]' "$muchtext" |html-sh.sed

Tags that are not closed by the end of the input will automatically be closed.