What are Markup Languages?

HTML is a Markup Language
HTML is a Markup Language. Image courtesy J Kyrnin

If you are new to web design, you may have heard the term "markup" or "markup language" and wondered what it means? How is "markup" different that "code" and why do some web professionals seem to uses these terms interchangeably? Let's take a look at exactly what a "markup language" is.

What Do the Abbreviations with “ML” in Them Mean?

Nearly every acronym on the web that has an “ML” in it is a “markup language.” Markup languages are the languages that build the Web.

 

There are many different markup languages. There are three that you will likely run across if you are doing web design or development - HTML, XML, and XHTML.

What is a Markup Language?

A markup language is a language that annotates text so that the computer can manipulate the text. Most markup languages are human readable because the annotations are written in a way to distinguish them from the text. For example, with HTML, XML, and XHTML, the markup tags are < and >. Any text that appears within one of those characters is considered part of the markup language and not part of the annotated text. For example

<p>
this is a paragraph of text written in HTML
</p>

When you format text to be printed (or displayed on a computer or other device screen), you need to distinguish between the text itself and the instructions for printing the text. The markup is the instructions for displaying or printing the text.

Markup doesn’t have to be computer readable. Annotations done in print or in a book are also markup. For example, many students in school will highlight certain phrases in their text books. This indicates that the highlighted text is more important than the surrounding text. The highlight color is considered markup.

Markup becomes a language when rules are codified around how to write and use the markup. That same student could have their own “note taking markup language” if they codified rules like “purple highlighter is for definitions, yellow highlighter is for exam details, and pencil notes in the margins are for additional resources.” But most markup languages are defined by an outside authority for use by many different people. This is how the markup languages for the Web work.

HTML—HyperText Markup Language

HTML or HyperText Markup Language is the primary language of the Web. All web pages are written in a flavor of HTML. HTML defines the way that images, multimedia, and text are displayed in web browsers. It includes elements to connect your documents (hypertext) and make your web documents interactive (such as with forms). Many people call HTML "website code", but it is really just a markup language. Neither term is strictly wrong and, as I mentioned previously, you will hear many web professionals uses both of these terms.

HTML is a defined standard markup language. That standard was developed by the World Wide Web Consortium (W3C). It is based upon SGML (Standard Generalized Markup Language).

It is a language that uses tags to define the structure of your text. Elements and tags are defined by the < and > characters.

HTML is no longer the only standard for web development. As HTML was developed, it got more and more complicated and the style and content tags combined into one language. Eventually, the W3C decided that there was a need for a separation between the style of a web page and the content. A tag that defines the content alone, such as , would remain in HTML while, tags that define style, such as , are deprecated in favor of style sheets.

The newest numbered version of HTML is HTML5. HTML5 adds more features into HTML and removes some of the strictness that was imposed by XHTML, but HTML5 is still a markup language.

The way that HTML is released has been altered with the rise of HTML5.

Today, new features and changes are added without there needing to be a new, numbered version released.

XML—eXtensible Markup Language

The eXtensible Markup Language is the language that another version of HTML is based on. Like HTML, XML is also based off of SGML. It is less strict than SGML and more strict than plain HTML, and provides the extensibility to create various different languages.

XML is a language for writing markup languages. For example, if you are working on genealogy, you might create tags using XML to define the father, mother, daughter, and son in your XML like this: <father> <mother> <daughter> <son>. There are also several standardized languages already created with XML: MathML for defining mathematics, SMIL for working with multimedia, XHTML, and many others.

XHTML—eXtended HyperText Markup Language

XHTML 1.0 is HTML 4.0 redefined to meet the XML standard. XHTML has been replaced in modern web design with HTML5 and the changes that have come since. If you are working on a much older site, however, you may still encounter XHTML on live websites. 

There aren't a lot of major differences between HTML and XHTML:

  • XHTML is written in lower case. While HTML tags can be written in UPPER case, MiXeD case, or lower case, to be correct, XHTML tags must be all lower case.
  • All XHTML elements must have an end tag. Elements with only one tag, such as and need a closing slash (/) at the end of the tag:
    <hr />
    <img />
  • All attributes must be quoted in XHTML. Some people remove the quotes around attributes to save space, but they are required for correct XHTML.
  • XHTML requires that tags are nested correctly. If you open a bold () element and then an italics () element, you must close the italics element (</i>) before you close the bold (</b>).
  • HTML Attributes must have a name and a value. Attributes that are stand-alone in HTML must be declared with values as well, for example, the HR attribute would be written noshade="noshade".
     

Original article by Jennifer Krynin. Edited by Jeremy Girard on 1/4/17.