What are Markup Languages?

HTML is a Markup Language
HTML is a Markup Language. Image courtesy J Kyrnin

As you begin exploring the world of web design, you will undoubtedly be introduced to a number of words and phrases that are new to you. One of the terms that you will likely hear is "markup" or perhaps "markup language". How is "markup" different than "code" and why do some web professionals seem to use these terms interchangeably? Let's start by taking a look at exactly what a "markup language" is.

Let's Look at 3 Markup Languages

Nearly every acronym on the Web that has an “ML” in it is a “markup language” (big surprise, that is what the "ML" stands for). Markup languages are the building blocks used to create web pages or all shapes and sizes.

In reality, there are many different markup languages out there in the world. For web design and development, there are three specific markup languages that you will likely run across. These are HTML, XML, and XHTML.

What is a Markup Language?

To properly define this term - a markup language is a language that annotates text so that the computer can manipulate that text. Most markup languages are human readable because the annotations are written in a way to distinguish them from the text itself. For example, with HTML, XML, and XHTML, the markup tags are < and >. Any text that appears within one of those characters is considered part of the markup language and not part of the annotated text.

For example:

<p>
This is a paragraph of text written in HTML
</p>

This example is an HTML paragraph. It is made up of an opening tag (<p>), a closing tag (</p>), and the actually text that would be displayed on screen (this is the text contained between the two tags). Each tag includes a "less than" and "great than" symbol to designate it as part of the markup.

When you format text to be displayed on a computer or other device screen, you need to distinguish between the text itself and the instructions for the text. The "markup" is the instructions for displaying or printing the text.

Markup doesn’t have to be computer readable. Annotations done in print or in a book are also considered markup. For example, many students in school will highlight certain phrases in their text books. This indicates that the highlighted text is more important than the surrounding text. The highlight color is considered markup.

Markup becomes a language when rules are codified around how to write and use that markup. That same student could have their own “note taking markup language” if they codified rules like “purple highlighter is for definitions, yellow highlighter is for exam details, and pencil notes in the margins are for additional resources.”  

Most markup languages are defined by an outside authority for use by many different people. This is how the markup languages for the Web work. They are defined by the W3C, or  World Wide Web Consortium.

HTML—HyperText Markup Language

HTML or HyperText Markup Language is the primary language of the Web and the most common one you will work with as a web designer/developer.

In fact, it may be the only markup language you use in your work.

All web pages are written in a flavor of HTML. HTML defines the way that images, multimedia, and text are displayed in web browsers. This language includes elements to connect your documents (hypertext) and make your web documents interactive (such as with forms). Many people call HTML "website code", but in truth it is really just a markup language. Neither term is strictly wrong and you will hear people, including web professionals, use these two terms interchangeably.  

HTML is a defined standard markup language. It is based upon SGML (Standard Generalized Markup Language). It is a language that uses tags to define the structure of your text. Elements and tags are defined by the < and > characters.

While HTML is by far the most popular markup language used on the Web today, it is not the only choice for web development.

As HTML was developed, it got more and more complicated and the style and content tags combined into one language. Eventually, the W3C decided that there was a need for a separation between the style of a web page and the content. A tag that defines the content alone would remain in HTML while tags that define style were deprecated in favor of CSS (Cascading Style Sheets).

The newest numbered version of HTML is HTML5. This version added more features into HTML and removed some of the strictness that was imposed by XHTML (more on that language shortly). 

The way that HTML is released has been altered with the rise of HTML5. Today, new features and changes are added without there needing to be a new, numbered version released. The latest version of the language is simply referred to as "HTML."

XML—eXtensible Markup Language

The eXtensible Markup Language is the language that another version of HTML is based on. Like HTML, XML is also based off of SGML. It is less strict than SGML and more strict than plain HTML. XML provides the extensibility to create various different languages.

XML is a language for writing markup languages. For example, if you are working on genealogy, you might create tags using XML to define the father, mother, daughter, and son in your XML like this: <father> <mother> <daughter> <son> . There are also several standardized languages already created with XML: MathML for defining mathematics, SMIL for working with multimedia, XHTML, and many others.

XHTML—eXtended HyperText Markup Language

XHTML 1.0 is HTML 4.0 redefined to meet the XML standard. XHTML has been replaced in modern web design with HTML5 and the changes that have come since. You are unlikely to find any newer sites using XHTML, but if you are working on a much older site,  you may still encounter XHTML out there in the wild. 

There aren't a lot of major differences between HTML and XHTML, but here is what you will notice:

  • XHTML is written in lower case. While HTML tags can be written in UPPER case, MiXeD case, or lower case, to be correct, XHTML tags must be all lower case. (Note - many web professionals write HTML in all lowercase, even though it is not technical required).
  • All XHTML elements must have an end tag. Elements with only one tag, such as and need a closing slash (/) at the end of the tag:
    <hr />
    <img />
  • All attributes must be quoted in XHTML. Some people remove the quotes around attributes to save space, but they are required for correct XHTML.
  • XHTML requires that tags are nested correctly. If you open a bold (<b>) element and then an italics (<i>) element, you must close the italics element (</i>) before you close the bold (</b>). (Note that both of these elements have been deprecated because they are visual elements. HTML now uses <strong> and <em> in place of these two)
  • HTML Attributes must have a name and a value. Attributes that are stand-alone in HTML must be declared with values as well, for example, the HR attribute would be written noshade="noshade".
     

Original article by Jennifer Krynin. Edited by Jeremy Girard on 7/5/17.