Meta Charset Tag in HTML5

Setting Character Encoding in HTML5

PHP Code
Scott Cartwright/E+/Getty Images

Prior to the introduction of HTML5, setting the character encoding on a document with a element required you to write the somewhat verbose line seen below. This is the Meta Charset elements if you were using HTML4 in your web page:

<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">

What is important to notice in this code are the quotation marks you see around the content attribute: content="text/html; charset=iso-8859-1".

Like all HTML attributes, these quotation marks define the value of the attribute, indicating that the entire string text/html; charset=iso-8959-1 is the content of this element.This is proper HTML and it is how this string was meant to be written. It is also unwieldy long and ugly! It's also not something you would likely remember off the top of your head! In most cases, web developers would have to copy and paste this code from one site into any new one they were developing because writing this from scratch was asking a lot.

HTML5 Cuts Out the Extra "Stuff"

HTML5 not only added a number of new elements to the language, but it also greatly simplified much of the syntax of HTML, including the Meta Charset element. With HTML5, you can add your character encoding with the much easier to remember syntax for the META element that you see below:

<meta charset="utf-8">

Compare that simplified syntax to what we wrote at the start of this article, the old syntax used for HTML4, and you will see how much easier to write and remember the HTML5 version really is.

Instead of needing to copy and paste this from an existing site into any new one you were working on, this is absolutely something that, as a front-end web developer, you could remember. This savings of time many not be much, but when you consider the other syntax areas that HTML5 simplified, the savings do add up!

Always Include the Character Encoding

You should always include character encoding for your web pages, even if you do not ever intended to use any special characters. If you do not include a character encoding, your site becomes vulnerable to a cross site scripting attack using UTF-7.

In this scenario, an attacker sees that your site has no character encoding defined, so it tricks the browser into thinking that the character encoding of the page is actually UTF-7. Next, the attacker injects UTF-7 encoded scripts into the web page and your site is hacked.This is obviously problematic for everyone involved, from your company to your visitors. The good news is that it is a simple problem to avoid - just be sure to add character encoding to all your webpages.

Where to Add Character Encoding

The character encoding for a webpage should be the first line of your HTML's <head> element. This ensures that the browser knows what the character encoding is before it does anything else on the page other than to determine the doctype and identify that it is an HTML page. Your HTML should read:

<!doctype html>
<meta charset="UTF-8">

Using HTTP Headers for Extra Security

You can also specify the character encoding in the HTTP headers.

This is even more secure than adding it to the HTML page, but you would need to have access to the server configurations or .htaccess files, which means you may need to work with your website's hosting provider to gain this kind of access or have them make the changes for you. Access is really the challenge here. The change itself is simple, so any hosting provider should be able to make this change for your with relative ease.

If you are using Apache, you can set the default character set for your entire site by adding: AddDefaultCharset UTF-8 to your root .htaccess file. Apache's default character set is ISO-8859-1.

Original article by Jennifer Kyrnin. Edited by Jeremy Girard on 8/17/16