Converting an XML File to be Well-Formed

Learn How to Write Well-Formed and Valid XML

Well-Formed

There are some specific rules to create a well-formed XML document:

  • The XML declaration must come first in every document.
  • Comments are not valid within a tag. Comments may not contain two hyphens in a row, other than the beginning and end of the comment.
  • Tags must have an end tag, or be closed within the singleton tag itself, for example
    .
  • All attributes of tags must be quoted, preferably double quotes unless the attribute itself contains a double quote.
  • Every XML document must contain one element that completely contains all the other elements.

There are only two problems with the document that make it not well-formed:

  1. The first thing that the AML document needs is an XML declaration statement.
  2. The other problem is that there is no one element that completely encloses all the other elements. To fix this, I'll add an external container element:
    
     

Making those two simple changes (and ensuring that all the elements contain only CDATA) will turn the non well-formed document into a well-formed document.

Next page > Now Make it Valid > Page 1, 2

A valid XML document is validated against a Document Type Definition (DTD) or XML Schema. These are a set of rules created by the developer or a standards organization that define the semantics of the XML document. These tell the computer what to do with the markup.

In the case of the About Markup Language, since this is not a standard XML language, like XHTML or SMIL, the DTD would be created by the developer.

That DTD would most likely be on the same server as the XML document, and referenced at the top of the document.

Before you start developing a DTD or Schema for your documents, you should realize that simply through being well-formed, an XML document is self-describing, and thus doesn't need a DTD.

For example, with our well-formed AML document, there are the following tags:

If you are familiar with the Web Writer newsletter, you may recognize the different sections of the newsletter. This makes it very easy to create new XML documents using the same standard format. I know that I would always put the full long title in the tag, and the first section URL in the tag.

DTDs

If you are required to write a valid XML document, either to use the data or to process it, you would include it in your document with the tag. In this tag, you define the base XML tag in the document, and the location of the DTD (usually a Web URI).

For example:

One nice thing about DTD declarations is that you can declare that a DTD is local to the system where the XML document is with the "SYSTEM". You can also point to a public DTD, such as with an HTML 4.0 document:

When you use both, you are telling the document to use a specific DTD (the public identifier) and where to find it (the system identifier).

Finally, you can include an internal DTD directly in the document, within the DOCTYPE tag. For example (this is not a complete DTD for the AML document):


   
   
   
   
   
   
 ]>

XML Schema

In order to create a valid XML document, you can also use an XML Schema document to define your XML. XML Schema is an XML document that describes XML documents. Learn how to write a schema.

Note

Just pointing to a DTD or XML Schema is not enough. The XML that is in the document must follow the rules in the DTD or Schema. Using a validating parser is a simple way to check that your XML is following the DTD rules. You can find many such parsers online.

First page > Making it Well-Formed > Page 1, 2