MTI TEK
  • Home
  • About
  • LLMs
  • Docker
  • Kubernetes
  • Java
  • All Resources
XML | Introduction
  1. Elements, tags, text
  2. XML Rules
  3. Tag Naming Rules
  4. Attributes
  5. Comments
  6. XML Declaration
  7. Processing Instructions
  8. Escape Characters
  9. Well-Formed vs Valid XML Documents

  1. Elements, tags, text
    Data in XML is organized hierarchically (parent–children).
    Each parent element in the XML document can contain multiple child sub-elements.
    Therefore, each child element is attached to a single parent element.

    The XML document begins with a single element called the root, to which all other top-level elements must be connected.

    Example:
    <book>
        <title>
            eXtensible
            <bold>Markup</bold>
            Language
        </title>
    </book>
    An element is defined by an opening tag (e.g., "<bold>"), content (e.g., "Markup"), and a closing tag (e.g., "</bold>"). The name of the opening tag (e.g., "<bold>") must match the corresponding closing tag name (e.g., "</bold>").

    The relationship between XML document elements is defined as follows:
    • The "book" element is the parent of the "title" element.

    • The "title" element is the child or sub-element of the "book" element.

    The content of an element can be simple text or composed of other elements (including text):
    • The (root) "book" element contains a single child element:
      ► The "title" element.

    • The "title" element contains three child nodes:
      ► A text node containing "eXtensible"
      ► The "bold" element.
      ► A text node containing "Language"

    • The "bold" element contains a single child node:
      ► A text node containing "Markup"

    An empty element can be represented using either of the following syntaxes:
    <emptyTag />
    <anotherEmptyTag></anotherEmptyTag>
    Note that that self-closing tags (like <emptyTag />) are preferred for empty elements as they're more concise and clearly indicate the element has no content.
  2. XML Rules
    • An XML document can have only one root element.

    • Each opening tag must have a corresponding closing tag.

    • XML is case-sensitive: the opening tag name must exactly match the closing tag name.

    • Tags must not overlap; elements must be properly nested.

    Here's an example of proper nesting nesting:
    <book><title>XML</title></book>
    Here's an example of Incorrect nesting (overlapping tags):
    <book><title>XML</book></title>
  3. Tag Naming Rules
    • A tag name must start with a letter; it cannot begin with a number or punctuation character.

    • After the first character, digits, hyphens (-), and periods (.) are allowed.

    • A tag name cannot contain the colon character ":" (this is reserved for declaring or referencing a namespace).

    • A tag name cannot contain spaces.

    • There cannot be a space between the opening "<" character and the tag name (e.g., < book> is invalid).
      However, spaces are allowed between the tag name and the closing ">" character (e.g., <book >).
      But note: added spaces are not part of the tag name (in the previous example, the tag name is "book").

    Other naming rules:
    • Tag names starting with "xml" (in any case combination) are reserved for XML specifications.
    • Underscore (_) characters are also allowed in tag names.
    • Tag names are case-sensitive, so <Book> and <book> are different elements.

    Examples of valid tag names:
    • The opening tag name can be followed by spaces:
      <character >A</character>

    • The closing tag name can be followed by spaces:
      <character>A</character >

    • Both opening and closing tags can be followed by spaces:
      <character >A</character       >

    • Tag names can be followed by spaces or line breaks:
      <character
      >A</character>

    • Tag names can contain digits, hyphens (-), and periods (.):
      <character.1-code />

    Examples of invalid tag names:
    • The opening tag name cannot start with a space:
      < character>A</character>

    • The closing tag name cannot start with a space:
      <character>A< /character>
      <character>A</ character>

    • A tag name cannot start with a digit:
      <1character />

    • A tag name cannot contain spaces:
      <character code />

    • The "=" character is not allowed in a tag name:
      <character=code />
  4. Attributes
    An element can have attributes.
    Attributes are specified in the element's opening tag.
    An attribute has the following format: ATTRIBUTE_NAME="ATTRIBUTE_VALUE"

    An attribute must have a value, and that value must be enclosed in either double (") or single (') quotes.
    It is possible for an attribute's value to be an empty string.
    <file type="text" encoding="UTF-8" extension="" />
    Tag naming rules also apply to attribute names.
    Other attribute rules:
    • Attribute names must be unique within an element (no duplicate attributes).
    • Attribute values are always treated as strings, even if they contain numbers.
    • The order of attributes doesn't matter in XML.
    • Whitespace around the "=" sign is allowed (type = "text" is valid).

    Examples of invalid attributes:
    • The "type" attribute has no value:
      <file type>

    • The value of the "type" attribute is not enclosed in quotes:
      <file type=text>

    • The "type" attribute value must be enclosed in either single or double quotes, but not both:
      <file type="text" />
  5. Comments
    Comments are placed between "<!--" and "-->"
    <!-- comment -->
    Comments can span multiple lines:
    <!-- This is a
        multi-line
        comment -->
    Examples of invalid comments:
    • A comment cannot be placed inside a tag:
      <file <!-- comment --> />

    • A comment cannot contain the sequence "--":
      <!-- comment -- more comment -->
  6. XML Declaration
    Example of an XML declaration:
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    Notes:
    • The XML declaration is optional.

    • The XML declaration must be the first line of the XML document.

    • The XML declaration begins with "<?xml" and ends with "?>".

    • The "version" attribute is required; the "encoding" and "standalone" attributes are optional.

    • The attributes "version", "encoding", and "standalone" must appear in that order.

    • The "version" attribute must have the value 1.0 or 1.1.

    Note that XML 1.1 is rarely used in practice and has limited support and adoption. XML 1.0 is the overwhelmingly dominant version.

    If the "standalone" attribute is included in the XML declaration, it must have the value "yes" or "no":
    • "yes": indicates the document does not depend on any external file.

    • "no": indicates the document may rely on an external DTD.

    Common encoding values include:
    • UTF-8 - Most common, supports all Unicode characters
    • UTF-16 - Unicode encoding using 16-bit units
    • ISO-8859-1 - Latin-1 character set
    • US-ASCII - Basic ASCII characters
  7. Processing Instructions
    An XML document may contain specific instructions used by applications that process the XML document.
    Processing instructions are not part of the document content and are ignored by applications that do not use them.
    Processing instructions start with "<?" and end with "?>".

    Note that the syntax of processing instructions is similar to the XML declaration.

    Examples of a processing instruction:
    <?xml-stylesheet type="text/xsl" href="style.xsl"?>
    <?php echo "Hello World"; ?>
    <?python import sys; print("XML processing") ?>
    Note: The target name (first word after <?) cannot be "xml" (in any case combination), as this is reserved.
  8. Escape Characters
    Some characters are reserved in XML and must be escaped if they are to be used as element content (text):
    • &amp; represents the character &

    • &lt; represents the character <

    • &gt; represents the character >

    • &apos; represents the character "

    • &quot; represents the character "

    Escaping priority:
    • & and < must always be escaped in element content
    • > should be escaped for consistency, though it's only required in the sequence ]]>
    • ' and " need escaping only when they appear in attribute values using the same quote type

    You can also use the "CDATA" keyword to escape characters.
    Everything between "<![CDATA[" and "]]>" is ignored by the parser and treated as raw text.
    <title><![CDATA[Extensible <Markup> Language]]></title>
    CDATA sections are particularly useful for:
    • Including code snippets with special characters
    • Embedding HTML within XML
    • Large blocks of text with many characters that would otherwise need escaping

    Note: CDATA sections cannot be nested, and they cannot contain the sequence ]]>.

    It is also possible to escape characters using the following two methods:
    • &#nnn; where "nnn" is the decimal code of the character.
      Example: &#169; represents the character ©

    • &#xhhhh; where "hhhh" is the hexadecimal code of the character.
      Example: &#x00A9; represents the character ©
  9. Well-Formed vs Valid XML Documents
    A well-formed XML document follows the basic XML syntax rules (as seen above):
    • Has a single root element.
    • All elements are properly nested.
    • All opening tags have corresponding closing tags.
    • Attribute values are enclosed in quotes.
    • XML declaration is properly formatted (if present).

    A valid XML document is not only well-formed but also conforms to the structure defined in a DTD or XML Schema. A valid XML document must:
    • Be well-formed.
    • Contain only elements and attributes declared in the DTD or XML Schema.
    • Follow the element structure and cardinality rules defined in the DTD or XML Schema.
    • Have all required attributes present.
    • Use only allowed attribute values.

    The following XML document is well-formed but not valid against the book DTD:
    <?xml version="1.0"?>
    <!DOCTYPE book [
        <!ELEMENT book (title, isbn)>
        <!ELEMENT title (#PCDATA)>
        <!ELEMENT isbn (#PCDATA)>
    ]>
    
    <book>
        <title>XML</title>
        <author>MTI TEK</author> <!-- ERROR: 'author' not declared in DTD -->
        <isbn>123</isbn>
    </book>
    Note: XML parsers can validate documents against DTDs to ensure they are both well-formed and valid. Most XML processing applications require documents to be at least well-formed, while some also require validity.
© 2025 mtitek