MTI TEK
  • Home
  • About
  • LLMs
  • Docker
  • Kubernetes
  • Java
  • All Resources
XML | DTD (Document Type Definition)
  1. Introduction
  2. DTD Declaration: System Identifier
  3. DTD Declaration: Public Identifier
  4. DTD Structure: Element Declarations
    1. Declaring an empty element ("EMPTY")
    2. Declaring an element that contains only text
    3. Declaring an element that contains other elements
    4. Declaring a parent element that contains a combination of child elements and text
    5. Declaring a parent element that contains any combination of elements and text
  5. DTD Structure: Element Cardinality
  6. DTD Structure: Attribute Declarations

  1. Introduction
    A DTD defines the syntactic and semantic structure that XML documents referencing the DTD must follow in order to be valid.

    For example, the following XML document:
    <?xml version="1.0"?>
    <book>
        <title>XML</title>
        <isbn>123</isbn>
    </book>
    Can be validated against the following DTD:
    <!ELEMENT book (title, isbn)>
    <!ELEMENT title (#PCDATA)>
    <!ELEMENT isbn (#PCDATA)>
    To declare that the XML document must conform to a DTD, the following syntax is used:
    <!DOCTYPE ... >
    The DTD declaration must begin with "<!DOCTYPE", followed by a space, then the name of the root element of the XML document (example: book).

    Note: No space is allowed within the keyword "<!DOCTYPE".

    There are several ways to declare the DTD to which the XML document must adhere. The simplest way is to specify the DTD directly within the declaration by placing the content between the characters "[" and "]".
    <?xml version="1.0"?>
    
    <!DOCTYPE book [
        <!ELEMENT book (title, isbn)>
        <!ELEMENT title (#PCDATA)>
        <!ELEMENT isbn (#PCDATA)>
    ]>
    
    <book>
        <title>XML</title>
        <isbn>123</isbn>
    </book>
    It is also possible to reference a DTD using a System identifier or a Public identifier.
  2. DTD Declaration: System Identifier
    A system identifier is used to specify an external file that contains the DTD content.

    The system identifier is defined using the keyword "SYSTEM" followed by the location of the external file.

    The location of the DTD can be relative to the XML document (example: "book.dtd"), or absolute by specifying a full path to a local resource (hard drive: "file:///C:/dtd/book.dtd") or remote resource (intranet or internet: "http://www.mtitek.com/dtd/book.dtd").

    Here are a few examples of DTD declarations pointing to an external file named "book.dtd":
    <!DOCTYPE book SYSTEM "book.dtd">
    <!DOCTYPE book SYSTEM "file:///C:/dtd/book.dtd">
    <!DOCTYPE book SYSTEM "http://www.mtitek.com/dtd/book.dtd">
    Here is the content of the "book.dtd" file:
    <!ELEMENT book (title, isbn)>
    <!ELEMENT title (#PCDATA)>
    <!ELEMENT isbn (#PCDATA)>
  3. DTD Declaration: Public Identifier
    A public identifier is used to specify an external file that contains the DTD content.

    The public identifier is defined using the keyword "PUBLIC" followed by an identifier that helps locate the DTD resource in a public catalog.

    Here is an example of a DTD declaration using a public identifier:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN">
    With a public identifier, it is also possible to specify a system identifier, which will be used in case the public identifier cannot be resolved:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    Note: In this case, it's not mandatory to include the keyword "SYSTEM".

    You can also use internal declarations along with a public identifier:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" [
        <!-- ... -->
    ]>
  4. DTD Structure: Element Declarations
    An XML document mainly consists of elements. These elements must be declared in the DTD for the XML document to be valid.
    <!ELEMENT book (title, isbn, price)>
    An element declaration must start with "<!ELEMENT", followed by a space, then the name of the element (e.g., book), and then the element's content (placed in parentheses).

    The content of an element defines what content is allowed inside the element.

    An element can be empty ("EMPTY"), contain text ("PCDATA"), contain other elements, contain a mix of elements and text, or allow any content ("ANY").
    1. Declaring an Empty Element ("EMPTY")
      To declare an empty element, use the keyword "EMPTY".
      <!ELEMENT price EMPTY>
      In this example, the "price" element must be empty in the XML document.
      <price></price>
      <price />
    2. Declaring an Element That Contains Only Text
      To declare an element that contains only text, use the keyword "PCDATA" (Parsed Character Data).
      <!ELEMENT title (#PCDATA)>
      In this example, the "title" element can contain only text (this includes an empty string or even a self-closing element).
      <title>XML</title>
      <title></title>
      <title />
    3. Declaring an Element That Contains Other Elements
      • To declare a parent element that contains child elements, list the names of those elements separated by commas. The element names must be enclosed in parentheses.
        <!ELEMENT book (title, isbn)>
        For example, the "<book>" element is allowed to contain only the following child elements: "title", "isbn".

        An element declaration must be unique in a DTD, i.e., there cannot be multiple declarations for the same element name. However, the same child element can appear inside different parent elements; in this case, the same declaration for the child element can be reused across multiple parent declarations.
        <!ELEMENT book (title, author)>
        <!ELEMENT title (#PCDATA)>
        <!ELEMENT author (title, name)> <!-- OK: Both "author" and "book" use the same declaration for "title" -->
        <!ELEMENT title (#PCDATA)> <!-- ERROR: The "title" element must only be declared once -->
        <!ELEMENT name (#PCDATA)>
        The order of child elements in the XML document must match the order listed in the parent element declaration in the DTD (child elements are separated by commas). In the example above, the "<book>" element must contain two child elements ("<title>", "<author>") in that exact order in the XML document.

        The XML document will fail validation if:
        • A required child element is missing.
        • A parent element contains a child element not specified in its declaration.
        • Child elements appear in a different order than declared.

      • You can use the "|" character (pipe symbol) to declare that a parent element may contain only one child from a list of possible child elements.
        <!ELEMENT book (isbn | nbn | issn)>
        In the example above, the "<book>" element must contain exactly one of the following: "<isbn>", "<nbn>", or "<issn>".

        Note: The pipe character "|" means "OR" - the element can contain one of the specified options, but not multiple options simultaneously.

      • It is also possible to combine a sequence of required elements with a choice of one optional element.
        <?xml version="1.0"?>
        
        <!DOCTYPE book [
            <!ELEMENT book (title, author, (isbn | nbn | issn))>
            <!ELEMENT title (#PCDATA)>
            <!ELEMENT author (#PCDATA)>
            <!ELEMENT isbn (#PCDATA)>
            <!ELEMENT nbn (#PCDATA)>
            <!ELEMENT issn (#PCDATA)>
        ]>
        
        <book>
            <title>XML</title>
            <author>MTI TEK</author>
            <nbn>123</nbn>
        </book>
        In the example above, the "<book>" element must contain two child elements ("<title>", "<author>") in that order in the XML document.
        Additionally, it must contain one of the following elements: "<isbn>", "<nbn>", or "<issn>", which must appear after the "<author>" element.
    4. Declaring a Parent Element That Contains a Mix of Child Elements and Text
      To declare a parent element that can contain a mix of child elements and text, list the child element names along with "PCDATA", separating them with the "|" character.
      <!ELEMENT author (#PCDATA | firstName | lastName)*>
      Note: The "*" character indicates that any combination of these elements and text is allowed.

      When declaring a parent element that mixes text and child elements, follow these rules:
      • All elements (including text) must be separated using the "|" character.
      • The "#PCDATA" must appear first in the list.
      • The content must be enclosed in parentheses, followed by the "*" character.
      • The declaration must not contain nested element groupings (no inner parentheses).
      <?xml version="1.0"?>
      
      <!DOCTYPE book [
          <!ELEMENT book (title, author)>
          <!ELEMENT title (#PCDATA)>
          <!ELEMENT author (#PCDATA | firstName | lastName)*>
          <!ELEMENT firstName (#PCDATA)>
          <!ELEMENT lastName (#PCDATA)>
      ]>
      
      <book>
          <title>XML</title>
          <author>
              1
              <firstName>MTI</firstName>
              <lastName>TEK</lastName>
              2
              <lastName>XYZ</lastName>
              <firstName>ABC</firstName>
          </author>
      </book>
      Note: This syntax provides no control over the order or repetition of child elements inside the parent.
    5. Declaring a Parent Element That Can Contain Any Combination of Elements and Text
      To declare a parent element that can contain any combination of child elements and text, use the keyword "ANY".
      <!ELEMENT book ANY>
      In this example, the "book" element can contain any combination of child elements and text, regardless of their order or how many times they appear inside the "book" element.
      <?xml version="1.0"?>
      
      <!DOCTYPE book [
          <!ELEMENT book ANY>
          <!ELEMENT title (#PCDATA)>
          <!ELEMENT author (#PCDATA | firstName | lastName)*>
          <!ELEMENT firstName (#PCDATA)>
          <!ELEMENT lastName (#PCDATA)>
      ]>
      
      <book>
          Book information...
          <author>
              1
              <firstName>MTI</firstName>
              <lastName>TEK</lastName>
          </author>
          <title>XML</title>
      </book>
      Note: The "book" element can only contain elements that are declared in the DTD.
  5. DTD Structure: Element Cardinality
    Cardinality defines how many times a child element can appear within a parent element.

    There are four possible options to specify the cardinality of a child element:
    • "?": indicates that the child element can appear at most once within the parent element.
      The child element cannot appear more than once within the parent element.

    • "+": indicates that the child element can appear one or more times within the parent element.
      The child element must appear at least once within the parent element.

    • "*": indicates that the element can appear zero or more times within the parent element.

    • If no cardinality option is specified, then the child element must appear exactly once within the parent element.
    <?xml version="1.0"?>
    
    <!DOCTYPE book [
        <!ELEMENT book (title, author*, price?, info+)>
        <!ELEMENT title (#PCDATA)>
        <!ELEMENT author (#PCDATA)>
        <!ELEMENT price (#PCDATA)>
        <!ELEMENT info (#PCDATA)>
    ]>
    
    <book>
        <title>XML</title>
        <author>MTI TEK</author>
        <author>ABC XYZ</author>
        <info>XML</info>
    </book>
    Note: Cardinality does not affect the order of elements; a child element must appear exactly in the order defined in the DTD declaration (regardless of how many times it is repeated).
  6. DTD Structure: Attribute Declarations
    An element can define attributes. These attributes must be declared in the DTD for the XML document to be valid.
    <!ATTLIST book isbn CDATA #IMPLIED>
    An attribute declaration must start with "<!ATTLIST", followed by a space, then the name of the element to which the attribute belongs (e.g., book), followed by a space, then the name of the attribute (e.g., isbn), followed by a space, then the attribute options (type, default value, etc.).

    Note: Multiple attributes can be declared in the same declaration.

    Here is a list of attribute types:
    • CDATA
      Indicates that the attribute value is a character string.

    • ID
      Indicates that the attribute value uniquely identifies the element it belongs to.

    • IDREF
      Indicates that the attribute value is a reference to another element (identified by its ID).

    • IDREFS
      Indicates that the attribute value is a space-separated list of IDREF values.

    • NMTOKEN
      Indicates that the attribute value is a valid name token.

    • NMTOKENS
      Indicates that the attribute value is a space-separated list of NMTOKEN values.

    • ENTITY
      Indicates that the attribute value is a reference to an external resource (image, text, etc.).

    • ENTITIES
      Indicates that the attribute value is a space-separated list of ENTITY values.

    • Enumeration: (value1 | value2 | value3)
      Indicates that the attribute value must be one of the listed values (separated by the "|" character).
      <!ATTLIST book version (1 | 2 | 3 | 4) "1">
      In the example above, the declaration for the "<book>" element specifies that it may have an attribute "version" that can take one of the values "1", "2", "3", or "4". The default value of the "version" attribute is "1".

      The attribute can be omitted, but it must not be assigned a value other than those listed in the declaration.

      Note: Notice that the pipe character "|" is used to separate the enumeration values, similar to element choice declarations.

    You can also add the following options to the attribute declaration:
    • #FIXED
      Indicates that the attribute has a fixed value.
      <!ATTLIST book version CDATA #FIXED "1">
      The attribute can be omitted, but it must not be given a value different from the fixed value.

    • #REQUIRED
      Indicates that the attribute is required.
      <!ATTLIST book version CDATA #REQUIRED>

    • #IMPLIED
      Indicates that the attribute is optional.
      <!ATTLIST book version CDATA #IMPLIED>
© 2025 mtitek