Simple Types

Both element and attribute declarations can use simple types to describe the data content of the components. This article introduces simple types, and explains how to define your own atomic simple types for use in your schemas.

Simple Type Varieties

There are three varieties of simple type: atomic types, list types, and union types.

  • Automatic types have values that are indivisible, such as 10 and large.
  • List types have values that are whitespace-separated lists of atomic values, such as <availableSizes>10 large 2</availableSizes>.
  • Union types may have values that are either atomic values or list values. What differentiates them is that the set of valid values, or "value space," for the type is the union of the value spaces of two or more other simple types. For example, to represent a dress size, you may define a union type that allows a value to be either an integer from 2 through 18, or one of the string values small, medium, or large.

Design Hint: How Much Should I Break Down My Data Values?

Data values should be broken down to the most atomic level possible. This allows them to be processed in a variety of ways for different uses, such as display, mathematical operations, and validation. It is much easier to concatenate two data values back together than it is to split them apart. In addition, more granular data is much easier to validate.

It is a fairly common practice to put a data value and its units in the same element, for example <length>3cm</length>. However, the preferred approach is to have a separate data value, preferably an attribute, for the units, for example <length units="cm">3</length>.

Using a single concatenated value is limiting because

  • It is extremely cumbersome to validate. You have to apply a complicated pattern that would need to change every time a unit type is added.
  • You cannot perform comparisons, conversions, or mathematical operations on the data without splitting it apart.
  • If you want to display the data item differently (for example, as "3 centimeters" or "3 cm" or just "3", you have to split it apart. This complicates the stylesheets and applications that process the instance document.

It is possible to go too far, though. For example, you may break a date down as follows:

<orderDate>

<year>2001</year>

<month>06</month>

<day>15</day>

</orderDate>

This is probably an overkill unless you have a special need to process these items separately.

Simple Type Definitions

Named Simple Types

Simple types can be either named or anonymous. Named simple types are always defined globally (i.e., their parent is always schema or redefine) and are required to have a name that is unique among the data types (both simple and complex) in the schema. The XSDL syntax for a named simple type definition is shown in Table 1.

The name of a simple type must be an XML non-colonized name, which means that it must start with a letter or underscore, and may only contain letters, digits, underscores, hyphens, and periods. You cannot include a namespace prefix when defining the type; it takes its namespace from the target namespace of the schema document.

All of the examples of named types in this book have the word "Type" at the end of their names, to clearly distinguish them from element-type names and attribute names. However, this is not a requirement; you may in fact have a data type definition and an element declaration using the same name.