The document type declaration
Why a document type declaration?
Because there is a variety of HTML versions, the browser needs to be able to
determine what it is dealing with. This goes from such trivial things like the
known HTML elements to the question how they are to be displayed. Since
elements and attributes have been added and removed by and by the browser would
have to guess which version of HTML it is dealing with – woe if it recognizes
something wrong! The browser is inevitably going to display the page the wrong
way.
On top of that many older pages have been created at a time when the various
browser vendors attempted to force their own concept into place or the standard
has been inaccurately implemented. In this case it is necessary to switch back
to the faulty mode to display these old pages correctly, especially since it
isn't ncessarily possible to adapt said pages to the new standards. Especially
archived pages prove to be particularly problematic, because it is questionable
whether or not the original author would be able to revise these pages in the
first place.
The other advantage that results is that a document thus tagged may be validated, i. e. that its structure corresponds to the rules set by the document type declaration. This way any possible mistakes may easily be detected and eliminated. This isn't possible when the document type declaration is missing, because no information at all is available for validating the document. In this case it must be assumed that the document is defective, and the browser switches into Quirks mode.
to the topCouldn't I just omit it?
No, because the browser wouldn't get any information on how to trat the
document. Since a document type declaration has been omitted especially in
older documents, the browser has to assume that the document has been built
according to an older standard and therefore switches to the so-called
Quirks mode in which it applies the incorrect
box model
implemented in older browsers which differs from that of the
Standards mode.
This can be demonstrated the easiest way by creating two exactly identical
documents that only differ in the presence or absence of the document type
declaration. That which is presented acceptably in one case normally looks like
a motley collection of text fragments. Depending on the mode which the document
has been created for, there are in part considerable problems with rendering it
in the other mode.
Standards mode
In this mode a browser is going to display a document exactly as stipulated by
the W3C for browsers that
support the CSS standard. The Standards mode is supposed to arrange for web
pages to be displayed identically in all browsers. Unfortunately this still
isn't entirely given so that some browsers, even though they switch to
Standards mode, still display the pages faultily even though this is merely
peanuts. Generally speaking it is safe to say that the modern browsers cut a
fine figure in this matter.
In order to be compatible with future developments it is recommended to use a
document type that enables Standards mode.
Almost Standards mode
With one single exception this mode works like Standards mode. Only inline
without a border as well as an inner and outer margin that also don't have any
content or are filled only with whitespace characters are rendered as if the
browser had switched into Quirks mode: The height of such elements is simply
ignored.
Here problems arose especially with fragmented images whose parts have been
placed in tables, e. g. to make certain parts of an image a link, etc. Whereas
this fragmented image appears to be contiguous while in Quirks mode although it
is distributed across many table cells, gaps would appear all the sudden when
switching to Standards mode: The image that appeared as one piece is blown
apart and looks ugly.
To protect older layouts from this kind of disaster the Almost Standards mode
has been conceived in which everything is rendered according to the W3C
standard, except for empty inline elements without any kind of information on
the margin. However, this is bound to particular document type declarations
that switch the browser to this mode.
Any browser that doesn't support this mode will switch to Full
Standards mode instead.
Quirks mode
This mode comes into effect for old documents that have been created while the so-called browser wars between Internet Explorer and Netscap Communicator have been in full swing. By then many new elements had been introduced that either hadn't been part of the standards published by the W3C, or if so, were rendered differently from it – which very soon led to many web sites that were displayed correctly by only one particular browser and not by others. Of course this rank growth led to a series of problems with later browsers that abode more strictly by the standards, because the old pages weren't displayed correctly at all in Standards mode.
In order to deal with exactly this problem, the Quirks mode had been introduced
that should allow for old documents to be displayed correctly.
In this case the browser interprets the CSS declarations for the various
elements differently from the standard and so ensures that even pages created
with erroneous format information are displayed reasonably well.
The structure of a document type declaration
The document type declarationis generally placed at the beginning of an HTML document and is constructed according to a fixed scheme. This normally includes the version of the HTML being used and the associated definitions, a document type declaration for HTML 4.01 Strict:
When having a look at the document type declaration there are several fields that can be distinguished, all of which have a specific meaning.
- 1. The introducing marker
- 2. The name of the root element of the HTML document
- 3. The specification PUBLIC or SYSTEM
- 4. The FPI
- 5. The URI of the DTD
Here the introducing marker and the name of the root element of the document
are required to be present so that the browser is able to do anything with it.
If it's only these two elements that are present in the document typ
declaration, the browser switches to Standards mode, because it assumes that
the document is created according to the latest specifications. This enables
an older browser to correctly display hitherto unknown versions of HTML, that
is, with the means that have been implemented into it. On top of that HTML5 is
marked this way si9nce no further information is necessary to describe this
variant of HTML.
The third field indicates what information about the version of HTML in use is
transmitted.
If the indication is PUBLIC, it is mandatory to
specify the FPI which may be split into four fields as well that are separated
by double slashes. The first of these fields indicates whether the owner of the
document type declaration is registered (when set to a +) or not (when set to a
-). The second field of the FPI specifies the owner (here it's the World Wide
Web Consortium) and the third the DTD being used as cleartext. The fourth field
finally specifies the language being used in
the DTD.
However, when this field is set to SYSTEM, the FPI is
inapplicable and the URI of the DTD must be provided so that the browser can
retrieve the necessary information.
The URI of the DTD finally serves to enable the browser to download the DTD
when necessary in order to update its definitions in the case of changes. The
elements specified here are therefore available under the given address.
If necessary you have the option to download these definitions and make them
available on your server. You then would only have to adjust the links so that
they point to the copies that you provide.
This field may be omitted when a PUBLIC DTD is
specified, except when XHTML 1.1 is being used, which mandates providing the
URI to the DTD.
Known document type declarations
Depending on what you want to achieve with your HTML document, there are various versions of HTML at your disposal. To specify which version of HTML you are using, you are required to set the appropriate document type declaraion in your document. You may choose from the following document types:
Version of HTML | Document type declaration | Rendering mode |
---|---|---|
HTML5 / XHTML 5 ¹) | <!DOCTYPE html> | Standards mode |
XHTML 1.1 ²) | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> |
Standards mode |
XHTML Basic 1.1 ²) | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.1//EN" "http://www.w3.org/TR/xhtml-basic/xhtml-basic11.dtd"> |
Standards mode |
XHTML 1.0 Strict | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
Standards mode |
XHTML 1.0 Transitional | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
Almost Standards mode |
XHTML 1.0 Frameset | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"> |
Almost Standards mode |
HTML 4.01 Strict | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> |
Standards mode |
HTML 4.01 Transitional | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
Almost Standards mode |
HTML 4.01 Frameset | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html14/frameset.dtd"> |
Almost Standards mode |
There are furthermore other document types that are still valid but shouldn't be used any more due to various problems. If at all, they can be found on old web pages.
Version of HTML | Document type declaration | Rendering mode |
---|---|---|
XHTML Basic 1.0 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN" "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd"> |
Standards mode |
HTML 4.0 Strict | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> |
Standards mode |
HTML 4.0 Transitional | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> |
Almost Standards mode |
HTML 4.0 Frameset | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Frameset//EN" "http://www.w3.org/TR/REC-html140/frameset.dtd"> |
Almost Standards mode |
HTML 3.2 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> | Quirks mode |
HTML 3.0 | <!DOCTYPE html PUBLIC "-//IETF//DTD HTML 3.0//EN"> | Quirks mode |
HTML 2.0 | <!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN"> | Quirks mode |
There is no document type declaration for HTML 1.0, especially since this version isn't explicitly supported any more, plus that explicitly specifying the document type hadn't been necessary by then, either. However, should there contrary to all expectation be a document around that has been authored in HTML 1.0, the browser switches to Quirks mode due to the missing document type declaration to properly render it.
Of HTML versions that provide more than one variant (here it's HTML 4.0, HTML 4.01, and XHTML 1.0) the variant strict is generally to be preferred, because they provide a more stringent distinction between the logical breakdown and the presentation of a document whereas this isn't the case with transitional and frameset. These two only exists for compatibility reasons or have been introduced to permit logical partitioning of documents. Nevertheless, these variants shouldn't be used any more.
to the topXHTML
This is a derivative of HTML, but in contrast to HTML it isn't based on SGML,
but instead on XML whereas the latter may be considered to be a subset of SGML.
However, XML didn't gain importance just for XHTML, but for a variety of file
formats. Other derivatives of XML are SVG, for example, which is used to
describe graphics, WML which has been used as a standard for the first browsers
that could be found on mobile phones or even SMIL that is used for generating
scripts for presentations.
The bottom line is insofar that XHTML looks exactly like HTML, however, due to
the peculiarities of XML, there are some differences that you need to know when
you intend to write a document by means of valid XHTML.
Anyway, the main difference consists in the XML prologue that has to introduce any document that is derived from XML and which looks like this:
It must furthermore be taken into account that the document is transferred with
the correct MIME type so that the browser switches to the XML parser instead of
using the conventional SGML-based HTML parser. This means that the web server
must be configured so that it announces
application/xhtml+xml instead of
text/html when transmitting XHTML documents.
Especially older browsers can be a problem, because they don't understand the
MIME type and therefore offer the transmitted file for download instead of
displaying it. This means that the server must be configured so that it checks
whether the requestor accepts application/xhtml+xml,
and if no, transmits the XHTML document with a MIME type of
text/html. In this manner the document can be
displayed on older browsers as well.