Summary of Core XHTML
Peter Coxhead
Contents
Structure
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<!-- head elements go here -->
</head>
<body>
<!-- body elements go here -->
</body>
</html>
head elements
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Other charset values include ISO-8859-1. Useful documentation, but the content type is also indicated by the HTTP header sent by the server and by the file extension; these usually take priority.<meta name="Author" content="..." />
<meta name="Description" content="..." />
<meta name="KeyWords" content="..." />
Useful as documentation; deliberate mis-information in commercial pages in order to increase page rankings means that search engines now pay little if any attention to these meta elements.
title defines the text shown in the title bar of the window.
- <title>...</title>
link specifies the location of an external CSS stylesheet.
- <link rel="stylesheet" type="text/css" href="..." />
style defines styles within the document. Individual elements can be styled via a style attribute. See link above to specify an external stylesheet.
<style type="text/css"> .content { margin-left: 12pt; } span.warning { color: red; } </style>
script may be used as a head or a body element. Defines JavaScript, either within the document or as an external file.
<script type="text/javascript"> function doPress() { alert('Button was pressed.'); } </script>- <script type="text/javascript" src="..."></script>
For compatibility, use an explicit closing tag as above; src gives the URL of the source file.
noscript may be used as a head or a body element. Browsers which have JavaScript support disabled or do not support JavaScript process HTML placed in a noscript element; otherwise they ignore it.
body elements
Block vs. inline elements
An important distinction is between block elements and inline elements. By default, some HTML elements create separate 'blocks' when laid out in a page. For example p elements create distinct paragraphs, each starting on a new line. Other HTML elements are 'inline', which basically means that they are laid out in the same way as text. For example, a strong element inside a paragraph (p) doesn't affect the position of its contents; it just changes their appearance.
(Styles could in principle be used to over-ride such default behaviour, but this would be very confusing to anyone reading the resulting HTML.)
Lists and tables are effectively special kinds of blocks, where the 'basic' blocks are list items (li) and table cells (td). However, these have to be enclosed in outer elements (such as ul for unordered lists or tr for a table row) to be displayed properly.
Simple Blocks
h1 to h6 are predefined block elements which should be used to create paragraphs to serve as headings and subheadings. Start with a single h1; keep them in a sensible order in the document.
<h1>...</h1>
<h3>...</h3>
div defines a block in the document, that is a section which by default starts on a new line and forces a new line afterwards. It is often used so that a different style may be applied to a section of the page.
<div class="..."> ... </div>
p is a pre-defined block element, corresponding to a 'paragraph'. Most browsers will put extra space before and after a p element compared to a div. Unlike div elements, which can be nested, paragraphs should not contain other block elements in XHTML.
<p> ... </p>
pre is a pre-defined block element. Unlike other HTML elements, it retains white space from the source HTML file. It uses a fixed width font. It is thus useful for laying out code with the correct indentation.
<pre> public int foo() { var a = 1; return a; } </pre>
br can be used within blocks of any kind to force a newline.
- <br />
As with all elements used without an explicit closing tag, for compatibility put a space before the /.
Lists
ul defines an unordered list with bullets.
<ul> <li>...</li> <li>...</li> </ul>
ol defines an ordered list with numbers.
<ol> <li>...</li> <li>...</li> </ol>
Tables
Simple tables have rows defined by tr, cells defined by td. The summary attribute should give a brief text description of the table for users who rely on spoken text access to web pages.
<table summary="..."> <tr><td>...</td><td>...</td></tr> <tr><td>...</td><td>...</td></tr> </table>
Note that only elements which define 'table components' can appear inside table or tr elements; 'general markup' can only appear inside table cells. Thus the following is not valid:
<table> <script type="text/javascript">...</script> ... <script> </table>
Text Styles
span defines an inline piece of text (i.e. a 'span'), usually so that different text styling can be applied.
- It is <span class="...">extremely important</span> to close tags in XHTML.
em, strong and code are three of a number of pre-defined text styles, respectively for emphasis, strong emphasis and for a fixed width font for code. These elements should probably be avoided now in favour of a styled span.
Other Inline Elements
a defines a hyperlink OR an anchor point in the document to which a link can be made.
- <a href="...">...</a> Defines a hypertext link.
- <a id="..." /> Defines an anchor point within the document; this format is badly supported by browsers at present.
- <a id="..." name="..."></a> This format for an anchor point is safer; repeat the id attribute as a name with the same value.
img defines the source file of an image to be shown on the page and also sets its size (in pixels). Browsers will normally use the pixel size of the source file if width and height are omitted or scale the image to the size given if different from the source file. The alt attribute should present a alternative brief text description of the image for users who rely on non-visual access to web pages.
- <img src="..." width="..." height="..." alt="..." />
User Interaction
input defines a displayed inline item which is used to interact with the user. An important use of input is in form elements, not covered here.
The type attribute can take a number of values to specify the kind of input item. Examples are button, text, checkbox, radio.
To respond to user interaction, either explicit JavaScript event handlers must be provided in attributes such as onclick, or the input element must be part of a form.
Adding the attribute disabled="disabled" will disable the input.
<input type="button" value="..." onclick="..." />
The value attribute sets the text which is displayed on the button.Example:
<input type="image" src="..." onclick="..." />
The src attribute gives the URL of the image which is used as a button.Example:
<input type="text" size="..." value="..." />
The size attribute defines the size of the text box (typically as the number of digit characters which will fit in the box). The value attribute defines the text which is initially displayed in the box.
The string entered in the box can be accessed in JavaScript as the value field of the corresponding JavaScript object.Example (enter a number in the box, then press tab or click elsewhere in the page to trigger the onChange event):
<input type="checkbox" checked="checked" />
The checked attribute defines whether the checkbox is initially checked or not; omit for unchecked.
Whether the box is checked or not can be accessed in JavaScript through the boolean checked field of the corresponding JavaScript object.Example (the onChange event is used to report the state of the box):
<input type="radio" name="..." checked="checked" />
The name attribute defines a set of radio buttons: at any one time, only one of those with the same name will be checked (on). The checked attribute defines whether the radio button is initially checked (on) or not; omit for unchecked (off).
Whether a particular radio box is checked or not (on or off) can be accessed in JavaScript through the boolean checked field of the corresponding JavaScript object.Example:
Happy
Neither
Unhappy
select defines an inline drop down menu list; it's a particular kind of input item. When an option is chosen, its value attribute becomes the string value of the select item as a whole, accessed in JavaScript through the value field of the select object. Typically the value of an option is set to a shortened version of the text shown in the menu.
<select id="sel1"> <option value="...">...</option> <option value="...">...</option> <option value="..." selected="selected">...</option> </select>
Example (the onChange event is used to report the selected item):
Validity
XHTML files can be validated via the W3C Markup Validation Service, by either inputting a URL or uploading a file. Do validate your XHTML!
However, there are at present difficulties with including JavaScript in valid XHMTL web pages (external files are fine).
Well-formed and valid XHTML documents can be presented to a browser (user agent) in several modes, e.g. as HTML, as XHTML or as pure XML. The file extension is normally the main trigger; for example, files whose names end in ".html" will usually be treated as HTML and files whose names end in ".xml" or ".xhtml" as XHTML/XML.
In HTML as opposed to XHTML, the contents of the script element are treated as not containing markup; the characters &, < and > must NOT be represented as entities, since sequences such as > will not be expanded to >. Thus the following is correct in HTML:
<script type="text/javascript"> if (!flag1 && x > 3) div1.innerHTML = '<p>A paragraph.</p>'); </script>For this to be acceptable, the file must be served to the browser as HTML, e.g. via the extension ".html".
The XHTML 1.0 standard specifies that the contents of the script element are treated as XML 'character data'. the characters &, < and > MUST be represented as entities or else they will be treated as markup; sequences such as > will be expanded to >. So the following is correct XHTML:
<script type="text/javascript"> if (!flag1 && x > 3) div1.innerHTML = '<p>A paragraph.</p>'; </script>Alternatively, in an XHTML document, JavaScript containing &, < or > can be enclosed in a 'CDATA section' which tells the browser to ignore any markup. So the following is also correct XHTML:
<script type="text/javascript"> <![CDATA[ if (!flag1 && x > 3) div1.innerHTML = '<p>A paragraph.</p>'); ]]> </script>For either of these to be acceptable, the file must be served to the browser as XHTML/XML, e.g. via the extension ".xml" or ".xhtml".
The problem at present is that some of the major browsers, particularly Internet Explorer, do not correctly process XHTML served as XHTML/XML. So for compatibility, it is best use the extension ".html" or ".htm", which normally forces an XHTML page to be served as HTML. This means that JavaScript should be written as in the example in the first bullet point above.
However, if the DOCTYPE and XML name space declaration identify the document as XHTML, the W3C Markup Validation Service will then report rightly validation errors if the JavaScript contains & or uses < or > in ways which look like tags.
The best solution is to put all JavaScript into external files, with at most function calls embedded into the XHTML. Alternatively, accept that some JavaScript will cause validation errors until browsers catch up.