">
'">
amp,
lt
,
gt
,
apos
,
quot
">
]>
Extensible Markup Language (XML) 1.0
REC-xml-&iso6.doc.date;
W3C Recommendation
&draft.day;&draft.month;&draft.year;
http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;
http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.xml
http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.html
http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.pdf
http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.ps
httwww.w3.org/TR/REC-xml
http://www.w3.org/TR/PR-xml-971208
Tim Bray
Textuality and Netscape
tbray@textuality.com
Jean Paoli
Microsoft
jeanpa@microsoft.com
C. M. Sperberg-McQueen
University of Illinois at Chicago
cmsmcq@uic.edu
The Extensible Markup Language (XML) is a subset of
SGML that is completely described in this document. Its goal is to
enable generic SGML to be served, received, and processed on the Web
in the way that is now possible with HTML. XML has been designed for
ease of implementation and for interoperability with both SGML and
HTML.
This document has been reviewed by W3C Members and
other interested parties and has been endorsed by the
Director as a W3C Recommendation. It is a stable
document and may be used as reference material or cited
as a normative reference from another document. W3C's
role in making the Recommendation is to draw attention
to the spPcification and to promote its widespread
deployment. This enhances the functionality and
interoperability of the Web.
This document specifies a syntax created by subsetting an existing,
widely used international text processing standard (Standard
Generalized Markup Language, ISO 8879:1986(E) as amended and
corrected) for use on the World Wide Web. It is a product of the W3C
XML Activity, details of which can be found at http://www.w3.org/XML. A list of
current W3C Recommendations and other technical documents can be found
at http://www.w3.org/TR.
This specification uses the term URI, which is defined by , a work in progress expected to update and .
The list of known errors in this specification is
available at
http://www.w3.org/XML/xml-19980210-errata.
Please report errors in this document to
xml-editor@w3.org.
Chicago, Vancouver, Mountain View, et al.:
World-Wide Web Consortium, XML Working Group, 1996, 1997.
Created in electronic form.
English
Extended Backus-Naur Form (formal grammar)
1997-12-03 : CMSMcQ : yet further changes
1997-12-02 : TB : further changes (see TB to XML WG,
2 December 1997)
1997-12-02 : CMSMcQ : deal with as many corrections and
comments from the proofreaders as possible:
entify hard-coded document date in pubdate element,
change expansion of entity WebSGML,
update status description as per Dan Connolly (am not sure
about refernece to Berners-Lee et al.),
add 'The' to abstract as per WG decision,
move Relationship to Existing Standards to back matter and
combine with References,
re-order back matter so normative appendices come first,
re-tag back matter so informative appendices are tagged informdiv1,
remove XXX XXX from list of 'normative' specs in prose,
move some references from Other References to Normative References,
add RFC 1738, 1808, and 2141 to Other References (they are not
normative since we do not require the processor to enforce any
rules based on them),
add reference to 'Fielding draft' (Berners-Lee et al.),
move notation section to end of body,
drop URIchar non-terminal and use SkipLit instead,
lose stray reference to defunct nonterminal 'markupdecls',
move reference to Aho et al. into appendix (Tim's right),
add prose note saying that hash marks and fragment identifiers are
NOT part of the URI formally speaking, and are NOT legal in
system identifiers (processor 'may' signal an error).
Work through:
Tim Bray reacting to James Clark,
Tim Bray on his own,
Eve Maler,
NOT DONE YET:
change binary / text to unparsed / parsed.
handle James's suggestion about < in attriubte values
uppercase hex characters,
namechar list,
1997-12-01 : JB : add some column-width parameters
1997-12-01 : CMSMcQ : begin round of changes to incorporate
recent WG decisions and other corrections:
binding sources of character encoding info (27 Aug / 3 Sept),
correct wording of Faust quotation (restore dropped line),
drop SDD from EncodingDecl,
change text at version number 1.0,
drop misleading (wrong!) sentence about ignorables and extenders,
modify definxamples with Byte Order Mark.
Add content model as a term and clarify that it applies to both
mixed and element content.
1997-06-30 : CMSMcQ : change date, some cosmetic changes,
changes to productions for choice, seq, Mixed, NotationType,
Enumeration. Follow James Clark's suggestion and prohibit
conditional sections in internal subset. TO DO: simplify
production for ignored sections as a result, since we don't
need to worry about parsers whi
1997-06-29 : TB : various edits
1997-06-29 : CMSMcQ : further changes:
Suppress old FINAL EDIT comments and some dead material.
Revise occurrences of % in grammar to exploit Henry Thompson's pun,
especially markupdecl and attdef.
Remove RMD requirement relating to element content (?).
1997-06-28 : CMSMcQ : Various changes for 1 July draft:
Add text for draconian error handling (introduce
the term Fatal Error).
RE deleta est (changing wording from
original announcement to restrict the requirement to validating
parsers).
Tag definition of validawwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww it meant 'may or may not'.
1997-03-21 : TB : massive changes on plane flight from Chicago
to Vancouver
1997-03-21 : CMSMcQ : correct as many reported errors as possible.
1997-03-20 : CMSMcQ : correct typos listed in CMSMcQ hand copy of spec.
1997 James Clark:
Define the set of characters from which [^abc] subtracts.
Charref should use just [0-9] not Digit.
Location info needs cleaner treatment: remove? (ERB
question).
One example of a PI has wrong pic.
Clarify discussion of encoding names.
Encoding failure should lead to unspecified results; don't
prescribe error recovery.
Don't require exposure of entity boundaries.
Ignore white space in element content.
Reserve entity names of the form u-NNNN.
Clarify relative URLs.
And some of my own:
Correct productions for content model: model cannot
consist of a name, so "elements ::= cp" is no good.
1996-11-11 : CMSMcQ : revise for style.
Add new rhs to entity declaration, for parameter entities.
1996-11-10 : CMSMcQ : revise for style.
Fix / complete section on names, characters.
Add sections on parameter entities, conditional sections.
Still to do: Add compatibility note on deterministic content models.
Finish stylistic revision.
1996-10-31 : TB : Add Entity Handling section
1996-10-30 : TB : Clean up term & termdef. Slip in
ERB decision re EMPTY.
1996-10-28 : TB : Change DTD. Implement some of Michael's
suggestions. Change comments back to //. Introduce language for
XML namespace reservation. Add section on white-space handling.
Lots more cleanup.
1996-10-24 : CMSMcQ : quick tweaks, implement some ERB
decisions. Characters are not integers. Comments are /* */ not //.
Add bibliographic refs to 10646, HyTime, Unicode.
Rename old Cdata as MsData since it's only seen
in marked sections. Call them attribute-value pairs not
name-value pairs, except once. Internal subset is optional, needs
'?'. Implied attributes should be signaled to the app, not
have values supplied by processor.
1996-10-16 : TB : track down & excise all DSD references;
introduce some EBNF for entity declarations.
1996-10-?? nsistency check, fix up scraps so
they all parse, get formatter working, correct a few productions.
1996-10-10/11 : CMSMcQ : various maintenance, stylistic, and
organizational changes:
Replace a few literals with xmlpio and
pi""entities, to make them consistent and ensure we can change pic
reliably when the ERB votes.
Drop paragraph on recognizers from notation section.
Add match, exact match to terminology.
Move old 2.2 XML Processors and Apps into intro.
Mention comments, PIs, and marked sections in discussion of
delimiter escaping.
Streamline discussion of doctype decl syntax.
Drop old section of 'PI syntax' for doctype decl, and add
section on partial-DTD summary PIs to end of Logical Structures
section.
Revise DSD syntax section to use Tim's subset-in-a-PI
mechanism.
1996-10-10 : TB : eliminate name recognizers (and more?)
1996-10-09 : CMSMcQ : revise for style, consistency through 2.3
(Characters)
1996-10-09 : CMSMcQ : re-unite everything for convenience,
at least temporarily, and revise quickly
1996-10-08 : TB : first major homogenization pass
1996-10-08 : TB : turn "current" attribute on div type into
CDATA
1996-10-02 : TB : remould into skeleton + entities
1996-09-30 : CMSMcQ : add a few more sections prior to exchange
with Tim.
1996-09-20 : CMSMcQ : finish transcribing notes.
1996-09-19 : CMSMcQ : begin transcribing notes for draft.
1996-09-13 : CMSMcQ : made outline from notes of 09-06,
do some housekeeping
is used to read XML documents
and provide access to their content and structure. It is @ssumed that an XML processor is
doing its work on behalf of another module, called the
application. This specification describes the
required beh\vior of an XML processor in terms of how it must read XML
data and the information it must provide to the application.
Origin and Goals
XML was developed by an XML Working Group (orisable over the
Internet.
XML shall support a wide variey of applications.
XML shall be compatible with SGML.
It shall be easy to write programs which process XML
documents.
The number of optional features in XML is to be kept to the
absolute minimum, ideally zero.
XML documents shou