[ acknowledgement1 ]
between the complexity of SGML and
the rigidity of HTML lies the eXtensible Markup Language(XML).
XML lets you describe the structure of a document. In return,
you must have a Document Type Description(DTD) before you can process
an XML document properly.
XML documents can be well_formed if they follow some
simple rules that allow them to be parsed. These rules
are outlined below.
A well_formed document can also be valid if they match a Document
Type Declaration(DTD). The DTD has to be declared at the start of
the document along with things like the XML version and the character
code. There are many different DTDs for for different purposes.
XML documents can be processed if the processing is described
in XSL. To display an XML document you need to supply some kind
of mapping into a particular "style". Thus we now have style sheet
languages : XSL, PSL, P, ...
See the W3C information
[ Style ]
on style sheets and style sheet languages.
Examples of Document Types
- SimpleNovel::= See http://www.megginson.com/texts/darkness/novel.dtd.
(MathML): Structure of mathematical formula.
[ REC-MathML ]
[ appendixE.html ]
+ XML dsssl stylesheets rtf tex jade
[ mml-files ]
[ http://www.openmath.org ]
(SVG): Scaleable Vector Graphics --
[ http://www.w3.org/Graphics/SVG/ ]
(W3D): Replacement for the Virtual Reality Modeling Language.
[ Specifications ]
(HRMML): XML based Human Resource Management Markup Language:
[ main.html ]
(DocBook): Structure of documentation for software documents. DocBook
is (in 1999) actually an SGML based way to document software.
[ DocBook in comp.text.SGML ]
for more information.
(XMI): XMI::="Presents meta-data for modeling objects", by CORBA and
the Object Managment Group.
[click here if you can fill this hole]
The following needs an Id and password
XMI is also integrated with the Unified Modeling Language 1.3
[ uml.html ]
(CBL): XML Common Business Library
[ cblfaq.html ]
There are many more sample DTDs at
[ resources.html?keys=*5266 ]
[ http://www.xmltree.com/ ]
[ http://xmlrepository.com/ ]
XML information by Dick Baldwin
[ http://xml.about.com ]
about.com is the new name for the organization that was previously known as
The Mining Company.
First, XML is like HTML however there are vital differences:
- All the tags used in HTML are not defined in XML.
- You can add new tags to XML.
- XML is Case Sensitive
- In XML WhiteSpace is significant
- XML is not about layout and look-and-feel. It is about structure and meaning.
- Five predefined entities: gt(>), lt(<), quot("), amp(&), apos(').
- End tags are never omitted. <t....> ... </t>
- There is a special kind of tag which does not enclose some content <.../>
- Comments look like this <!-- ..... -->
- Processing can be embedded <?....?>
- Attributes always have a name and a value, and the value is between double quotes: name="value".
Syntax of Well Formed Documents
Here is a simple description of all documents that might be in XML -- ignoring
the context dependencies:
After a prolog, comes a single
entity called the root, and then some miscellaneous stuff that
is probably meaningless:
- document::= prolog root miscellaneous.
- prolog::=xml_type #comment dtd.
A well formed document must start with a prolog that
identifies the version of
XML it uses. For example
is the current version of xml.
should also identify the character code - especially if you need to use
any non-"ASCII" characters. It can also identify some namespaces:
- root::= tagged_element | empty_element.
- miscellaneous::= #(comment | processing | WS).
- WS::=white space.
- tagged_element::= "<" tag #attribute ">" content "</" tag ">", -- the
tag at the start and end must be the same. To be valid the tag must be
defined in a DTD and have attributes that and content that match
the rules in the DTD. A tagged element contains other data -- between the two tags.
<title>War and Peace</title>
- empty_element::="<" empty_tag #attribute "/>".
<timestamp date="1999/06/22" time="11:00"/>
- singleton::= empty_element.
- content::= #( parsed_data | element | comment ),
the valid sequences of pieces in a content are described by a regular
expression form in the DTD. An element is either a
tagged element or a empty_element:
- (element) |- element==>tagged_element | empty_element.
- parsed_data::= #(char ~ ("<" | ">" | "&" | ";" | "'") | entity ).
- entity::= predefined_entity | defined_entity.
- predefined_entity::=gt | lt | quot | amp | apos,
- gt::=">", stands for ">".
- lt::="<", stands for "<".
- quot::=""", stands for "\"".
- amp::="&", stands for "&".
- apos::="&apos", stands for "'".
- comment::= "<!--" ... "-->".
<!-- this is a comment -->
- attribute::= name "=" quoted_value.
- quoted_value::=quotes value quotes | apostrophe value apostrophe.
- defined_entity::=defined in prolog.
- parsed_data::=defined in prolog.
- tag::=defined in prolog or namespace,
- |-tag ==> O( namespace ":") name.
- name::=defined in prolog.
- value::=defined in prolog.
(End of Net XMLBNF)
The actual rule for quoting is a little more complex in that the quote
character can not appear inside the value:
- quoted_value::= | [ q:quotes|apostrophe ] q #(char~q) q,
or the union with q equal to quotes or apostrophe of....
To be valid the entities, tags and their attributes must match a set of rules
given in a DTD.
Suppose that we specify a DTD that has a set of normal tag names T
and a set of content free (empty elements) with tag names C
and for each tag t:T|C we must have attribute names N(t),
and for each tag t:T|C, q:quotes|apostrophe, and attribute a:N(t), we have a set of valid
values V(t,n, q), and D is the raw data in our document
- a(t,q)::= ![n:N(t), q](n= q V(t,n,q) q), a sequence of names with valid quoted values,
- c(t, e)::=an expression describing the valid content of tag t in terms of elements e,
and then an element of type t, is defined by
- e(t)::= ("<"t a(t)] ">" c(t, e) "</" t"> | [t:C]( "<" t a(t) "/>"),
and an element is the union over all tags
- element::= D | |[t:T](e(t)).
There is a trick above... the content expression c(t,e) depends on
all the elements as a function associating tag names to elements of
that type. the whole Its probably best to think of this as an array
or vector indexed by entity names. The resulting grammar
is context dependent but can be formalized using only a small variation of
context free grammars.
The "data" (D above) can include elements that indicate some processing to be
done to the data like this "<?.....?>".
- processing::= "<?" tag parameters "?>".
It is possible to name things (like files of data or strings) and
use the names in place of the things -- but the rules are a little
Document Type Declarations
The dtd above is a document type declaration and has many forms.
Here are some simple ones:
- dtd::= "<!DOCTYPE " WS name O(WS externalId) OWS O( localdtd ) ">".
- externalId::= ("PUBLIC" | "SYSTEM") WS string_identifying_a_dtd_file.
- localdtd::= "[" #(markup_declaration| ... | WS) "]" OWS.
Local dtd are interpreted before external ones so that they can define
terms used in the external ones. Unlike all other languages the
first definition of a markup overrides the later ones. Thus localdtd's
both over-ride and inform the external ones!
The DOCTYPE defines the structure of the entity in the document
for the document to be valid.
- markup_declaration::=element_declaration|entity_declaration|attribute_list_declaration | notation_declaration | process_indication | WS.
- element_declaration::="<!ELEMENT" element_name type_description ">".
- element_name::@name, the set of names occurring in element_declarations.
- attribute_list_declaration::="<!ATTLIST" element_name #attribute_declaration ">", attaches a set of attributes to the element named..
- attribute_declaration::=attribute_name attribute_type attribute_default.
- attribute_name::@name, the set of names appearing in attribute declarations.
- type::= "CDATA" | "ENTITY" | "NMTOKEN" | "NMTOKENS" | "ID" | "IDREF" | "IDREFS".
- attribute_default::= required | implied | fixed | default_value.
- default_value::=literal data token.
- required::="#REQUIRED", implies that the element must specify a value and so no default is needed.
- implied::="#IMPLIED", no default is given and no value has to be given. Note however if the attribute name is mentioned it must be assigned a value.
- fixed::="#FIXED" default_value, meaning that the default is also the only value and so cannot be changed in any occurrence.
- entity_declaration::="<!ENTITY" O("%") entity_name entity_meaning ">".
- entity_name::@name, the set of names occurring in entity_declarations.
These add new entities. An entity is an abbreviation. Some (with the '%')
are to be used in DTDs and are expanded there. They are written as
"%"entity_name";" and are replaced by the associated entity_meaning as the
DTD is elaborated. Others (with no "%") are ready to be
used in actual XML document in form "&"entity_name";".
- notation_declaration::="<!NOTATION" TBA ">".
- CDATA_section::= "<![CDATA[" TBA "]]>".
- pcdata::="#PCDATA", keyword indicating a block of parsed character data -- but no XML style marking up.
Standards on the WWW
[ REC-xml-19980210 ]
and Tim Brays Annotated Specification
[ axml.html ]
- FOP::= See http://www.jtauber.com/fop/,
XSL to PDF converter.
- XT::= See http://www.jclark.com/xml/xt.html,
processes XSL transformations.
[ http://click.softwaredevelopment.email-publisher.com/maaac9gaaQhCea89bdEb/ ]
- Apache XML Project's Xerces Java
[ http://click.softwaredevelopment.email-publisher.com/maaac9gaaQhCfa89bdEb/ ]
- James Clark's XP
[ http://click.softwaredevelopment.email-publisher.com/maaac9gaaQhCga89bdEb/ ]
- Microstar's Aefred
[ http://click.softwaredevelopment.email-publisher.com/maaac9gaaQhCha89bdEb/ ]
- Sun's Java API for XML
[ http://click.softwaredevelopment.email-publisher.com/maaac9gaaQhCja89bdEb/ ]
- Oracle's XML parser
[ http://click.softwaredevelopment.email-publisher.com/maaac9gaaQhCka89bdEb/ ]
. . . . . . . . . ( end of section Parsers) <<Contents | Index>>
Lars Marius Garshol <email@example.com> wrote(comp.text.xml,13 May 1999)
"The namespace URI does not point
to anything meaningful, it's just a globally unique identifier. So your application will have to understand the DTD to make use of its elements. It would need that even if the URI did refer to a
DTD. But at least they are now identified as being fitting elements,
and your application can make a decision as to whether it should just
ignore them or whether it should try to support them."
- API::="Application Programmers Interface".
- DOM::="Documentation Object Model".
- DTD::="Document Type Declaration",
[ DTD in comp.text.SGML ]
- HTML::markup_language= HTML_glossary & HTML_syntax.
- HTML_glossary::= See http://cse.csusb.edu/dick/samples/comp.html.glossary.html.
- HTML_syntax::= See http://cse.csusb.edu/dick/samples/comp.html.syntax.html.
- language::="a set of syntactic and semantic rules defining the correct form, structure, and meaning of strings of characters", the chief product of computer science research.
- ML::="in an acronym often indicates a markup_language" | "a programming language".
- markup_language::language="a language that describes how to mark up text to give it added meaning, richness, or layout and style".
- For x, O(x)::= an optional x.
[ Thot ]
the Thot structured document language and the P stylesheet language.
- PSL::stylesheet_language, part of the Proteus library and style sheet library.
[ ~multimedia ]
- SGML::markup_language="Standard Generalized Markup Language",
[ comp.text.SGML.html] .
- SAX::="Simple API for XML".
- stylesheet::="A description in a special stylesheet_language of the way a user or client wants some data interpreted and/or displayed".
- stylesheet_language::="A computer language defining how to specify the style for displaying or processing a document".
- TBA::="To Be Announced".
- XML::markup_language="eXtensible Markup Language".
See the BNF syntax XMLBNF above
or the W3C specs
[ REC-xml-19980210 ]
or Tim Brays Annotated Specification
[ axml.html ]
or the Italian translation
- XSL::stylesheet_language="XML stylesheet Language".
[ REC-xml ]
- element::=an identifiable(and so tagged) piece of data.
- entity::=a string that symbolizes a character | something that contains data.
The Annotated XML Spec at
[ axml.html ]
Mapping runtime objects into XML formatted data
[ XML_Serialization ]
[ xtal.html ]