[Skip Navigation] [CSUSB] / [CNS] / [Comp Sci & Eng] / [R J Botting] /[CS320 Course Materials] /html.syntax.html [html.syntax.txt(Text)] [Search ]
Tue Jun 15 14:35:18 PDT 2010
[Schedule] [Syllabi] [Text] [Labs] [Projects] [Resources] [Grading] [Contact]
Notes: [01] [02] [03] [04] [05] [06] [07] [08] [09] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]

Contents


    Syntax of the HyperText Markup Language

      HTML is the language used on the World-Wide-Web to describe the logical structure of a large number of interlinked pages. It is a special document type described using the rules of SGML: [ comp.text.SGML.html ]

      For new information see [ sgml.html ]

      There are three ideas that makeup HTML

      1. SGML tags: <tag> .... </tag> [ Grammar ]
      2. URLs: how://where/what... [ Universal Resource Locators ]
      3. CGIs: [ Common Gateway Interface ]

      For help with the Three Letter Accronyms(TLAs) see [ comp.html.glossary.html ]

      Lexicon

      1. entity::= "&" identifier ";". These encode special characters in ASCII using "&" before and ";" after:
         		&lt;
         		&gt;
         		&amp;
         		&quot;
        Respectively: <,>,&,".

      2. tag::="<" tag_identifier #attribute ">" | "<" "/" tag_identifier ">".
      3. attribute::= attribute_identifer O("=" attribute_value). Upper and lower case are ignored in tag and attribute identifiers but not in attribute values.

        Here is a useful shorthand for describing most tags:

      4. tagged::= "<" (_) #attribute ">".

        The "(_)" above is a place holder for the tag identifier. So

      5. tagged(b) = "<" b #attribute ">".

      6. comment::="<!--" text that will not effect the rendered page "-->".

      Grammar

        Universal Resource Locators

          Universal Resource locators are attribute values that tell a browser where to find things on the internet.

          The following XBNF is an approximation to the standard defined at [ 5_BNF.html ]

        1. URL::= protocol O(site O(port)) #("/"directory) O("/" O(file O("#" identifier | #( "?" query)))).

        2. protocol::="http" | "ftp" | "telnet" | "file" | "gopher" | "news" |... .
        3. site::= "//" internet_address.
        4. port::=":" decimal_number.
        5. directory::=file_name.
        6. file::=file_name O("."file_type). File names can include periods.
        7. file_type::="html" | "gif" | "xbm" | "au" | "jpeg" | "jpg" | "mpeg" | "aiff" | "mov" |...

        Documents

      1. document::= O("<HTML>") O(header) body.
      2. header::= "<head>" #header_element "</head>"
      3. body::= tagged("body") untagged_body "</body>" .
      4. untagged_body::= #( element | named(element) | hypertext_refed(element) ) .

        Elements

      5. element::=special_text | header | list | image | paragraph #("<P>" paragraph) | "<br" | "<h>" | link | form | ...

      6. named::="<a name=" name ">" (_) "</a>".
      7. Note. The above is used like this named(header) in this syntax, and the argument (header) replaces the (_) above:
         		<a name = Contents>Contents List of Document</a>

      8. hypertext_refed::="<a href=" quotes URL quotes ">" (_) "</a>". Again.... hypertext_refed(X) means "<a href=" quotes URL quotes ">" X </a>.
         		<a href="http://cse.csusb.edu"> Hello World</a>

        Text

        This fails to express a complicated set of rules about what elements can, can not, or should not appear nested inside other elements. These are in the Document Type Definition(DTL) for HTML documents - written in the Standardized General Markup Language (SGML) and held at CERN(Center for European BNuclear Resesarch).

      9. special_text::= |[ x:special_test_type] ("<" x ">" simpler_text "</" x ">").
      10. special_text_type::= "pre"|"listing"|"blockquote".

        Note. The above sumarizes 3 different alternative with a different x in each one.

      11. header::= |[i:"1".."6"] ( "<H" i ">" text "</H" i ">" ).
      12. Note the above describes the 6 levels of headers with "H1" being most prominent and "H6" least prominent. The actual and relative styles and sizes can not be specified but a chosen by the user and the browser.

      13. paragraph::= #(piece | text ),
      14. piece::= |[s:styles]( tagged(s) text "</" s "> ).
      15. styles::=logical_styles | physical_styles.

      16. logical_styles::= "em" | "strong" | "code" | "samp" | "kbd" | "var" | "dfn" | "cite" | "address",
      17. physical_styles::= "b" | "i" | "u" | "tt".
      18. Note. Physical styles are "deprecated".
      19. deprecated::=they don't like it for some reason and they might remove it one day. (HTML is not for word processing!)

        The browser and user determine the precise meaning of these styles with the following guidelines:

        Table of Styles

         Style	Meaning
         em	Emphasized - "notice me"
         strong	Emphasized even more
         code	This is a piece of computer output
         samp	This is a sample of HTML
         kbd	This is the name of a key on the keyboard
         var	This is a syntactic variable
         dfn	This is a definition
         cite	This is a citation of a source
         address	This is an address (Real of Email)
         b	looks bold (deprecated)
         i	looks italic (deprecated)
         u	looks underlined (deprecated)
         tt	looks like a typewriter (deprecated)

        The SGML specifies rules about what is reccomended, normal and deprecated.

      20. text::=untagged_body & ( reccomended_DTD_ rules | DTD_rules | deprecated_DTD_rules),
      21. DTD::=Document Type Definition.

        Images

      22. image::="<img src=" URL " O("alt=" string) O("align=" alignment) O("ismap") ">".
      23. alignment::="left" | "right" | "center" .

        Lists

      24. list::=ordered_list |unordered_list | definition_list | menu |directory|... .
      25. definition_list::= "<DL" O("compact")">" #( "<dt>" term #(<dd> text ) ) "</DL>".
      26. term::=text.
      27. ordered_list::="<OL>" #("<li>" text ) "</OL>".
      28. unordered_list::="<UL>" #("<li>" text ) "</UL>".
      29. menu::="<menu>" #("<li>" text ) "</menu>".
      30. directory::="<dir>" #("<li>" text ) "</dir>".

        Tables

        Tables are not yet a standard part of HTML but the popularity of Netscape has nade them a part of many pages:
      31. table::= "<table>" #row "</table>",
      32. row::="<tr> #table_item "</tr>",
      33. table_item::="<td>" #element "</td>". -- ??probably??

      Forms

        .TBA

      Common Gateway Interface

      1. (glossary) |- CGI::=Common Gateway Interface [ CGI in www ]

      Abbreviations

    1. For x, O(x)::= (x| ""), Optional x.
    2. For x, N(x)::= (x | x N(x)), one or more x.
    3. For x, #x::= ("" | N(x)), any number of x including none.
    4. quotes::= "\"".

End