[Skip Navigation] [CSUSB] / [Comp Sci ] / [R J Botting] >> [MATHS] >> intro_ebnf
[Contents] [Source Text] || [Notation] || [Copyright] || [Contact] [Search ]
Wed Jan 31 14:23:44 PST 2007

Contents


    Syntax (XBNF)

    1. XBNF::="Extreme BNF", a form of syntax description that extends Backus-Naur-Form as far as it can go.

      XBNF uses space, bars ('|') and the symbol '::=' as in BNF. It uses '()' as parentheses and '#' to mean 'any number of'. This choice follows more than 10 years of testing other forms in class and on computers. The EBNF symbols '<>{}[]' are used for other purposes

      Here is comparison of BNF, EBNF and XBNF.

       BNF	  <number>::=<digit>|<number><digit>
       Ada EBNF   number::= digit {digit}
       XBNF	   number::= N(digit).

      You can see which is shorter. XBNF is more powerful than EBNF as well. It is not as short as the theoretical notations for a grammar because it is designed to work in ASCII (no Greek or special symbols) and yet say more.

      A Mapping into English

       	"::=" is read "is defined to be"
       	"&" is read "and also"
       	"~" is read as "but not"
       	"|"  is  read as "or"
       	"#" is read as "any number of including none"
       	"(" is read as "a sequence of"
       	")" is read as ", end of sequence"
       	space between 2 parts in a sequences can be read as "and then"
       	"_" becomes a space.
       	"N(_)" is read as "one or more _" and short for "_ #(_)"
       	"O(_)" is read as "optional _" and is short for "(|_)"

      A Meta-Grammar

      Here is a set of definitions that defines a XBNF grammar.

    2. grammar::= #( definition | comment ), a grammar has any number of definitions and comments. A useful grammar has comments that describe why a syntax is needed and lots of examples of how to use it. The definition above is followed by this comment for example.

    3. definition::= defined_term "::=" selection,
    4. a definition has a "::=" between the defined term and an expression defining a set. This definition defines what a definition is: it is something that defines a defined_term (what ever that is) plus a special colon-colon-equals opearator and then an expression that defines a selection of alternative forms (see below).

    5. selection::= alternative #( "|" alternative ), a selection is made of alternatives. It means making a choice of forms. Mathematically is no more or less than the union of the alternative forms.

    6. alternative::= intersection | complementary_form. Each alternative is either a intersection or complementary_form.

    7. intersection::= sequence #("&" sequence ), the ampersand("&") character indicates that all the sequences must hold at once (set intersection).

    8. complementary_form::= sequence "~" sequence. In a complementary form ( A~B say) then any A which is not a B is in A~B. An example would be #(a|b) ~ (a b) : any sequenced of a's and b's except the sequence ab.

    9. sequence::= #phase ,

    10. phase::= O("#") item, The 'number sign'(#) indicates the occurrence of any number of the following item - including none. The above shows an optional item: O("#"). So the definition of phase above implies
    11. ()|-phase = "#" item | item.

    12. item::= element | defined_term | "("selection ")", an item can be a more complex sructure in parentheses of some simple name, or a string of characters in quotes.

    13. comment::= (#character )~definition. Any number of characters that are not definition form a comment. This sentence is therefore part of a comment in this grammar.

      Here is the definition of what an element or terminal looks like - its actually a C string.

    14. element::=quote #(char~(quote| backslash)|backslash char) quote.
    15. quote::="\"".
    16. backslash::="\\".

      Here is a more complex definition the defines what a defined term can be:

    17. defined_term::= ( (letter | digit) #identifier_character) & ( #identifier_character (letter|digit)) & correctly_spelled & defined. A defined_term does not start or end with an underscore - an identifier is made of letters, digits and underscore characters, and must always start and end with either a letter or digit.

      Notice that a defined_term's definition includes the item '...& defined' which indicates a special non-syntactic constraint - that a defined_term must be in the set of terms which are defined.

    18. identifier_character::=(letter | digit | underscore).

      A defined term is made of numbers, words and the underscore character:

    19. correctly_spelled::=#(word | number | underscore).

      Numbers can have one or more digits:

    20. number::= digit # digit.

      Words are at least two letters long and are correfctly spelled:

    21. word::=(letter letter #letter) & in some dictionary.

      Lexemes

      It helps if you define the basic elements of your language (the lexemes) in a special section -- a Lexicon. It also helps to indicate them:
       	name::lexeme="string", purpose.
       	double_plus::lexeme="++", used to add 1 to a variable.
      or
       	name::lexeme, purpose.
       	typedef::lexeme, use to introduce type definitions in C++.

      In the second form the string is equal to the characters in the name:

       	typedef::lexeme="typedef".

      Doing this lets the string that represents the lexeme be indexed and found by search engines. In practice many queries of language reference manuals on the web are trying to find the meaning a lexeme.

      Lexicon

      The following terms are defined here for completeness and so you can use them in your own documents. Normally you can take them as given merely by including a link to this section of this web page:
       		http://www.csci.csusb.edu/dick/maths/intro_ebnf.html#Lexicon

      1. character::lexeme=Any ASCII character.
      2. char::lexeme=any ASCII character.
      3. digit::lexeme= "0".."9".
      4. capital_letter::lexeme= "A".."Z".
      5. upper_case::lexeme=capital_letter.
      6. lower_case::lexeme= "a".."z".

        Note: MATHS/XBNF/EBNF is case sensitive, but is used to define case insensitive languages. The following define some maps that may help to define the syntax of case insensitive languages.

      7. to_upper::lower_case---upper_case. See table below...
      8. to_upper::char->char= to_upper|->Id.
      9. to_upper::#char->#char= ""+>"" |+> (first;to_upper|rest;to_upper).
        cto_upper(c)
        aA
        bB
        cC
        ......
      10. to_lower::upper_case---lower_case= /to_lower.
      11. to_upper::char->char= to_lower|+>Id.
      12. to_upper::#char->#char= ""->"" |+> (first;to_lower|rest;to_lower).

      13. ignore_case::char->@char=map[c:char]( to_lower(c) | to_upper(c) ).
      14. ignore_case::#char->@#char= ""+>{""} |+> (first;ignore_case|rest;ignore_case).
      15. letter::lexeme=capital_letter | "a".."z".
      16. underscore::lexeme= "_".
      17. sign::lexeme=plus|minus
      18. plus::lexeme= "+",
      19. minus::lexeme= "-".
      20. comma::lexeme= ",",
      21. semicolon::lexeme=";",
      22. left_bracket::lexeme= "[",
      23. right_bracket::lexeme= "]".
      24. quotes::lexeme= "\'".
      25. space::lexeme= " ".
      26. non_quote::lexeme=char ~ quotes.
      27. l_paren::lexeme= "(",
      28. r_paren::lexeme= ")".

      If you need to define a language that uses the ASCII code then [ comp.text.ASCII.html ] gives definitions of characters (including the control characters) by name and purpose.

      Precedence of Operators.

      The definitions of XBNF above [ A Meta-Grammar ] imply the following consequences and interpretations:

      Concatenation (space) has a higher precedence than a vertical bar ("|"). In an XBNF formula with both spaces and "|" symbols the bars separate the sequences. For example, If a, b, c and d are items, then
      a b | c d = (a b) | (c d) =either a sequence of a and then b, end of sequence or a sequence of c and then d, end of sequence.
      a b | c d is not equal to a (b|c) d.

      Indeed

    22. a (b|c) d = a b d | a c d = (a b d) | (a c d).

      The number sign("#") has lower precedence than a space and so always applies to the next item. Thus:

    23. file1::=header #data_record sentinel, indicates one header, then a number of data_record, and then one sentinel. It does not allow the header to re-occur. Neither can sentinel appear other than once in each file.

      However, if

    24. file2::=#(header data_record) sentinel, then there can be any number of header's but each header is followed by an data_record and there is still exactly one sentinel in an file.

      The next description

    25. file3::=#(header data_record sentinel). implies that headers and sentinels can occur many times - but always with an data_record in between them. Further in any file there will be the same number of sentinel's, data_record's and header's because the "#" does not permit a part of the repeated sequence to appear with out the rest of it also following.
    26. file4::=#(header #data_record sentinel). implies that headers and sentinel's can occur many times and with zero or more data_record's in between them.

      Similarly, If a and b are items then (a | #b ) indicates either an a or any number of b 's, as does (#b |a ). This, plus the rule relating spaces and the "|" symbol means that
      (a #b c | #b )=either a single a followed by many b's and one c or else many b 's alone.

      Parentheses (()) are put around selections within sequences, and iterations.
      (a | b) (c | d)= a sequence of a or b, end of sequence and then a sequence of c or d, end of sequence.

      Or informally
      the first item is either an a or a b followed by either a c or a d. Notice that this describes four alternative forms because:


    27. (set_theory)|-(a | b ) (c | d) = (a c | a d | b c | b d).


      #(a | b )=any number of a's and b's in some order.


    28. (similarly)|-#(a | b )= empty | a | b | a a | a b | b a | b b | a a a | a a b | ...

      So, #(a|b) is different from #a | #b because

    29. (above)|-#a | #b = empty | a | a a | a a a| ... | b | b b | b b b | ...

      So, #(a|b) includes alternatives like a b and a a b and b b a that are not permitted by #a | #b.

      The following common form indicates a series of pairs:
      #(a b )=any number of pairs. where each pair has one a followed by one b.

      The #(a b) form implies that each a must be followed by a b. #a #b implies that all the as are followed by all the bs. #(a|b) implies that there is a series of as and bs in some order... a kind of muddled list.

      Shorthand Idioms

      Certain patterns appear again and again in practice and XBNF defines shorthand "macro"s for these patterns:
      1. For x, O(x)::=(x |). O(_) means 0 or 1 occurrences of its argument. So for example:
      2. (O1)|-for all x, y, O(x) y = (x y| y).

      3. N(X)::= X | X N(X), N(_) indicates 1 or more occurrence.

      4. L(X)::=X | X comma L(X), L(_) indicates a comma separated list.
      5. comma::=",".

      Indeed the "#" notation is in essence another abbreviation because of the following rules:
    30. |- (hashon): #(X) = O( N(X)).
    31. |- (nhash): N(X) = X #(X) = #(X) X.

      Semantics

      Informally each defined term maps into a set of sequences. Thus it extends set_theory and a theory of sequences. The structure of these sequences is determined by the definitions of the defined terms. This is easy to see until one gets doubts about definitions like this:
    32. train:: = engine carriage | train carriage.

      The above defines a train in terms of the meaning of a train. The formal semantics tackles and resolves the problem of such recursive definitions see grammar_theory below.

      See Also

    33. grammar_theory::= See http://csci.csusb.edu/dick/maths/intro_grammar.html.

    34. set_theory::= See http://csci.csusb.edu/dick/maths/logic_30_Sets.html.

    35. similarly::=using the same reasoning as in the previous theorem.

      Acknowledgment

      Thanks to Larry Evans <jcampbell3@prodigy....> who pointed out the broken and bogus links in these notes on Mon May 21. Also thanks to claudiu <claudiu@romatsa.ro> on Tue, 23 Oct 2001 who spotted errors and made wise suggestions.

      Problems that remain are dick botting's.

    . . . . . . . . . ( end of section Syntax (XBNF)) <<Contents | End>>

    Notes on MATHS Notation

    Special characters are defined in [ intro_characters.html ] that also outlines the syntax of expressions and a document.

    Proofs follow a natural deduction style that start with assumptions ("Let") and continue to a consequence ("Close Let") and then discard the assumptions and deduce a conclusion. Look here [ Block Structure in logic_25_Proofs ] for more on the structure and rules.

    The notation also allows you to create a new network of variables and constraints, and give them a name. The schema, formal system, or an elementary piece of documentation starts with "Net" and finishes "End of Net". For more, see [ notn_13_Docn_Syntax.html ] for these ways of defining and reusing pieces of logic and algebra in your documents.

    For a complete listing of pages in this part of my site by topic see [ home.html ]

    Notes on the Underlying Logic of MATHS

    The notation used here is a formal language with syntax and a semantics described using traditional formal logic [ logic_0_Intro.html ] plus sets, functions, relations, and other mathematical extensions.

    For a more rigorous description of the standard notations see

  1. STANDARD::= See http://www.csci.csusb.edu/dick/maths/math_11_STANDARD.html

    Glossary

  2. above::reason="I'm too lazy to work out which of the above statements I need here", often the last 2 or three statements.
  3. given::reason="I've been told that...", used to describe a problem.
  4. given::variable="I'll be given a value or object like this...", used to describe a problem.
  5. goal::theorem="The result I'm trying to prove right now".
  6. goal::variable="The value or object I'm trying to find or construct".
  7. let::reason="For the sake of argument let...", intoduces a temporary hypothesis that survives until the end of the surrounding "Let...Close.Let" block or Case.
  8. hyp::reason="I assumed this in my last Let/Case/Po/...".
  9. QED::conclusion="Quite Easily Done" or "Quod Erat Demonstrandum", indicates that you have proved what you wanted to prove.
  10. QEF::conclusion="Quite Easily Faked", -- indicate that you have proved that the object you constructed fitte the goal you were given.
  11. RAA::conclusion="Reducto Ad Absurdum".

End