THis is a frst rough draft for a new language. The symbol TBD
indicates a known area of incompleteness "To Be Done".
Richard J Botting
rbotting@csusb.edu
Small Awk (not smalltalk!)
is designed as sample language studied and worked on in
Computer Science classes that study high level computer languages.
It is a subset of the awk language of Aho, Weinberg and Kernighan.
IT is a subset created by deleting a lot of useful convenient options
and features for awk. Even so it is a languages that has control structures
variables, input, and output. For simple, every day programs it is
rather like a small and safe C or C++.
Small awk programs can be tested by running them thru any awk
interpreter on a UNIX system. awk has also been ported to other
platforms and can also be used to test smallawk programs. Suppose your
smallawk program is in a file called
hello.awk
then the command
awk -f hello.awk
will test the program.
Like awk, smallawk is designed for processing data files. It works with
simpler files than awk however. Smallawk assumes that it will read
a single file and each line in the file is a list of "fields" of data
separated by one or more spaces or tabs. Smallawk reads the inoput file
and produces a
single stream of output by processing the input. Notice that a smallawk
program does not start until it is given some data, and doesn't stop
until it gets the usual end-of-file. Here is a traditional simple
program in smallawk:
END{ print "hello, world"; }
It reads the input and at the end of the inpoyut data it outputs the
"hello, world message. The following program reads the input file
and numbers the lines:
{ print NR, $0; }
The next program prints outs lines that contain the string "AWK":
/AWK/{ print ;}
The next program Assumes that each line has one number, and at the
end of the file outputs the total of these nunmbers:
{sum = sum + $0;}
END{print sum;}
This one checks the input and only adds a line if it is a valid integer,
with one or more decimal digits:
/^[0-9][0-9]*$/{sum = sum + $0;}
END{print sum;}
This reads in a file of names, student_ids, and scores
and calculates the mean score. It assume input with data
separated by spaces like this:
ShortName 9999 3.2
AnotherName 1234 17
The program is
{sum = sum + $3; count=count+1;}
END{print sum/count;}
smallawk has the normal lexical scan separating variables, constants,
strings, from some reserved words:
- reserved_words::= "END" | "NF" | "NR" | "BEGIN" | "print" | "if" | "else".
- END::lexeme, true after last line is read.
- NF::lexeme, Number of Fields.
- NR::lexeme, Number of this line/record.
- BEGIN::lexeme, true on first line/record only.
- print::lexeme, starts a command to output the value of an expression.
- if::lexeme, introduces a conditional statement.
- else::lexeme, alternative branch of a conditional statements.
- comma::lexeme=
",".
A program is a sequence of pieces. Each piece has two parts:
a pattern and an action. The pattern states when the action is to be applied.
When smallawk is running it takes each line and tries each piece of the program
in turn and carries out the actions with patterns that match the line.
- program::= # piece.
BEGIN{product=1;}
{product = product * $0;}
END{print product;}
[ example.awk ]
- piece::= O( pattern ) "{" action "}".
The pattern is used to recognise the lines of data
that the action applies to.
/^[0-9][0-9]*$/{sum = sum + $0;}
END{print sum;}
- disjunction::= conjunction #( "||" conjunction).
/Botting/||/Dick/
- conjunction::= possible_complement #( "&&" possible_complement).
/Botting/ && /Richard/
- possible_complement::= "!" elementary_pattern | elementary_pattern.
!/Blotting/
- elementary_pattern::= "END" | "BEGIN" | "/" regular_expression "/".
/Banana/
/[bB]anana/
/^\.As_is/
/^[0-9]*$/
- regular_expression::= left_most | rightmost | exact | contained.
- leftmost::= "^" contained.
^.As_is
- rightmost::= contained "$".
two part:$
- exact::= "^" contained "$".
^Dick.*Botting$
- contained::= #possibly_repeated_set_of_chars.
- possibly_repeated_set_of_chars::= possible_set "*" | possible_set.
pos*ibil*ity
The "*" means "zero or more of".
The above matches "posssibiity" for example.
- possible_set::=normal_character | escaped_character | set_of_characters.
- normal_character::= letter | digit | symbol ~ special_character.
- escaped_character::= "\" special_character.
- special_character::= "[" | "]" | "\" | "." | "*".
- set_of_characters::= "[" chars "]" | "[^" chars "]".
- chars::= # character | character "-" character.
- character::= normal_character | escaped_character.
An action is a series of at least one statement. These are
executed one after another.
- action::= statement #statement.
- statement::= assignment | print | selection | do_nothing.
- assignment::= variable "=" expression ";"
sum = sum + $0;
- print::= "print" ";"| "print" expression ";".
print sum;
- selection::= "if(" condition ")" body "else" else.
if ( sum > 0 ) print "greater"; else ;
- body::=action | do_nothing.
- else::=action | do_nothing.
- Note: avoids an ambiguity (dangling else) by always requiring an else for
every if. real awk does not have this restriction...
- condition::=expression.
- do_nothing::= ";".
- operation::= "+" | "-" | "*" | "/" | "%" | "&&" | "||" | "!".
- function::= "sin" | "cos" | "log" | "exp" | "sqrt".
- expression::= concatenation.
- concatenation::= arithmetic_expression #arithmetic_expression.
- arithemetic_expression::= simple_expression #(#operation simple_expression).
- simple_expression::= function_call | variable | constant | parentesized_expression.
- parenthesized_expression::= "(" expressison ")".
- function_call::= function "(" expression #( comma expression ).
TBD : Operations, variables, functions....
- variable::= whole_line | field | global_variable.
- whole_line::= dollar "0".
$0
- field::= dollar expression.
$5
- global_variable::=letter #(letter|digit).
sum
TBD
. . . . . . . . . ( end of section Syntax.) <<Contents | End>>
Here is a diagram:
TBD
An smallawk program describes a transformation of a text file into another
text file. This is done one line at a time. Each line is read and matched
against the patterns. Where they match the actions are executed. These
lead to outputing lines of data as well.
Here are the informal operational semantics of a smallawk program P.
P wll consist of a sequence of n pieces p[1]..p[n].
Each piece 'p'['i'] has two parts a pattern 'p'['i'].pattern
and an action 'p'['i'].action.
Here is a C++like description of
what the program P does.
NR=1;
while( get next line until end of input )
{
for(i = 1; i<=n; i++)
if( line matches p[i].pattern )
apply p[i].action to line;
NR++;
}
//after end of file
for(i = 1; i<=n; i++)
if( p[i].pattern is "END" )
apply p[i].action;
A line matches a pattern according to the rules TBD ...
[ regular_expressions.html ]
Applying an action to the line starts by assigning the whole line to
variable $0. Then each field in the line (separated by one or more spaces)
is assigned to $1 thru to NF where NF is set to the number of fields.
An action is a sequence of one or more instructions and these are
executed in turn. If an instruction is an assignment then the expression on
the right hand side is evaluated and the resulting value is placed in the
variable on the left hand side of the '=' sign. This may change the whole
line or any field in the line if the variable is '$0' or 'i' for some other i.
If the action is a print command with an expression then the expression is evaluated and output plus a new line. If it is a print with no expression then
the whole line (with any changes) is printed. If the instruction
is a selection with condition c and body b and else part 'e', then the condition is
evaluated and if it not zero the body b is executed. Other wise if c evaluates to zero then e is executed.
Expressions are evaluated in the usual way: constants beome their values,
variables return their current value, and operations are applied in order
of precedence to give a value.
- awk::= See http://cse.csusb.edu/dick/cs360/notes/awk.html.
- smalltalk::= See http://cse.csusb.edu/dick/samples/smalltalk.html
- TBA::="To Be Announced".
- TBD::="To Be Done".
- For X, O(X)::= (X | ), Optional X.