Introduction
The ASCII code was the first 8-bit standard code that let characters - letters,numbers, punctuation, and other symbols - be
represented by the same 8-bits on many different kinds of computers.
It is limitted to the alphabet popular at the time in the USA but was
adopted internationally( see ISO_Latin_1).
It is the code used on the Internet. In the
1990's a 16-bit code was developed that will handle alphabets of many
nations. It contains the ASCII sequence. The new code is called
UNICODE.
The ASCII code includes control characters that do not print a
character. These have a short standard name, a standard function
plus a
large number of non-standard applications.
The original code was developed in the days of mechanical
terminals such as Teletypes. The meaning of the control codes are
defined in terms of typewriter like actions - Tab, ring bell,
back-space, return, and line feed. These have been re-interpreted
as cursor movements for CRT's. Many computer systems use the
control codes for special purposes. A competent software
engineer will know about the control codes; what they were
designed to mean, and how they are used or mis-used in real
systems.
In many high level languages the ASCII characters are representable as a
function, (eg Pascal - chr(i), C - (char)i, or Ada - CHAR'VAL(i) ) where
"i" is an integer. Ada specifies a special standard package that defines
ASCII with standard names for constants representing the coded character.
In C they can be indicated by a backslash character (\) followed by
either a special letter, or as a hexadecimal or octal number. The
following dictionary defines the ASCII name, its position in the ASCII
code, its original meaning, and Ada 83 symbol.
Control characters
- ASCII::=
Net{
There are 128 ASCII codes numbered from 0 to 128. I will use the notation of
character_nbr(i)
to indicate the i'th ASCII character:
- character_nbr::0..127---char, -- The "---" indicates that there are precisely 128 standard characters that correspond, one-for-one, to their code numbers.
I use the C/C++ abbreviation to indicate the set of all ASCII codes:
- char::=character_nbr(0..127), --all the ASCII codes.
- CTRL_CHARS::=
Net{
- NUL::=character_nbr(0)::= Fills in time* (ASCII'NUL).
- SOH::=character_nbr(1)::= Start Of Header (routing info)(ASCII'SOH).
- STX::=character_nbr(2)::= Start Of Text (end of header)(ASCII'STX).
- ETX::=character_nbr(3)::= End Of Text(ASCII'ETX).
- EOT::=character_nbr(4)::= End Of Transmission(ASCII'EOT).
- ENQ::=character_nbr(5)::= ENQuiry, asking who is there(ASCII'ENQ).
- ACK::=character_nbr(6)::= Receiver ACKnowledges positively(ASCII'ACK).
- BEL::=character_nbr(7)::= Rings BELl or beeps(ASCII'BEL)\a.
- BS::=character_nbr(8)::= Move print head Back one Space(ASCII'BS)\b.
- HT::=character_nbr(9)::= Move to next Tab-stop(ASCII'HT)\t.
- LF::=character_nbr(10)::= Line Feed (ASCII'LF)\n.
- VT::=character_nbr(11)::= Vertical Tabulation(ASCII'VT)\v.
- FF::=character_nbr(12)::= Form Feed - new page or form(ASCII'FF)\f.
- CR::=character_nbr(13)::= Carriage Return to left margin(ASCII'CR)\r.
- SO::=character_nbr(14)::= Shift Out of ASCII(ASCII'SO).
- SI::=character_nbr(15)::= Shift into ASCII(ASCII'SI).
- DLE::=character_nbr(16)::= Data Link Escape(ASCII'DLE).
- DC1::=character_nbr(17)::= Device control(ASCII'DC1).
- DC2::=character_nbr(18)::= Device control(ASCII'DC2).
- DC3::=character_nbr(19)::= Device control(ASCII'DC3).
- DC4::=character_nbr(20)::= Device control(ASCII'DC4).
- NAK::=character_nbr(21)::= Negative Acknowledgment(ASCII'NAK).
- SYN::=character_nbr(22)::= Sent in place of data to keep systems synchronized(ASCII'SYN).
- ETB::=character_nbr(23)::= End of transmission block(ASCII'ETB).
- CAN::=character_nbr(24)::= Cancel previous data(ASCII'CAN).
- EM::=character_nbr(25)::= End of Medium(ASCII'EM).
- SUB::=character_nbr(26)::= Substitute(ASCII'SUB).
- ESC::=character_nbr(27)::= Escape to extended character set(ASCII'ESC).
- FS::=character_nbr(28)::= File separator(ASCII'FS).
- GS::=character_nbr(29)::= Group separator(ASCII'GS).
- RS::=character_nbr(30)::= Record separator(ASCII'RS).
- US::=character_nbr(31)::= Unit separator(ASCII'US).
- SP::=character_nbr(32)::= Blank Space character(ASCII'SP).
- DEL::=character_nbr(127)::=Punch out all bits on paper tape(delete).
}=::
CTRL_CHARS.
Normal Characters
- OTHER_CHARS::=
Note
Notice that NO ASCII character sends a BREAK signal. This is not a
character. It is transmitted thru an RS232 cable by dropping the DTR
line to the signal ground, or thru a modem by ceasing to send the
carrier frequency for a fixed length of time. NUL transmits a
character (with all bits=0), BREAK does not.
Whitespace
- whitespace::= whitespace_char #(whitespace_char).
- whitespace_char::= SP | CR |LF | HT | ... .
- EOLN::=End Of Line string -- depends on the system you are using.
- |-EOLN ==> (CR | LF) #(CR | LF | HT | VT | ...).
In COBOL and MATHS the "." character is both a punctuator and a decimal point.
The following defines the cases when a "." is acting as a punctuator:
- period::="." whitespace,-- A dot followed by white space is treated as a period.
Standard Character Sets
char is the set of all ASCII characters.
- digit::="0".."9". See digits.
- letter::=upper_case_letter | llower_case_letter.
- upper_case_letter::="A".."Z". See upper_case_letters: characters 65..90.
- lower_case_letter::="a".."z". See lower_case_letters: characters 97..121.
}=::
ASCII.
See Also
Tables of ISO Latin 1 codes:
- ISO_Latin_1::= See http://www.bbsinc.com/symbol.html
- UNICODE::= See http://cse.csusb.edu/dick/samples/glossary.html#UNICODE.
Notes on special uses of special characters
- The following have been used to mark the end of a string:
NUL, ESC, 2 ESCs, grave accent, apostrophe, quotation, EOLN, slash.
- The following have been used to indicate the end of input:
EOT, SUB, 2 CRs
- The following have been used to kill or delete the previous character:
DEL, BS, #.
- The following have been
used to cancel the current line of input:
DEL, NAK(^U), hash(#)
- On a network the special character take on yet more meanings. For
example, commonly RS232 communications use DC3(^S) and DC1(^Q) to
delay and restart data transmission (originally to allow data to be punched).
In an X.25 packet switched network SI(^O) forces the data through the
intervening machines and DLE(^P) allows you to send commands to your local
"Pad". Proprietary networks often have a special 'escape' character as well.
The following can allow a following control character to appear in
text:
SYN(^V),