Problem.
Given a set of defined terms, and a text, search for
occurences of the terms, and if they match, mark them up as links reffering
to the definition.
Terms consist of a sequence of non-whitespace characters.
The marking up consists of putting a fixed string in front of the term, then
a second fxied string, then teh term again, and then a final string:
- <a href="#
- term
- ">
- term
- </A>
However the precise strings are not important.
First Dictionary
- Program::=(input=>i.text, output=>o.text, process=>p).
- i.text::= #(occurrence_of_term | non_occurence_of_term | whitespace).
- o.text::= #(marked_up_term | non_occurence_of_term | whitespace).
- marked_up_term::=prefix occurence_of_term infix occurence_of_term postfix.
- prefix::="<a href=\"",
- infix::="\">",
- postfix::="</A>".
Difficulties
HTML does not permit nested "anchors". So terms must not overlap.... given
- Terms:={ "abcd", "ab", "bc" }
and text:="hello abcd hi ab ther bc"
we need as output
- "hello <a href="#abcd">abcd</A> hi <a href="#ab">ab</A> ther <a href="#bc">bc</A>"
but not:
- "hello <a href="#abcd"><a href="#ab">ab</A>cd</A> hi <a href="#ab">ab</A> ther <a href="#bc">bc</A>"
Further we want to substitute the longest term in preferrence to the shorter
ones. This makes the following reult incorrect:
- "hello <a href="#ab">ab</A>cd hi <a href="#ab">ab</A> ther <a href="#bc">bc</A>"
This means that the normal search for a the first match and replace will
not work.
Idea
Scan the text, with each term watching to see if it is the matching term.
When there is more than one term macthing, continue the scan. They drop
out, until: there is one perfect match or none left. If there is one
then fix it up, other wise, reset every thing, output the non-term
and continue.
Design
Treat each term as a separate entity. Each is responsible for
recognizing whether or not it is a match.
- in.term::=matching | non_matching.
- matching::= #(equal_character) end_of_term_maker.
- non_matching::=#(equal_character) un_equal_character #(char) end_of_term_marker.
- ...
Implementation
- ...