Quick Tour
Introduction to the Universaltext Interpreter
The Universaltext Language is a computer language to represent text, particularly to represent a text implementation called the universaltext.
Writing Prose
When writing prose one sets some tags such as ~t to indicate that the following is a title, or ~article to begin a new article.
~article Title of the first article
~t Title of the first section
This is the first paragraph.
This is the second paragraph.
~t Title of the second section
...
~article Title of the second article
~t First section of it
First paragraph of 2nd article
The input text is read by the interpreter, that records the literal parts and the given structure. This can be used to navigate and query the registered information. For example one can retrieve the titles of all articles, or whole articles, or generate web pages or a LaTeX book layout from them.
The text marks such as ~t and ~article above are not predefined by the interpreter but by the input files themselves and are completely free, their names as well as their relationships can be defined at will. The above input lines would be for example preceeded by some lines such as these:
^article {
^p :string
^title :string
^t :string
}
This introduces the root symbol ”article“ and its member symbols ”title“ (for article title), ”p“ (for paragraph) and ”t“ (for section title). After defining some symbols one can mark prose with them and query it. The symbolic structure that one can define is not a shallow document-oriented structure, but a profound logical one.
Structuring Text
Let's see now how the text structure can be defined. At the beginning there is always a symbol definition.
^ person
The above sentence creates the symbol person.
^ woman : person
After introducing a symbol, one can immediately instanciate it to create another symbol. The previous sentence introduces the symbol woman as a particular occurrence of person. The Universaltext uses for this the name of ”type“. We say: the symbol woman has the type person.
^ family {
^ parent : person
^ child : person
}
This code snippet defines a new symbol family having two subordinate symbols parent and child, both of which are of type person. This means: a family consists of persons, each of them playing in this family the role of a parent or of a child.
Let us now instantiate the type family:
=Smith ~family
The above introduces the symbol ”Smith“ as a particular family. But we said before: a family consists of parents and children. Therefore this family may consist of some particular parents and some particular children.
=Smith ~family {
~parent =Mary : woman
~parent =John : man
~child =Lena : woman
~child =Peter : man
}
The above means: The symbol Smith denotes a family that consists of two parents, a woman called Mary and a man called John, and two children, a woman called Lena and a man called Peter.
Note that this expression is only correct assuming the prior definition of family. The Universaltext Interpreter checks the logical correctness of the expressions. If you try to define the Smith family as above without having previously defined a family, your text will not compile and the interpreter will abort execution generating an error message.
The interpreter ”understands“ the underlying logical structure. That is its key feature. The text is parsed and it is recorded as symbols with relationships between them. This makes it possible to navigate and query the text, as we will see soon.
Creating a Website
Suppose you make a genealogical website. You can begin this way:
^website {
^webpage {
^title : string
^content {
^p : string
^h1 : string
}
}
}
You do not define string because this is a symbol already defined by the interpreter. It represents a common string.
After defining the general structure, you can enter a particular web page:
=geneaweb ~website
=index ~webpage
~title The Genealogy Site
~content
~h1 The Genealogy Site
Welcome to the Genealogy site!
This Site reports the history of Family Smith.
What about having more families at the same site? We would define them as:
=Smith ~family {
~parent =Mary : woman
...
}
=Smithers ~family {
~parent =Jenny : woman
...
}
=Smithereen ~family {
~parent =Jeremy : man
...
}
=Clark ~family {
...
}
=Smith ~family {
~parent =Wendy : woman
...
}
If you try to enter the above text, it won't work as you might expect. When introducing the second Smith family, the interpreter will parse it as refering to the first Smith family, thus making a single big Smith family with four parents and all children. If these symbols refer to different units, they must have different identifiers. We resolve this conflict rewriting the text as follows:
^ family {
^ name : string
^ parent : person
^ child : person
}
=Smith ~family {
~name Smith
~parent =Mary : woman
...
}
=Smithers ~family {
~name Smithers
~parent =Jenny : woman
...
}
=Smithereen ~family {
~name Smithereen
~parent =Jeremy : man
...
}
=Clark ~family {
...
}
=Smith2 ~family {
~name Smith
~parent =Wendy : woman
...
}
You could add a list of families to your webpage this way:
^website {
^webpage {
^title : string
^content {
^p : string
^h1 : string
^ul {
^li : string
}
}
}
}
=geneaweb ~website
=index ~webpage
~title The Genealogy Site
~content
~h1 The Genealogy Site
Welcome to the Genealogy site!
This Site reports the history of families:
~ul {
Clark
Smith
Smith
Smithereen
}
But why should we do the work of enumerating the family list? The system knows that we have collected information about these particular families. We can let the interpreter provide this list for us, and even sort it alphabetically. Let us see how.
The UString Tags
This can be achieved using ustrings. The predefined type ustring is a string containing embedded tags that can be evaluated by the interpreter. If we define a symbol say ~html : ustring we can replace the above family list with something like that:
This Site reports the history of families:
~html [foreach/ family][v name] [/foreach]
When generating the website, the interpreter evaluates the given ustring this way: The foreach tag is repeated for each known symbol that happens to play the role "~family". In our case it gets repeated four times. Each time the tag [v name] gets evaluated, returning respectively the family name. Thus the whole line gets expanded as: "Smith Smithers Smithereen Clark Smith".
That is quite imperfect, because we get here all names at the same line. To get this expanded in html as an unordered list we enhance the line:
~html <ul>[foreach/ family]<li>[v name]</li>[/foreach]</ul>
Each family name is now enclosed in a <li> tag and the whole gets therefore expanded as:
<ul><li>Smith</li><li>Smithers</li><li>Smithereen</li><li>Clark</li><li>Smith</li></ul>
We want to get this list sorted by the family name, which we get adding < to the foreach-clause:
~html <ul>[foreach/ <family]<li>[v name]</li>[/foreach]</ul>
Now we get the family list as expected. And when we some day add data of a new family, the list at the webpage gets updated and sorted automatically.
Generating HTML files
To generate a website one can write a small UText script that lets the universaltext interpreter parse the text and then queries it to generate the html pages. The main body of such a script can look like this:
read geneaweb.utl
select website.webpage begin
[...]
end
First the interpreter is instructed to read the file geneaweb.utl which is a plain text file containing the text definition. After that the parsed text is traversed via a cursor that visits each text unit that plays the role webpage under the unit with role website and generates each page.
The page generation could look this way:
save [u].html begin
out <html><head><title>[v title]</title></head><body>
select content.? do case begin
when type h1 do out <h1>[v]</h1>
when type p do out <p>[v]</p>
when type html do v
end
out </body></html>
end
After creating the file and setting its header, a cursor steps over each unit under the role content and generates html code according to each paragraph's type.
The sample ”geneaweb“ can be found at the distribution files under the directory samples. There is a file geneaweb.utl with the text definition and a file make-geneaweb.utl with the above UText script to generate the web pages. There are alternate Perl scripts geneaweb.pl, geneaweb1.pl and geneaweb2.pl that do exactly the same but use different ways to generate the pages with Perl.
Summary
At this quick tour we have seen the main idea of the universaltext interpreter. With the universaltext language one defines a text structure consisting of symbols with relationships between them and some literal data. The system ensures that the text structure is coherent and provides query capabilities to traverse the text and get it under different views. With a simple script one can define transformations on the text that generate documents. We have seen a sample website, but the same procedure can be used to generate say a LaTeX document for a book to be published, or an electronic book, or why not to generate some Perl modules including its POD documentation. The basic idea is the separation between the semantics and the additional data needed to generate a document for a particular medium, above the separation between the genalogical data on the one side and the webpages on the other side. The semantics are specified through a general-purpose symbolic language and they can be revised at any time without having to make relevant changes on other parts of the text or the scripts that process it.

