Abstracts of Invited Talks
You may scroll through all abstracts of the invited talks, in the
program's chronological order.
Separate access to each abstract is also provided through its title
in the workshop program.
There, links under author names point to authors' home pages.
Algebraic Processing of Programming Languages
Teodor Rus (University of Iowa, Iowa City, USA)
Current methodology for compiler construction evolved from the need to
release programmers of the burden of writing machine-language
programs. These methodology do not assume a formal concept of a
programming language and is not based on mathematical algorithms that
model the behavior of a compiler. The side effect is that compiler
implementation is a difficult task and the correctness of a compiler
usually is not proven mathematically. Moreover, a compiler may be
based on assumptions on its source and target languages that are not
necessarily acceptable for another compiler that has the same source
and target languages. The consequence is that programs are not
portable between platforms of machines neither between generations of
languages. In addition, while a conventional compiler freezes the
notation that programmers can use to develop their programs the
problem domain evolves and requires extensions that are not supported
by the compiler. These problems are addressed by two directions of
research in the current language processing technology. One direction
enriches the programming environment provided by the conventional
compilers with tools that optimize programs according to the
architecture of the target machines. The other research direction
focuses on a new methodology for programming language design and
implementation that accommodates the existing programming tools and at
the same time allows programmers to manufacture their own languages
and compilers adapted to their own machines and problem
domains.
The first research direction further complicates the compiler,
which is anyhow too complicated, and do not provide for language
extensibility with the problem domain. The second research
direction is based on mathematical concepts of programming
language and programming language translation that are
independent of the computer and computer user and could easily be
mastered by programmers. The research reported in this talk fits
within this framework. We first introduce the formal concept of
a programming language which captures within the same framework
all three components of a programming language, the semantics,
the syntax, and their association into a programming language,
as mathematical constructs of universal algebra. This concept
reveals the compositional structure of programming languages and
allow us to look for a natural decomposition of programming
languages into simpler objects. Each language component is
however mathematically specified and implemented and the
specifications and implementations of these language components
are further mathematically integrated into the specification and
implementation of the programming language they define.
We illustrate this approach of programming language processing
with the algebraic compiler generation using as components
language independent lexical analyzers, parallel pattern matching
parsers, language independent type systems, machine independent
target code generation, and parallel program development
by the compiler under the direction of the programmer.
Polymorphic Syntax Definition
Eelco Visser (University of Amsterdam, Amsterdam, The Netherlands)
Context-free grammars can be used in algebraic specification instead
of first-order signatures to define the structure of algebras. The
rigidity of these first-order structures enforces a choice between
strongly typed structures with little genericity or generic operations
over untyped structures. Two-level signatures provide a better balance
between genericity and typing. Two-level grammars are the grammatical
counterpart of two-level signatures. The paper discusses generic
polymorphic syntax definition in context-free grammars and two-level
grammars and investigates the problems for the practical usage of
two-level grammars as signatures in algebraic specification
formalisms.
Algebraic Specification of Documents
Jose Carlos Ramalho, Jose Joao Almeida, Pedro Rangel Henriques
(Portugal)
Abstract: Purpose and tool oriented writing of documents reduces their
flexibility and reuse. It's becoming normal that a document should
serve several purposes. So, the writer will loose some time adapting
the same document to each different purpose, when this tasks could be
done automatically after the first version of the document with an
appropriate system. This communication highlights the guidelines to
build a system to solve the above problem. Such a system should be an
algebraic specification environment and provide:
- Document Type Definitions.
- Function Definitions over Document Types.
- Document definitions as algebraic terms.
This approach (model based algebraic specification), will allow us to
have an homogeneous environment to deal with operations as merging
documents, converting documents from one format to another, extracting
portions of documents, and some other unusual operations like mail
reply and literate programming. We intend to base this system on
CAMILA (Prototyping Environment which uses the specification language
CAMILA based on SETs), and build it as an extension of the actual
CAMILA.
Multi-layered Pipeline Parsing from Multi-axiom Grammars
James S. Jones (Graceland College and University of Iowa, USA)
Abstract: Multi-axiom grammars have been introduced
as an alternative to the single-axiom context free grammars
and the all-axiom algebraic grammars for programming
language specification. Multi-axiom grammars eliminate
some of the limitations of the context-free grammars used for
programming language specification and implementation. However,
prior algebraic notions regarding multi-axiom grammars such as
subgrammar, primitive subgrammar, quotient grammar, and
grammar/language hierarchy, were based on limited use of non-axioms.
In this talk, I will present some of these concepts and a naturally
parallel algorithm for parsing programming languages specified by
multi-axiom grammars. This algorithm is convenient for large
program development because it can recognize any phrase in the
language and thus facilitates incremental development of programs.
The algorithm is efficient and can use the computation power of the
parallel machines because it distributes the parsing task among a
collection of smaller parsers associated with individual language
layers. The algorithm is called a PHRASE parser, an acronym for
its actions: Pass, Halt, Reduce, Accept, Shift, and Error.
I will also demonstrate layered parsing on sample inputs using
software that I have developed to study such activity, entitled
MAGLAB (for Multi-Axiom Grammar LAB environment).
Parsing Schemata and Correctness of Parsing Algorithms
Klaas Sikkel (GMD, Sankt Augustin, Germany)
Abstract: The Parsing Schemata framework defines a formal
level of abstraction between context-free grammars and
parsers. A parsing schema abstracts from data structures,
control structures and (in case of parallel algorithms)
communication structures, thus allowing a clear and concise
specification of the essential traits of an algorithm.
One of the applications of this framework is in proving the
correctness of parsers. Based on a parsing schema, the
correctness proof of an algorithm is separated into two parts:
(i) proving that the schema is correct, and (ii) proving that the
algorithm implements this schema.
The talk will introduce parsing schemata, give some examples
of schemata and algorithms, and discuss a general method for
establishing the correctness of a parsing schema.
Systematic Tabular Simulation of Stack-Based Parsers
Francois Barthelemy (CNAM-Laboratoire de Micro-informatique, Paris,
France)
Abstract: The talk deals with the systematic creation of tabular
procedures to simulate arbitrary stack computations (such as
parsing). Tabulation consist in storing in a table data
representing elementary computation steps, in order to avoid
recomputations. We are mainly concerned with the question of
what form the items to be stored in the table must adopt. For
example, most tabular context-free parsers use items made
up of a stack symbol and two integers. However, this is not always
sound, and things become more difficult with context-sensitive
formalisms (LFGs, HPSGs, etc).
We are going to define a direct but powerful extension of
push-down automata and describe a tabular simulation procedure
for these extended automata. This procedure is based on two
operations: a cutting operation which breaks stacks into several
items and the opposite operation which brings several items
together. Our procedure is generic with respect to these two
operations. We provide some sufficient properties of the
operations to ensure the soundness of the simulation with
respect to the extended push-down automaton execution.
Our results do not lead to complete automation of the design
of tabular simulation which still requires two manual steps:
finding appropriate cutting and combining operations and then
proving that they fulfil the specified conditions. Nevertheless,
a significant part of the proof of soundness is established once
and for all on our generic frame, and the given conditions
provide guidelines to find these two operations.
Parsing algebraic power series using dynamic programming
Frederic Tendeau, INRIA, France
Abstract: The use of ambiguous grammars raises a complexity problem
for parsing: an exponential (or even infinite, in case of cyclic
grammars) number of parse-trees are to be considered. The dynamic
programming techniques reduce this complexity down to polynomial.
When looking for decorating trees with contextual informations, the
problem raises again. The formal power series formalism is very
interesting for its generic description of the parsing paradigm. We
show how to applicate dynamic programming techniques to computations
in the semiring of the power series, independently from its nature, ie
without considering it as describing booleans (in a recognition aim),
forests (parsing), or any decoration domain.
Categorial Grammar: A State of the Art Survey
Michael Moortgat (Universiteit Utrecht, Utrecht, The Netherlands)
Abstract: The line of investigation started by Lambek in the
late fifties presents formal grammar as a logic. As a system
for reasoning about linguistic resources, the grammar
logic accounts for the composition of linguistic form and
meaning in deductive terms.
In current elaborations of the Lambek framework
the grammar logic takes the form of a composite system,
in which different modes of grammatical composition
live together and interact. The composition modes
share a common residuation logic; they differ in the
structural rule packages that govern the management
of the linguistic resources. Grammatical inference,
in the mixed setting, has to be relativized to
the composition modes, thus becoming a context-dependent
process.
The generalization to multimodal systems has greatly
enhanced the linguistic expressivity of the categorial
framework. In the talk I will try to give an impression
of the empirical scope of the current systems, and I
will discuss the computational interpretation of the
multimodal architecture in terms of `parsing as deduction'.
On the Convergence Between `Minimalist' Syntax and Categorial Grammar
Professor Robert C. Berwick, MIT Department of Computer Science
Professor Samuel David Epstein, Harvard Department of Linguistics
In this paper we show that the so-called ``Minimalist Program'' of
Chomsky (1993, 1995) can be given a natural interpretation as a
categorial system in which there is exactly one syntactic (algebraic)
operation: namely, ``Hierarchically concatenate'' (what Chomsky calls
``Merge''), at the same time replacing the representations of ``D-structure,''
`S-structure'' and transformations with the derivation lines typical
of categorial systems---thus unifying two previouly disparate
approaches to the analysis of natural language. For example, the
general ``movement'' rule of transformational grammar is easily seen
to be a subcase of Hierarchical concatenation of (alpha, beta), where
alpha is a subtree of beta; this automatically derives the usual
c-command condition on so-called ``empty categories.'' The usual
semantic interpretation benefits of the categorial approach also
follow.
Further, we demonstrate that by positing a single syntactic
concatenation operation we can *derive*---rather than stipulate---the
observed grammatical relations in natural languages (viz.,
Subject-Verb, or Specifier-Head; the notion Head-of; Verb-Complement
or Head-Complement; and constituent command or c-command), as well as
the primacy of ``adajacency'' in syntactic constraints.
Finally, we show how the ``minimalist program'' can be extended to the
computational ground of parsing, in that the concatenative system can
be naturally interpreted as a generalized, canonical LR parser with a
corresponding minimal set of computational operations---suggesting
that the (abstract) human parsing system is, like the human language
faculty, ``perfect'' in the sense that the parser delivers to the
language faculty exactly those derivational sequences required for
the language faculty to ``interpret'' sentences.
A Simple Uniform Semantics for Concatenation-Based Grammar
Annius V. Groenink (CWI, Amsterdam, The Netherlands)
Abstract: Linear context-free rewriting systems (LCFRS;
Weir) are grammar systems based on tuples of terminal
strings; they provide a natural progression from the context-
free grammars, to the mutually weakly equivalent class of
head grammar (HG), tree adjoining grammar (TAG), linear
indexed grammar (LIG) and combinatory categorial
grammar} (CCG) (Pollard, Weir) and beyond. LCFRS are the
largest known well-defined class of mildly context-sensitive
grammars, that is grammars
1. with a limited capacity for describing crossed dependencies
2. recognisable in polynomial time
3. satisfying the constant growth property.
We propose a more elementary, definite clause-style notation
for LCFRS, and then show how this new notation allows for
an elegant further progression to the classes PTIME,
EXPTIME and finally those that are recursively enumerable.
Furthermore, we show how this framework, called
concatenative predicate grammar, provides a simple and
uniform semantical perspective for a number of other existing
grammar formalisms, such as the author's literal movement
grammars. We discuss both a derivational semantics and an
algebraic least fixed point version.
Theory of Texts
G. Rozenberg (Universiteit Leiden, the Netherlands)
Abstract: The theory of 2-structures provides a convenient framework
for the decomposition and transformation of graph-like structures. In
this talk we will discuss, in a tutorial fashion, a subclass of the
class of 2-structures, the so-called T-structures. The notion of
T-structure is a natural generalization of the notion of linear
order. The basic result of the theory of T-structures, which says
that a T-structure can be represented by two linear orders, leads to
the notion of a text. It generalizes the notion of a word as used in
formal language theory. A text may be seen as a word with an
additional linear order of its domain. This order determines a
"structure" spanned on the word - it may be a tree, but it may be also
more general than a tree. We consider context-free grammars for
different classes of texts.
A Graph Grammar Approach to Graphical Parsing
J. Rekers (Leiden University, the Netherlands)
A. Schuerr (RWTH Aachen, Germany)
Abstract: We present a new graph grammar based approach for defining
the syntax of visual languages and for generating visual language
parsers. Its main advantage in comparison to other visual language
parsing approaches is its ability to handle context-sensitive
productions which may replace more than one non-terminal at the same
time and which may contain very complex context requirements. Its
implementation will be part of a forthcoming parsing toolkit for
visual languages and the already existing graph grammar programming
environment PROGRES.
Strong Interchangeability and Nonlinearity of Primitive Words
S'andor Horv'ath (E"otv"os Lor'and Univ., Budapest, H)
Abstract: A primitive word is a nonempty word that cannot be
written as a proper power. Such words play an important role in
combinatorics on words and in the theory of codes. In this paper
we prove four new results (in Theorems 1-4) on the set Q of all
primitive words over a fixed, several-letter alphabet. The first
result is that Q satisfies a strong interchange property. The
second one is that the density of nonprimitive words up to length
n tends to 0 at an exponential speed. The third one is that Q is
not a bounded language. The fourth result says that Q is not a
linear context-free language. The conjecture that Q is not
context-free remains unproved.
Algebraic Methods in Categorial Grammar
Wojciech Buszkowski (Adam Mickiewicz University, Poznan, Poland)
Abstract: We discuss different algebraic structures which are natural
algebraic frames for categorial grammars. First, absolutely free
algebras of functor-argument structures and phrase structures together
with powerset algebras of types are used to characterize structure
languages of Basic Categorial Grammars and to provide algorithms for
equivalence problems and related questions. Second, unification
applied to the above frames is employed to develop learning procedures
for these grammars. Third, residuated algebras and implication
structures are used to model language hierarchies of Lambek Categorial
Grammars.
A Variant of a Universal Metagrammar of Conceptual
Structures. Algebraic Systems of Conceptual Syntax
Vladimir A. Fomichov (Moscow State University, Moscow, Russia)
Abstract: A new logic-algebraic theory is set forth; it is called the
theory of K-calculuses, algebraic systems of conceptual syntax, and
K-languages (the KCL-theory). The theory describes structured meanings
(SMs) of highly complicated (very likely, arbitrarily complicated)
real discourses pertaining to science, technology, medicine, business,
law, etc. The KCL-theory is a central constituent of Integral Formal
Semantics (IFS) - a powerful approach to mathematical studying
semantics and pragmatics of natural language (NL).
The Method of Rosetta, Natural Language Translation Using Algebras
Theo M.V. Janssen (Univ. of Amsterdam, Amsterdam, The Netherlands)
Abstract: Rosetta is a research project (of Philips) for translating
natural language. The underlying linguistic theory is an amalgation of
Montague Grammar and transformational grammar. Its basic idea about
translating is expressed by the principle of compositionality of
translation:
Two expressions are each other's translation if they are built
up from parts which are each other's translation, by means of
translation equivalent rules.
This principle will be formalized as follows: translating is a
homomorphism from the term algebra for the source language to the term
algebra for the target language.
In several other grammatical theories there are proposals for
automated translation; we will focus on: Tree Adjoining Grammar,
Lexical Functional Grammar, and Functional Grammar. These proposals
seem to differ much. However, if one considers the situation from an
algebraic perspective, the differences disappear. In all theories the
idea emerged that, although natural languages can be very different,
the underlying term algebras are much alike. The proposals can be seen
as instances of the same algebraic situation as Rosetta.
Given this situation, it can be explained that the proposals claim
the same successes and meet the same problems. Idioms are often
presented as a success of a proposal, and problems are quantifier
scope and long distance dependencies. This situation is however not
due to the used linguistic theory, but to the algebraic aspects of the
approach. The algebraic perspective gives a unifying approach on the
apparently different theories, and makes it possible to compare (and
transfer) their methods, problems and solutions.
Contextual Grammars with Depth-First Derivation
Carlos Martin Vide, J. Miquel-Verges
Gh. Paun (Rovira i Virgili University, Tarragona, Spain)
Abstract: We introduce and investigate contextual grammars
with the derivation performed in the depth-first mode known
from Chomsky grammars: the selector used at any step must
contain as a subword at least one of the members of the
context adjoined at the previous derivation step. Among the
interesting features of this type of grammars is the fact that
the empty context is very important from the generative point
of view (this is similar with erasing rules in Chomsky
grammars). We also shortly examine the linguistic relevance
of these grammars.
Subword Memership Problem for Linear Indexed Languages
Pál Dömösi (Kossuth University, Debrecen, Hungary)
Jürgen Duske (University of Hannover, Germany)
Abstract: A typical problem in formal language theory is to decide
for some word p whether or not it belongs to a given language L.
This is the so-called membership problem. The subword membership
problem is a natural extension of this question:
Given a language L and a word p, decide whether or not p belongs
to Sub(L)={p | qpr \in L for some pair q,r}.
We prove that for any linear indexed language L there exists a
constant c such that if p \in Sub(L) then there exist words q,r
with qpr \in L and |q|, |r| \leq c|p|.
This is a natural extension of an earlier result concerning
context-free languages. The decidability of the subword membership
problem for indexed languages remains open. (For context-sensitive
languages the problem is known to be undecidable.)
Algebraic Methods for Anaphora Resolution
Celia Rico Perez (Universidad Alfonso X El Sabio,
Villanueva de la Canada (Madrid), Espana)
Jose Simon Granda (Universidad de Alcala de Henares, Espana)
Abstract: One of the main problems in anaphora resolution is
finding antecedents. As shown in a previous study (Rico, 1993)
the type of knowledge required to identify antecedents springs
from different sources of linguistic information (morphology,
syntax, semantics and pragmatics) and it is only through the
coordination of all these sources that we will be able to
approach the task from an effective point of view.
The strategy for anaphora resolution we present here allows
the integration of the mentioned sources by means of a vector-
like model. We consider each anaphoric expression and
possible antecedents as vectors and we compare them in such
a way that closer vectors indicate anaphoric relations.This
strategy involves the following aspects: (a)defining a base for
representing the selected features; (b) implementing a
transducer for transducing qualitative features into
quantitative values; (c) defining an operation for comparing
vectors and interpreting the results.The work we present here
explores the solutions to the outlined tasks and adopts a
"democratic" approach to anaphora (Carter, 1992) as all sources of
information play the same role in the selection of the
antecedent.
A Logical Formalism for Interlingual Representation of Texts
Vincenzo Manca (University of Pisa, Italy)
Abstract: Logical theories of natural language (see [4,7,8])
allow a formal analysis of fundamental linguistic phenomena
and the definition of clear and rigorous settings for old
linguistic problems. Despite these important achievements,
formal tools remain in the specialistic realm of logicians,
linguists and philosophers as laboratory devices for very
specific purposes. The main objective of this paper is to
develop a wide range logical formalism that could provide a
basis for interlingual representation of texts. That formalism
has been conceived in such a manner that it could be
compared with First Order Language in the formal description
of mathematical theories or with programming languages in
the specification of algorithms. Of course, its formal
semantics should be an important issue, but, at a first stage,
the primary interest is in its design, on the basis of
experiments in specific contexts. The basic idea of such a
logical formalism is extending first order language following
an approach outlined by Reichenbach nearly fifty years ago
(cf. [7]) and recent advancements in logical studies on
language, e.g. situated logics and similar theories (cf. [2]). An
important aspect of the considered formalism is its intrinsic
connection to traditional grammar (cf. [6]), to language
universals and to linguistic typology (cf. [1,3,6]).
The basis for our objective is a symbolization of linguistic
constructs such as 'predication', 'determination',
'relativization', 'quantification', 'modification',
'complementation', 'factualization', 'relativization',
'coordination', 'enunciation', 'anaphora' and 'deixis', within a
sort of "situated" logic where the formal translation of texts
results to be adequate for the main functionalities of ordinary
discourse. We assign to any textual component, one of four
basic logical categories called 'categoremas': *substantive*,
*predicate*, *proposition*, *sentence*. Moreover, we
consider variables, constants (some of which are 'situations')
and formation rules that formalize about 20 fundamental
grammatical constructs present in almost all languages.
An important issue of the proposed theory is the specification
of a sort of 'Text Formalization Manual' as a guide for
adequate symbolizations in correspondence to different
linguistic forms. In a sense, this manual does play the role of a
traditional grammar for bilinguistic translation. At a first
stage, the information content of English texts is reduced into
formulas built by intergrammatical symbols and English
lexical items. Another important aspect related to the
developed formalism, is the definition of small lexicons for
some experimental purposes. Future research are focused on:
a) the use of the formalism for studying 'archetypal' lexicons,
where definitions of lemmas are given by formulas based on a
small lexical kernel; b) the design of algorithms for text
generation, driven by logical representations, in order to find a
sort of 'grammatical engine' able to produce the surface
syntactic form of a given representation; c) the development
of automatic tools which could help one to establish the most
appropriate translation between logical representations.
References
[1] Comrie B., Language Universals and Linguistic Typology,
Syntax and Morphology, Basil Blackwell, Oxford, 1981.
[2] Devlin K., Logic and Information, Cambridge University
Press, Cambridge (Mass.), 1991.
[3] Greenberg J. H., Universals of Language, The MIT
Press, Cambridge (Mass.), 1963.
[4] Kamp H.,Uwe R., From Discourse to Logic, Kluver
Academic, Dordrecht, 1993.
[5] Large A., The artificial Language Movement, Basil
Blackwell, Oxford, 1985.
[6] Manca V., Typology and logical structure of natural
languages, K. Sikkel, A. Njiholt (eds.), in Proc. of the TWLT
Workshop on Parsing Natural Language, University of
Twente, Enschede, NL, Dec., 16-17, 1993.
[7] Reichenbach H., Elements of Symbolic Logic, MacMillan
Limited, New York, 1947.
[8] Thomason R. H. (ed.), Formal Philosophy, Yale
University Press, London, 1974.
Last update: December 8, 1997, by G. Scollo (scollo@cs.utwente.nl)