Abstracts of Invited Talks

You may scroll through all abstracts of the invited talks, in the program's chronological order.

Separate access to each abstract is also provided through its title in the workshop program. There, links under author names point to authors' home pages.



Algebraic Processing of Programming Languages

Teodor Rus (University of Iowa, Iowa City, USA)

Current methodology for compiler construction evolved from the need to release programmers of the burden of writing machine-language programs. These methodology do not assume a formal concept of a programming language and is not based on mathematical algorithms that model the behavior of a compiler. The side effect is that compiler implementation is a difficult task and the correctness of a compiler usually is not proven mathematically. Moreover, a compiler may be based on assumptions on its source and target languages that are not necessarily acceptable for another compiler that has the same source and target languages. The consequence is that programs are not portable between platforms of machines neither between generations of languages. In addition, while a conventional compiler freezes the notation that programmers can use to develop their programs the problem domain evolves and requires extensions that are not supported by the compiler. These problems are addressed by two directions of research in the current language processing technology. One direction enriches the programming environment provided by the conventional compilers with tools that optimize programs according to the architecture of the target machines. The other research direction focuses on a new methodology for programming language design and implementation that accommodates the existing programming tools and at the same time allows programmers to manufacture their own languages and compilers adapted to their own machines and problem domains.

The first research direction further complicates the compiler, which is anyhow too complicated, and do not provide for language extensibility with the problem domain. The second research direction is based on mathematical concepts of programming language and programming language translation that are independent of the computer and computer user and could easily be mastered by programmers. The research reported in this talk fits within this framework. We first introduce the formal concept of a programming language which captures within the same framework all three components of a programming language, the semantics, the syntax, and their association into a programming language, as mathematical constructs of universal algebra. This concept reveals the compositional structure of programming languages and allow us to look for a natural decomposition of programming languages into simpler objects. Each language component is however mathematically specified and implemented and the specifications and implementations of these language components are further mathematically integrated into the specification and implementation of the programming language they define. We illustrate this approach of programming language processing with the algebraic compiler generation using as components language independent lexical analyzers, parallel pattern matching parsers, language independent type systems, machine independent target code generation, and parallel program development by the compiler under the direction of the programmer.



Polymorphic Syntax Definition

Eelco Visser (University of Amsterdam, Amsterdam, The Netherlands)

Context-free grammars can be used in algebraic specification instead of first-order signatures to define the structure of algebras. The rigidity of these first-order structures enforces a choice between strongly typed structures with little genericity or generic operations over untyped structures. Two-level signatures provide a better balance between genericity and typing. Two-level grammars are the grammatical counterpart of two-level signatures. The paper discusses generic polymorphic syntax definition in context-free grammars and two-level grammars and investigates the problems for the practical usage of two-level grammars as signatures in algebraic specification formalisms.



Algebraic Specification of Documents

Jose Carlos Ramalho, Jose Joao Almeida, Pedro Rangel Henriques (Portugal)

Abstract: Purpose and tool oriented writing of documents reduces their flexibility and reuse. It's becoming normal that a document should serve several purposes. So, the writer will loose some time adapting the same document to each different purpose, when this tasks could be done automatically after the first version of the document with an appropriate system. This communication highlights the guidelines to build a system to solve the above problem. Such a system should be an algebraic specification environment and provide:

- Document Type Definitions.
- Function Definitions over Document Types.
- Document definitions as algebraic terms.

This approach (model based algebraic specification), will allow us to have an homogeneous environment to deal with operations as merging documents, converting documents from one format to another, extracting portions of documents, and some other unusual operations like mail reply and literate programming. We intend to base this system on CAMILA (Prototyping Environment which uses the specification language CAMILA based on SETs), and build it as an extension of the actual CAMILA.



Multi-layered Pipeline Parsing from Multi-axiom Grammars

James S. Jones (Graceland College and University of Iowa, USA)

Abstract: Multi-axiom grammars have been introduced as an alternative to the single-axiom context free grammars and the all-axiom algebraic grammars for programming language specification. Multi-axiom grammars eliminate some of the limitations of the context-free grammars used for programming language specification and implementation. However, prior algebraic notions regarding multi-axiom grammars such as subgrammar, primitive subgrammar, quotient grammar, and grammar/language hierarchy, were based on limited use of non-axioms. In this talk, I will present some of these concepts and a naturally parallel algorithm for parsing programming languages specified by multi-axiom grammars. This algorithm is convenient for large program development because it can recognize any phrase in the language and thus facilitates incremental development of programs. The algorithm is efficient and can use the computation power of the parallel machines because it distributes the parsing task among a collection of smaller parsers associated with individual language layers. The algorithm is called a PHRASE parser, an acronym for its actions: Pass, Halt, Reduce, Accept, Shift, and Error.

I will also demonstrate layered parsing on sample inputs using software that I have developed to study such activity, entitled MAGLAB (for Multi-Axiom Grammar LAB environment).



Parsing Schemata and Correctness of Parsing Algorithms

Klaas Sikkel (GMD, Sankt Augustin, Germany)

Abstract: The Parsing Schemata framework defines a formal level of abstraction between context-free grammars and parsers. A parsing schema abstracts from data structures, control structures and (in case of parallel algorithms) communication structures, thus allowing a clear and concise specification of the essential traits of an algorithm. One of the applications of this framework is in proving the correctness of parsers. Based on a parsing schema, the correctness proof of an algorithm is separated into two parts: (i) proving that the schema is correct, and (ii) proving that the algorithm implements this schema. The talk will introduce parsing schemata, give some examples of schemata and algorithms, and discuss a general method for establishing the correctness of a parsing schema.



Systematic Tabular Simulation of Stack-Based Parsers

Francois Barthelemy (CNAM-Laboratoire de Micro-informatique, Paris, France)

Abstract: The talk deals with the systematic creation of tabular procedures to simulate arbitrary stack computations (such as parsing). Tabulation consist in storing in a table data representing elementary computation steps, in order to avoid recomputations. We are mainly concerned with the question of what form the items to be stored in the table must adopt. For example, most tabular context-free parsers use items made up of a stack symbol and two integers. However, this is not always sound, and things become more difficult with context-sensitive formalisms (LFGs, HPSGs, etc).

We are going to define a direct but powerful extension of push-down automata and describe a tabular simulation procedure for these extended automata. This procedure is based on two operations: a cutting operation which breaks stacks into several items and the opposite operation which brings several items together. Our procedure is generic with respect to these two operations. We provide some sufficient properties of the operations to ensure the soundness of the simulation with respect to the extended push-down automaton execution.

Our results do not lead to complete automation of the design of tabular simulation which still requires two manual steps: finding appropriate cutting and combining operations and then proving that they fulfil the specified conditions. Nevertheless, a significant part of the proof of soundness is established once and for all on our generic frame, and the given conditions provide guidelines to find these two operations.



Parsing algebraic power series using dynamic programming

Frederic Tendeau, INRIA, France

Abstract: The use of ambiguous grammars raises a complexity problem for parsing: an exponential (or even infinite, in case of cyclic grammars) number of parse-trees are to be considered. The dynamic programming techniques reduce this complexity down to polynomial. When looking for decorating trees with contextual informations, the problem raises again. The formal power series formalism is very interesting for its generic description of the parsing paradigm. We show how to applicate dynamic programming techniques to computations in the semiring of the power series, independently from its nature, ie without considering it as describing booleans (in a recognition aim), forests (parsing), or any decoration domain.



Categorial Grammar: A State of the Art Survey

Michael Moortgat (Universiteit Utrecht, Utrecht, The Netherlands)

Abstract: The line of investigation started by Lambek in the late fifties presents formal grammar as a logic. As a system for reasoning about linguistic resources, the grammar logic accounts for the composition of linguistic form and meaning in deductive terms.

In current elaborations of the Lambek framework the grammar logic takes the form of a composite system, in which different modes of grammatical composition live together and interact. The composition modes share a common residuation logic; they differ in the structural rule packages that govern the management of the linguistic resources. Grammatical inference, in the mixed setting, has to be relativized to the composition modes, thus becoming a context-dependent process.

The generalization to multimodal systems has greatly enhanced the linguistic expressivity of the categorial framework. In the talk I will try to give an impression of the empirical scope of the current systems, and I will discuss the computational interpretation of the multimodal architecture in terms of `parsing as deduction'.



On the Convergence Between `Minimalist' Syntax and Categorial Grammar

Professor Robert C. Berwick, MIT Department of Computer Science
Professor Samuel David Epstein, Harvard Department of Linguistics

In this paper we show that the so-called ``Minimalist Program'' of Chomsky (1993, 1995) can be given a natural interpretation as a categorial system in which there is exactly one syntactic (algebraic) operation: namely, ``Hierarchically concatenate'' (what Chomsky calls ``Merge''), at the same time replacing the representations of ``D-structure,'' `S-structure'' and transformations with the derivation lines typical of categorial systems---thus unifying two previouly disparate approaches to the analysis of natural language. For example, the general ``movement'' rule of transformational grammar is easily seen to be a subcase of Hierarchical concatenation of (alpha, beta), where alpha is a subtree of beta; this automatically derives the usual c-command condition on so-called ``empty categories.'' The usual semantic interpretation benefits of the categorial approach also follow. Further, we demonstrate that by positing a single syntactic concatenation operation we can *derive*---rather than stipulate---the observed grammatical relations in natural languages (viz., Subject-Verb, or Specifier-Head; the notion Head-of; Verb-Complement or Head-Complement; and constituent command or c-command), as well as the primacy of ``adajacency'' in syntactic constraints. Finally, we show how the ``minimalist program'' can be extended to the computational ground of parsing, in that the concatenative system can be naturally interpreted as a generalized, canonical LR parser with a corresponding minimal set of computational operations---suggesting that the (abstract) human parsing system is, like the human language faculty, ``perfect'' in the sense that the parser delivers to the language faculty exactly those derivational sequences required for the language faculty to ``interpret'' sentences.


A Simple Uniform Semantics for Concatenation-Based Grammar

Annius V. Groenink (CWI, Amsterdam, The Netherlands)

Abstract: Linear context-free rewriting systems (LCFRS; Weir) are grammar systems based on tuples of terminal strings; they provide a natural progression from the context- free grammars, to the mutually weakly equivalent class of head grammar (HG), tree adjoining grammar (TAG), linear indexed grammar (LIG) and combinatory categorial grammar} (CCG) (Pollard, Weir) and beyond. LCFRS are the largest known well-defined class of mildly context-sensitive grammars, that is grammars

1. with a limited capacity for describing crossed dependencies
2. recognisable in polynomial time
3. satisfying the constant growth property.

We propose a more elementary, definite clause-style notation for LCFRS, and then show how this new notation allows for an elegant further progression to the classes PTIME, EXPTIME and finally those that are recursively enumerable. Furthermore, we show how this framework, called concatenative predicate grammar, provides a simple and uniform semantical perspective for a number of other existing grammar formalisms, such as the author's literal movement grammars. We discuss both a derivational semantics and an algebraic least fixed point version.



Theory of Texts

G. Rozenberg (Universiteit Leiden, the Netherlands)

Abstract: The theory of 2-structures provides a convenient framework for the decomposition and transformation of graph-like structures. In this talk we will discuss, in a tutorial fashion, a subclass of the class of 2-structures, the so-called T-structures. The notion of T-structure is a natural generalization of the notion of linear order. The basic result of the theory of T-structures, which says that a T-structure can be represented by two linear orders, leads to the notion of a text. It generalizes the notion of a word as used in formal language theory. A text may be seen as a word with an additional linear order of its domain. This order determines a "structure" spanned on the word - it may be a tree, but it may be also more general than a tree. We consider context-free grammars for different classes of texts.



A Graph Grammar Approach to Graphical Parsing

J. Rekers (Leiden University, the Netherlands)
A. Schuerr (RWTH Aachen, Germany)

Abstract: We present a new graph grammar based approach for defining the syntax of visual languages and for generating visual language parsers. Its main advantage in comparison to other visual language parsing approaches is its ability to handle context-sensitive productions which may replace more than one non-terminal at the same time and which may contain very complex context requirements. Its implementation will be part of a forthcoming parsing toolkit for visual languages and the already existing graph grammar programming environment PROGRES.



Strong Interchangeability and Nonlinearity of Primitive Words

S'andor Horv'ath (E"otv"os Lor'and Univ., Budapest, H)

Abstract: A primitive word is a nonempty word that cannot be written as a proper power. Such words play an important role in combinatorics on words and in the theory of codes. In this paper we prove four new results (in Theorems 1-4) on the set Q of all primitive words over a fixed, several-letter alphabet. The first result is that Q satisfies a strong interchange property. The second one is that the density of nonprimitive words up to length n tends to 0 at an exponential speed. The third one is that Q is not a bounded language. The fourth result says that Q is not a linear context-free language. The conjecture that Q is not context-free remains unproved.


Algebraic Methods in Categorial Grammar

Wojciech Buszkowski (Adam Mickiewicz University, Poznan, Poland)

Abstract: We discuss different algebraic structures which are natural algebraic frames for categorial grammars. First, absolutely free algebras of functor-argument structures and phrase structures together with powerset algebras of types are used to characterize structure languages of Basic Categorial Grammars and to provide algorithms for equivalence problems and related questions. Second, unification applied to the above frames is employed to develop learning procedures for these grammars. Third, residuated algebras and implication structures are used to model language hierarchies of Lambek Categorial Grammars.



A Variant of a Universal Metagrammar of Conceptual Structures. Algebraic Systems of Conceptual Syntax

Vladimir A. Fomichov (Moscow State University, Moscow, Russia)

Abstract: A new logic-algebraic theory is set forth; it is called the theory of K-calculuses, algebraic systems of conceptual syntax, and K-languages (the KCL-theory). The theory describes structured meanings (SMs) of highly complicated (very likely, arbitrarily complicated) real discourses pertaining to science, technology, medicine, business, law, etc. The KCL-theory is a central constituent of Integral Formal Semantics (IFS) - a powerful approach to mathematical studying semantics and pragmatics of natural language (NL).



The Method of Rosetta, Natural Language Translation Using Algebras

Theo M.V. Janssen (Univ. of Amsterdam, Amsterdam, The Netherlands)

Abstract: Rosetta is a research project (of Philips) for translating natural language. The underlying linguistic theory is an amalgation of Montague Grammar and transformational grammar. Its basic idea about translating is expressed by the principle of compositionality of translation:

Two expressions are each other's translation if they are built up from parts which are each other's translation, by means of translation equivalent rules.

This principle will be formalized as follows: translating is a homomorphism from the term algebra for the source language to the term algebra for the target language.

In several other grammatical theories there are proposals for automated translation; we will focus on: Tree Adjoining Grammar, Lexical Functional Grammar, and Functional Grammar. These proposals seem to differ much. However, if one considers the situation from an algebraic perspective, the differences disappear. In all theories the idea emerged that, although natural languages can be very different, the underlying term algebras are much alike. The proposals can be seen as instances of the same algebraic situation as Rosetta.

Given this situation, it can be explained that the proposals claim the same successes and meet the same problems. Idioms are often presented as a success of a proposal, and problems are quantifier scope and long distance dependencies. This situation is however not due to the used linguistic theory, but to the algebraic aspects of the approach. The algebraic perspective gives a unifying approach on the apparently different theories, and makes it possible to compare (and transfer) their methods, problems and solutions.



Contextual Grammars with Depth-First Derivation

Carlos Martin Vide, J. Miquel-Verges
Gh. Paun (Rovira i Virgili University, Tarragona, Spain)

Abstract: We introduce and investigate contextual grammars with the derivation performed in the depth-first mode known from Chomsky grammars: the selector used at any step must contain as a subword at least one of the members of the context adjoined at the previous derivation step. Among the interesting features of this type of grammars is the fact that the empty context is very important from the generative point of view (this is similar with erasing rules in Chomsky grammars). We also shortly examine the linguistic relevance of these grammars.



Subword Memership Problem for Linear Indexed Languages

Pál Dömösi (Kossuth University, Debrecen, Hungary)
Jürgen Duske (University of Hannover, Germany)

Abstract: A typical problem in formal language theory is to decide for some word p whether or not it belongs to a given language L. This is the so-called membership problem. The subword membership problem is a natural extension of this question:

Given a language L and a word p, decide whether or not p belongs to Sub(L)={p | qpr \in L for some pair q,r}.

We prove that for any linear indexed language L there exists a constant c such that if p \in Sub(L) then there exist words q,r with qpr \in L and |q|, |r| \leq c|p|. This is a natural extension of an earlier result concerning context-free languages. The decidability of the subword membership problem for indexed languages remains open. (For context-sensitive languages the problem is known to be undecidable.)



Algebraic Methods for Anaphora Resolution

Celia Rico Perez (Universidad Alfonso X El Sabio, Villanueva de la Canada (Madrid), Espana)
Jose Simon Granda (Universidad de Alcala de Henares, Espana)

Abstract: One of the main problems in anaphora resolution is finding antecedents. As shown in a previous study (Rico, 1993) the type of knowledge required to identify antecedents springs from different sources of linguistic information (morphology, syntax, semantics and pragmatics) and it is only through the coordination of all these sources that we will be able to approach the task from an effective point of view. The strategy for anaphora resolution we present here allows the integration of the mentioned sources by means of a vector- like model. We consider each anaphoric expression and possible antecedents as vectors and we compare them in such a way that closer vectors indicate anaphoric relations.This strategy involves the following aspects: (a)defining a base for representing the selected features; (b) implementing a transducer for transducing qualitative features into quantitative values; (c) defining an operation for comparing vectors and interpreting the results.The work we present here explores the solutions to the outlined tasks and adopts a "democratic" approach to anaphora (Carter, 1992) as all sources of information play the same role in the selection of the antecedent.



A Logical Formalism for Interlingual Representation of Texts

Vincenzo Manca (University of Pisa, Italy)

Abstract: Logical theories of natural language (see [4,7,8]) allow a formal analysis of fundamental linguistic phenomena and the definition of clear and rigorous settings for old linguistic problems. Despite these important achievements, formal tools remain in the specialistic realm of logicians, linguists and philosophers as laboratory devices for very specific purposes. The main objective of this paper is to develop a wide range logical formalism that could provide a basis for interlingual representation of texts. That formalism has been conceived in such a manner that it could be compared with First Order Language in the formal description of mathematical theories or with programming languages in the specification of algorithms. Of course, its formal semantics should be an important issue, but, at a first stage, the primary interest is in its design, on the basis of experiments in specific contexts. The basic idea of such a logical formalism is extending first order language following an approach outlined by Reichenbach nearly fifty years ago (cf. [7]) and recent advancements in logical studies on language, e.g. situated logics and similar theories (cf. [2]). An important aspect of the considered formalism is its intrinsic connection to traditional grammar (cf. [6]), to language universals and to linguistic typology (cf. [1,3,6]).
The basis for our objective is a symbolization of linguistic constructs such as 'predication', 'determination', 'relativization', 'quantification', 'modification', 'complementation', 'factualization', 'relativization', 'coordination', 'enunciation', 'anaphora' and 'deixis', within a sort of "situated" logic where the formal translation of texts results to be adequate for the main functionalities of ordinary discourse. We assign to any textual component, one of four basic logical categories called 'categoremas': *substantive*, *predicate*, *proposition*, *sentence*. Moreover, we consider variables, constants (some of which are 'situations') and formation rules that formalize about 20 fundamental grammatical constructs present in almost all languages.
An important issue of the proposed theory is the specification of a sort of 'Text Formalization Manual' as a guide for adequate symbolizations in correspondence to different linguistic forms. In a sense, this manual does play the role of a traditional grammar for bilinguistic translation. At a first stage, the information content of English texts is reduced into formulas built by intergrammatical symbols and English lexical items. Another important aspect related to the developed formalism, is the definition of small lexicons for some experimental purposes. Future research are focused on: a) the use of the formalism for studying 'archetypal' lexicons, where definitions of lemmas are given by formulas based on a small lexical kernel; b) the design of algorithms for text generation, driven by logical representations, in order to find a sort of 'grammatical engine' able to produce the surface syntactic form of a given representation; c) the development of automatic tools which could help one to establish the most appropriate translation between logical representations.

References

[1] Comrie B., Language Universals and Linguistic Typology, Syntax and Morphology, Basil Blackwell, Oxford, 1981.
[2] Devlin K., Logic and Information, Cambridge University Press, Cambridge (Mass.), 1991.
[3] Greenberg J. H., Universals of Language, The MIT Press, Cambridge (Mass.), 1963.
[4] Kamp H.,Uwe R., From Discourse to Logic, Kluver Academic, Dordrecht, 1993.
[5] Large A., The artificial Language Movement, Basil Blackwell, Oxford, 1985.
[6] Manca V., Typology and logical structure of natural languages, K. Sikkel, A. Njiholt (eds.), in Proc. of the TWLT Workshop on Parsing Natural Language, University of Twente, Enschede, NL, Dec., 16-17, 1993.
[7] Reichenbach H., Elements of Symbolic Logic, MacMillan Limited, New York, 1947.
[8] Thomason R. H. (ed.), Formal Philosophy, Yale University Press, London, 1974.



Last update: December 8, 1997, by G. Scollo (scollo@cs.utwente.nl)