Algebraic Specification of Documents Jos\'e Carlos Ramalho, Jos\'e Jo\~ao Almeida, Pedro Henriques Departamento de Inform\'atica Universidade do Minho Braga -- Portugal {jcr,jj,prh}@di.uminho.pt Abstract According to recent research, nearly 95 percent of a corporate's information is stored in documents. Further studies indicate that companies spend between 6 and 10 percent of their gross revenues in printing and distributing documents in several ways: web and cdrom publishing, database storage and retrieval, and printing. In this context documents exist in some different formats, from plain text files to internal database or text processor formats. It is clear that document reusability and low-cost maintenance are two important issues in the near future. The majority of available document processors is purpose-oriented, reducing the necessary flexibility and reusability of documents. The problem of adapting the same text to different purposes gives rise to waste of time. For example you may want to have the same document as an article, as a set of slides, or as a poster; or you can have a dictionary document producing a book and a list of words for a spell-checker. This conversion could be done automatically from the first version of the document if it complies with some standard requirements. The key idea will be to keep a complete separation between syntax and semantics. In this way we produce an abstract description separating conceptual issues of document structure from those concerned with document use. This note proposes a few guidelines to build a system to solve the above problem. Such a system should be an _algebraic based environment_ in order to provide facilities for: - Definition of document types; - Specification of functions over document types; - Definition and handling of documents as algebraic terms. Our approach (_rooted in the tradition of constructive algebraic specification_), allows for a homogeneous environment to deal with operations such as _merging_ documents, _converting_ formats, _translating_ documents, _extracting different kinds of information_ (to set up information repositories, data bases, or semantic networks) or _portions of documents_ (as it happens, for instance, in _literate programming_), and some other actions, not so traditional, like _mail reply_, or _memo production_. We intend to use \camila (a specification language and prototyping environment developed at Universidade do Minho, by the Computer Science group) to develop the above mentioned system. Key words: Document processing, Algebraic Specification, SGML, \camila.