Goñi Menoyo, José Miguel and González Cristóbal, José Carlos and Moreno Sandoval, Antonio
ARIES: A Lexical Platform for Engineering Spanish Processing Tools.
"Natural Language Engineering", v. 3
We present a lexical platform that has been developed for the Spanish language. It achieves portability between different computer systems and efficiency, in terms of speed and lexical coverage. A model for the full treatment of Spanish inflectional morphology for verbs, nouns and adjectives is presented. This model permits word formation based solely on morpheme concatenation, driven by a feature-based unification grammar. The run-time lexicon is a collection of allomorphs for both stems and endings. Although not tested, it should be suitable also for other Romance and highly inflected languages. A formalism is also described for encoding a lemma-based lexical source, well suited for expressing linguistic generalizations: inheritance classes, lemma encoding, morpho-graphemic allomorphy rules and limited type-checking. From this source base, we can automatically generate an allomorph indexed dictionary adequate for efficient retrieval and processing. A set of software tools has been implemented around this formalism: lexical base augmenting aids, lexical compilers to build run-time dictionaries and access libraries for them, feature manipulation libraries, unification and pseudo-unification modules, morphological processors, a parsing system, etc. Software interfaces among the different modules and tools are cleanly defined to ease software integration and tool combination in a flexible way. Directions for accessing our e-mail and web demonstration prototypes are also provided. Some figures are given, showing the lexical coverage of our platform compared to some popular spelling checkers.