Matador v 0.9
Spanish-English Generation-Heavy Hybrid Machine Translation
What is Matador?
Matador is a Spanish-English machine translation system implemented
following the Genereation-heavy Hybrid approach to Machine Translation
(GHMT). The focus of GHMT is addressing the lack of resource symmetry
between source and target languages. GHMT exploits symbolic and
statistical target language resources in source-poor/target-rich
language pairs. Expected source language resources include a syntactic
parser and a simple one-to-many translation dictionary. No transfer
rules or complex interlingual representations are used. Rich target
language symbolic resources such as word lexical semantics, categorial
variations and subcategorization frames are used to overgenerate
multiple structural variations from a target-glossed syntactic
dependency representation of source language sentences. This symbolic
overgeneration, which accounts for possible translation divergences,
is constrained by multiple statistical target language models
including surface n-grams and structural n-grams. The source-target
asymmetry of systems developed in this approach makes them more easily
retargetable (re-source-able) to new source languages (provided a
source language parser and translation dictionary).
The basic intuition of the GHMT approach parallels the experience of
most language learners whose lack of symmetrical knowledge impairs
their ability to translate into their newly learned language but does
not hinder them as much when translating from the foreign language
into their native tongue (where they are assisted by rich resources).
For more information check the publications
section.
Matador Demo
Spanish is parsed with Conexor(on-line demo).
Demo Options
Explicit Diacritics
This option allows users to input Spanish diacritized characters
(e.g. á or ñ) when no Spanish keyboard is available.
The following table describes how these characters can be specified:
Diacritized Character |
Explicitly Diacritized Character |
á |
a' |
é |
e' |
í |
i' |
ó |
o' |
ú |
u' |
ñ |
n~ |
ü |
u" |
|
Diacritized Character |
Explicitly Diacritized Character |
Á |
A' |
É |
E' |
Í |
I' |
Ó |
O' |
Ú |
U' |
Ñ |
N~ |
Ü |
U" |
|
Publications
- Habash, Nizar and Bonnie Dorr. A Categorial
Variation Database for English. (to appear) in Proceedings of
NAACL/HLT-2003. Edmonton, Canada. 2003
- Habash, Nizar and Bonnie Dorr. Handling
Translation Divergences: Combining Statistical and Symbolic
Techniques in Generation-Heavy Machine
Translation. AMTA-2002. Tiburon, California,
USA.
- Dorr, Bonnie and Nizar Habash. Interlingua
Approximation: A Generation-Heavy Approach. AMTA-2002
Interlingua Reliability Workshop. Tiburon, California,
USA.
- Habash, Nizar. Generation-Heavy
Hybrid Machine Translation. INLG-02. New York.
- Nizar Habash. A Reference Manual
to the Linearization Engine oxyGen. University of
Maryland Technical Report: LAMP-TR-079/ CS-TR-4295/
UMIACS-TR-2001-73/ MDA-904-96-C-1250.
- Habash, Nizar. oxyGen: A Language
Independent Language Realization
Engine. AMTA-2000. Cuernavaca, Mexico.
Credits
|