Qumqum v 0.1
Arabic-English Generation-Heavy Hybrid Machine Translation
What is Qumqum?
Qumqum is an Arabic-English machine translation system implemented
following the Genereation-heavy Hybrid approach to Machine Translation
(GHMT). The focus of GHMT is addressing the lack of resource symmetry
between source and target languages. GHMT exploits symbolic and
statistical target language resources in source-poor/target-rich
language pairs. Expected source language resources include a syntactic
parser and a simple one-to-many translation dictionary. No transfer
rules or complex interlingual representations are used. Rich target
language symbolic resources such as word lexical semantics, categorial
variations and subcategorization frames are used to overgenerate
multiple structural variations from a target-glossed syntactic
dependency representation of source language sentences. This symbolic
overgeneration, which accounts for possible translation divergences,
is constrained by multiple statistical target language models
including surface n-grams and structural n-grams. The source-target
asymmetry of systems developed in this approach makes them more easily
retargetable (re-source-able) to new source languages (provided a
source language parser and translation dictionary).
The basic intuition of the GHMT approach parallels the experience of
most language learners whose lack of symmetrical knowledge impairs
their ability to translate into their newly learned language but does
not hinder them as much when translating from the foreign language
into their native tongue (where they are assisted by rich resources).
For more information check the publications
section.
Qumqum Demo
THE DEMO IS CURRENLTY DISABLED. PLEASE TRY LATER.
Arabic is parsed with Dan Bikel's parser.
Demo Options
Buckwlater Arabic Encoding

Tim Buckwalter's Arabic Transliteration
Publications
- Habash, Nizar. The Use of a Structural N-gram Language Model in Generation-Heavy Hybrid Machine Translation. In Proceedings of the Third International Conference of Natural Language Generation (INLG-04). Careys Manor, UK, July 2004
- Habash, Nizar. Matador: A Large Scale Spanish-English GHMT System. In Proceedings of the MT Summit, New Orleans, LA, pp. 149--156, 2003.
- Habash, Nizar and Bonnie Dorr. A Categorial
Variation Database for English. In Proceedings of
NAACL/HLT-2003. Edmonton, Canada. 2003
- Habash, Nizar and Bonnie Dorr. Handling
Translation Divergences: Combining Statistical and Symbolic
Techniques in Generation-Heavy Machine
Translation. AMTA-2002. Tiburon, California,
USA.
- Dorr, Bonnie and Nizar Habash. Interlingua
Approximation: A Generation-Heavy Approach. AMTA-2002
Interlingua Reliability Workshop. Tiburon, California,
USA.
- Habash, Nizar. Generation-Heavy
Hybrid Machine Translation. INLG-02. New York.
- Nizar Habash. A Reference Manual
to the Linearization Engine oxyGen. University of
Maryland Technical Report: LAMP-TR-079/ CS-TR-4295/
UMIACS-TR-2001-73/ MDA-904-96-C-1250.
- Habash, Nizar. oxyGen: A Language
Independent Language Realization
Engine. AMTA-2000. Cuernavaca, Mexico.
Credits
|