Research at the Statistical Language Processing and Learning Lab concentrates on statistical models for structured language processing with application to machine translation, paraphrasing, semantic and morpho-syntactic parsing, and statistical learning for NLP .
Machine Translation is currently a central embedding application for our work. Researchers at the lab aim at a range of phenomena in the quest for more adequate and more fluent MT systems learned from bilingual parallel data, including reordering, morphological variation, domain adaptation, evaluation and tuning.
The general approach the Lab takes aims at inducing the latent structure that represents relevant salient regularities in natural language data (mono- and multi-lingual corpora) for improved language applications. The Lab’s current work concentrates on
- exploiting regularities in word aligned parallel data for learning hierarchical reordering models over permutations and word alignments, e.g., learning hierarchical preordering from bare word alignments;
- inducing models sensitive to domain variation in big parallel data, e.g., data selection, word alignment, model adaptation;
- devising better MT evaluation metrics, e.g., BEER;
- inducing novel semantic representations within meaning-preserving language processing models as a surrogate for actual semantic representations that work with a form and its referent
- inducing morpho-syntactic generation models for translation into morphologically-rich languages.