Come to LitMT for the machine-translated novels—stay for the opportunity to help improve them
Think for a moment about what sets literature apart from other writing: imagination, idiom, style, subtext. These same qualities make it difficult for computers to translate literature intelligibly, let alone faithfully. A new machine translation platform called LitMT, just launched at litmt.org, invites the public to help UMass researchers tackle this challenge.
The site hosts nearly 100 books translated by artificial intelligence (AI) into over 20 languages, including English, Chinese, Hindi, Spanish, French, Modern Standard Arabic, Bengali, Russian, Portuguese, Urdu, Indonesian, German, Japanese, Turkish, Tamil, Vietnamese, Korean, Hausa, Swahili, Thai, Polish, Ukrainian, and Mongolian. Readers are encouraged to annotate text they find confusing or incoherent, from inconsistent gender pronouns to odd word choices and gaps in information. Informed by this human input, the researchers aim to build specialized machine translation (MT) technology that is capable of delivering the experience of literary works in languages in which they’ve never before appeared.
LitMT comes from the lab of Assistant Professor Mohit Iyyer, a member of the natural language processing group in the Manning College of Information and Computer Sciences. Iyyer has a particular interest in narrative and believes in the power of fiction to expand intercultural understanding. He admires human translators’ meticulous art—“those are always going to be the best translations”—but observes that there’s a scarcity of expertise and funding for such work, especially outside a handful of well-resourced languages. “I’m excited about speeding up this process and allowing more readers access to the content of the books,” he says.
If the translation is bad, that should be a signal to the community that we should invest more energy into these languages. Otherwise, the technology becomes very powerful in English, and English-speaking citizens of the world benefit, whereas the rest of the world is left behind.– Mohit Iyyer
UMass postdoctoral researcher Marzena Karpinska, a linguist, joined the LitMT project eager to explore how AI can assist human translators. “A good MT system can create the initial draft,” she says. It can “suggest rewording, or help the human translator find a cultural equivalent.” Graduate student Katherine Thai ’23MS, ’26PhD, rounds out the LitMT team. Moira Inghilleri ’79, professor of comparative literature and director of translation and interpreting studies, has also advised on aspects of the work.
All books on LitMT are in the public domain, yet global readership has eluded many of them for decades. These include the 1996 autobiography My Life and After Fifty Years by national poet of Tanzania Shaaban bin Robert and Víctor Català’s collection of Catalan short stories The Mother Whale.
According to Iyyer, one factor exacerbating the intrinsic challenges of translating literature is that existing AI models tend to optimize for English. In languages that have a lower volume of digitized text, and between pairs of languages with relatively few human translations to reference, there’s insufficient data to even analyze what kinds of mistakes the machines are making. That’s where LitMT’s commenting function comes in. While the average user can’t assess the accuracy of a translation, “based on the spans they highlight, we can tell quite a bit about errors in aggregate,” Iyyer says. His lab has hired freelance translators to provide more detailed evaluations and, in consultation with the UMass Translation Center, will seek UMass student translators to supplement crowdsourced feedback.
Iyyer’s lab is developing its own AI models, but for now, most translations hosted on LitMT have been created using OpenAI’s GPT-3.5 and GPT-4. “By identifying weaknesses in these state-of-the-art ‘large language models,’ we could build language-specific—or even language-pair-specific—open-source models that are trained not to make those errors,” Iyyer says. Though OpenAI is a commercial model, the team hopes to eventually provide an open-source alternative to these AI tools.
Increasing translations to and from languages with few speakers, as well as to and from widely spoken languages that are underrepresented in the global arenas of technology and literature, is another guiding principle the LitMT team takes to heart. Despite the flaws in translations currently on LitMT—in some ways because of those flaws—Iyyer is excited to offer the platform to the public, including fellow researchers.
“Our project is helping to bring more attention to the quality of these AIs in languages that have not been studied,” he says. “If the translation is bad, that should be a signal to the community that we should invest more energy into these languages. Otherwise, the technology becomes very powerful in English, and English-speaking citizens of the world benefit, whereas the rest of the world is left behind.”
Experiment with the beta version of LitMT and help researchers improve the tool.