Researchers use the Bible to translate text
Dartmouth computer science researchers studying text translators recently turned to an unlikely source to gather data: the Bible.
The purpose of the team’s research was to create a highly trained algorithm that can read text written in one style and re-write the text in a different style with the same meaning. A different style could mean that the new text is simpler to understand, has more flowery language or matches a particular author’s style, according to Keith Carlson, a Ph.D. student and lead author of the research paper.
“Style transfer as a task is very similar to machine translation,” Carlson said. “Where [machine translation] says take this English and make it Spanish, we’re saying take this English and [for example] make it old-timey English.”
For example, a verse from the King James Bible can be replicated in the style of the New International Version. However, Carlson said that “it doesn’t produce the exact same verse, but that’s not surprising; even a human could not look at [a verse] and do it exactly.”
The paper was also co-authored by Carlson’s adviser and associate dean for the sciences, mathematics and computer science professor Daniel Rockmore, and Allen Riddell, a professor of information science at the University of Indiana.
Though some work has been done relating to style transfer, there has been far less research than that on machine translation. A large reason for this discrepancy stems from the lack of a text that is replicated in multiple styles with the same meaning. The idea to use the Bible came out of a need for a source with enough aligned data to train the style transfer algorithm, Carlson said.
“We used the Bible because there are so many versions of it already, that we have nice aligned text with the same verse written 33 different ways by people that have done these translations,” he said.
The researchers made minor changes to adapt pre-existing machine-translation algorithms for this style transfer. Two different models were used to transfer the text. The first, Moses, is a statistics-based machine translation software. The other model, Seq2Seq, is a recurrent neural network.
The results of their research “prove that style translation is possible by machines,” Rockmore said.
“We look forward to pushing this idea ahead and trying to expand on it,” he said.
Currently, the algorithm can only produce text in the styles that it has been trained on. However, the research team recently tried starting with an input from a different style than anything the algorithm had seen before. In one example, they took an Ernest Hemingway quote and produced King James and International Standard versions of the quote.
The Hemingway quote was a line from A Farewell to Arms: “‘Hell,’ I said, ‘I love you enough now. What do you want to do? Ruin me?’”
Targeting the International Standard Version’s style, the computer rewrote the quote: “But I said, ‘Hell! I love you now. What do you want to do, destroy me?’”
After configuring the model for the King James Version of the Bible, it changed the style of the quote: “And I said, Hell, I love thee now: what wilt thou do? destroy me.”
The results of their work — the identified and available Bible data set — allows others to try newer style transfer models and algorithms on the data. Eventually, style transfer could be used to simplify texts, such as making complex writing accessible to children or non-native speakers of a language. It could also be used to help people understand legal or government documents, according to Carlson.
“You could also use it for curiosity’s sake,” Carlson said. “For example, looking at how Jane Austen’s works would look if Hemingway had written it.”
An additional application for text style transfer could be to create uniformity across an organization’s writing samples.
“Imagine there’s a company where 20 different people are writing product descriptions, but the company wants to have a singular voice in the work that goes out,” Carlson said. “They could have each person write the description, then run it through this algorithm to homogenize the voices.”
Dartmouth has a long-standing tradition of working with language and communication. Modern languages and literature professor at Fairfield University, Joel Goldfield GR ’76 works with Dartmouth’s Rassias Center to help improve global communication. An integral concept of the Rassias method is the way in which it helps students become comfortable with communicating in a foreign language, even if needing to use a simplified version of the target language, according to Goldfield.
Goldfield worked for a long time with languages, including computer-assisted language learning programs such as the Rosetta Stone and Transparent Language in the 1980s and 1990s. These programs did not run on a trained algorithm like Carlson’ s style transfer to text.
However, Goldfield said that to simplify the spoken word as Carlson’s algorithm simplifies texts would be very difficult to implement because it requires a great deal of artificial intelligence and computational linguistics programming.