Auto-translation tools are increasingly used for quick translations; but once thrown, will the boomerang return to its originator or spin wildly off into an incomprehensible back translation?
An abundance of free translation services are available online; but what do you get for free? Will a simple sentence auto-translate cleanly across multiple languages? Or will the end result bear little resemblance to the original language? BY SARA GREENWALD
Most translators are all too familiar with the “free instant translation” services available online. I tried boomerang-translating, first using a translation website to translate from English to a target language, and then back-translating to English. Doing this shows what happens when the peculiarities of English are stripped away. If an English phrase or construction doesn’t have an exact counterpart in the target language, the translator has to make decisions based on meaning. When the translator and back-translator are making decisions based on words and phrases in their databases, the result can be pretty odd. → continue reading
A timely and entertaining introduction to the tools of our trade. BY NIELS NIELSEN
On Saturday, October 2, 2010, Jost Zetzsche, perhaps best known to most for his GeekSpeak column in the ATA Chronicle, presented a workshop on CAT tools from 1:00 pm to 5:00 pm at the downtown campus of San Francisco State University. In view of the ongoing changes in the translation industry brought about by technology, the importance of this topic was not lost on anyone. → continue reading
Is post-editing in your future? Will you soon be cleaning up after the machines? The AMTA Conference addressed this and more. BY MIKE DILLINGER, FOREWORD BY YVES AVÉROUS
Jiri Stejksal, President of ATA, attended the Waikiki conference to represent us.
What’s not to like in a conference taking place in Waikiki, Hawaii? Besides, the program put together last October by AMTA, the Association for Machine Translation in the Americas, was quite compelling. After a recent general meeting presentation touting the merits of post-editing-or fixing the manageable translations some machines achieve today, it looked like the matter needed more examination. The MT trend also reached the halls of the ATA conference that followed a month later. Not coincidentally, Jiri Stejksal, President of ATA, attended the Waikiki conference to represent us. → continue reading
At the NCTA September meeting, Dr. Anthony Pym discussed his research findings and explained “what happens” when translators work under pressure. BY RAFFAELLA BUSCHIAZZO
The September General Meeting took place on Saturday the 13th in downtown San Francisco and was presented by NCTA President Tuomas Kostiainen. Vice President and Translorial Publisher Yves Avérous offered potential volunteers free training on layout and Translorial blog site management. Then he praised the excellent work Translorial’s new editor, Nina Bogdan, did on the September issue. He also showed everyone how to join the new NCTA Group in LinkedIn, the professional network website. All active NCTA members are welcome to join the group. At the December General Meeting we will present the most popular network websites where you can promote your professional skills online. → continue reading
How close are we to the reality of a Star Trek-type of Universal Communicator? The U.S. Government is making a serious effort to get us there. BY HANY FARAG
In the December 2007 issue of Translorial, Paula Dieli, in her article about Machine Translation (MT), concluded that MT is no longer the funny substitution of words in one language for words in another and that MT, according to Google, is based on a data-driven approach. MT also combines linguistic typology, phrase recognition, translation of idioms and isolation of anomalies. However MT does not have to be limited to one form of implementation centered on documents and word files. If it is combined with a system that converts natural speech of one language into text, with the output text fed into MT for translation to another language, and the translated text is then converted back into speech, we will have an automated speech translation system. This is, as I prefer to call it, an Interpreter Machine (IM).
Automated speech translation, as seen in films like Star Wars, is interesting but fictitious. So why do linguists need to know about the Interpreter Machine? → continue reading
By Paula Dieli
Mention the words “machine translation,” and a translator’s thoughts will range from job security to the ridiculously funny translations we’re able to produce with so-called online translation tools. Should we be worried that machines will take over our jobs? Paula Dieli thinks not, and explains why in this report.
I recently attended a presentation on “Challenges in Machine Translation,” sponsored by the International Macintosh Users Group (IMUG), at which Dr. Franz Josef Och, Senior Staff Research Scientist at GoogleResearch, presented some of the challenges Google is facing in its machine translation (MT) research, and how some of these challenges are being addressed. Excitement about successes in machine translation research initially came to a head back in 1954 with a report in the press regarding the Georgetown University/IBM experiment which had used a computer to translate Russian into English. Since then, over the past 50 years, we have continued to read about the great advances that will be possible in “the next 20 years,” but these great advances never came to pass. When the Internet came of age, online translation tools surfaced and we translators amused ourselves by seeing what crazy translations we could come up with by entering seemingly simple phrases.
The linguistics of MT
So why did the research never produce anything really viable? It was based on a linguistic approach; that is, an analysis of the structure of a language followed by an attempt to map it into machine language such that one could input a source language text and out would come a wonderful translation in the target language, albeit with a few minor errors. As we all know, a language is filled with so many cultural, contextual, idiomatic, and exceptional uses that this task became virtually impossible, and no real progress has been made with this approach in the past 50 years.
Dr. Geoffrey Nunberg, Adjunct full professor at UC Berkeley, linguist, researcher, and consulting professor at Stanford University, had this to say at a recent NCTA presentation: “I asked a friend of mine, who is the dean of this [MT] field, once, ‘if you asked people working in machine translation how long it will be until we have perfect, idiomatic machine translation of text …?’, they would all say about 25 years. And that’s been a constant since 1969.”
The data-driven approach
In recent years, MT researchers have begun to take a different approach, which can be loosely compared to the work you do as a translator when you use a tool such as SDL Trados WinAlign or Translator’s Workbench. That is, you use a data-driven methodology. As you translate, you store your translations in a translation memory (TM), so that if that same or a similar translation appears again, the tool will notify you and let you use that translation as is, or modify it slightly to match the source text. The more you translate similar texts in a particular domain, the more likely it is that you will find similar translations already in your TM.
Similarly, if before you began to translate a weekly online newsletter of real estate announcements, for example, you searched the Internet for already existing translations in your language pair and then aligned them and input them, via WinAlign, into your TM, you might find that much of the work had already been done for you. Imagine now if you were to input 47 billion words worth of these translations. Your chances of being able to “automatically” translate much of your source text would certainly increase. This is the approach that Google is taking.
Google’s goal, as stated by Dr. Och, is “to organize the world’s information and make it universally accessible and useful.” Now before you go thinking you’re out of a job, their data-driven approach has proven successful only for certain language pairs, and only in certain specialized domains. They have achieved success in what they call “hard” languages, that is from Chinese to English, and from Arabic to English in domains such as blogging, online FAQs, and interviews by journalists.
Dr. Och reported that their reasons for progress were due to “learning from examples rather than from a rule-based approach.” He admits that “more data is better data.” He went on to say that adding 2 trillion words to their data store would result in a 1 percent improvement for specific uses such as the ones described above. They see a year-to-year improvement of 4 percent by doubling the amount of data in their data store, or “corpus.” The progress reported by Dr. Och is supported by a study conducted by the NIST (National Institute of Standards and Technology) in 2005. Google received the highest BLEU (Bilingual Evaluation Understudy) scores using their MT technology to translate 100 news articles in the language pairs mentioned above. A BLEU score ranges from 0 (lowest) to 1 (highest) and is calculated by comparing the quality of the target segments with their associated source segments (a penalty is applied for short segments since that artificially produces a higher score).
Challenges and limitations
So what are the limitations of this data-driven approach? When asked by a member of the audience if Google’s technology could be used to translate a logo, Dr. Och instantly replied that such a translation would require a human translator. It’s clear that Google’s approach handles a very specific type of translation. Similar data-driven MT implementations can be used to translate highly specialized or technical documents with a limited vocabulary which wouldn’t be translated 100 percent correctly, but which would be readable enough to determine whether the document is of interest. In that case, a human translator would be needed to “really” translate it.
The Google approach described above deals with a tremendous amount of data and a very targeted use. It works only for some languages—German, for example, has been problematic—and in order to improve in more than just small increments, human intervention is required to make corrections to errors generated by this approach. One example that Dr. Och provided—the number “1,173” was consistently incorrectly translated into the word “Swedes”—confirms that a machine can’t do it all.
And if you think for a minute about the amount of Internet-based data being generated on just an hourly basis, it’s great to have machines around to handle some of the repetitive (read: uninteresting) work, and let us translators handle the rest. That still leaves plenty of work for us humans.
There are other approaches to MT, including example-based technology, which relies on a combination of existing translations (such as you have in your translation memory) along with a linguistic approach that involves an analysis of an unmatched segment to a set of heuristics, or rules, based on the grammar of the target language. Some proponents of this approach concede that large amounts of data would be needed to make this approach successful, and have all but abandoned their research. Once again, we can see that any approach that relies even partially on linguistics has not met with a reasonable level of success.
Other advances occurring in the MT arena include gisting and post-editing. MT can be used successfully in some settings where the gist of a document is all that is needed in order to determine if it is of enough interest to warrant a human translation. There are also MT systems on the market that produce translations that require post-editing by human translators who spend (often painful) time “fixing” these translations, correcting the linguistic errors that such a system invariably produces. While this may not be the translation work you’re looking for, I know of at least one large translation agency that provides specific training for this type of post-editing to linguists willing to do this kind of work. This is another example that shows that while machines play a part, there is still a role for human translators in the overall process.
Still other advancements include the licensing of machine translation technology based on a data-driven approach, which can be tailored to work with existing translations and terminology databases at a specific company. As with the Google solution, such technologies typically work on a limited set of languages. However, if they can help translate some of the less interesting, repetitive information out there, with more information being produced at a continually increasing rate, have no fear; there will still be plenty of work for human translators to do!
The road ahead
Where does that leave us? From the typewriter to word processors to CAT (Computer-Assisted or Computer-Aided Translation) tools and the pervasiveness of the Internet, our livelihood has been transformed, in a positive way. We are more productive and able to work on more interesting translations than ever before.
I encourage you to embrace technology; understand how it is helping to make information accessible, and learn how technology can help translators do the work that only humans can do.
The calendar of the International Macintosh User Group (IMUG) upcoming presentations can be found at http://www.imug.org.
You can get the official results of the 2005 Machine Translation Evaluation from the National Institute of Standards and Technology (NIST) at http://www.nist.gov/speech/tests/mt/doc/mt05eval_official_results_release_20050801_v3.html.
By Anna Schlegel
Tiziana Perinotti is the founder of TGP Consulting and creator of the award-winning Silicon Valley Localization Forum website and services. She has over 15 years of successful software development and product marketing experience with companies such as Olivetti, Microsoft, PowerUp! (acquired by The Learning Company), Radius, Verity, and Palm Computing, to name a few.
In 1996, she founded TGP Consulting and helped the original founders of Palm Computing (also founders of Handspring) develop what it is now the very successful Palm handheld. Tina has developed and offered training and courseware material for end enterprises as well as freelance translators and translation companies.
Where did you grow up, and when did you come to the U.S.?
TIZIANA PERINOTTI: I was born and raised in Turin, northwest of Milan, near the Italian Alps. I developed a desire to move to the U.S.—Silicon Valley, in particular—when I was in college. So, as a student in Turin, I decided to travel and take communication and other computer summer courses in the U.S. during my visits to an old uncle who used to live in Pittsburgh.
How did you start in the localization field?
Right after my Computer Science degree and Masters in Linguistics, I was recruited by Olivetti, the large computer conglomerate located in Ivrea, near Turin, Italy. At the time, Olivetti was very active in the research field of software office automation, not just for the stylish typewriters the company was manufacturing, but also for the first PC lines.
There was a need to localize Olivetti Italian hardware and software products into English-ready products for all English-speaking markets around the world. In 1997—when the joint venture/OEM project between Olivetti and Microsoft was established—I was sent to Microsoft headquarters in Redmond, Washington, to work on Windows 2.0 and the Windows version for the first 386 machines. We developed all the device drivers for Olivetti that were included in Windows and completed the first localized versions (Dutch, French, German, Italian, Portuguese, Spanish, and Swedish).
What localization challenges do corporations face today?
The mantra “fast and cheap” localization has reached new levels, and the challenge is how to add “quality” to that equation in a process that has outsourced all skills, including engineering, testing, management, and customer support. Cultural and communication barriers among the product team members located in very different locales are another big challenge as well as a lack of training for IT, engineering, customer support, marketing/sales, and project management staff to be able to operate at the best of their abilities in a stressful, multi-cultural environment under strict deadlines.
What are the new trends you see in localization?
Because of the new challenge of introducing products less expensively, more and more localizers are relying on machine translation tools, online terminology tools, and project management tools to expedite the localization process and achieve consistency. Localization has also expanded beyond the traditional computer and electronics industry; for example, biotech, pharmaceutical, medical device companies, and the government are in need of more localization.
How does English influence other language localization?
In the U.S., in general, my experience has been that corporations tend to be biased towards the English language. Products still tend to be first architected and developed in an English context, before they go through some internationalization process. Part of the problem is that we ask engineering to make certain product development decisions that would be better made by professionals who have the training and experience of designing for a global audience. The outcome of this approach may be a poorly localized product and unsatisfied customers who are forced to use an English-based product with a translated user interface that is less than optimal for them. This is an obvious cost to the company in terms of missed sales revenues and market opportunities.
Have you experimented with machine translation?
Yes, since the very beginning of my career I have used and tested many tools and systems, from the most sophisticated to the very basic ones. I am very pleased with the progress and advancement in this field, and other areas such as voice recognition and search and retrieval engines; all the signs are there that these tools will become better and better and employed in more aspects of our life.
What would you like to see changed in localization?
The mentality, meaning that when corporations need to cut their budgets, one of the first things they drop off their priority list is internationalization and localization. That’s a symptom of not understanding the investment opportunity and added value of the internationalization and localization product cycles.