Open Source Translations

Rosetta StoneAt the May 2, 2011 Grand Rapids Python User Group (GPRUG) meeting, Dave Brondsema talked to us about his forays into the internationalization (i18n) and localization (L10n) of Sourceforge’s new open source project hosting platform, Allura. He revealed that in open source, there are mature and widely-used standards for translations. In general, each project has its own set of  files with translations of selected strings. There are libraries for these files in pretty much every language, so that’s not really much of an interesting or unique problem. The big unanswered question was where they would find translators for the project.

His talk has been simmering in the back of my mind for the last couple of weeks. One thing that really nagged at me was that each open source project was on its own for translators and translations. It seemed like a huge amount of redundancy would be happening since many projects have similar phrases to translate. Also small projects would never get translated because they would have trouble recruiting translators. Had nobody thought of combining all of the disparate translation efforts into a unified translation project?

Transifex Web PageWhen I see a big, obvious problem that affects so many people, I feel sort of obliged to take a stab at solving it. Often, other people are already on top of it, I just haven’t come across them yet. So I Googled a few terms and finally found Transifex, a project that is centralizing translations for open source projects. According to this introductory blog post, it was started at Google Summer of Code back in 2007. This other informative article mentions that it’s the primary tool used in Fedora translations, and is analogous to the Rosetta features integrated into Canonical‘s Launchpad system. So a few projects do exist! Excellent.

With that burden off of my mind I got to thinking about the huge numbers of translated phrases these projects must be building up, presumably under various open source licenses. Could those phrases be extracted and turned into an open source general translation program like Google Translate? How about crowd-sourcing audio pronunciations of the phrases by native speakers? I think this could generate a pretty nice, open platform for learning another language. Anyone interested?

This entry was posted in grlug, grpug, internet, linux, planet-ubuntu-users, python, tech, ubuntu-michigan, wmlug. Bookmark the permalink.