How using AI translation tools for minority languages can boost subscriptions
Tech

How using AI translation tools for minority languages can boost subscriptions

Avatar

Sermitsiaq, Greenland’s largest news publisher, always published its content in both Danish and Kalaallisut – the Greenlandic – as all communication, including news, has to be, by law, in both languages. 

Translations used to be made by humans, as automatic translation systems into this minority language did not exist. This situation was causing delays in the publication of translated articles, but it was also expensive and time-consuming for the outlet. But all that was before AI.

In 2023, the Danish tech startup MediaCatch developed an AI translation tool for Sermitsiaq, which is able to quickly translate news content into a minority language ignored by most big tech companies. The AI tool was trained on more than 15 years’ work of newspaper content translated by professionals. 

Since then, Sermitsiaq developed a subscription strategy to allow its readers to buy a subscription bundle that includes access to this AI translator – an effective strategy, as the entire society operates in two languages. 

The Fix spoke with Lars Damgaard Nielsen, CEO and co-founder of MediaCatch.

How did it all begin?

Everything started with a conversation with Greenland's largest news publisher. The chairman of the board said they had this challenge of not being able to translate between Danish to Greenlandic and Greenlandic to Danish. We heard about this challenge and understood that big tech companies couldn't solve it because it was a very small language, so the interest for them was not that high. 

How did you develop this translation tool to meet their challenge?

They were sitting on a gold mine of training data because they already had spent hours having human translators translating between Danish and Greenlandic for a lot of articles during the last 10-15 years because they are publishing content in two languages.

So what we did then was that we took all the data from them and created an algorithm that could translate between the two languages. It's been in collaboration with them, so they actually own the algorithm. 

During the first tests, did you make sure that the translations made by the AI ​​were correct?

Yes, we had human evaluators. For each model we made, we evaluated the model against ChatGPT, the official translation tool from Greenlandic, and another tool. We had human evaluators scoring the translations blindly, so they didn't know where they came from. They were scoring each sentence if it was good or bad or very good. 

In the end, we created around 80% good sentences, versus ChatGPT, which created 20% good sentences. Of course, when you create, it's like with every other type of translation, you need to have a human in the loop if you're not going to make some kind of mistakes.

What are the advantages of using this AI tool?

They are now able to translate articles for themselves within minutes instead of hours, so they have a very quick turnaround. For the media, when you're doing breaking news, it's not nice to wait around three hours to get something translated. 

Before, it took a lot of hours to translate a full article between two languages, and now they can have a Greenlandic-speaking journalist who can just proofread the translation. So it's all translated, so they just have to correct minor details.

In their case, they also developed a strategy based on subscriptions, including this translation tool?

I think that's the beauty of this story. It helps the society and the media outlets making money to make more independent journalism. So it's fuelling the democracy of Greenland while helping them make a more sustainable business in a small country. 

The other good thing with this case is that instead of just giving their valuable training data away from the media to, let's say, Microsoft or Google, they can now themselves make money out of it. They have created a business case on top of that. 

So, if you are a business in Greenland and you would like to have access to this tool, you have to have a business subscription to the newspaper. So, they are funding journalism with their own historic data.

For minority languages, is there also the challenge of converting the writings into audio?

Absolutely. That speech-to-text is something we are used to, being able to speak to Google Home or iPhone or something like that. But in all the small languages, you cannot speak in your own language. You have to choose English or a bigger language.

But we haven't been able to find the funding to create a project for speech-to-text in Greenlandic. It requires a lot of training material to do that. So you cannot just scrape yourself out of it with a lot of the written things. 

Could this model be replicated elsewhere?

Yes, you can do it for many languages. The only thing it needs is basically training data. And in this case, they already had the training data because they translated between the two languages for many years. So it's only a question about training data, a lot, basically. 

Some of it will be fixed by major big tech companies, but I also think that there's a lot of very small niche languages like the Greenlandic one, where it's only 50,000 people who are speaking it, writing it or reading it. And you cannot make big tech companies spend hundreds and thousands of euros just fine-tuning that small language. 

Since then, have you received other requests of this type for translations of small languages?

We haven't created any more language models. It’s difficult to find a business proposition for making it work really well. 

Source of the cover photo: Suzy Hazelwood via Pexels


[subscribeform]
The Fix logo

Subscribe to The Fix's newsletter courses

View courses