Small languages vs. large language model churn

The problem of out-of-office email replies from translation services being copy-pasted as the sought translation has been seen in Welsh street signs and Scandinavian hospital signage. Rote process-based misunderstanding pollutes public spaces.

One concern in the 2020 discussions in Wikipedia, Reddit and elsewhere about the problematic Scots Wikipedia was that its pseudo-synthetic-Scots could come to be regarded as authoritative by others. Once its faulty rendition came to be used and reused, the language would be irredeemably polluted.

There were also suggestions that (a) its faulty place labelling, based on no prior usage but replicated into Wikidata, could already be polluting AI usage, and (b) that, more broadly, novelty without evidence of prior usage would result in a special language private to its adepts.

Specialist language furthers exclusion. In the case of a Wiki, which depends on many-eyes for control, that is likely fatal.

Since the Scots Wikipedia debacle, the introduction of large language model AI has complicated the ground further. Their robots are greedy consumers of any and all pieces of text, which will then be regurgitated to the unwary.

And recently, there is the problem of the Greenlandic language Wikipedia, which lacks interest from native speakers and has become prey to slurry from low-quality machine-translation tools, further weakening a language with only 60,000 native speakers.

How then to behave prudently with regard to minority languages? The best way to learn a language is to use it. But using it to publish text when one's Babel proficiency is below level 2 is fraught. Optimally, there will be a native speaker who can promptly correct grammatical errors. That is the "many-eyes" virtue of a popular Wikipedia. But if there isn't, then the risk is that the poor text is vacuumed up by LLMs and turned into part of that language's corpus.

Is the only responsible behaviour to desist from editing in minority languages in which one is not fluent?

January 2026 edit: linking an excellent article on the damage done by relying on machine translation.


Author: admin

Mastodon account where these were first posted: link