Web Translator API

developer.mozilla.org

95 points by kozika a day ago

sfmz a day ago

https://developer.chrome.com/docs/ai/translator-api

const translator = await Translator.create({ sourceLanguage: 'en', targetLanguage: 'fr', });

await translator.translate('Where is the next bus stop, please?');

ks2048 a day ago

So, this is Google Translate built running locally in Chrome? I wonder if it is a small/degraded model or limited languages? Otherwise, how is it not a simple way around the paid Google API?
- ameliaquining a day ago
  
  The article explains that this feature uses a small (up to 22 GB) language model that runs on-device.
  That said, the "simple way around the paid API" problem is something Google has to deal with anyway, because there are a bunch of ways to use Google Translate without paying for it (e.g., the translate.google.com web UI, or the non-JavaScript-exposed "Translate to [language]" feature built into Chrome), and any action that can be taken by a human can in principle also be taken by a script. The only thing to do about it is use secret-sauce server-side abuse detection to block as much of this activity as they can; they can't get all of it but they can get enough to push enough people onto the paid API that the economics pencil out.
  - jannes a day ago
    
    So installing Chrome is going to require 22 GB of disk space now?
    
    dhx 2 hours ago
    
    This sounds off by an order of magnitude? Firefox's local translation models are only 20-70MB per language pair direction (e.g. en-to-fr or fr-to-en).[1] These models are also only released when they reach at least -5% of Google Translate's COMET score.[1] Currently Firefox ships with support for 32 xx-to-en language pairs and 29 en-to-xx language pairs.[1] As the number of language pairs increases, it probably isn't unreasonable for browsers to stop bundling every language pair and instead prompt users to download uncommon models the first time the user wants to use them.
    [1] https://mozilla.github.io/translations/firefox-models/
    
    ks2048 a day ago
    
    It only installs models that explicitly downloaded via this API, it seems.
    Also, it says to have 22 GB free, but below (under "Note"...), it says the model takes "around a couple of GB".
    
    cj a day ago
    
    Does the API trigger the download automatically, or does it ask for user permission?
    (Answered my own question): Doesn't look like it requires the user's permission. Upon first use, the model will start downloading. The user has to wait for the download to finish before the API will work. That could take hours for 22gb.
    I presume this can't work on mobile?
    https://developer.mozilla.org/en-US/docs/Web/API/Translator_...
    
    ameliaquining a day ago
    
    The article indicates that it will only download the model over an unmetered connection, e.g., while the phone is connected to wifi.
    
    djhn 9 hours ago
    
    Seems very backwards for markets where wired/wifi connections at home are nonexistant, 4g/5g is already unmetered and phones are the wifi you connect your devices to.
- sfmz a day ago
  
  There's already ways to do translation locally in javascript with neural-nets running in WASM, this is just more convenient.
  https://huggingface.co/Xenova/nllb-200-distilled-600M
  - vitonsky 11 hours ago
    
    I tried to use this model in my package with translators kit https://github.com/translate-tools/core/pull/112
    It runs very slow. Test case that run translation for text in 3k chars multiple times, takes about 30 seconds for google translator, but more than 10 minutes for `nllb-200-distilled-600M`.
    Text sample: https://github.com/translate-tools/core/pull/112/files#diff-...
    My tests runs on nodejs, it looks in browser it have no chance for real world use
- akazantsev 21 hours ago
  
  Here is the information on how it works in Chrome. https://developer.chrome.com/docs/ai/translator-api
- dbbk 20 hours ago
  
  Could it get more degraded?
- pinoy420 a day ago
  
  [dead]

sandstrom a day ago

This would be very useful.

Basically, the 'translate this' button you see on Twitter or Instagram next to comments in foreign languages. This API would make it trivial for all developers to add that to their web apps.

greatgib a day ago

Except that it is the user that will pay with his own llm tokens
- 8n4vidtmkvmk 11 hours ago
  
  The user pays with some disk space, not API tokens
- cAtte_ a day ago
  
  how do you know this?

vitonsky a day ago

As maintainer of https://linguister.io/, should I start work on polyfill for that API?

If this API will be implemented in next few years, there will be browsers who hold up the world in progress.

Linguist have enough many users, so we could expose this API for client side code, and users with browsers where Translation API is not implemented yet, could install Linguist and sites who uses Translation API would works fine. Translation API calls would proxy by Linguist and preferred translator module would be used.

Any thoughts about it?

RockRobotRock a day ago

https://github.com/mozilla/standards-positions/issues/1015

sandstrom a day ago

I honestly don't understand the arguments Mozilla have against it.
Safari/webkit is positive (though no official stance yet):
https://github.com/WebKit/standards-positions/issues/339#iss...
- yjftsjthsd-h a day ago
  
  I don't know enough to understand the DOM argument, but
  > The spec assumes a certain form of translation backend, exposing information about model availability, download progress, quotas, and usage prediction. We'd like to minimize the information exposure so that the implementation can be more flexible.
  reads to me as Chrome once again trying to export itself verbatim as a "standard" and Mozilla pointing out that that's not really applicable to others.
  Also the WebKit post seems to raise somewhat similar arguments but on the basis of fingerprinting/privacy problems.

pwdisswordfishz a day ago

Why does it need to be a JavaScript API?

Why not just use the lang= attribute as it was intended, then let the user select text to translate as they wish?

diggan a day ago

If it's a HTML attribute, then you can only use it with DOM elements, with no control about when it runs.
Instead, a JS API gives more flexibility and control.
Besides, I think the "lang" attribute is supposed to signal what the language of the text inside that element is, not what it could/should be. So even if going with attributes would be the way forward, a new one would need to be created.
tempodox a day ago

It's only implemented in Google Chrome, so go figure.
- Uehreka a day ago
  
  If Chrome tried to pull this in like 2016, when Google Translate was the only-ish game in town, I’d call them out for it. But we now have multiple competing open weights translation models that are really good, making this kind of service essentially a commodity. One vendor might give users free access to their services to entice them to use their browser, another might differentiate themself by running the model locally and giving the user better privacy guarantees in exchange for performance.
  I get that this is one more brick in the wall that teams like LadyBird will have to maintain, but as a web developer I do think more Web API features is generally a good thing, as it makes it easier for smaller shops to implement richer functionality.

mediumsmart 14 hours ago

I am ok with Chronic translating the Italian version of a site back into the original German version living in the neighboring folder for good money.

seabass a day ago

With js being a garbage collected language, what is the benefit of the destroy method here and why is it necessary?

charcircuit a day ago

The is no guarantee when it will be garbage collected. For large local models that use a lot of resources they should be unloaded as soon as possible to allow other programs on the computer to use the resources.

lynx97 a day ago

Can we please NOT autotranslate the web? I have yet to find a site where the quality of autotranslate does not make me stop using that site. I was already irritated when google started to show me de.wikipedia.org articles adespite me explicitly searching for the english article name. Then came Etsy, where the autotranslate quality was so bad I stopped using the site altogether.

diggan a day ago

The good news is that if the browsers offered this natively, websites wouldn't need their own implementation of this. And if it's in the client (the browser), you're most likely gonna be able to turn it off globally, just like how you like it.
Worst case scenario a user-script/extension could monkey patch it out, but probably clients will let you disable it.
sandstrom a day ago

This is not auto-translation.
Rather, it's an API developers can use to add inline translation to web apps.
For example, under a comment in your app, you can (a) detect the language, and (b) if it's different from the current users/browsers language, offer to translate it with a small link (c) if the user clicks the link, the content is translated to their language.
lofaszvanitt a day ago

But Reddit already does it! It's a new form of cultural colonisation by a headless society.

minus7 a day ago

I was excited that Firefox finally exposed its local translations as API, but it's Chrome-only (still?). Will be nice for userscripts, for example to replace Twitter's translation button that hardly ever works

troupo 19 hours ago

> I was excited that Firefox finally exposed its local translations as API, but it's Chrome-only (still?).
Bacause it was, is, and will be Chrome-only for the forseeable future: https://news.ycombinator.com/item?id=44375326

troupo a day ago

While this might be useful, be mindful:

- it's experimental

- the "specification" is nowhere near a standards track: https://webmachinelearning.github.io/translation-api/

Of course it's already shipped in Chrome, and now Chrome pretends that its own Chrome-only API is somehow standard. Expect people on HN to blame other browsers for not shipping this.

jazzypants a day ago

I've been pleasantly surprised by the last few conversations about this type of thing that I've seen. It seems like people are pretty sick of Chrome's IE proclivities.
moron4hire a day ago

This is the W3C standardization process.
The W3C is not a prescriptive standardization body. It doesn't have any regulatory power giving it any teeth to go after vendors acting in bad faith. So the W3C process is descriptive and encourages a period of competitive divergence in implementations. It is only after the early adopters have hammered on the features and figured out which parts they like best that a Web API can then start to get standardized.
- troupo a day ago
  
  > This is the W3C standardization process.
  Let me quote the site for you
  --- start quote ---
  This specification was published by the Web Machine Learning Community Group. It is not a W3C Standard nor is it on the W3C Standards Track.
  --- end quote ---
  > So the W3C process is descriptive and encourages a period of competitive divergence in implementations.
  That is exactly opposite of how the w3c standardization process works
  > It is only after the early adopters have hammered on the features and figured out which parts they like best that a Web API can then start to get standardized.
  Yes, and until then this work is not supposed to be enabled by default
  - moron4hire 7 hours ago
    
    You're quoting from their literal, W3C-format working draft, quoting the name of the W3C working group that has been formed to standardized this.
    Being "standards track" means the spec is out of draft and has been proposed. It does not mean "we intend to standardize this". It means, "we've put in all of the work to standardize this and are waiting on final acceptance".
    I don't know what you mean by "isn't supposed to be enabled by default". There is no mention of when browser vendors may or may not ship features in the standardization process.
    
    troupo 5 hours ago
    
    > You're quoting from their literal, W3C-format working draft, quoting the name of the W3C working group that has been formed to standardized this.
    The literal "Draft Community Group Report" (and not a working draft) is a literal link to w3c standardization process: https://www.w3.org/standards/types/#CG-DRAFT
    Since the words "not on the W3C Standards Track" from the document didn't persuade you, you could go to the actual w3c process and answer a few simple questions:
    - is "Draft Community Group Report" a document on a standards track?
    - what does it take to get on the standards track?
    - what does it take to "put in all of the work to standardize this and wait on final acceptance", and how many steps there are between "Draft Community Group Report" and this stage?
    > I don't know what you mean by "isn't supposed to be enabled by default".
    For a person who is so confidently talking about the w3c standards process, I'm surprised you don't.
    w3c doesn't explicitly state this. Except for the final few stages, all steps in the process contain the following: "Software MAY implement these specifications at their own risk but implementation feedback is encouraged."
    However.
    Since this is browsers we're talking about, it means that whatever browsers ship enabled by default will remain in the wild forever because people will immediately start depending on that implementation.
    Additionally, a standard cannot become a standard until there are at least two independent implementations of a proposed feature. This is to eliminate the possibility to ship purely internal APIs, or depend on a single library/implementor.
    So the way to do it, especially for APIs that are nowhere close to being "waiting for final acceptance" is: ship behind a flag, iron out issues and differences, perhaps change the API (and changes to API happen all the time), then ship.
    Of course, Chrome shits all over this process and just ships whatever it wants to ship.

indeyets a day ago

So, the browsers have to provide some means for choosing the desired translation engine (add-on API maybe?) and this is a standard API which all of the providers should implement.

right?

rhabarba a day ago

You had me at "Browser compatibility".

Raed667 a day ago

Chrome embeds a small LLM (never stops being a funny thing) in the browser allowing them to do local translations.
I assume every browser will do the same as on-device models start becoming more useful.
- Asraelite a day ago
  
  What's the easiest way to get this functionality outside of the browser, e.g. as a CLI tool?
  Last time I looked I wasn't able to find any easy to run models that supported more than a handful of languages.
  - JimDabell a day ago
    
    That depends on what counts as “a handful of languages” for you.
    You can use llm for this fairly easily:
    uv tool install llm # Set up your model however you like. For instance: llm install llm-ollama ollama pull mistral-small3.2 llm --model mistral-small3.2 --system "Translate to English, no other output" --save english alias english="llm --template english" english "Bonjour" english "Hola" english "Γειά σου" english "你好" cat some_file.txt | english
    https://llm.datasette.io
    
    usagisushi a day ago
    
    Tip: You might want to use `uv tool install llm --with llm-ollama`.
    ref: https://github.com/simonw/llm/issues/575
    
    JimDabell a day ago
    
    Thanks!
    
    jan_Sate a day ago
    
    That's just the base/stock/instruct model for general use case. There gotta be a finetune specialized in translation, right? Any recommendations for that?
    Plus, mistral-small3.2 has too many parameters. Not all devices can run it fast. That probably isn't the exact translation model being used by Chrome.
    
    JimDabell a day ago
    
    I haven’t tried it myself, but NLLB-200 has various sizes going down to 600M params:
    https://github.com/facebookresearch/fairseq/tree/nllb/
    If running locally is too difficult, you can use llm to access hosted models too.
  - deivid a day ago
    
    You can use bergamot ( https://github.com/browsermt/bergamot-translator ) with Mozilla's models ( https://github.com/mozilla/firefox-translations-models ).
    Not the easiest, but easy enough (requires building).
    I used these two projects to build an on-device translator for Android.
  - wittjeff a day ago
    
    https://ai.meta.com/blog/nllb-200-high-quality-machine-trans... https://www.youtube.com/watch?v=AGgzRE3TlvU
  - ukuina a day ago
    
    ollama run gemma3:1b
    https://ollama.com/library/gemma3
    > support for over 140 languages
    
    diggan a day ago
    
    Try to translate a paragraph with 1b gemma and compare it to DeepL :) Still amazing it can understand anything at all at that scale, but can't really rely on it for much tbh
  - _1 a day ago
    
    If you need to support several languages, you're going to have to have a zoo of models. Small ones just can't handle that many; and they especially aren't good enough for distribution, we only use them for understanding.
  - mftrhu a day ago
    
    Setting aside general-purpose LLMs, there exist a handful of models geared towards translation between hundred of language pairs: Meta's NLLB-200 [0] and M2M-100 [1] can be run using HuggingFace's transformers (plus numpy and sentencepieces), while Google's MADLAD-400 [2], in GGUF format [3], is also supported by llama.cpp.
    You could also look into Argos Translate, or just use the same models as Firefox through kotki [4].
    [0] https://huggingface.co/facebook/nllb-200-distilled-600M [1] https://huggingface.co/facebook/m2m100_418M [2] https://huggingface.co/google/madlad400-3b-mt [3] https://huggingface.co/models?other=base_model:quantized:goo... [4] https://github.com/kroketio/kotki
- rhabarba a day ago
  
  While I appreciate the on-device approach for a couple of reasons, it is rather ironic that Mozilla needs to document that for them.
  - its-summertime a day ago
    
    Firefox also has on-device translations for what its worth.
tempodox a day ago

What compatibility? It's Chrome-only.

nachomg a day ago

This gives strong IE vibes.

lofaszvanitt a day ago

Another useful feature that nobody could've replicated themselves.

curtisszmania a day ago

[dead]