M4v3R a day ago

I think they've buried the lede with their image editing capabilities, which seem to be very good! OpenAI's model will change the whole image while editing messing up details in unrelated areas. This seems to perfectly preserve parts of the image unrelated to your query and selectively apply the edits, which is very impressive! The only downside is the output resolution (the resulting image is 1184px wide even though the input image was much larger).

For a quick test I've uploaded a photo of my home office and asked the following prompt: "Retouch this photo to fix the gray panels at the bottom that are slightly ripped, make them look brand new"

Input image (rescaled): https://i.imgur.com/t0WCKAu.jpeg

Output image: https://i.imgur.com/xb99lmC.png

I think it did a fantastic job. The output image quality is ever so slightly worse than the original but that's something they'll improve with time I'm sure.

  • ChuckMcM a day ago

    This is gonna kill Craigslist :-). You see these pictures of a really nice car and get there and find it has a crushed left fender, rust holes in the hood and a broken headlight.

    We had a realtor list a property in our area and they had used generative AI to "re-imagine" the property because the original owner who had bought it in 1950 and died in it in 2023 had done zero maintenance and upgrades. People who showed up to see it were really super pissed. The realtor argued this was just the next step after staging, but it certainly didn't work here. They took it off the market and a bunch of people showed up to fix it up (presumably from the family but one never knows).

    • Brybry a day ago

      Would lying really lead to more used car sales and thus cause pressure for other to attempt this kind of fraud? And wouldn't people get in trouble for fraud (or at least false advertising?)

      When I last bought a used car I found it in a classified newspaper ad: there was no picture.

      I looked at every car I considered in-person.

      When I found one I liked I paid for an independent pre-purchase inspection, discovered a crack in the radiator, and negotiated the price down to cover my post-sale expense fixing it.

      • ChuckMcM a day ago

        There is a lot of fraud on Craigslist. The fraudsters are very creative. There is this fallacy, the 'Sunk Cost' fallacy, where people accept a substandard result because they feel they have already invested in the result and don't want to 'throw away' that investment. So in a place where it can take 90 minutes to go across the Bay, if you drive clear across the Bay for what you believe to be a pristine item, you may buy anyway (at a reduced price) because you've already invested the time to get there. Whereas, had the seller posted actual pictures of the item in question, you would have said, "I'll wait for one in better condition to come along."

        The "success" of Craiglist is that it exposes you item to a wider pool of buyers, which increases the chance that the one person who really wants it, will see it. And if they really want it they are motivated to go out of their way to get to it. But if even the pictures lie and you don't know what you're getting until you get there, your willingness to take the risk and drive out is reduced, which means people will have items that might have sold if you were trusted.

        This happens on EBay too. Sellers list something and it isn't as described, and fraudulent sellers will say "but it is! This buyer is trying to scam me." and EBay usually sides with the seller.

        My prediction (and hey, its just a guess) is that if people start using these tools to "enhance" the images they use to sell stuff and it becomes a regular practice, then the total population of people who will use Craigslist will go down and prices overall will be reduced as that fraud gets priced in. Sellers won't get as much as they think they should and stop selling there. If it drops below critical mass then the service suffers.

        • xuki 17 hours ago

          > This happens on EBay too. Sellers list something and it isn't as described, and fraudulent sellers will say "but it is! This buyer is trying to scam me." and EBay usually sides with the seller.

          This is not my experience at all and I've used eBay since 2008. eBay is pro buyer to the point that I don't sell anything on eBay (and buy all everything on eBay if price is the same).

          • nancyminusone 9 hours ago

            Sell on eBay and can confirm this. eBay will side with the buyer 95% of the time, even if we can prove it was their fault. Maybe they side with scam sellers more.

        • johnisgood 14 hours ago

          > There is this fallacy, the 'Sunk Cost' fallacy, where people accept a substandard result because they feel they have already invested in the result and don't want to 'throw away' that investment.

          I was going to say the same thing. The car on the picture may not have a broken headlight, and the one in reality may, but if it takes for the person >2 hours just to visit that car, they may still end up buying it anyway as they have already invested too much time (and possibly money) into it.

      • rendaw 20 hours ago

        People use transformative filters on their faces on dating apps all the time. If you show up and find someone with a completely different face, is there any chance of romance? I have no idea... the best I can guess is

        - No, but people do it anyway due to anxiety

        - People can be pressured, the trick is to meet them the first time

        - People say they care about faces, but don't actually care about faces

        • johnisgood 14 hours ago

          I am not attractive. Thankfully once I am being given the chance to have a conversation with people, after that, they find me attractive regardless of my appearance, in fact, I am more attractive now in their eyes due to the way "I am". Oftentimes all it takes is a deeper conversation.

          It happened to me, too. I did not find someone particularly attractive, but their experiences, their views of relationships, the world, and so forth somehow ended up making them look more attractive.

    • riffraff 18 hours ago

      TBF, "reimagining" has been done by real estate agents and sellers before LLMs, it just required more skills.

      • conradfr 17 hours ago

        As an additional photo it's a valid usecase, not as the only one.

        • joshstrange 12 hours ago

          I’m current house shopping and I think I’m pro or neutral on “AI staging” as long as it’s clearly labeled (which has been the case).

          Especially when the house is vacant/empty, it helps to see a proposed layout so you can imagine living there.

  • sync a day ago

    FYI, your Input and output URLs are the same (I thought I was crazy for a sec trying to spot the differences)

    • M4v3R a day ago

      whoops, sorry about that, fixed

  • bakkoting a day ago

    Kontext is probably better at this specific task, if that's what Mistral is using. Certainly faster and cheaper. But:

    OpenAI just yesterday added the ability to do higher fidelity image edits with their model [1], though I'm not sure if the functionality is only in the API or if their chat UI will make use of this feature too. Same prompt and input image: [2]

    [1] https://x.com/OpenAIDevs/status/1945538534884135132

    [2] https://i.imgur.com/w5Q0UQm.png

  • pablonaj a day ago

    They are using Flux Kontext from Black Forest Labs, fantastic model.

    • koakuma-chan a day ago

      So Mistral is just hosting a Flux model?

      • Squarex a day ago

        Yes, but it's great that they are both made by european companies.

    • littlestymaar 12 hours ago

      Ah! That was my intuition as well but do you have a source for that?

  • vunderba 8 hours ago

    That's because they're leveraging BFL models (almost assuredly Kontext) - it's mentioned in the release notes.

    The input image is scaled down to the closest aspect ratio of approximately 1 megapixel.

    I ran some experiments with Kontext and added a slider so you can see the before / after of the isolated changes it makes without affecting the entire image.

    https://specularrealms.com/ai-transcripts/experiments-with-f...

  • joshcartme a day ago

    Wow, that really is amazing!

    I couldn't help but notice that you can still see the shadows of the rips in the fixed version. I wonder how hard it would be to get those fixed as well.

  • shaky-carrousel a day ago

    It messed up the titles of the books.

    • Lerc a day ago

      That might be autoencoder loss rather than the image generation itself. It's hard to tell without doing a round-trip using just the autoencoder without any generation, but it kind-of has the look of that sort of loss.

  • dkga 15 hours ago

    That‘s very interesting, thanks for sharing!

    Incidentally and veering off topic, I find it extremely annoying that to open both pictures I need to click numerous times to avoid receiving unwanted cookies (even if some are „legitimate“, implying others are not). A further nuisance is from the fact that multiple websites have the same cookies vendor pop-up, suggesting there is a „cookies-as-a-service“ vendor of some sort.

  • davidwritesbugs 13 hours ago

    interestingly the shadows of the tears on the wall didn't get fixed, but a very convincing job otherwise

  • totetsu 19 hours ago

    Can anyone point to a good explanation of how these multi-modal text and image models are set up architecturally.. is there like a shared embedding space? or is it lots of integrations..

    • Zacharias030 16 hours ago

      Not my area of expertise, but here you go.

      I don’t know how much tool use there is these days of the llm „just“ calling image generation models, with a bunch of prompt reformulation for the text-to-image model which is most likely a „steerable“ diffusion model (really nice talks by Stefano Ermon on youtube!).

      Actually multimodal models usually have a vision encoder submodel that translates image patches into tokens and then the pretrained llm and vision model are jointly finetuned. I think reading the reports about gemma or kimi VL will give a good idea here.

  • sylware 13 hours ago

    celeste? CELESTE?

    3 thumbs up. Part of the community which _knows_ what is peak gaming (and natively on elf/linux...)

trilogic a day ago

Finally EU is waking up. Proud of it. I am switching asap my Openai contract finishes to Mistral. We got to support EU, Viva La France.

  • jug a day ago

    Yeah honestly I'm just waiting for Mistral Large 3. They've hinted about it. It's probably going to become the new language model in Le Chat, and imminent:

    From the "One more thing" on the Mistral Medium 3 blog post in May:

    > With the launches of Mistral Small in March and Mistral Medium today, it’s no secret that we’re working on something ‘large’ over the next few weeks. With even our medium-sized model being resoundingly better than flagship open source models such as Llama 4 Maverick, we’re excited to ‘open’ up what’s to come :)

    That version ought to close the gap to large top models today enough to not matter much anymore. And the Cerebras speed will/does make it feel awesome compared to ChatGPT.

    • trilogic 16 hours ago

      I just hope Mistral find a really cold place (in the alps) for their servers so the GPU will not melt, cause I can foresee 500 million users in the next 2 years. Coding plans of 20/30 euro/month will attract users worldwide and raise the progress in competitivity/innovation of EU, as nr 1 in the world. Now time to invest not to be greedy. And step up the Dutch for the chip production.

      • kergonath 13 hours ago

        > in the alps

        We don’t need another heat source there, the glaciers are already having a hard time.

  • GuB-42 10 hours ago

    Mistral has been there for quite a while (2 years is "a while" in this field).

    They are very good at making smaller models, not the smartest or the most knowledgeable, but you generally get pretty clean results quickly. Also, in my experience, they are less heavy handed than others when it comes to censorship.

  • okasaki 12 hours ago

    A company owned by US investors, running on US infra, in a "strategic partnership with Microsoft", is "the EU waking up"? What a joke.

    • fakepropaganda 11 hours ago

      while once again conflating what individuals are doing with the EU - a political entity

    • maelito 10 hours ago

      Where can I see the ownership by country ?

      What about their infra ? Nothing's running in France ?

tdhz77 a day ago

I’m struggling with MRF. Model Release Fatigue. It’s a syndrome of constantly context switching new large models. Claude 4, gpt, llama, Gemini 2.5, pro-mini, mistrial.

I fire off the ide switch the model and think oh great this is better. I switch to something that worked before and man, this sucks now.

Context switching llm, Model Release Fatigue

  • reilly3000 a day ago

    Not to invalidate your feelings of fatigue, but I’m sure glad that there are a lot of choices in the marketplace, and that they are innovating at a decent clip. If you’re committed to always be using the best of all options you’re in for a wild ride, but it beats stagnation and monopoly.

    • ivape a day ago

      We’re also headed into a world where there will be very few open weight models coming out (Meta going closed source, not releasing Behemoth). This era of constant model releases may be over before it even started. Gratitude definitely needs to be echoed.

      • randomNumber7 a day ago

        I don't agree with that. I didn't expect we ever get open weight models close to the current state of the art, yet china delivered some real burners.

      • echelon a day ago

        If China stays open, then the rest of the world will build on open. I'm frankly shocked that a domestic player isn't doing this.

        Fine tuning will work for niche business use cases better than promises of AGI.

        • seszett a day ago

          > If China stays open, then the rest of the world will build on open

          I was listening to a Taiwanese news channel earlier today and although I wasn't paying much attention, I remember hearing about how Chinese AIs are biased towards Chinese political ideas and that some programme to create a more Taiwanese-aligned AI was being put in place.

          I wouldn't be surprised if just for this reason, at least a few different open models kept being released, because even if they don't directly bring in money, several actors care more about spreading or defending their ideas and IAs are perfect for that.

          • mark_l_watson 10 hours ago

            I don’t disagree, but I feel comfortable enough using Moonshot’s Kimi K2 API for engineering use cases. It is also good that the model can be used via USA based providers.

          • outworlder a day ago

            It makes sense that they would be trojan horses.

            • bsenftner 13 hours ago

              I'm expecting any day now (they are slow thinkers over there) xAI to start releasing Christian Fascist centric LLMs with a distorted angry Jesus and the entire Bible rewritten for Fascist authoritarianism. Any day now.

        • kakapo5672 a day ago

          It's curious that China is carrying the open banner nowadays. Why is that?

          One theory is that they believe the real endpoint value will be embodied AIs (i.e. robots), where they think they'll hold a long-term competitive advantage. The models themselves will become commoditized, under the pressure of the open-source models.

  • bee_rider a day ago

    A major reason I haven’t really tried any of these things (despite thinking they are vaguely neat). I think I will wait until… 2026, second half, most likely. At least I’ll check if we have local models and hardware that can run them nicely, by then.

    Hats off to the folks who have decided to deal with the nascent versions though.

    • Nezteb a day ago

      Depending on the definition of "nicely", FWIW I currently run Ollama sever [1] + Qwen Coder models [2] with decent success compared to the big hosted models. Granted, I don't utilize most "agentic" features and still mostly use chat-based interactions.

      The server is basically just my Windows gaming PC, and the client is my editor on a macOS laptop.

      Most of this effort is so that I can prepare for the arrival of that mythical second half of 2026!

      [1] https://github.com/ollama/ollama/blob/main/docs/faq.md#how-d...

      [2] https://huggingface.co/collections/Qwen/qwen25-coder-66eaa22...

      • mark_l_watson 10 hours ago

        +1 Ollama and Qwen coder is amazingly effective, even running on my modest M2 32G mac mini.

      • Kostic a day ago

        Agentic editing is really nice. If on VSCode, Cline works well with Ollama.

      • QRY a day ago

        Thanks for sharing your setup! I'm also very interested in running AI locally. In which contexts are you experiencing decent success? eg debugging, boilerplate, or some other task?

        • bogzz a day ago

          I'm running qwen via ollama on my M4 Max 14 inch with the OpenWebUI interface, it's silly easy to set up.

          Not useful though, I just like the idea of having so much compressed knowledge on my machine in just 20gb. In fact I disabled all Siri features cause they're dogshit.

    • Uehreka a day ago

      When ChatGPT, then Llama, then Alpaca came out in rapid succession, I decided to hold off a year before diving in. This was definitely the right choice at the time, it’s becoming less-the-right-choice all the time.

      In particular it’s important to get past the whole need-to-self-host thing. Like, I used to be holding out for when this stuff would plateau, but that keeps not happening, and the things we’re starting to be able to build in 2025 now that we have fairly capable models like Claude 4 are super exciting.

      If you just want locally runnable commodity “boring technology that just works” stuff, sure, cool, keep waiting. If you’re interested in hacking on interesting new technology (glances at the title of the site) now is an excellent time to do so.

      • bee_rider 4 hours ago

        I wouldn’t want to become dependent on something like OpenAI, at least not until we see what the “profitable” version of the company is.

        If they have to enshiffify, I don’t want that baked into my workflow. If they have to raise prices, that changes the local vs remote trade off. If they manage to lower prices, then the cost of running locally will be reduced as well.

        I’m also not sure what the LLMs that I’d want to use look like. No real deal-maker applications have shown up so far; if the good application ends up being something like “integrate it into neovim and suggest completions as you type” obviously I won’t want to hit the network for that.

        Early days still.

      • ikt 21 hours ago

        i don’t quite see the point in waiting, if you’re using something like lm studio just download the latest and greatest and you’re on your way, where is the fatigue part?

        i can understand maybe if you’re spending hours setting it up but to me these are download and go

        • Uehreka 12 hours ago

          Yeah, that’s why I said waiting a year after Llama v1 was good. By that point llama.cpp, LM Studio and Ollama were all pretty well established and a lot of low-hanging fruit around performance and memory mapping stuff was picked.

    • randomNumber7 a day ago

      It is completely unreasonable to buy the hardware to run a local model and only use it 1% of the time. It will be unreasonable in 2026 and probably very long after that.

      Maybe s.th. like a collective that buys the gpu's together and then uses them without leaking data can work.

      • ikt 21 hours ago

        over time you would assume the models will get more efficient and the hardware will get better to the point that buying a massive new gpu with boatloads of vram is just not necessary

        maybe 128gb of vram becomes the new mid tier model and most llms can fit into this nicely and do everything one wants in an llm

        given how fast llms are progressing it wouldn’t surprise me if we reach this point by 2030

    • nosianu a day ago

      I have a modified tiered approach, that I adopted without consciously thinking hard about it.

      I use AI mostly for problems on my fringes. Things like manipulating some Excel table somebody sent me with invoice data from one of our suppliers and some moderately complex question that they (pure business) don't know how to handle, where simple formulas would not be sufficient and I would have to start learning Power Query. I can tell the AI exactly what I want in human language and don't have to learn a system that I only use because people here use it to fill holes not yet served by "real" software (databases, automated EDI data exchange, and code that automates the business processes). It works great, and it saves me hours on fringe tasks that people outsource to me, but that I too don't really want to deal with too much.

      For example, I also don't check various vendors and models against one another. I still stick to whatever the default is from the first vendor I signed up with, and so far it worked well enough. If I were to spend time checking vendors and models, the knowledge would be outdated far too quickly for my taste.

      On the other hand, I don't use it for my core tasks yet. Too much movement in this space, I would have to invest many hours in how to integrate this new stuff when the "old" software approach is more than sufficient, still more reliable, and vastly more economical (once implemented).

      Same for coding. I ask AI on the fringes where I don't know enough, but in the core that I'm sufficiently proficient with I wait for a more stable AI world.

      I don't solve complex sciency problems, I move business data around. Many suppliers, many customers, different countries, various EDI formats, everybody has slightly different data and naming and procedures. For example, I have to deal with one vendor wanting some share of pre-payment early in the year, which I have to apply to thousands of invoices over the year and track when we have to pay a number of hundreds or thousands of invoices all with different payment conditions and timings. If I were to ask the AI I would have to be so super specific I may as well write the code.

      But I love AI on the not-yet-automated edges. I'm starting to show others how they can ask some AI, and many are surprised how easy it is - when you have thee right task and know exactly hat you have and what you want. My last colleague-convert was someone already past retirement age (still working on the business side). I think this is a good time to gradually teach regular employees some small use cases to get them interested, rather than some big top-down approach that mostly creates more work and many people then rightly question what the point is.

      About politically-touched questions like whether I should rather use an EU-made AI like the one this topic is about, or use one from the already much of the software-world dominating US vendor, I don't care at this point, because I'm not yet creating any significant dependencies. I am glad to see it happening though (as an EU country citizen).

      • bee_rider a day ago

        > About politically-touched questions like whether I should rather use an EU-made AI like the one this topic is about, or use one from the already much of the software-world dominating US vendor, I don't care at this point, because I'm not yet creating any significant dependencies. I am glad to see it happening though (as an EU country citizen).

        Another nice thing about waiting a bit—one can see how much (if any) the EU models get from paying the “do things somewhat ethically” price. I suspect it won’t be much of a penalty.

  • vouaobrasil a day ago

    An alternative: don't use LLMs. Focus on the enjoyment of coding, not on becoming more efficient. Because the lion's share of the gains from increased efficiency are mainly going to the CEOs.

    • freedomben a day ago

      This might be good short term advice, but in the medium and long term I think devs who don't use any AI will start to be much slower at delivery than devs who do. I'm alreay seeing it IRL (and I'm not a fan of AI coding, so this sucks for me)

      • jdiff a day ago

        Good news for you then, this idea is less and less born out by the data. The productivity and efficiency gains aren't there, so there's no reason to be compelled by the spectre of obsolescence. The models may be getting better, but it doesn't seem to be actually changing much for programming. The illusion of busywork, perhaps, is swallowing up the decreased mental bandwidth in constant context switching.

      • ivape a day ago

        Slower in initial delivery maybe, but the maintenance and debugging of production applications requires intimate knowledge of the code base usually. The amount of code AI writes will require AI itself to manage it since no human would inundate themselves with that much code. Will it be faster even so? We simply won’t know because those vibe coded apps have just entered production. The horror stories can’t be written yet because the horror is ongoing.

        I’m big on AI, but vibe coding is such a fuck around and find out situation.

        • freedomben a day ago

          Oh yeah, I totally agree. Vibe coding is not (anytime soon at least) going to be a thing.

          But using AI tools for things like completing simple functions (co-pilot) or asking questions about a codebase can still be huge time savers. I've also had really good success with having AI generate me basic scripts that would have taken 45 minutes of work, but it gets me a working script in 3. It's not the revolution that's been promised, but it definitely makes me faster even though I don't like it

          • javawizard a day ago

            This. If there's one thing I've found AI to be a huge timesaver for, it's writing things that interact with libraries/frameworks/codebases that have an atrociously large surface area. AI can sift through the noise so much faster than I can and get me going down the right pack in way less time.

            (Aside: Hi Ben! If you are who I think you are, we started at the same company on the same day back in August of 2014.)

        • jdiff a day ago

          Plenty of small FAFO stories circulate already. There will certainly be more. Lots of demonstration code out there in the training data meant only for illustrative purposes, and all too often vibe coding overlooks the rock bottom basics of security.

    • wahnfrieden a day ago

      This is HN, we are not all wage workers here

      For wage workers, not learning the latest productivity tools will result in job loss. By the time it is expected of your role, if you have not learned already, you won't be given the leniency to catch up on company time. There is no impactful resistance to this through individual protest, only by organizing your peers in industry

      • gitremote 21 hours ago

        What does wage versus salary have to do with anything?

        • wahnfrieden 18 hours ago

          Salary is a specific type of wage

      • wahnfrieden 21 hours ago

        I would like downvotes to explain what’s wrong

  • sva_ a day ago

    All the competition is great to me. I'm using premium models all the time and barely spent a few euro on them, as there's always some offers that are almost free if you look around.

  • emilsedgh a day ago

    Why do you even follow? Just stick to one that works well for you?

    • barbazoo a day ago

      Totally, I feel like though you do have to pay some attention for example in the context I'm working on, for the last while, Gemini was our gold standard for code generation whereas today, Claude subjectively produces the better results. Sure you can stick to what worked abut then you're missing the opportunity to be more productive or less busy, whichever one you choose.

      • exe34 a day ago

        I remember the days when I was looking for the perfect note-taking system/setup - I never achieved anything with it, I was too busy figuring out the best way to take notes.

        • barbazoo a day ago

          Once we find the best way though...

          • exe34 11 hours ago

            Yep, now I have a directory of org files.

    • tartoran a day ago

      FOMO may be one of the reasons amongst others.

  • didibear a day ago

    I believe perfs of previous versions are worse because providers reallocate resources to newer versions. Also because of training data cut-off to previous years. This is what happened between claude sonnet 3.5 and 3.7.

    Personally I only use Claude/Anthropic and ignore other providers because I understand it the more. It's smart enough, I rarely need the latest greatest.

  • zamadatix a day ago

    Much like with new computer hardware, announcements are constant but they rarely entice me to drop one thing and switch to another. If an average user picked a top 3 option last year and stuck with them through now you didn't really miss out on all that much, even if your particular choice wasn't the absolute latest and greatest the entire time.

    • wahnfrieden a day ago

      Sticking with one year old models would mean no o3 which is a huge loss for dev work

  • criemen a day ago

    I totally get it. Due to my work, I mostly keep up with new model releases, but the pace is not sustainable for individuals, or the industry. I'm hoping that model releases (and the entire development speed of the field) will slow down over time, as LLMs mature and most low-hanging fruits in model training have been picked. Are we there yet? Surely not.

  • mrcwinn a day ago

    What a luxury!

    One way to avoid this: stick with one LLM and bet on the company behind it (meaning, over time, they’ll always have the best offering). I’ve bet on OpenAI. Others can make different conclusions.

  • Ey7NFZ3P0nzAe 17 hours ago

    Using litellm and/or openrouter.ai really makes it painless

  • sunaookami a day ago

    You only need Claude and GPT. Everything else is not worth your time.

raphaelj 12 hours ago

I've been trying to use other LLM providers than OpenAI over the past few weeks: Claude, Deepseek, Mistral, local Ollama ...

While Mistral might not have the best LLM performances, their UX is IMO the best, or at least a tie with OpenAI's:

- I never had any UI bug, while these were common with Claude or OpenAI (e.g. a discussion disappearing, LLM crashing mid-answer, long context errors on Claude ...);

- They support most of the features I liked from OpenAI, such as libraries and projects;

- Their app is by far the fastest, thanks to their fast reply feature;

- They allow you to disable web-search.

  • mark_l_watson 10 hours ago

    It is painful, but I have done the same thing: dropping any paid use of OpenAI. For years, basically since I retired from managing a deep learning team at Capital One, I have spent a ton of time experimenting with all LLM options.

    Enough! I just paid for a year of Gemini Pro, I use gemini-cli for free for small sessions, turn on using my API key for longer sessions to avoid timeout, and most importantly: for API use I mostly just use Gemini 2.5-flash, sometimes -pro, and Moonshot’s Kimi K2. I also use local models on Ollama when they are sufficient (which is surprisingly often.)

    I simply decided that I no longer wanted the hobby of always trying everything. I did look again at Mistral a few weeks ago, a good option, but Google was a good option for me.

behnamoh a day ago

At this point, the entire AI industry seems to just copy OpenAI for the most part. I cannot help but notice that we have the same services just offered by different companies. The amount of innovation in this build is not that high actually.

  • klntsky a day ago

    They are not the same service. There is A LOT of difference between offerings if you actually use the models for daily tasks like coding.

    • lossolo a day ago

      It really depends on what you're working on and what was included in the training data of the model you used. From a model architecture point of view, they're basically all the same, the main difference lies in the training data.

      • klntsky 21 hours ago

        Also not true. Even the API surface differs

        • lossolo 18 hours ago

          API is irrelevant. It's like saying that talking to John via Telegram or WhatsApp is like talking to a different person.

          • PxldLtd 11 hours ago

            I agree here a fair bit, not that I'm an expert or anything. I'd like to see some progress on some of the neuronal modelling. It seems since 'attention is all you need' they've locked into this LLM stack and gluing up models as data pipelines rather than integrating different NN's on a deeper level.

  • mirekrusin a day ago

    Whole world is now building stuff on top of `f(input: string): string` function - they're going to be similar.

  • cubefox a day ago

    > At this point, the entire AI industry seems to just copy OpenAI for the most part

    Well, OpenAI copied the Deep Research feature from Google. They even used the same name (as does Mistral).

    • scoot a day ago

      Where does Perplexity sit in this race? I was aware of their "deep research" first, but that doesn't mean it was.

      They've recently removed (limited) use of it from the free plan, so I guess it was costing more than they were making from paid subscribers

    • cowpig a day ago

      Weird that you're being downvoted for stating a fact.

      All of the major labs are innovating and copying one another.

      Anthropic has all of the other labs trying to come up with an "agentic" protocol of their own. They also seem to be way ahead on interpretability research

      Deepseek came up with multi-headed latent attention, and publishing an open-source model that's huge and SOTA.

      Deepmind's way ahead on world models

      ...

  • croes a day ago

    It’s basically everywhere the same technology. Maybe a difference in training data and computing power.

  • scotty79 a day ago

    That's what a healthy competition in the free market looks like. Things like Apple that "stay innovative" for decades are aberration caused by monopolistic gatekeeping.

    • behnamoh a day ago

      > Things like Apple are aberration.

      This used to be a good example of innovation that is hard to copy. But it doesn't apply anymore for two reasons:

      1. Apple went from being an agile, pro-developers, creative company to an Oracle-style old-board milking-cow company; not much innovation is happening at Apple anymore.

      2. To their surprise, much of what they call "innovative" is actually pretty easy to replicate on other platforms. It took 4 hours for Flutter folks to re-create Liquid Glass...

      • overfeed a day ago

        > This used to be a good example of innovation that is hard to copy.

        Steve Jobs did say they "patented the hell out of [the iPhone]" and went about saber-rattling, then came the patent wars which proved that Apple also rely on innovation by others, and that patent workarounds would still result in competitive products, and things calmed down afterwards.

    • croes a day ago

      They often copied others but because Apple is more popular they got the fame for „their“ innovation.

Aissen a day ago

The Voxtral release seemed interesting, because it brought back competitive open source audio transcription. I wonder if it was necessary to have an LLM backbone (vs a pure-function model) though, but the approach is interesting.

  • nomad_horse a day ago

    > brought back competitive open source audio transcription

    Bear in mind that there are a lot of very strong _open_ STT models that Mistral's press-release didn't bother to compare to, making impression they are the best new open thing since Whisper. Here is an open benchmark: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard . The strongest model Mistral compared to is Scribe, ranked 10 here.

    This benchmark is for English, but many of those models are multilingual (eg https://huggingface.co/nvidia/canary-1b-flash )

    • espadrine a day ago

      The best model there is 2.5B parameters. I can believe that a model 10x bigger is somewhat better.

      One element of comparison is OpenAI Whisper v3, which achieves 7.44 WER on the ASR leaderboard, and shows up as ~8.3 WER on FLEURS in the Voxtral announcement[0]. If FLEURS has +1 WER on average compared to ASR, it would imply that Voxtral does have a lead on ASR.

      [0]: https://mistral.ai/news/voxtral

      • nomad_horse a day ago

        There are larger models in there, a 8B and a 6B. By this logic they should be above 2B model, yet we don't see this. That's why we have open standard benchmarks, to measure this directly - not hypothesize by the models' sizes or do some cross-dataset arithmetics.

        Also note that, Voxtral's capacity is not necessarily all devoted to speech, since it "Retains the text understanding capabilities of its language model backbone"

    • jiehong a day ago

      I just can’t find dictation apps for Mac using those models except for open whisper.

      IBM’s granite models seems multilingual and well ranked, but can’t find any app using it.

      Anybody aware of a dictation app using one of those "better" models?

aquir 15 hours ago

This is quite good and it's bit cheaper than ChatGPT or Claude. I'll give this a try for a month.

monkeydust 14 hours ago

Does anyone have a good way for doing high-stakes deep research across a number of models? i.e. send to OpenAI, Anthropic, Gemini then evaluate (perhaps LLM as judge)? Does that yield some performance uplift or make it worse?

rawgabbit a day ago

I have been a heavy user of ChatGPT. I guess I should try out LeChat. What can I expect? Are they basically the same tool with slight differences?

jddj a day ago

The examples aren't great. The personal planning one for example answers the prompt better without deep research than with (with answers only the Visas point)

erlend_sh a day ago

At this point I care far more about an open (and credibly ethically-sourced) data model than open code, open weights or whatever. I wanna use models that can tell me whether or not a resource I’m pointing to is in its training data or not.

bangaladore a day ago

If you haven't tried OpenAI's deep research feature, you are missing out. I'm not sure of any good alternatives, I've tried Google's, and I'm not impressed.

There is a lot of value to say engineers doing tradeoff studies using these tools as a huge head start.

  • barrell 16 hours ago

    Every time I need a deep research (1-2x per month) I ask all the providers. OpenAI’s deep research has consistently performed the worst by a significant margin. If you’re just using OpenAI’s deep research, then you’re the one missing out ;-)

  • ripley12 a day ago

    Anthropic's Research is pretty good; I'd say on par with OpenAI.

    Agreed about Google, accuracy is a little better on the paid version but the reports are still frustrating to read through. They're incredibly verbose, like an undergrad padding a report to get to a certain word count.

    • the_duke a day ago

      That's Gemini Pro now in general. The initial preview was pretty good, but the newer iterations are incredibly verbose.

      "Be terse" is a mandatory part of the prompt now.

      Either it's to increase token counts so they can charge more, or to show better usage growth metrics internally or for shareholders, or just some odd effects of fine tuning / system prompt ... who knows.

  • criemen a day ago

    Perplexities isn't bad? Although I lack the OpenAI subscription to compare.

  • crmd a day ago

    It’s been invaluable to me for market research related to starting a business. It’s like having a bright early career new hire research assistant/product manager “on staff” to collaborate with.

  • ankit219 a day ago

    Try one from Kimi 2 as well. I was surprised how good it turned out to be.

  • freedomben a day ago

    I've gotten pretty different results from OpenAI and Gemini, though it's hard to say one is better/worse than the other. Just different

  • sunaookami 11 hours ago

    All of them create an overly verbose report that reads like typical AI slop. Especially Gemini - it's good that it reads like 200 sources but it creates a page long "report" when you e.g. only want to compare features or prices.

BrunoWinck a day ago

I needed that. Now I have it :)

maelito a day ago

Can we expect Voxtral in the Futo Android keyboard ?

  • Ey7NFZ3P0nzAe 17 hours ago

    Probably not: FUTO uses a slightly tweaked whisper that does not pad all audios to be 30s long as the original whisper does. It's not the end of the world to retrain this on more recent versions but they have not done so when whisper v3 came out, nor whisper v3 turbo. And voxtral has way more capabilities than stt that would then be wasted in a stt only setting imho.

htrp a day ago

is anyone doing online reviews of model performance ? (I know artificial analysis does some work on infrastructure and has an intelligence index)

  • reckless a day ago

    The aggregate picture only tells you so much.

    Sites like simonwillison.net/2025/jul/ and channels like https://www.youtube.com/@aiexplained-official also cover new model releases pretty quickly for some "out of the box thinking/reasoning" evaluations.

    For me and my usage I can really only tell if I start using the new model for tasks I actually use them for.

    My personal benchmark andrew.ginns.uk/merbench has full code and data on GitHub if you want a staring point!

lostmsu a day ago

Is Voice available on the free tier? I signed up just to try it, but all I see is the dictation mode.

  • druskacik 15 hours ago

    It seems their "Voice mode" is just a dictation mode, not like the Voice Mode from e.g. OpenAI. Even their demo[0] just shows a dictation.

    I am a bit disappointed, the headline made me think they offer a voice mode similar to OpenAI.

    [0] https://www.youtube.com/watch?v=CEP-xIIfuhs

fvv 14 hours ago

image editing.. safety restrictions max level.. welcome to EU

chickenzzzzu a day ago

[flagged]

  • dust42 a day ago

    Actually Le Chat is french for the (male) cat. Also 'Le Chat' is a well known laundry detergent (of german origin - Henkel company). The headline 'Le Chat takes a deep dive' means 'the cat takes a deep dive'. As there is a cooperation with (german) Black Forest Labs, this is all pretty funny for a french speaking person. 'La Chatte' is the female cat. And also colloquial for female private parts.

    • saratogacx a day ago

      Mistral plays this up with their M logo being a cat face and at the bottom of most of their non-document pages having an animated pixel cat.