Reddit - The AI Report

Anthropic and OpenAI released flagship models 27 minutes apart — the AI pricing and capability gap is getting weird

6Feb

By The AI ReportNo Comments

News Feed, Reddit

Anthropic and OpenAI released flagship models 27 minutes apart — the AI pricing and capability gap is getting weird

Anthropic shipped Opus 4.6 and OpenAI shipped GPT-5.3-Codex on the same day, 27 minutes apart. Both claim benchmark leads. Both are right -- just on different benchmarks. Where each model leads Opus 4.6 tops reasoning tasks: Humanity's Last Exam (53.1%), GDPval-AA (144 Elo ahead of GPT-5.2), BrowseComp (84.0%). GPT-5.3-Codex takes coding: Terminal-Bench 2.0 at 75.1% vs Opus 4.6's 69.9%. The pricing spread is hard to ignore Model Input/M Output/M Gemini 3 Pro $2 $12.00 GPT-5.2 $1.75 $14.00 Opus 4.6 $5.00 $25.00 MiMo V2 Flash $0.10 $0.30 Opus 4.6 costs 2x Gemini on input. Open-source alternatives cost 50x less. At some point the benchmark gap has to justify the price gap -- and for many tasks it doesn't. 1M context is becoming table stakes Opus 4.6 adds 1M tokens (beta, 2x prici...

Early user test of a persistent AI narrative system with kids — some unexpected engagement patterns

5Feb

By The AI ReportNo Comments

News Feed, Reddit

Early user test of a persistent AI narrative system with kids — some unexpected engagement patterns

I ran a small real-world test today with two kids (ages 8 and 11) using a long-running AI story world I’ve been experimenting with. Instead of one-shot story generation, the system maintains a persistent world state where choices carry over and shape future events. I let them pick the setting — they chose a Minecraft × Harry Potter mashup where they play wizards trying to defeat the Ender Dragon. One thing that made a huge difference: I used their real names as the characters, and the story started in their actual school. The engine generated story text and illustrations each round. They made all the choices. After about 10 rounds, they were constantly laughing, debating which option to pick, and building on each other’s ideas. It felt much more like co-creating a world than listening to a...

‘In the end, you feel blank’: India’s female workers watching hours of abusive content to train AI

5Feb

By The AI ReportNo Comments

News Feed, Reddit

‘In the end, you feel blank’: India’s female workers watching hours of abusive content to train AI

submitted by /u/tekz [link] [comments]

4Feb

By The AI ReportNo Comments

News Feed, Reddit

Can A.I. Save Your Life? – Freakonomics

It highlights a hilarious paradox: we have futuristic organ transplants, yet hospitals still run on fax machines and pagers (even drug dealers ditched those in the 90s). They cover: AI Scribes: Finally ending "pyjama time" (doctors typing notes all night instead of sleeping). Diagnostics: AI finding heart disease in simple EKGs that humans completely miss. The Empathy Gap: Patients actually rated AI chatbots as more empathetic than busy human doctors. Ouch. It’s a grounded look at AI actually saving lives—assuming the doctors don’t forget how to do their jobs when the Wi-Fi goes down. Post by a LLM. submitted by /u/stapaw [link] [comments]

The 18-month gap between frontier and open-source AI models has shrunk to 6 months – what this means

4Feb

By The AI ReportNo Comments

News Feed, Reddit

The 18-month gap between frontier and open-source AI models has shrunk to 6 months – what this means

Ran a real-world test this week: Gemma 3 12B vs paid frontier models across actual business workflows. The honest assessment? 90% of tasks: no meaningful difference. 5%: frontier models worth it (pay-per-use). 5%: neither quite there yet. This matches the data - open models are catching up fast. The article explores: - Why the "gasoline doesn't matter" - only if it powers your task - The shift from "one model to rule them all" to specialized local models - Why even AGI will eventually be open-sourced (historical precedent) - The water company future: infrastructure > model quality https://www.linkedin.com/posts/azizme_activity-7424774668034842624-v1-2?utm_source=share&utm_medium=member_desktop&rcm=ACoAACX_HOcBcpTEWJ3cXyVbVqKJsi39tDHJLFY Curious what others are seeing in their do...

Alibaba releases Qwen3-Coder-Next to rival OpenAI, Anthropic

4Feb

By The AI ReportNo Comments

News Feed, Reddit

Alibaba releases Qwen3-Coder-Next to rival OpenAI, Anthropic

submitted by /u/app1310 [link] [comments]

'We're actively embracing generative AI,' Take-Two boss says, after previously expressing skepticism: 'We have hundreds of pilots and implementations across our company' | CEO Strauss Zelnick says generative AI remains a tool for enabling creators to do bigger and better things

4Feb

By The AI ReportNo Comments

News Feed, Reddit

‘We’re actively embracing generative AI,’ Take-Two boss says, after previously expressing skepticism: ‘We have hundreds of pilots and implementations across our company’ | CEO Strauss Zelnick says generative AI remains a tool for enabling creators to do bigger and better things

submitted by /u/ControlCAD [link] [comments]

Why world models will bring us to AGI, not LLMs

3Feb

By The AI ReportNo Comments

News Feed, Reddit

Why world models will bring us to AGI, not LLMs

Yann Lecun recently shared that a cat is smarter than ChatGPT and that we are never going to get to human-level intelligence by just training on text. My personal opinion is not only are they unreliable but it can be a safety issue as well in high-stakes environments like enterprises, healthcare and more. World models are fundamentally different. These AI systems build internal representations of how reality works, allowing them to understand cause and effect rather than just predict tokens. There has been a shift lately and major figures from Nvidia's CEO Jensen Huang to Demis Hassabis at Google DeepMind are talking more openly about world models. I believe we're still in the early stages of discovering how transformative this technology will be for reaching AGI. Research and application...

X offices raided in France as UK opens fresh investigation into Grok

3Feb

By The AI ReportNo Comments

News Feed, Reddit

X offices raided in France as UK opens fresh investigation into Grok

submitted by /u/esporx [link] [comments]

Elon Musk links SpaceX and xAI in a record-setting merger to boost AI

3Feb