David Pawlan
Co-Founder
Hey y’all,
From AI scientists to shady leaderboard tactics and mini models beating giants, this week’s AI news is full of shake-ups. Microsoft’s new reasoning models are surprisingly powerful (and phone-sized), Claude got a serious productivity upgrade, and decentralized models might just be the next big thing.
Let’s break it all down — fast.
🏆 LMArena’s leaderboard credibility questioned
A damning study from Cohere Labs, MIT, and Stanford suggests the popular LMArena leaderboard might be rigged in favor of tech giants like OpenAI and Google. Allegations include private testing, silent model removals, and biased sampling. LMArena denies wrongdoing, but the episode casts doubt on benchmark integrity — just as Llama 4 Maverick’s drama fades.
Why it matters: Leaderboards shape perception — and funding. If they’re gamed, the whole AI model race loses meaning.
🧠 Microsoft and Anthropic go small, go smart
Microsoft’s new Phi-4 models show that small can be mighty. The flagship 14B-parameter Phi-4-reasoning outpaces larger models like o1-mini and even holds up against DeepSeek's 671B titan. Meanwhile, Anthropic’s new Claude Integrations eliminate the complexity of MCPs, letting Claude plug into apps like Zapier or Square and fetch live data or web results for 45 minutes.
Why it matters: Power is shifting from bloated models to nimble, task-specific ones that run on your laptop or smartphone — no data center required.
🔬 AI scientists enter the chat
FutureHouse launched “AI scientists” — agents that can review research, answer deep scientific questions, and in one case (hello, Phoenix), help you design new chemistry experiments from scratch. This push into public-facing research agents is backed by none other than Eric Schmidt.
Why it matters: It’s a glimpse into a future where AI doesn’t just summarize papers — it creates the next breakthroughs.
🌐 A new kind of AI model: decentralized and user-owned
Vana and Flower Labs are teaming up to build a “user-owned” large language model, Collective-1. The model is powered by volunteered compute and personal data, with the goal of reaching 100B parameters.
Why it matters: This decentralized approach could let smaller players compete with tech giants — and give users control over their data (finally).
New this week:
🎶 Suno v4.5 debuts 8-minute AI songs and better genre control
📻 Australian radio ran an AI host for 6 months — no one noticed
🕵️ Google’s AMIE now reads medical images during diagnosis
🛰️ AI is now uncannily good at guessing where a photo was taken
Conduct Recursive Research Iterations
Prompt: Act as a recursive research optimizer. Analyze results, identify gaps, refine searches, and repeat until you hit max-quality insights.
Use it when you're deep in research mode and want to push past surface-level summaries.
Benchmark trust is breaking down, Microsoft and Anthropic are proving small can be powerful, and AI scientists are moving from theory to hands-on discovery. Meanwhile, decentralized models and Claude’s new integrations offer a peek at AI’s more open, connected future.
Catch you on the next iteration,
—David
Read by 10,000+ AI professionals and builders.
Claude can now browse the web, Meta’s glasses may recognize faces, and OpenAI hires Fidji Simo as it shifts from lab to product powerhouse.
AI brings down traditional search, France enters the AI race, and OpenAI expands Stargate globally. Plus, Figma, Claude, and Hugging Face make major moves.
Gemini 2.5 Pro dominates with video-to-app power. OpenAI’s restructuring stirs backlash. HeyGen and Lightricks redefine AI video. Zapier makes finance fun.