Rendered at 20:49:58 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
vld_chk 31 minutes ago [-]
Anthropic legit builds one the strongest if not the strongest IC team in the history of computational technology. They are insanely stacked on talent, and either we will witness a legendary run, or a new LTCM
abraxas 8 minutes ago [-]
Something seems afoot at Google. The real tell will be if Demis makes a move. Jeff Dean seems more like a lifer to me.
CuriouslyC 2 hours ago [-]
Something spicy must have happened internally at Google. This rapid fire high level attrition isn't just down to the bureaucratic quagmire.
whiplash451 2 minutes ago [-]
Maybe because they know where things are going with Gemini (more ads to your face) while Anthropic might, for once, have a different story.
When personal finance is not the bottleneck anymore, the new criteria becomes "vision" and "stacked talent".
3 minutes ago [-]
kranke155 2 hours ago [-]
Is it possible they are just falling behind ?
Their newest model wasn’t really SOTA. And honestly fable 5 was the most human like model I’d ever tried. It was an incredible jump.
And recently lots of Claude users at r/ClaudeAI are noticing Opus 4.8 has really increased in capability. Not new things but maybe redirected compute. It just feels like one of the best models ever, maybe because the compute that was previously assigned to Fable has been redirected? It feels incredible.
thewebguyd 1 minutes ago [-]
> noticing Opus 4.8 has really increased in capability
I've definitely noticed it, at least for doing backend C#/dotnet. Its insanely good, I haven't had to babysit much at all this week.
xnx 42 minutes ago [-]
They almost certainly wanted 3.5 Pro out for Google IO a few weeks ago. They're still crunching on it. No ETA given. Would be fascinating to read about the behind the scenes stories (failed training run?) if they ever get told.
basch 29 minutes ago [-]
from the looks of it, 3.5 Flash is still better than most models
The idea of "falling behind" when you can leapfrog each other every six months leads me to believe it has to be more than just "falling behind" for one cycle. It's a culture, process, red tape, focus, or mandate problem of some sort. Something not as easily correctable preparing for next launch.
joe_mamba 9 minutes ago [-]
>from the looks of it, 3.5 Flash is still better than most models
I guess it depends on what you're using it for. I think it's garbage for coding and reasoning.
I use it almost daily, but for questions related to coding, solving arch linux and wine lutris issues, helping me tiwh MXLinux issues, and fiddling with wifi issues on an old rooted huawei tablet running LineageOS, it was consistently wrong, constantly giving out confident but outdated or misinformation, or hallucinating stuff. Every time I would point out it was wrong, it would re-check and keep apologizing and then repeat giving me wrong answers, and then apologising again and so on.
Basic free tier ChatGPT would blow it out of the water on those topics.
3.5 Flash seems tuned to just eyeballing answers to general purpose questions that resemble Google searches like "give me a recipe" or "give me a workout plan", or "when did Yandex move to Netherlands", not to solving complex issues that require cognition and accuracy. That's what the 3.1 Pro is better for.
I think Google just doesn't care about being the SOTA for coding, reasoning and accuracy, since they're in the ads and search business, not in the agentic coding business, so if the answers are some hallucinations that "good enough" to its search engine user base, but dirt cheap to run on their datacenter hardware, then it's more than enough for them.
AgentMasterRace 1 hours ago [-]
Gemini is super bad, grok is actually superior most of the time and that's saying something because grok also sucks.
2 hours ago [-]
michaelbuckbee 50 minutes ago [-]
Vesting schedule?
musicale 1 hours ago [-]
Name checks out.
freedomben 14 minutes ago [-]
Missed opportunity for headline: John jumper jumps to anthropic
So Mr. Jumper. You are committed? We need people longer term here. Your boss Mr. Settles is really excited about you joining.
hackerbeat 41 minutes ago [-]
Super Mario leaves Nintendo to focus on plumbing.
Iolaum 5 hours ago [-]
Two big names left GDM recently. Could be a coincidence, but where's the fun in that? :p
coderatlarge 5 minutes ago [-]
you mean shazeer?
SpyCoder77 19 minutes ago [-]
The guy who invented jumping is joining a major AI lab?!?
andrewstuart 2 hours ago [-]
John Jumper what a great name sounds like a video game action hero.
SilverElfin 3 hours ago [-]
Who?
artninja1988 3 hours ago [-]
He was leading the development of AlphaFold, the AI system that predicts protein structures for which he got the 2024 Nobel Prize in Chemistry.
yuffffley 43 minutes ago [-]
I remember that.
That was when they realized the deep learning was largely unnecessary, and they could just use their massive compute resources to brute force the problem space.
Proving that we would greatly benefit from using our compute resources for science rather than showing ads, and then we just kept showing ads.
TeMPOraL 18 minutes ago [-]
You could argue that training SOTA LLMs is pre-bruteforcing every problem everywhere all at once.
dekhn 21 minutes ago [-]
AlphaFold is based on deep learning and it's not brute force.
When personal finance is not the bottleneck anymore, the new criteria becomes "vision" and "stacked talent".
Their newest model wasn’t really SOTA. And honestly fable 5 was the most human like model I’d ever tried. It was an incredible jump.
And recently lots of Claude users at r/ClaudeAI are noticing Opus 4.8 has really increased in capability. Not new things but maybe redirected compute. It just feels like one of the best models ever, maybe because the compute that was previously assigned to Fable has been redirected? It feels incredible.
I've definitely noticed it, at least for doing backend C#/dotnet. Its insanely good, I haven't had to babysit much at all this week.
https://artificialanalysis.ai/articles/glm-5-2-is-the-new-le...
The idea of "falling behind" when you can leapfrog each other every six months leads me to believe it has to be more than just "falling behind" for one cycle. It's a culture, process, red tape, focus, or mandate problem of some sort. Something not as easily correctable preparing for next launch.
I guess it depends on what you're using it for. I think it's garbage for coding and reasoning.
I use it almost daily, but for questions related to coding, solving arch linux and wine lutris issues, helping me tiwh MXLinux issues, and fiddling with wifi issues on an old rooted huawei tablet running LineageOS, it was consistently wrong, constantly giving out confident but outdated or misinformation, or hallucinating stuff. Every time I would point out it was wrong, it would re-check and keep apologizing and then repeat giving me wrong answers, and then apologising again and so on.
Basic free tier ChatGPT would blow it out of the water on those topics.
3.5 Flash seems tuned to just eyeballing answers to general purpose questions that resemble Google searches like "give me a recipe" or "give me a workout plan", or "when did Yandex move to Netherlands", not to solving complex issues that require cognition and accuracy. That's what the 3.1 Pro is better for.
I think Google just doesn't care about being the SOTA for coding, reasoning and accuracy, since they're in the ads and search business, not in the agentic coding business, so if the answers are some hallucinations that "good enough" to its search engine user base, but dirt cheap to run on their datacenter hardware, then it's more than enough for them.
That was when they realized the deep learning was largely unnecessary, and they could just use their massive compute resources to brute force the problem space.
Proving that we would greatly benefit from using our compute resources for science rather than showing ads, and then we just kept showing ads.