
Enterprise hits and misses – Apple’s AI reasoning paper yields enterprise lessons, privacy breaches are (sometimes) on us, and the Spatial Web is here
Lead story – Enterprise takeaways from Apple’s paper on AI reasoning
The polarized reaction to Apple’s paper on AI reasoning disappointed me.
Increasingly, we see a division between those who defend the “intelligence” of today’s LLMs/LRMs with near-religious fervor. On the other side, we have skeptics with antipathy towards big AI, pounding at the flaws of this tech, while overlooking viable use cases. Enterprise leaders need something different altogether: they need less posturing around AGI investor candy dreamscapes, and more field lessons that will increase AI project success.
This week, Ian aimed to rectify this on diginomica. He explains the differences between Large Language Models and Large Reasoning Models (Apple tested them both):
LRMs are an attempt to take a different path. Instead of relying purely on ever-growing training data, they aim to improve performance by allocating more computing power at inference time — giving the model a chance to build and work through an internal ‘chain of thought’ intended to improve the quality of its output.
As for enterprise takeaways, Ian draws out an item I riffed on previously: the curious-but-useful note that LRMs didn’t fare well with simple or complex reasoning problems (overthinking the former, and literally giving up on the latter before compute ran out) – but they functioned better in mid-level complexity. Big enterprise clue: find the right tool for the right use case, LLM or LRM or not, without religion. Ian concludes:
Taken together, the results suggest that what looks like reasoning may often be something else entirely — pattern matching dressed up as intelligence. The models didn’t fail because the puzzles were too hard — they failed because they had nothing to copy from. In that sense, the illusion of thinking collapses exactly where novelty begins.
How about Ian’s enterprise takeways?
For enterprises, the takeaway seems clear — if your use case involves predictable structure, falls within the distribution of the model’s likely training data, and can be reliably controlled through prompting and guardrails, then today’s systems may work well. But if you’re hoping to deploy autonomous agents that solve novel problems or make complex decisions, the risks rise fast. Which is exactly why we need a map of the terrain — like the agent taxonomy I recently outlined.
I like that Ian didn’t dwell on whether these systems are “reasoning” or not. For AI project leaders, that’s not really the point. The point is the taxonomy, and the tool know-how. I’ll add a few more:
- Reasoning or not, the general principle of having an LLM review its own work via “chain of thought” can, at times, be fruitful – though it comes at a cost that someone must absorb.
- Vendors that are pursuing “test time scaling,” e.g. making the most of opportunities at inference, are worth looking at. I expect LLMs to continue to have limitations reviewing their own work, but third party verifiers can also be part of this inference activity – e.g. factual knowledge graphs, code verification tools, and mathematics engines (this is not unlike how humans might deliberate on a tough problem, seeking outside experts).
- Forget the religion and test the tools – not just via benchmark research but in your own environment, with your own data – with this constant caveat: successful testing does not necessarily result in successful scaling.
Finally, on the viral debate, readers know I land more on the skeptical side, but I don’t want to be polarized, or lose my curiosity. I happen to think there is some level of “reasoning” in some LRM examples – and the die hard skeptics tend to miss the sometimes-impressive advancements.
However, the strident attacks against Apple’s paper (“lies!” as one counter-productive LinkedIn post asserted) overlook an undeniable truth: this is hardly the only study that calls attention to reasoning limitations (so much for trashing this paper because of Apple’s own AI missteps). Subbarao Kambhampati and colleagues have published numerous papers (and given countless talks) about these limitations (Kambhampati proposes an “LLM Modulo” external tool verification framework to mitigate planning and reasoning limitations). Turing Award winner Yann LeCun, who knows a thing or two about LLMs, criticizes their planning and reasoning limitations constantly, and is in active pursuit of new approaches. This is hardly a complete list. ‘Shooting the messenger’ is never the most credible debate tactic, even moreso when there are so many messengers…
Diginomica picks – my top stories on diginomica this week
Vendor analysis, diginomica style. The diginomica road show continues: we’ve been through the entire AI agent hype cycle at least three times by now, but there are still a few more keynotes to wade through, customers to interview, and insights to be gleaned. So, the diginomica team rolls on:
Derek’s “no sleep till Margate” event-finale-tour started with Pure Storage, where he notes: “Pure Storage is pivoting from storage vendor to enterprise data trust broker, positioning itself as the validation layer that determines which data is clean, governed, and ready for AI consumption.”
A few more vendor picks, without the quotables:
Jon’s grab bag – Madeline hits on crucial AI issues in: If we want more women in AI, we need to hide the tech, says AI for Good founder Kriti Sharma. Mark Chillingworth expands the talent conversation in How refugee tech talent is filling critical skills gaps through Blue Hope Network.
If we want “intelligent AI,” we need world models. Can the so-called Spatial Web make a difference here? George explores the possibilities in What is the Spatial Web, and why should the enterprise care? Here’s why. Could this someday fuel AI agents? George picks that up in How the Spatial Web could guide de-centralized and trustworthy agentic AI to like us more: “The Spatial Web standard also supports an agent framework component that allows an agent to look at a registry of other agents to determine which other agents have the capability to meet the needs of an activity.”
My quick take: George has done some thorough reporting on this topic, and the Spatial Web (supported by open standards) is a worthy pursuit. It’s notable that many of the “AI agents” involved here might not be LLM agents, which could be, in some cases, for the better (e.g. agents oriented around causality and real world context, not the probabilistic correlations of LLMs). Caveats: I’m wary of any future vision that involves blockchain, which has never resolved its performance and scale issues (not the best weaknesses to have in an AI context). Also: “metaverses” have their own sets of problems, not to mention a bruising hype cycle of their own… But ‘decentralizing’ AI can alleviate vexing privacy and real-time compute questions. Let’s track this.
Best of the enterprise web
My top seven
- No, the 16 billion credentials leak is not a new data breach – No, it’s not a new breach, but the personal cybsecurity recommendations in this post are useful – especially for so many of us who have blurred lines between personal and work devices.
- Top five security principles driving open source cyber apps at scale – Louis Columbus keeps the security insights coming, via a slew of interviews.
- Root Causes of Digital Transformation Failure: It’s Not Just the Tech – Eric Kimberling isn’t done warning us about the perils of digital transformation gone wrong: “If your project is failing, blaming the software won’t save it. Successful digital transformation requires honest assessment, upfront alignment, and rigorous planning—not just a sleek UI or a fast go-live.”
- The hidden time bomb in the tax code that’s fueling mass tech layoffs – via reader Frank Scavo, this Quartz piece makes the case that it’s a tax code issue – not AI – that’s resulting in widespread tech layoffs: “A tax policy aimed at raising short-term revenue effectively hid a time bomb inside the growth engines of thousands of companies. And when it detonated, it kneecapped the incentive for hiring American engineers or investing in American-made tech and digital products.“
- Managing Supply Chain Planning in the World of Scarcity – Lora Cecere has not run out of supply chain cautions for us, including the proper use of agentic AI.
- Enterprise Month in Review – the Mary Meeker + Accenture trend predictions blowout – the audio edition of my latest show with diginomica contributor Brian Sommer. Our review of these vaunted trends reports got rather spicy. I also posted my podcast version of: Hot topics live at SAP Sapphire Orlando 2025 – the ASUG Hub “ask us anything” edition…
Whiffs
So, Meta is back….
Meta Invents New Way to Humiliate Users With Feed of People’s Chats With AI https://t.co/lZDiI9OE2e
-> who says you can’t be a self-annointed AI leader and keep things a bit saucy as well? LOL!
— Jon Reed (@jonerp) June 23, 2025
But Meta is not alone – and sometimes, we do the heavy lifting on our own privacy violations:
40,000 Cameras, From Bird Feeders to Baby Monitors, Exposed to the Internet https://t.co/y9CEFz0OJY
Includes pet cams, hospital cams, baby cams…
“it doesn’t take elite hacking to access these cameras; in most cases, a regular web browser and a curious mind are all it takes”…
— Jon Reed (@jonerp) June 23, 2025
I’m not sure this news was as reassuring as Anthropic intended:
Anthropic says most AI models, not just Claude, will resort to blackmail | TechCrunch https://t.co/WXs6Ar1xfq
“most leading AI models will engage in harmful behaviors when given sufficient autonomy and obstacles to their goals.”
-> b-but the “autonomous future” slide decks are…
— Jon Reed (@jonerp) June 23, 2025
If you find an #ensw piece that qualifies for hits and misses – in a good or bad way – let me know in the comments as Clive (almost) always does. Most Enterprise hits and misses articles are selected from my curated @jonerpnewsfeed.