Skip to content

The Real Problem with AI in Companies Is Not Tools

15 min read2917 words

The conversations I have with leadership teams about AI all sound the same now. They have the budget, a Copilot license, an enterprise Claude contract, an internal RAG project, at least one agent platform under evaluation, a Head of AI, a steering committee, and a Slack channel called #ai-transformation that nobody reads. What they do not have is anything in production that moves a real number.

The instinctive response is to ask which tool is wrong. The answer is that it is rarely the tool. After enough of these stalled programs, I am confident the bottleneck is not the model, the vendor, or the platform. The bottleneck is how the company thinks about the work AI is supposed to do, who is supposed to do it, and what counts as success. Tools are easy to swap. Mindset is what determines whether the next tool fails the same way.

The data on AI failure is no longer ambiguous

The pattern is now well measured. S&P Global Market Intelligence reported in 2025 that 42 percent of enterprises had abandoned most of their AI initiatives, up from 17 percent the year before. MIT's research on enterprise GenAI placed the production-ROI failure rate at roughly 95 percent of pilots. BCG found 74 percent of companies had no tangible value from AI investments despite the $252 billion spent globally in 2024.

These numbers track adoption, not capability. The models got more capable across this period. Pricing dropped. Tooling matured. Failure rates rose.

📊The Inversion Problem

MIT's analysis of successful enterprise AI deployments suggests roughly 70 percent of the work is people and process, 20 percent is data and infrastructure, and 10 percent is the algorithm itself. Most enterprise AI budgets invert that ratio entirely. The spend goes where the failure does not live.

The most useful framing I have read on this comes from RAND's review of failed AI projects, which traced root causes to "misunderstood problem definition" and a "technology-first mentality." That is mindset language. Not stack language.

Adoption is not usage, and the gap is where the money goes

The single most useful distinction I have learned to make when I review an AI program is between adoption and usage. Adoption is what shows up in the procurement report: seats provisioned, contracts signed, SSO configured, dashboards populated. Usage is what shows up in the work: the proposal that actually got drafted with AI, the PR that actually got reviewed with AI, the customer ticket that actually got triaged with AI. The two numbers diverge wildly in almost every enterprise I look at.

Four patterns drive that divergence, and they are the ones I see most consistently on the ground. None of them are about the model.

People keep using their free tools instead of the real workflow

The most reliable finding I have across the programs I review is that the people doing the work are already using AI heavily. They are just not using yours. They are pasting work into ChatGPT free tier on their personal phone. They are using Claude.ai signed in with a gmail address. They are running Cursor on a personal laptop against a checked-out copy of the codebase.

The reason is straightforward. The free personal tool gives them an answer in three seconds. The sanctioned enterprise tool sits behind SSO, a policy banner, a context-window restriction, a model that someone in compliance picked twelve months ago, and a chat surface that has not been wired up to the data they actually work with. Of course they route around it. The path of least resistance always wins.

The mindset shift here is to stop asking 'why are people not using our AI tool' and start asking 'what does the sanctioned tool need to do to be ten seconds faster than ChatGPT for this specific task.' If the enterprise stack is not faster, easier, and at least as accurate as the free thing on the user's phone, you are paying a vendor to lose a competition you do not realize you are in.

Resistance to paid tools is rarely about money

Leadership tends to interpret resistance to corporate AI tools as a budget objection from the team. It is almost never about budget. It is about three things, in roughly this order: the personal tool already works, the corporate tool feels surveilled, and the corporate tool forces the user to learn a new pattern when they have already learned a working one.

The 'feels surveilled' piece is the most underrated. People will tolerate a lot of friction in a tool they trust to be private. They will tolerate almost none in a tool they suspect is reading their drafts and reporting them back to their manager. If the rollout messaging leans on monitoring, governance, and analytics, expect quiet resistance regardless of how good the tool actually is. If the messaging leans on 'we bought you a faster, better-integrated version of the thing you are already using,' expect adoption.

I have watched companies fix this in a single week by changing two things: the launch comms and the default privacy settings. Same tool, same vendor contract, completely different curve.

Training gaps are not about features

The training most enterprises offer for AI tools is feature training. Here is the login. Here is the chat box. Here is how to upload a file. Here is the prompt library nobody opens. This is the wrong training. The interface is not the hard part.

The hard part is judgement: when to trust an output, how to phrase a request so it returns something usable, when to verify against a source, when to push back on the model, what categories of work are worth handing to AI and what categories are not. This is closer to apprenticeship than software training, and almost no enterprise budgets for it. The result is a workforce that has the tool but does not know how to use it well, produces mediocre output, concludes the model is not very good, and quietly reverts to the old way.

The companies that close this gap do unglamorous things. They run weekly working sessions where a team brings real work and a strong internal AI user shows them what a faster path would have looked like. They post recorded examples internally. They keep a Slack channel of 'prompts that actually worked, with screenshots.' They put practice time on calendars. None of it is exciting. All of it is what actually moves the usage curve. I covered the team-level version of this discipline in Prompt Engineering at Scale.

Heavy users hit 5x. Average users hit nothing. Both are in the same room.

Even inside companies that have cleared the free-tools and training problems, a second pattern runs underneath. AI productivity inside a workforce is not normally distributed. A small minority become genuinely faster and better with the tools. The middle is roughly where they started. The bottom is, if anything, slightly slower.

Writer's 2026 enterprise survey reported that AI super-users hit 5x productivity gains while only 29 percent of organizations see meaningful overall ROI. The math only works if most users are well below the average. The BCG / Harvard / Wharton 'Jagged Frontier' experiment (Dell'Acqua et al., 2023, 758 BCG consultants) made the mechanic concrete: GPT-4 users completed 12 percent more tasks, 25 percent faster, with 40 percent higher quality, but only inside the model's capability boundary. Outside it, they were worse than the control group. Whether someone benefits depends almost entirely on whether they have learned where the boundary is.

This is the dynamic that breaks the leadership conversation about AI. The deck shows average usage, the line looks flat, and the program gets called underperforming. Meanwhile the super-users are quietly doing the work of three people while the average user is doing the work of one. Splitting those two cohorts in the data is a one-meeting fix almost no enterprise has done, and the harder follow-up is turning what the super-users have learned into the training the average user is missing.

AI adoption is not AI usage, and most dashboards measure the wrong one

These four patterns add up to the distinction that I think every leadership team running an AI program needs to hold in mind. Adoption metrics measure procurement success. Usage metrics measure the change in the work itself.

A company can be 100 percent adopted and 5 percent used. That is most of what I see. The board deck shows full Copilot rollout, full enterprise Claude rollout, full coverage on whichever agent platform was chosen. The actual rate at which the work is being done with AI sits unmeasured because no one wants to know the answer. The moment a program starts measuring usage rather than adoption, the conversation shifts immediately onto the four levers that actually move it: friction in the sanctioned tool relative to the free alternative, depth of training, fit of the workflow integration, and incentives for the team to share what is working.

Until that switch happens, the AI line item on the budget is real and most of the AI work is imaginary.

💜The Cleanest Test for Whether Your AI Program Is Working

Pick one workflow your team does every week. Ask, without warning, what fraction of the people doing it last week used the company-paid AI tool versus a personal free tool versus no AI at all. If you can't answer, you are measuring adoption, not usage. If the personal-tool number is bigger than the sanctioned-tool number, the program is leaking value to vendors you do not pay.

Three more mindsets that show up in the same stalled programs

The dynamics above are the ones that move the usage number directly. The mindsets below are the underlying beliefs that produce them. None are about the model.

1. Treating AI as an IT project, not an operating-model change

The fastest way to kill an AI program is to assign it to the team that ships infrastructure tickets. They will pick a vendor, run a procurement cycle, deploy a chat box, and report success on adoption metrics that have nothing to do with the business outcome.

Real AI integration changes how a customer support agent triages cases, how a finance analyst prepares a forecast, how a sales rep researches an account, how an engineer reviews code. Those are operating-model changes. They require workflow design, role redefinition, and incentive realignment. None of that is in the IT project plan.

The companies I see succeeding treat AI as a redesign of the unit of work itself. The companies I see failing treat it as a tool deployment.

2. The pilot purgatory mindset

Pilots are not the problem. Pilots that exist to defer the decision are the problem.

In my experience, an AI pilot that has run longer than a quarter without a written commitment about what 'good' looks like has stopped being a pilot. It has become organizational permission to keep evaluating instead of acting. The tools rotate. The vendor list grows. Nothing ships. MIT's data here is brutal: midmarket firms scale AI in roughly 90 days, while enterprises take nine months or more, and most never escape the evaluation phase.

The mindset shift is a small one. Decide before you start: what would have to be true for us to roll this out, and what would have to be true for us to kill it. Put both on a single page. Re-read it monthly. The companies that do this stop running pilots and start running programs.

3. Fear of being the person blamed for an AI mistake

This one is rarely on the slide deck and almost always present. Decision-makers throughout the org are quietly reluctant to put AI in front of a customer or a regulator because if it goes wrong, the failure attaches to them personally in a way that a human-made mistake would not.

This is not irrational. It is the predictable response to a culture that has not yet decided how it will hold people accountable for AI-assisted decisions. Until that question is answered out loud, the most rational individual move is to keep AI in the demo lane and let someone else go first.

The fix is policy, not pep talks. Companies that get past this articulate, in writing, what kinds of decisions AI is allowed to make alone, what kinds require human review, what 'review' actually means, and who carries the accountability when something goes wrong. Once that is written down, people stop hedging and start shipping. I covered the implementation side of this in The Year of Agentic Development, where the same accountability question shows up at the agent level.

⚠️Shadow AI Is the Symptom, Not the Disease

Writer's 2026 survey found that 29 percent of employees (44 percent of Gen Z) admit to actively undermining their company's AI strategy by routing around sanctioned tools. Eighty percent of Gen Z employees say they trust AI more than their managers. Shadow AI is what happens when the workforce is more confident in the technology than the leadership is in its own program. Banning the personal tools does not fix it. Closing the credibility gap does.

What the companies that are getting it right actually do differently

The good news in all this data is that the gap between high-performers and the rest is now legible. The successful programs share a small number of mindset shifts, and they are imitable.

They define the problem before they pick the tool

Every failed program I have reviewed started with 'we should use AI for X.' Every successful one started with 'X is broken in a specific way and here is the cost of it being broken.' Once the cost of the current state is on the table, the question of whether AI is the right intervention becomes a real question rather than a foregone conclusion. Sometimes the answer is no, and the program saves the budget. Sometimes the answer is yes, and the program ships because the success criterion was decided before the tool was bought.

They invest in the people who will use it, not the people who will procure it

In the successful programs, the budget line for training, prompt design, internal evangelists, and time-to-experiment is roughly comparable to the budget line for software licenses. In the failed programs, training is an afterthought funded by whatever is left after the platform contract is signed.

The asymmetry matters because the marginal value of a model is now low and the marginal value of a workforce that knows how to use it is very high. Models are commodities. Internal AI fluency is not.

They measure usage, not seats

A productivity metric that counts seats, logins, or queries is a vanity metric. The companies getting real returns measure cycle time on a specific workflow, error rate on a specific output, or a customer-facing quality metric the AI program is supposed to influence. They publish the numbers internally and kill the workstreams that do not move them. Most enterprises do neither. The high-performers reallocate fast and stop the rest from drifting.

They make AI accountability legible

Successful programs answer four questions in writing: which decisions AI can make alone, which require a human reviewer and what that reviewer is responsible for, how errors are surfaced and learned from, and who owns the outcome when AI gets it wrong. Once those four answers exist, individual contributors stop hedging. The hedging is what was killing the program. The clarity is what unblocks it. I covered the agent-level version of the same accountability question in Why AI-Built Applications Keep Shipping Broken.

💜The One Question That Predicts Whether It Will Work

When I assess an AI program now, I ask one question: who in this organization can describe, in plain language, what 'better' will look like six months from now and how we will know we got there. If no one can answer, the program is going to stall regardless of which tools it buys. If two or three people can answer with the same words, the program will probably ship.

The honest reframe

I am not arguing the tools do not matter. They do. Capability differences between models are real, and the platform shapes what is possible. What I am arguing is that the difference between a successful AI program and a failed one is now almost never explained by the tool choice. Two companies running the same model, on the same data, with the same vendors will get radically different outcomes because one has done the work to redesign how the work happens and the other has not.

That redesign is the hard part. It does not show up in a vendor demo, the consultancy's slide template glosses over it, and the board is least patient with it. It is also the part that compounds: once an organization has rewritten one workflow well, it gets faster at rewriting the next.

The companies quietly winning right now are not the ones with the most expensive AI stack. They are the ones whose people have learned to think differently about their work, whose sanctioned tools beat the free ones for the workflow that matters, and who have stopped confusing the procurement number with the usage number. Everything else, including the choice of tools, follows from that.