The Teams Winning With AI Are The Ones Who Trust It Least

I’ll research current data on AI-assisted development before writing.I have strong data. Let me get a couple more current data points on AI adoption and the “false sense of security” / productivity perception gap.I have everything I need. Here’s the article.

The best AI users I know are also the most suspicious of it

Walk into any engineering team that ships fast and ships clean, and you will find people who use AI constantly and trust it almost not at all. They prompt it dozens of times a day. They also assume, by default, that what comes back is wrong until proven otherwise. That contradiction is the whole game.

The losing pattern is the opposite. Someone reads a headline about AI writing software, fires up a tool, accepts the suggestions, and ships. It feels fast. It feels like leverage. Then the bug reports start, the rework piles up, and the speed turns out to have been a loan with brutal interest.

The data backing this up is no longer thin. Trust and skill have decoupled from each other, and the teams who understand why are the ones pulling ahead.

The professionals using AI most trust it least, and that is not a coincidence

Start with the numbers, because they are blunt. Adoption is now near-universal while confidence is collapsing. Stack Overflow’s 2025 Developer Survey, drawn from more than 49,000 developers, found that 84% say they use or plan to use AI tools in their development process, up from 76% in 2024, but 46% of developers said they don’t trust the accuracy of the output from AI tools, a significant increase from 31% last year. Only a sliver feel strongly positive: more developers actively distrust the accuracy of AI tools (46%) than trust it (33%), while only 3% report that they “highly trust” the output.

Here is the part that matters most for anyone making decisions about how their team should work. The most experienced people are the most sceptical. Experienced developers are the most cautious, with the lowest “highly trust” rate (2.6%) and the highest “highly distrust” rate (20%), indicating a widespread need for human verification for those in roles with accountability.

Read that again. The people who carry accountability for what ships trust the machine the least. That is not technophobia. It is calibration. They have seen enough output to know exactly where it breaks.

“Almost right” is the expensive failure mode

The frustration is not that AI cannot code. It is that it produces something convincing and subtly broken. In the same survey, 66% of respondents selected “AI solutions that are almost right, but not quite” as a problem they encounter. And the cleanup is not free: another 45.2% said debugging AI-generated code is more time-consuming.

This is the difference between a tool that fails loudly and one that fails quietly. A compiler error stops you. A plausible-looking function that mishandles an edge case sails through, gets merged, and surfaces three weeks later in production. The cost did not disappear. It moved downstream, where it is harder to trace and more expensive to fix.

The productivity feeling is not the productivity fact

The most uncomfortable study on this came from METR in July 2025, and it is worth sitting with because it punctures the thing everyone assumes. METR ran a randomised controlled trial, the gold-standard method, on seasoned open-source developers working on their own mature codebases. After completing the study, developers estimated that allowing AI reduced completion time by 20%. Surprisingly, the researchers found that allowing AI actually increases completion time by 19%, AI tooling slowed developers down.

The forecasts were even more wrong than the participants. The slowdown also contradicted predictions from experts in economics (39% shorter) and ML (38% shorter). And critically, developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.

Sit with that gap. People were slower, and they could not feel it. They walked away convinced of the opposite of what happened. This is the trap. If your team cannot tell whether AI is helping or hurting from how it feels, then “it feels faster” is worthless as a management signal.

Two caveats before anyone overreads this. The study used a small group of experts on large, mature codebases with high quality bars, which is exactly the setting where AI struggles most and where a senior developer’s existing knowledge is hardest to beat. And METR itself noted the picture keeps moving; their earlier paper found AI caused a slowdown using data from February to June 2025, and they started a new experiment in August 2025 with a larger pool of developers using the latest tools. The point is not that AI always slows people down. The point is that your gut is a broken instrument here, and you need measurement instead.

Speed without review is a debt machine

The clearest mechanism behind the rework problem is what happens to a codebase when AI starts writing large chunks of it unsupervised. GitClear analysed a huge body of real code and found the warning signs. They examined approximately 153 million changed lines of code authored between January 2020 and December 2023. The headline finding should worry anyone responsible for a long-lived product: code churn, the percentage of lines that are reverted or updated less than two weeks after being authored, is projected to double in 2024 compared to its 2021, pre-AI baseline.

Churn is a polite word for code that was wrong almost immediately. On top of that, they found the composition of code shifting in an unhealthy direction. The percentage of added code and copy/pasted code is increasing in proportion to updated, deleted, and moved code. In this regard, AI-generated code resembles an itinerant contributor, prone to violate the DRY-ness of the repos visited.

That image is exactly right. AI defaults to writing more code, not better code. It adds. It rarely consolidates, refactors, or deletes. Left unchecked, you get a codebase that grows faster than anyone can hold in their head, full of near-duplicate logic that all has to be maintained forever. The lines went in fast. Someone still has to own the mess.

In our delivery, this is the line we hold hardest. AI drafts, a senior developer owns the result. The draft can come from anywhere. Accountability cannot be delegated to a model.

The security problem is the one that gets you sued

Quality debt is expensive. Security debt can be existential, and AI introduces it quietly. A study analysing GitHub Copilot’s output found that 40% of the suggestions in the relevant context contained security-related bugs, mapped to CWE classifications from MITRE. Independent benchmarking tells the same story from a different angle. Veracode, citing the BaxBench evaluation, reported that 41 to 62% of AI-generated code contains security vulnerabilities, and that even with extensive prompting, LLM-generated code is often either insecure or incorrect.

The mechanism is not mysterious. AI training data includes millions of public repositories where developers committed credentials, and the models learn to reproduce these patterns as “normal” code. The model learned from the average of public code, and the average of public code is not secure.

The most dangerous finding is psychological. A user study referenced across the security literature showed that participants using AI assistants wrote significantly less secure code and exhibited a false sense of security, often rating their insecure solutions as secure. The tool makes you more confident and less safe at the same time. That combination is how breaches happen.

This is why we draw a hard line by industry. For a marketing site or an internal tool, an AI-introduced flaw is an annoyance. For a fintech platform, a healthcare app handling patient data, or anything touching payments and regulated information, blind trust in generated code is not a productivity strategy. It is a liability you have not priced. The review step is not optional in those contexts; it is the whole job.

So why use AI at all?

Because the upside is real when it is aimed correctly. The same body of research that exposes the risks also shows where AI genuinely earns its place. GitHub’s own randomised controlled trial of 202 developers with at least five years of experience found that developers with Copilot access had a 53.2% greater likelihood of passing all 10 unit tests in the study, indicating it helped them write more functional code by a wide margin.

Take that with the appropriate salt, since it comes from the vendor. But it points at something true that we see daily. AI is excellent at the well-defined, low-stakes, high-volume work where being “mostly right” is a fine starting point and a human will finish the job anyway:

Scaffolding and boilerplate. Setting up the repetitive structure that you would otherwise type by hand.
First-draft test cases. Generating the obvious coverage so a developer can spend their attention on the edge cases the model will not think of.
Prototyping. Standing up a throwaway version to pressure-test an idea before committing real engineering to it.
Research and explanation. Summarising an unfamiliar library or API so you skip an hour of documentation diving, then verifying what it told you.
Documentation drafts. A large-scale repository analysis found widespread use of AI tools for documentation generation, 39% of collected files, an understudied application with implications for software maintainability.

The common thread is that a human stays in the loop with the power and the obligation to reject the output. The product leadership view in the Stack Overflow coverage put the working pattern plainly: the most successful teams use AI to draft test cases or scaffold documentation while retaining human review for complex refactors or high-stakes deployments.

The senior developer is worth more now, not less

The popular narrative says AI makes developers replaceable. The evidence says the opposite, and it is not subtle. When developers were asked to imagine a future where AI handles most coding, the top reason they would still ask another person for help was “when I don’t trust AI’s answers” (75%).

Think about what a model actually cannot do. It cannot tell whether output is right without a person who already knows the answer. It cannot hold the full context of why your system is built the way it is. It cannot weigh a security tradeoff against a business deadline. It cannot decide that the elegant solution is wrong because of a constraint that lives in someone’s head and was never written down. Those are senior skills, and AI raises their value because it generates so much more material that needs that exact judgement applied to it.

An old comment on this captured it well: AI tends to make good engineers better and weak engineers worse, because the strong ones consider the suggestions and reject the bad ones while the weak ones take everything. The differentiator is no longer who can type code fastest. It is who can tell, in seconds, that the confident-looking function is quietly broken.

What to do this week

Stop asking your team whether AI is making them faster. They cannot answer that reliably, and METR proved it. Instead, put real guardrails in place that assume the output is wrong until checked.

Three concrete moves you can make in the next few days:

Mandate human review on every AI-assisted commit, with a named owner. No generated code reaches main without a person who understands it putting their name to it. The reviewer owns the consequences, not the model.
Turn on the measurement you are currently flying without. Track code churn, the share of lines reverted or rewritten within two weeks. If that number is climbing, your AI usage is generating rework, not output, and you now have evidence instead of a feeling.
Run security scanning on generated code as a default, not an afterthought. Static analysis and dependency checking in the pipeline catch a meaningful share of the 40-plus percent vulnerability rate before it ships. Treat AI output as untrusted input, because that is what it is.

And draw your line by stakes. Let AI run loose on prototypes and throwaway work. Tighten the review noose hard around anything regulated, anything handling sensitive data, anything that integrates with systems you cannot afford to break. If you want a second opinion on where that line should sit for your particular product and risk profile, you can book your free discovery call and we will talk through it against your actual stack.

The teams winning with AI are not the ones who believe in it. They are the ones who use it relentlessly and check it ruthlessly. Trust is the liability. Verification is the edge.

Ready to get started?

Book Your Free Discovery Call →