The Synthetic Engineer: Measuring the Real Impact of AI on Software Delivery

https://miroslawstaron.github.io/hallucinations.html#/5

The shift from manual coding to AI-augmented orchestration is no longer a future – it is a reality. Software engineers adopt AI increasingly often and increasingly deep.

However, as organizations pour investment into Generative AI tools, a critical question remains: How do we measure the true return on investment?

I asked Gemini to analyze the DORA report and look at the internet to find how people measure AI adoption. Its report, Evaluating the Synthetic Engineer, suggests that we must move beyond vanity metrics like “lines of code generated.” When code generation is cheap, we need to think about the adoption and design.

I’ve recently heard that one company paid an eqiovalent of three software engineers worth of tokens to Anthropic, for a seven-person team. This means that effectively, 30% of the entire team (3+7) was AI. This is really cool and it shows that this reality is here. How do we measure that these tokens were not just wasted, though?

The Velocity-Quality Tension

The most immediate effect of AI is a spike in velocity. Teams often see a 15–25% reduction in Cycle Time and significantly accelerated onboarding—reducing the “Time to 10th PR” from 91 days to just 33.

However, this speed comes with a hidden cost: Comprehension Debt. The report highlights that AI-assisted code often results in higher defect density and a rework rate that can double the human baseline. To manage this, we must align AI metrics with the industry-standard DORA metrics to ensure that speed doesn’t break the system.

Integrated Metrics Framework

To truly evaluate the AI organizations should track a mix of telemetry-based system data and survey-based human sentiment.

CategoryMetricMeasurement Source / Context
DORA (System)Deployment FrequencyCI/CD Pipeline / Release logs
DORA (System)Lead Time for ChangesVersion Control / Deployment logs
DORA (System)Change Failure RateIncident Management / CI/CD logs
DORA (System)Recovery Time (MTTR)Incident Management / Pager logs
AI UseAcceptance RateIDE Plugin Telemetry
AI UseAI Interaction TimeTool Telemetry / Browser logs
AI EffectRework RateJira / Commit history
HumanTrust & RelianceDeveloper Surveys (Confidence in AI)
HumanJob SatisfactionDeveloper Surveys (Burnout vs. Flow)

Now, we can compare that to the DORA metrics that are used widely in industry today. There, we have two parts, the telemetry based ones:

MetricDefinitionMeasurement Source
Deployment FrequencyHow often the team successfully releases to production.CI/CD Pipeline / Release logs
Lead Time for ChangesTime from code commit to code successfully running in production.Version Control / Deployment logs
Change Failure Rate% of deployments causing a failure in production (requiring a fix/rollback).Incident Management / CI/CD logs
Failed Deployment Recovery TimeHow long it takes to restore service after a failure in production.Incident Management / Pager logs
Rework RateThe percentage of work time spent on unplanned fixes or bugs.Ticket tracking (Jira) / Commit history
Acceptance RateThe ratio of AI-generated code suggestions that are actually kept in the file.IDE Plugin Telemetry
Commit/PR VolumeThe raw count of code changes and pull requests submitted.Version Control Systems (VCS)
AI Interaction TimeThe actual duration of time spent interacting with an AI interface.Tool Telemetry / Browser logs
Code StabilityThe frequency of breaks or regressions in the automated test suite.Testing Frameworks / Build logs

And then the ones that are measuring perceptions, based on surveys:

MetricDefinitionContext for Use
TrustThe degree of confidence a developer has in the accuracy and safety of AI output.To identify if developers are “blindly” following AI or if skepticism is hindering adoption.
Reflexive UseHow instinctively a developer turns to AI when a new problem arises.To measure the behavioral shift in problem-solving habits.
RelianceThe self-assessed level of dependency on AI tools to complete daily work.To monitor for potential skill atrophy or high-dependency risks.
Individual EffectivenessPerceived productivity, impact on the organization, and ability to stay “in flow.”To assess the “value-add” from the developer’s own perspective.
Job SatisfactionThe level of fulfillment and contentment a developer feels in their role.To ensure that AI automation is improving work life rather than creating “toil.”
BurnoutPhysical or mental exhaustion caused by work-related stress.To monitor if the increased “instability” caused by AI is taxing the team.
Personal OwnershipThe psychological feeling of “owning” the code and its quality.To prevent the dilution of accountability when AI generates a high volume of code.
User-Centric FocusThe extent to which the team prioritizes end-user needs in their workflow.Used as a “multiplier” to see if AI speed is being directed at the right goals.

I recommend picking out some of these metrics and sticking to them. I personally prefer telemetry-based metrics because they provide more value than filling out a survey. Survey-based metrics should be used sparingly, as they provide more of a temperature reading for an organization.

From Gutenberg to Google and on to AI

Link to the book

I’m often asked what invention I think is the biggest in human history. I do not have one that is the biggest, but I have a short list:

1) Writing – once we learned how to codify knowledge, our progress accelerated tremendously

2) Computing – once we learned how to make complex calculations fast, we started to achieve the impossible – going to the Moon, communicating over the Internet, just to name a few.

3) AI – when we learned how to utilize advanced calculations to simulate intelligence, humanity achieved new heights

This book takes us through that kind of journey. It does add a few more steps, like the invention of binary calculations, the Internet, Google, etc., but in essence, it does follow the same pattern.

What the book does not cover, and what I often wonder about, is the invention of the compiler. Compilers, especially for higher-level programming languages like C, provided the abstraction needed to decouple the nitty-gritty details of computer architectures from the problems we want to solve.

We see a similar development today with LLMs and Agentic AI. It decouples the details of programs from the intents and requirements of the user. We do not need to know anything about programming to create software that does things for us. Product owners can create prototypes, requirements engineers can test their hypotheses, testers can ensure that they do not miss important corner cases – the examples can be multiplied, and that’s just software engineering.

This does not mean that software engineering is solved, as Nvidia’s CEO put it, it means that it has changed. It’s probably the most fun time to be a software engineer as we can start solving really difficult questions without the need to lose time for details of the implementations. We also need the knowledge how to design systems based on AI – how to engineer them (BTW: if you are interested in this, here is my latest book that will help you: Link).

I recommend Tom Wheeler’s book to anyone interested in the story of how we invented AI in the first place.

VECS 2026 — The Era of the AI-Defined Vehicle

The VECS 2026 conference in Gothenburg has made one thing clear: the transition to Software-Defined Vehicles (SDVs) is no longer a future prediction—it is accelerating rapidly toward total market dominance. I’ve been to both days and it seems that the best time for software is NOW! For a nerdy software engineer like me, this conference provided a glimpse of the future where software defines everything, AI – yes, but complemented with a lot of good-old-fashion programming, guardrails and similar.

My Key Takeaways from the Conference:

  • Rapid Market Evolution: While current volumes are relatively low, the global SDV share is projected to jump from 14% in 2025 to 46% by 2035. Similarly, Zonal Architectures are expected to grow from a 5% share today to 40% by 2035.
  • The Rise of Middleware: Middleware is emerging as a critical control point for OEMs. To shorten time-to-market and maintain control over software platforms, OEMs are now partnering to develop joint middleware solutions rather than relying on fragmented supplier systems.
  • China as a Catalyst: The fast pace of Chinese automakers is a primary driver for global change, pushing the industry toward “AI-defined mobility” and the integration of edge AI models. Notably, over 20 OEMs integrated DeepSeek within weeks of its release.
  • The “Software Factory”: Industry leaders like Alwin Bakkenes emphasized that profitability in the electric vehicle sector requires extreme process optimization. This is being achieved through “Software Factories”—modern development concepts where source code is integrated with digital twins for virtual testing and exploration.
  • Hardware Innovation: To control AI workloads, OEMs are increasingly designing their own chips and moving toward 2nd Generation Zonal Architectures, such as the one powering the upcoming Volvo EX60.

The message from VECS 2026 is certain: for the automotive industry to thrive, it must embrace a “machine that builds the machine” philosophy, prioritizing high-performance computing and seamless software integration.