{"id":1020,"date":"2026-05-18T14:04:48","date_gmt":"2026-05-18T13:04:48","guid":{"rendered":"https:\/\/metrics.blogg.gu.se\/?p=1020"},"modified":"2026-05-13T14:09:29","modified_gmt":"2026-05-13T13:09:29","slug":"the-synthetic-engineer-measuring-the-real-impact-of-ai-on-software-delivery","status":"publish","type":"post","link":"https:\/\/metrics.blogg.gu.se\/?p=1020","title":{"rendered":"The Synthetic Engineer: Measuring the Real Impact of AI on Software Delivery"},"content":{"rendered":"\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/metrics.blogg.gu.se\/files\/2025\/01\/women-697928_1920.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"614\" data-id=\"944\" src=\"https:\/\/metrics.blogg.gu.se\/files\/2025\/01\/women-697928_1920-1024x614.jpg\" alt=\"\" class=\"wp-image-944\" srcset=\"https:\/\/metrics.blogg.gu.se\/files\/2025\/01\/women-697928_1920-1024x614.jpg 1024w, https:\/\/metrics.blogg.gu.se\/files\/2025\/01\/women-697928_1920-300x180.jpg 300w, https:\/\/metrics.blogg.gu.se\/files\/2025\/01\/women-697928_1920-768x461.jpg 768w, https:\/\/metrics.blogg.gu.se\/files\/2025\/01\/women-697928_1920-1536x922.jpg 1536w, https:\/\/metrics.blogg.gu.se\/files\/2025\/01\/women-697928_1920-1200x720.jpg 1200w, https:\/\/metrics.blogg.gu.se\/files\/2025\/01\/women-697928_1920-1320x792.jpg 1320w, https:\/\/metrics.blogg.gu.se\/files\/2025\/01\/women-697928_1920.jpg 1920w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/a><\/figure>\n<\/figure>\n\n\n\n<p><a href=\"https:\/\/miroslawstaron.github.io\/hallucinations.html#\/5\">https:\/\/miroslawstaron.github.io\/hallucinations.html#\/5<\/a><\/p>\n\n\n\n<p>The shift from manual coding to AI-augmented orchestration is no longer a future &#8211; it is a reality. Software engineers adopt AI increasingly often and increasingly deep.<\/p>\n\n\n\n<p>However, as organizations pour investment into Generative AI tools, a critical question remains: <strong>How do we measure the true return on investment?<\/strong><\/p>\n\n\n\n<p>I asked Gemini to analyze the DORA report and look at the internet to find how people measure AI adoption. Its report, <em>Evaluating the Synthetic Engineer<\/em>, suggests that we must move beyond vanity metrics like &#8220;lines of code generated.&#8221; When code generation is cheap, we need to think about the adoption and design.<\/p>\n\n\n\n<p>I&#8217;ve recently heard that one company paid an eqiovalent of three software engineers worth of tokens to Anthropic, for a seven-person team. This means that effectively, 30% of the entire team (3+7) was AI. This is really cool and it shows that this reality is here. How do we measure that these tokens were not just wasted, though?<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The Velocity-Quality Tension<\/h3>\n\n\n\n<p>The most immediate effect of AI is a spike in velocity. Teams often see a <strong>15\u201325% reduction in Cycle Time<\/strong> and significantly accelerated onboarding\u2014reducing the &#8220;Time to 10th PR&#8221; from 91 days to just 33.<\/p>\n\n\n\n<p>However, this speed comes with a hidden cost: <strong>Comprehension Debt<\/strong>. The report highlights that AI-assisted code often results in higher defect density and a rework rate that can double the human baseline. To manage this, we must align AI metrics with the industry-standard <strong>DORA metrics<\/strong> to ensure that speed doesn&#8217;t break the system.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Integrated Metrics Framework<\/h3>\n\n\n\n<p>To truly evaluate the AI organizations should track a mix of telemetry-based system data and survey-based human sentiment.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><th>Category<\/th><th>Metric<\/th><th>Measurement Source \/ Context<\/th><\/tr><tr><td><strong>DORA (System)<\/strong><\/td><td>Deployment Frequency<\/td><td>CI\/CD Pipeline \/ Release logs<\/td><\/tr><tr><td><strong>DORA (System)<\/strong><\/td><td>Lead Time for Changes<\/td><td>Version Control \/ Deployment logs<\/td><\/tr><tr><td><strong>DORA (System)<\/strong><\/td><td>Change Failure Rate<\/td><td>Incident Management \/ CI\/CD logs<\/td><\/tr><tr><td><strong>DORA (System)<\/strong><\/td><td>Recovery Time (MTTR)<\/td><td>Incident Management \/ Pager logs<\/td><\/tr><tr><td><strong>AI Use<\/strong><\/td><td>Acceptance Rate<\/td><td>IDE Plugin Telemetry<\/td><\/tr><tr><td><strong>AI Use<\/strong><\/td><td>AI Interaction Time<\/td><td>Tool Telemetry \/ Browser logs<\/td><\/tr><tr><td><strong>AI Effect<\/strong><\/td><td>Rework Rate<\/td><td>Jira \/ Commit history<\/td><\/tr><tr><td><strong>Human<\/strong><\/td><td>Trust &amp; Reliance<\/td><td>Developer Surveys (Confidence in AI)<\/td><\/tr><tr><td><strong>Human<\/strong><\/td><td>Job Satisfaction<\/td><td>Developer Surveys (Burnout vs. Flow)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Now, we can compare that to the DORA metrics that are used widely in industry today. There, we have two parts, the telemetry based ones: <\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>Metric<\/td><td>Definition<\/td><td>Measurement Source<\/td><\/tr><tr><td>Deployment Frequency<\/td><td>How often the team successfully releases to production.<\/td><td>CI\/CD Pipeline \/ Release logs<\/td><\/tr><tr><td>Lead Time for Changes<\/td><td>Time from code commit to code successfully running in production.<\/td><td>Version Control \/ Deployment logs<\/td><\/tr><tr><td>Change Failure Rate<\/td><td>% of deployments causing a failure in production (requiring a fix\/rollback).<\/td><td>Incident Management \/ CI\/CD logs<\/td><\/tr><tr><td>Failed Deployment Recovery Time<\/td><td>How long it takes to restore service after a failure in production.<\/td><td>Incident Management \/ Pager logs<\/td><\/tr><tr><td>Rework Rate<\/td><td>The percentage of work time spent on unplanned fixes or bugs.<\/td><td>Ticket tracking (Jira) \/ Commit history<\/td><\/tr><tr><td>Acceptance Rate<\/td><td>The ratio of AI-generated code suggestions that are actually kept in the file.<\/td><td>IDE Plugin Telemetry<\/td><\/tr><tr><td>Commit\/PR Volume<\/td><td>The raw count of code changes and pull requests submitted.<\/td><td>Version Control Systems (VCS)<\/td><\/tr><tr><td>AI Interaction Time<\/td><td>The actual duration of time spent interacting with an AI interface.<\/td><td>Tool Telemetry \/ Browser logs<\/td><\/tr><tr><td>Code Stability<\/td><td>The frequency of breaks or regressions in the automated test suite.<\/td><td>Testing Frameworks \/ Build logs<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>And then the ones that are measuring perceptions, based on surveys:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>Metric<\/td><td>Definition<\/td><td>Context for Use<\/td><\/tr><tr><td>Trust<\/td><td>The degree of confidence a developer has in the accuracy and safety of AI output.<\/td><td>To identify if developers are &#8220;blindly&#8221; following AI or if skepticism is hindering adoption.<\/td><\/tr><tr><td>Reflexive Use<\/td><td>How instinctively a developer turns to AI when a new problem arises.<\/td><td>To measure the behavioral shift in problem-solving habits.<\/td><\/tr><tr><td>Reliance<\/td><td>The self-assessed level of dependency on AI tools to complete daily work.<\/td><td>To monitor for potential skill atrophy or high-dependency risks.<\/td><\/tr><tr><td>Individual Effectiveness<\/td><td>Perceived productivity, impact on the organization, and ability to stay &#8220;in flow.&#8221;<\/td><td>To assess the &#8220;value-add&#8221; from the developer&#8217;s own perspective.<\/td><\/tr><tr><td>Job Satisfaction<\/td><td>The level of fulfillment and contentment a developer feels in their role.<\/td><td>To ensure that AI automation is improving work life rather than creating &#8220;toil.&#8221;<\/td><\/tr><tr><td>Burnout<\/td><td>Physical or mental exhaustion caused by work-related stress.<\/td><td>To monitor if the increased &#8220;instability&#8221; caused by AI is taxing the team.<\/td><\/tr><tr><td>Personal Ownership<\/td><td>The psychological feeling of &#8220;owning&#8221; the code and its quality.<\/td><td>To prevent the dilution of accountability when AI generates a high volume of code.<\/td><\/tr><tr><td>User-Centric Focus<\/td><td>The extent to which the team prioritizes end-user needs in their workflow.<\/td><td>Used as a &#8220;multiplier&#8221; to see if AI speed is being directed at the right goals.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>I recommend picking out some of these metrics and sticking to them. I personally prefer telemetry-based metrics because they provide more value than filling out a survey. Survey-based metrics should be used sparingly, as they provide more of a temperature reading for an organization.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/miroslawstaron.github.io\/hallucinations.html#\/5 The shift from manual coding to AI-augmented orchestration is no longer a future &#8211; it is a reality. Software engineers adopt AI increasingly often and increasingly deep. However, as organizations pour investment into Generative AI tools, a critical question remains: How do we measure the true return on investment? I asked Gemini to analyze &hellip; <a href=\"https:\/\/metrics.blogg.gu.se\/?p=1020\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;The Synthetic Engineer: Measuring the Real Impact of AI on Software Delivery&#8221;<\/span><\/a><\/p>\n","protected":false},"author":68,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[],"_links":{"self":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts\/1020"}],"collection":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/users\/68"}],"replies":[{"embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1020"}],"version-history":[{"count":1,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts\/1020\/revisions"}],"predecessor-version":[{"id":1022,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts\/1020\/revisions\/1022"}],"wp:attachment":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1020"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1020"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1020"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}