SE metrics (Software Engineering) – Software engineering, metrics, functional safety …

From Gutenberg to Google and on to AI

I’m often asked what invention I think is the biggest in human history. I do not have one that is the biggest, but I have a short list:

1) Writing – once we learned how to codify knowledge, our progress accelerated tremendously

2) Computing – once we learned how to make complex calculations fast, we started to achieve the impossible – going to the Moon, communicating over the Internet, just to name a few.

3) AI – when we learned how to utilize advanced calculations to simulate intelligence, humanity achieved new heights

This book takes us through that kind of journey. It does add a few more steps, like the invention of binary calculations, the Internet, Google, etc., but in essence, it does follow the same pattern.

What the book does not cover, and what I often wonder about, is the invention of the compiler. Compilers, especially for higher-level programming languages like C, provided the abstraction needed to decouple the nitty-gritty details of computer architectures from the problems we want to solve.

We see a similar development today with LLMs and Agentic AI. It decouples the details of programs from the intents and requirements of the user. We do not need to know anything about programming to create software that does things for us. Product owners can create prototypes, requirements engineers can test their hypotheses, testers can ensure that they do not miss important corner cases – the examples can be multiplied, and that’s just software engineering.

This does not mean that software engineering is solved, as Nvidia’s CEO put it, it means that it has changed. It’s probably the most fun time to be a software engineer as we can start solving really difficult questions without the need to lose time for details of the implementations. We also need the knowledge how to design systems based on AI – how to engineer them (BTW: if you are interested in this, here is my latest book that will help you: Link).

I recommend Tom Wheeler’s book to anyone interested in the story of how we invented AI in the first place.

VECS 2026 — The Era of the AI-Defined Vehicle

The VECS 2026 conference in Gothenburg has made one thing clear: the transition to Software-Defined Vehicles (SDVs) is no longer a future prediction—it is accelerating rapidly toward total market dominance. I’ve been to both days and it seems that the best time for software is NOW! For a nerdy software engineer like me, this conference provided a glimpse of the future where software defines everything, AI – yes, but complemented with a lot of good-old-fashion programming, guardrails and similar.

My Key Takeaways from the Conference:

Rapid Market Evolution: While current volumes are relatively low, the global SDV share is projected to jump from 14% in 2025 to 46% by 2035. Similarly, Zonal Architectures are expected to grow from a 5% share today to 40% by 2035.
The Rise of Middleware: Middleware is emerging as a critical control point for OEMs. To shorten time-to-market and maintain control over software platforms, OEMs are now partnering to develop joint middleware solutions rather than relying on fragmented supplier systems.
China as a Catalyst: The fast pace of Chinese automakers is a primary driver for global change, pushing the industry toward “AI-defined mobility” and the integration of edge AI models. Notably, over 20 OEMs integrated DeepSeek within weeks of its release.
The “Software Factory”: Industry leaders like Alwin Bakkenes emphasized that profitability in the electric vehicle sector requires extreme process optimization. This is being achieved through “Software Factories”—modern development concepts where source code is integrated with digital twins for virtual testing and exploration.
Hardware Innovation: To control AI workloads, OEMs are increasingly designing their own chips and moving toward 2nd Generation Zonal Architectures, such as the one powering the upcoming Volvo EX60.

The message from VECS 2026 is certain: for the automotive industry to thrive, it must embrace a “machine that builds the machine” philosophy, prioritizing high-performance computing and seamless software integration.

Can You Trust GPT with Your System Design? Testing AI’s Architectural IQ

Image by Vinson Tan ( 楊祖武 ) from Pixabay

https://ieeexplore.ieee.org/document/10978937

We’ve all seen Large Language Models (LLMs) write impressive snippets of code or debug a tricky function. But can an AI actually understand the soul of a system? Can it explain the “why” behind a complex architectural decision?

The paper, “Do Large Language Models Contain Software Architectural Knowledge? An Exploratory Case Study with GPT,” puts this to the test. Researchers did a study with 14 software engineers to see if GPT could navigate the Architectural Knowledge (AK) of a massive, real-world system: the Hadoop Distributed File System (HDFS).

The Experiment: AI vs. The Ground Truth
Engineers grilled GPT with questions ranging from basic component identification to deep design rationales. Their answers were then compared against a verified “ground truth” of HDFS documentation.

The Results
The study revealed a fascinating dichotomy in GPT’s performance: Recall was ok: GPT is surprisingly good at “remembering” things. It showed moderate recall, meaning it could often identify the correct architectural components and general concepts buried in its training data. Precision was really bad (guessing is much better): It struggled with accuracy. The model often suffered from lower precision, frequently providing answers that sounded authoritative but were technically incorrect or “hallucinated.”

When asked about design rationales (why a specific solution was chosen) or quality attribute solutions, GPT’s performance dipped significantly. It can tell you what is there, but it struggles to explain the engineering trade-offs.

The Takeaway for Architects
The engineers in the study rated GPT’s trustworthiness as only moderate. The verdict is clear: GPT is a fantastic tool for initial discovery and brainstorming, but it cannot be used as a source of truth for critical system design.

The Bottom Line is to treat LLMs as junior architects with a photographic memory but a shaky grasp of logic. They are great for a first draft, but expert human validation remains the most important step in the process.

GenAI: The Architect’s New Brainstorming Buddy, Not a Replacement

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=11015085&casa_token=5hSfww3AlwIAAAAA:eTn9d1W-p95CJtxwAcvPft_bWZB9R8i6P-d1IZBln6MmSF-En1Q4vKdgbejF8w2klKZYeX1VZx4&tag=1

For years, software architects have operated in an “automation gap.” While developers enjoy robust CI/CD pipelines and automated testing, architects have largely relied on manual whiteboarding and expert intuition. With the rise of Generative AI (GenAI), many wonder: Is the gap finally closing?

In this paper, researchers provide a reality check. Their verdict? GenAI is a powerful “tutor” and “brainstormer,” but it isn’t ready to take the captain’s chair.

Where GenAI Shines

The study identifies a high “GenAI Fit” for tasks that are traditionally “loud” and creative. It excels at:

Brainstorming: Identifying potential stakeholders or generating design alternatives.
Drafting: Creating well-formed Architecturally Significant Requirements (ASRs) from raw notes.
Summarization: Condensing complex documentation into digestible views.

Where it does not fit!

However, the “gap” remains for high-fidelity tasks. GenAI struggles with objective analysis. It can’t reliably prioritize requirements, verify the correctness of architectural views, or resolve conflicting design decisions. These tasks require the subjective judgment and deep organizational context that only a human architect possesses.

The Future: Hybrid Workflows

The path forward isn’t replacing architects with bots; it’s about hybrid workflows. By pairing GenAI with traditional tools (like static analyzers) to fact-check its “hallucinations,” we can finally automate the tedious parts of architecting while leaving the critical, high-stakes decisions to the experts.

The Bottom Line: Use GenAI to widen your perspective and draft your docs, but keep your hands on the wheel when it comes to the “why” behind your system.

Is Your Microservice Architecture Causing Heartburn? The Cost of Static Chaos on Runtime Speed

BIld av stux från Pixabay

https://cs.gssi.it/catia.trubiani/download/2025-ICSA-Correlation-Architecture-Performance-Antipatterns.pdf

In the world of microservices, we often chase the dream of independent deployment, rapid scaling, and resilient services. We focus on the dynamic—the Kubernetes pods autoscaling, the latency spikes, the load balancer metrics. We assume that if we have a robust runtime, our architecture is sound.

But this study suggests we have been ignoring a crucial connection. We are too often treating the symptoms, not the disease.

The research team, using the massive Train Ticket benchmark system, decided to prove something architects have suspected for years: The way you draw your boxes and arrows directly dictates your application’s carbon footprint and response time.

They didn’t just guess; they used advanced tooling to quantify the chaos. By combining service call dependency mapping with Design Structure Matrices (DSM) that also tracked subtle entity-sharing (services talking behind each other’s backs via a shared database), they revealed invisible architectural decay. They matched static Architecture Antipatterns (e.g., “Cliques”—tightly clustered groups that must change together) against dynamic Performance Antipatterns (e.g., “Blobs”—services that become bottlenecks).

The results are a wake-up call for any DevOps team trying to scale a legacy monolith that’s masquerading as microservices.

A Roadmap to Technical Debt Management
The impact on practice is clear. This study validates that we must merge static and dynamic analysis. We cannot separate the “Dev” and “Ops.”

Stop Guessing: You cannot optimize what you cannot measure. Utilize tooling that visualizes both runtime traffic and structural dependencies.

Prioritize Refactoring: Performance monitoring based on real operational profiles tells you where the bottleneck is. Combining this with architecture analysis tells you why it is there and which structural repair will deliver the greatest performance ROI.

Green Your Code: Every redundant service call, every unneeded database join, and every “Chatty Service” antipattern is wasted energy. Good architecture is sustainable architecture.

It’s time to stop thinking that Kubernetes will save your tangled architecture. The next time you see a latency spike, don’t just add more pods. Check your blueprints. The fastest system is one that doesn’t have to do unnecessary work.

Ai improves productivity in the short term, without decreasing maintainability

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice – Codemanship’s Blog

https://arxiv.org/pdf/2507.00788

There is a lot of interest in Agentic AI and coding assistants, lots of hype, and lots of scare. This paper does a large-scale experiment on how much coding assistants really help. They look at 150 developers, and they find that AI helps in short-term productivity without any impact on maintainability.

In this video, they explain a lot of cool things and demystify the use of AI. They find that knowing what you want to do helps a lot when using AI agents – so, again, good programmers will be fantastic, while bad programmers will not have a chance.

Have a nice reading!

The close future of software engineering

https://www.arxiv.org/pdf/2601.10220

We’re witnessing a transformative shift in embedded software engineering as generative AI moves from a tool to an active participant in development pipelines. In our recent study, we explored how embedded software teams—especially in safety-critical and resource-constrained domains—are adapting to this change. Unlike conventional programming, embedded systems demand determinism, reliability, and traceability, attributes that stochastic, AI-generated artifacts can undermine.

Through qualitative interviews and structured brainstorming with senior engineers across four companies, we identified eleven emerging practices and fourteen challenges shaping generative AI adoption. Central to these practices is the concept of agentic pipelines—multi-agent continuous integration and delivery flows where generative agents collaborate across coding, compiling, testing, and validation. Key practices include designing AI-friendly artifacts, integrating compiler-in-the-loop feedback, and managing prompt repositories for auditability and consistency.

Equally important are governance and sustainability concerns. Teams emphasize human-in-the-loop supervision, formal governance frameworks, traceability of models and outputs, and workforce upskilling to responsibly harness AI automation. Our findings reveal that while generative AI offers substantial productivity gains, sustainable adoption in embedded systems hinges on balancing autonomy with accountability—without compromising safety or certification requirements.

What happens if you give a compiler to an LLM…

https://www.arxiv.org/abs/2601.12146

Large Language Models (LLMs) are now central to code generation, but they often produce non-compiling or incorrect programs. We investigate how giving an LLM direct access to a real compiler (gcc) transforms it from a passive code writer into an active programming agent.

We conduct an extensive experiment on 699 real programming tasks in C, using models from 135 M to 70 B parameters. With compiler feedback integrated into the generation loop, the LLMs dramatically improve: compilation success jumps by 5.3 – 79.4 percentage points, syntax errors drop ~75 %, and undefined references drop ~87 %.

Interestingly, smaller LLMs with compiler feedback can outperform larger models without this access, suggesting that tools like compilers can compensate for model size and reduce energy/compute costs in software applications.

Overall, the study highlights the role software engineering tools play in practical LLM deployment, pushing us toward more interactive, feedback-driven code agents rather than one-shot generators. It’s a promising step toward combining NLP models with existing development ecosystems for better accuracy and efficiency.

New year, new book!

Link to amazon

During the entire 2025, I’ve had a chance to get into details with programming of agents, LLMs, and what have you. Thanks to the fact that my role as pro-dean ended, I’ve been given a lot of time to do it.

My family has supported me a lot too. Without them, this would not be possible.

So, why did I even think about writing another book, one may wonder. Well, I’ve been asked by many students and colleagues on how to design good AI software. You, something that is beyond just hacking two lines of code together.

I’ve also organized several Hackathons where we learned how to create multi-agent systems and how to work with them. So, I decided it is time to document all my experiences and go deep on the software design. This book is the result of that. This is what the back cover says:

Engineering Generative-AI Based Software discusses both the process of developing this kind of AI-based software and its architectures, combining theory with practice. Sections review the most relevant models and technologies, detail software engineering practices for such systems, e.g., eliciting functional and non-functional requirements specific to generative AI, explore various architectural styles and tactics for such systems, including different programming platforms, and show how to create robust licensing models. Finally, readers learn how to manage data, both during training and when generating new data, and how to use generated data and user feedback to constantly evolve generative AI-based software. As generative AI software is gaining popularity thanks to such models as GPT-4 or Llama, this is a welcomed resource on the topics explored. With these systems becoming increasingly important, Software Engineering Professionals will need to know how to overcome challenges in incorporating GAI into the products and programs they develop.

Here is the link to the book repo: https://github.com/miroslawstaron/engineering_generative_ai_systems

If you want to play around with our agentic framework, here it is online too!

https://github.com/miroslawstaron/agenticAI

Quantum computing, or can we live on Mars?

Once in a while I get to read a book that has nothing to do with my field. It’s mostly for enjoyment. Not many know that I was an astronomy freak when I was a kid. Somewhere in the middle of my primary school, I read books about red dwarfs, black holes, distant galaxies, and even astrophysics. As I said, I was a freak. Then I discovered computers and my interests changed towards them.

In this book, the authors look at the possibility of creating live on Mars and on the Moon. They look at the physics, chemistry, and technology related to the planets. They’ve read hundreds of published articles, grey literature and even contacted scientists all over the globe.

They also speculate how we actually would govern space exploration. They scrutinize different laws and treaties that countries agreed to follow and they looked at what happens if there are no laws to follow. They draw parallels to how we govern uncharted territories on Earth and how we think about celestial bodies.

If you are looking for a holiday reading, it is a cool book to read. A bit long, but definitely nicely written, well-designed, and definitely well-prepared.