SE metrics (Software Engineering) – Page 2 – Software engineering, metrics, functional safety …

Can You Trust GPT with Your System Design? Testing AI’s Architectural IQ

Image by Vinson Tan ( 楊祖武 ) from Pixabay

https://ieeexplore.ieee.org/document/10978937

We’ve all seen Large Language Models (LLMs) write impressive snippets of code or debug a tricky function. But can an AI actually understand the soul of a system? Can it explain the “why” behind a complex architectural decision?

The paper, “Do Large Language Models Contain Software Architectural Knowledge? An Exploratory Case Study with GPT,” puts this to the test. Researchers did a study with 14 software engineers to see if GPT could navigate the Architectural Knowledge (AK) of a massive, real-world system: the Hadoop Distributed File System (HDFS).

The Experiment: AI vs. The Ground Truth
Engineers grilled GPT with questions ranging from basic component identification to deep design rationales. Their answers were then compared against a verified “ground truth” of HDFS documentation.

The Results
The study revealed a fascinating dichotomy in GPT’s performance: Recall was ok: GPT is surprisingly good at “remembering” things. It showed moderate recall, meaning it could often identify the correct architectural components and general concepts buried in its training data. Precision was really bad (guessing is much better): It struggled with accuracy. The model often suffered from lower precision, frequently providing answers that sounded authoritative but were technically incorrect or “hallucinated.”

When asked about design rationales (why a specific solution was chosen) or quality attribute solutions, GPT’s performance dipped significantly. It can tell you what is there, but it struggles to explain the engineering trade-offs.

The Takeaway for Architects
The engineers in the study rated GPT’s trustworthiness as only moderate. The verdict is clear: GPT is a fantastic tool for initial discovery and brainstorming, but it cannot be used as a source of truth for critical system design.

The Bottom Line is to treat LLMs as junior architects with a photographic memory but a shaky grasp of logic. They are great for a first draft, but expert human validation remains the most important step in the process.

GenAI: The Architect’s New Brainstorming Buddy, Not a Replacement

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=11015085&casa_token=5hSfww3AlwIAAAAA:eTn9d1W-p95CJtxwAcvPft_bWZB9R8i6P-d1IZBln6MmSF-En1Q4vKdgbejF8w2klKZYeX1VZx4&tag=1

For years, software architects have operated in an “automation gap.” While developers enjoy robust CI/CD pipelines and automated testing, architects have largely relied on manual whiteboarding and expert intuition. With the rise of Generative AI (GenAI), many wonder: Is the gap finally closing?

In this paper, researchers provide a reality check. Their verdict? GenAI is a powerful “tutor” and “brainstormer,” but it isn’t ready to take the captain’s chair.

Where GenAI Shines

The study identifies a high “GenAI Fit” for tasks that are traditionally “loud” and creative. It excels at:

Brainstorming: Identifying potential stakeholders or generating design alternatives.
Drafting: Creating well-formed Architecturally Significant Requirements (ASRs) from raw notes.
Summarization: Condensing complex documentation into digestible views.

Where it does not fit!

However, the “gap” remains for high-fidelity tasks. GenAI struggles with objective analysis. It can’t reliably prioritize requirements, verify the correctness of architectural views, or resolve conflicting design decisions. These tasks require the subjective judgment and deep organizational context that only a human architect possesses.

The Future: Hybrid Workflows

The path forward isn’t replacing architects with bots; it’s about hybrid workflows. By pairing GenAI with traditional tools (like static analyzers) to fact-check its “hallucinations,” we can finally automate the tedious parts of architecting while leaving the critical, high-stakes decisions to the experts.

The Bottom Line: Use GenAI to widen your perspective and draft your docs, but keep your hands on the wheel when it comes to the “why” behind your system.

Is Your Microservice Architecture Causing Heartburn? The Cost of Static Chaos on Runtime Speed

BIld av stux från Pixabay

https://cs.gssi.it/catia.trubiani/download/2025-ICSA-Correlation-Architecture-Performance-Antipatterns.pdf

In the world of microservices, we often chase the dream of independent deployment, rapid scaling, and resilient services. We focus on the dynamic—the Kubernetes pods autoscaling, the latency spikes, the load balancer metrics. We assume that if we have a robust runtime, our architecture is sound.

But this study suggests we have been ignoring a crucial connection. We are too often treating the symptoms, not the disease.

The research team, using the massive Train Ticket benchmark system, decided to prove something architects have suspected for years: The way you draw your boxes and arrows directly dictates your application’s carbon footprint and response time.

They didn’t just guess; they used advanced tooling to quantify the chaos. By combining service call dependency mapping with Design Structure Matrices (DSM) that also tracked subtle entity-sharing (services talking behind each other’s backs via a shared database), they revealed invisible architectural decay. They matched static Architecture Antipatterns (e.g., “Cliques”—tightly clustered groups that must change together) against dynamic Performance Antipatterns (e.g., “Blobs”—services that become bottlenecks).

The results are a wake-up call for any DevOps team trying to scale a legacy monolith that’s masquerading as microservices.

A Roadmap to Technical Debt Management
The impact on practice is clear. This study validates that we must merge static and dynamic analysis. We cannot separate the “Dev” and “Ops.”

Stop Guessing: You cannot optimize what you cannot measure. Utilize tooling that visualizes both runtime traffic and structural dependencies.

Prioritize Refactoring: Performance monitoring based on real operational profiles tells you where the bottleneck is. Combining this with architecture analysis tells you why it is there and which structural repair will deliver the greatest performance ROI.

Green Your Code: Every redundant service call, every unneeded database join, and every “Chatty Service” antipattern is wasted energy. Good architecture is sustainable architecture.

It’s time to stop thinking that Kubernetes will save your tangled architecture. The next time you see a latency spike, don’t just add more pods. Check your blueprints. The fastest system is one that doesn’t have to do unnecessary work.

Ai improves productivity in the short term, without decreasing maintainability

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice – Codemanship’s Blog

https://arxiv.org/pdf/2507.00788

There is a lot of interest in Agentic AI and coding assistants, lots of hype, and lots of scare. This paper does a large-scale experiment on how much coding assistants really help. They look at 150 developers, and they find that AI helps in short-term productivity without any impact on maintainability.

In this video, they explain a lot of cool things and demystify the use of AI. They find that knowing what you want to do helps a lot when using AI agents – so, again, good programmers will be fantastic, while bad programmers will not have a chance.

Have a nice reading!

The close future of software engineering

https://www.arxiv.org/pdf/2601.10220

We’re witnessing a transformative shift in embedded software engineering as generative AI moves from a tool to an active participant in development pipelines. In our recent study, we explored how embedded software teams—especially in safety-critical and resource-constrained domains—are adapting to this change. Unlike conventional programming, embedded systems demand determinism, reliability, and traceability, attributes that stochastic, AI-generated artifacts can undermine.

Through qualitative interviews and structured brainstorming with senior engineers across four companies, we identified eleven emerging practices and fourteen challenges shaping generative AI adoption. Central to these practices is the concept of agentic pipelines—multi-agent continuous integration and delivery flows where generative agents collaborate across coding, compiling, testing, and validation. Key practices include designing AI-friendly artifacts, integrating compiler-in-the-loop feedback, and managing prompt repositories for auditability and consistency.

Equally important are governance and sustainability concerns. Teams emphasize human-in-the-loop supervision, formal governance frameworks, traceability of models and outputs, and workforce upskilling to responsibly harness AI automation. Our findings reveal that while generative AI offers substantial productivity gains, sustainable adoption in embedded systems hinges on balancing autonomy with accountability—without compromising safety or certification requirements.

What happens if you give a compiler to an LLM…

https://www.arxiv.org/abs/2601.12146

Large Language Models (LLMs) are now central to code generation, but they often produce non-compiling or incorrect programs. We investigate how giving an LLM direct access to a real compiler (gcc) transforms it from a passive code writer into an active programming agent.

We conduct an extensive experiment on 699 real programming tasks in C, using models from 135 M to 70 B parameters. With compiler feedback integrated into the generation loop, the LLMs dramatically improve: compilation success jumps by 5.3 – 79.4 percentage points, syntax errors drop ~75 %, and undefined references drop ~87 %.

Interestingly, smaller LLMs with compiler feedback can outperform larger models without this access, suggesting that tools like compilers can compensate for model size and reduce energy/compute costs in software applications.

Overall, the study highlights the role software engineering tools play in practical LLM deployment, pushing us toward more interactive, feedback-driven code agents rather than one-shot generators. It’s a promising step toward combining NLP models with existing development ecosystems for better accuracy and efficiency.

New year, new book!

Link to amazon

During the entire 2025, I’ve had a chance to get into details with programming of agents, LLMs, and what have you. Thanks to the fact that my role as pro-dean ended, I’ve been given a lot of time to do it.

My family has supported me a lot too. Without them, this would not be possible.

So, why did I even think about writing another book, one may wonder. Well, I’ve been asked by many students and colleagues on how to design good AI software. You, something that is beyond just hacking two lines of code together.

I’ve also organized several Hackathons where we learned how to create multi-agent systems and how to work with them. So, I decided it is time to document all my experiences and go deep on the software design. This book is the result of that. This is what the back cover says:

Engineering Generative-AI Based Software discusses both the process of developing this kind of AI-based software and its architectures, combining theory with practice. Sections review the most relevant models and technologies, detail software engineering practices for such systems, e.g., eliciting functional and non-functional requirements specific to generative AI, explore various architectural styles and tactics for such systems, including different programming platforms, and show how to create robust licensing models. Finally, readers learn how to manage data, both during training and when generating new data, and how to use generated data and user feedback to constantly evolve generative AI-based software. As generative AI software is gaining popularity thanks to such models as GPT-4 or Llama, this is a welcomed resource on the topics explored. With these systems becoming increasingly important, Software Engineering Professionals will need to know how to overcome challenges in incorporating GAI into the products and programs they develop.

Here is the link to the book repo: https://github.com/miroslawstaron/engineering_generative_ai_systems

If you want to play around with our agentic framework, here it is online too!

https://github.com/miroslawstaron/agenticAI

Quantum computing, or can we live on Mars?

Once in a while I get to read a book that has nothing to do with my field. It’s mostly for enjoyment. Not many know that I was an astronomy freak when I was a kid. Somewhere in the middle of my primary school, I read books about red dwarfs, black holes, distant galaxies, and even astrophysics. As I said, I was a freak. Then I discovered computers and my interests changed towards them.

In this book, the authors look at the possibility of creating live on Mars and on the Moon. They look at the physics, chemistry, and technology related to the planets. They’ve read hundreds of published articles, grey literature and even contacted scientists all over the globe.

They also speculate how we actually would govern space exploration. They scrutinize different laws and treaties that countries agreed to follow and they looked at what happens if there are no laws to follow. They draw parallels to how we govern uncharted territories on Earth and how we think about celestial bodies.

If you are looking for a holiday reading, it is a cool book to read. A bit long, but definitely nicely written, well-designed, and definitely well-prepared.

The thinking machine, or machine that makes machines

A colleague of mine recommended this book to me. At first, I was a bit skeptical, because these books can either be very good or just a praise for the person who solicited it. This book is a mix of both.

Artificial Intelligence, and in particular Generative AI, is probably the hottest technology in town. Everybody we know talks about it, everybody we know wants to use it, but almost no one gets it to work on the industrial scale.

Well, that is true with a bit of modification. We have the OpenAIs and Anthropics of this world. They have built their entire businesses based on providing models to the public. We also have the Googles and the Microsofts who created tons of customer value from selling products built on top of these models.

This is a great inspirational book. It talks about the raise of NVidia, where we get to see how the founders were thinking when they created their products. We get to know that CUDA, the most powerful piece of software today, was created by a few individuals, who were doubted by the rest of the company.

However, it is also a story about creating the most valuable company in the world, which creates a lot of GDP for one nation and which monopolized the hardware used for advanced mathematical calculations. It is also a story of the man who made it all happen.

When I read this book, I got inspired to work even harder, to explore even more technologies, even faster. I hope that this book will have the same effect on others.

The Singularity is nearer

Image generated by Claude, text written by myself 🙂

https://www.adlibris.com/sv/bok/the-singularity-is-nearer-9780399562761

I’ve had vacation this week, so I managed to read a few books. One of them was the book by Ray Kurzweil about AI. It’s a continuation of the classical book “The Singularity is Near” by the same author. I like both of them….

Now, this book is very much alike the 2027 report – it’s essentially saying that the future is as we make it to be. If we make it dark, AI will make it darker, but if we make it bright, the AI will make it brighter.

Instead of fearing AI, we should use it for the better of human kind. We should use it to develop more software and make the software better. We should also use it to make us better – we can be better programmers thanks to it. However, if we just copy the software from AI to an editor, we’re not going to do great…

We can use software to cure cancer, create more medicine and make better products. We also should evolve as humanity – instead of taking job from one another, we should create more jobs for ourselves. We should also think about creating the UBI – Universal Basic Income – as there may be less jobs for ourselves. It’s not a bad thing – machines will do more jobs for us, so we need to make sure that we can live off these new inventions.

After reading this book, I strongly recommend this to anyone who doubts about the future with AI in it.