This is great that we have increasingly more tools and that AI engineering matures. We do not want to have precision and recall steer our development. We want to have real testing and real systems. We also need to work with data quality in order to ensure that the systems are reliable.
The alternative is that we use MCC, precision, recall, F1-score to tell us how good a system is, which is not entirely true. These measures do not provide any view on how the system reflects the requirements put on it. These measures allow us to compare different classifiers, but not systems.
I hope that we can focus more discussion on AI quality and not classification quality/accuracy.
Being a software engineer working with AI for a while, I noticed that the engineering of AI systems is different. Well, maybe not building the actual system, but the way in which the knowledge about quality, testing and maintenance differ.
In this article, IEEE Software’s Editor in Chief presents her view on the topic. The main point is that this engineering is both similar and different. This quote from the paper summarizes it nicely: “I argue that our existing design techniques will not only help us make progress in understanding how to design, deploy, and sustain the structure and behavior of AI-enabled systems, but they are also essential starting points. I suggest that what is different in AI engineering is, in essence, the quality attributes for which we need to design and analyze, not necessarily the design and engineering techniques we rely on. “
One of the differences is the process of development. It is not aligned with the non-ML systems, e.g. in terms of training, testing, maintenance. ML systems are data-centric and this needs to be reflected in the AI engineering processes.
Ipek Ozkaya discusses the following misconceptions about the differences:
We can specify systems – both AI and non-AI systems cannot really be fully specified,
System correctness can be verified – we can never fully verify systems, neither AI-based on non-AI based (e.g. due to complexity),
We can avoid hidden dependencies,
We can manage system change propagation,
Frameworks do it all,
We can build reliable systems from unreliable and unpredictable subcomponents
I recommend this article to get a quick overview of the gist of the differences and misconceptions.
New sub-areas or fields within software engineering are not that common, but they come up once in a while. The authors of this article (https://doi-org.ezproxy.ub.gu.se/10.1007/s10664-020-09808-9, Better software analytics via “DUO”: Data mining algorithms using/used-by optimizers) argue that this is the case now.
In this article, the authors provide a view that data mining and building optimization models are done in tandem and that this is the new field. They show that the data mined from repositories influences optimization models and that the development of models influences data mining.
The authors make the following claims (quoted from the paper, references removed):
Claim1:For software engineering tasks, optimization and data mining are very similar. Hence, it is natural and simple to combine the two methods.
Claim2:For software engineering tasks. optimizers can greatly improve data miners. A data miner’s default tuners can lead to sub-optimal performance. Automatic optimizers can find tunings that dramatically improve that performance.
Claim3:For software engineering tasks, data miners can greatly improve optimization. If a data miner groups together related items, an optimizer can explore and report conclusions that are general across a set of solutions. Further, optimization for SE problems can be very slow. But if that optimization executes over the groupings found by a data miner, that inference can terminate orders of magnitude faster.
Claim4:For software engineering tasks, data mining without optimization is not recommended. Conclusions reached from an unoptimized data miner can be changed, sometimes even dramatically improved, by running the same tuned learner on the same data. Researchers in data mining should, therefore, consider adding an optimization step to their analysis.
These claims make a lot of sense and they are aligned with my observations. I recommend this article for everyone who is working at or developing a metric team or a data analysis/data science team.
Agile software development is here to stay. Software engineering has been changed completely when it arrived and I cannot see how it will change back to waterfall.
Even the so-called post-agile methodologies are built on the same premises – collaboration within teams, empowerement, taking responsibility. Since we educate our future software engineers to be self-organizing, to take responsibility for their topics, projects and personal development, we will live with see a development in this direction.
At the same time, the tools that are needed to support these teams evolve. When I started working as a software engineer, Rational ClearCase was the go-to tool for version control. Now, my students don’t even recognize it. Well, Rational does not exist any more either – the last time I checked it was bought by IBM, no one knows what happens now.
In this paper, the authors take a look at how modern collaboration patterns look like and which tools are used there. They have observed that consolidation of tools (e.g. towards Slack) introduced very positive dynamics to the team. However, they also found that some automated notifications (e.g. from Jenkins) got lost in these new tools, which was not appreciated.
This case study is a longitudinal one, which is a great example of how a research team builds up a long-term collaboration, so important for modern academia and industry!
Creating recommendation systems is a tricky task. We need to add the temporal domain to the data. In particular, we need to make sure that we capture what was recommended before to the specific user and how the user reacted upon that. We also need to capture the evolution of the users and the data.
In this paper, the authors present a framework, RectoLibry, which helps to construct these kind of systems. The system captures both the parts of the development of the recommendations, but also their deployment.
Our research team has been working with code comments for a while now. We have done analyses of source code comments and the code that is commented. We have also worked with the needs for code comments.
The results showed that AI support for code reviewing process is very much needed. However, it has also shown that the current tools are not good enough yet. One of the tools is DeepCode.ai, which analyzes code repository and finds problems in the code. It is a great tool, but it has been trained on large data sets from open source projects, which makes it tricky to use on proprietary software. It does not know how to capture the project specific characteristics.
I’ve recently cam across this article (https://link-springer-com.ezproxy.ub.gu.se/content/pdf/10.1007/s10664-019-09730-9.pdf) which is about generating code comments. It can generate code comments (not review comments) for the Java programming language. It “understands” what the code does and generates a natural language description of the code. The tool is based on the latest work from the NLP domain and has been trained on over 44 million statements, which is quite a number😊
The tool is not perfect (yet), but shows improvement over existing approaches and is definitely opening up new alleys of using NLP in Software Engineering.