SE metrics (Software Engineering) – Page 17 – Software engineering, metrics, functional safety …

AI Christmas

Image by Oberholster Venita from Pixabay

The Christmas holidays are coming and it’s a busy time in academia. The old semester is coming to an end, deadlines pile up, papers need to be written, economy reported, new semester to start.

This year I though a bit about making a reflection on what has happened in 2019. Well, the biggest thing was the AI Competence project. It took a lot of time to prepare and a lot of persons to coordinate. It’s also been a super exciting time as I’ve learned a lot about AI.

We organized seminars and courses about AI in Law, AI in journalism, AI in schools. Basically, it turned out that AI is everywhere and influences all kinds of professions. The last seminar is about AI and ethics. I’m not very good with ethics so I will not talk about that.

What I would like to talk about AI and holidays. Yes, it is half-serious, but it’s holiday season coming up, so let’s see.

First, let’s look at the concept of trustworthiness (10.1109/MITP.2019.2913265). AI can be trustworthy or not, we can also trust it or not. If we look at the confusion matrix based on that, we could quickly see that the most problmatic challenge is when we trust AI and it cannot be trusted. Then we can be fooled and it can have disastrous consequence – we can get killed if we trust an AI that flies a plane and it is malicious. I do not believe this is very likely, but can happen – who knows whether the software we construct is actually wishing us well? In a 100 million LOC software we cannot really check that.

Second, holidays are often about forgetting sins and those that trespass against us. So, can we teach AI to forget (https://futurism.com/how-to-make-ai-forget). One big issue a few years ago was the so-called “right to be forgotten”, i.e. The right of an individual to ask to be removed from search histories, etc. Can we ask AI to forget us? And if it does now show results related to us, has it really forgotten? And if it forgets, does it only forget the “bad” things and not the “good” ones?

Third, holidays are often about being grateful for something. We’re often grateful for our families, health, life, friends. We can be grateful for basically anything. But, can AI be grateful? Does Ai have friends? Does it need a family? I don’t think it does, and what does it mean? I guess that we can still see what will happen, but I hope that AI systems will start to understand the need of these values. In cases of AlphaGo or AlphaStar, where different types of algorithms were linked together (reinforced learning and deep learning), did these two algorithms understood that they need each other to succeed? This is as close to friendship as I could find about AI, but I have not found any evidence about gratefulness.

I guess that the concept of forgiveness is also important. Most of us do not reflect about it, but let’s look at a simple case of kids using internet. They make mistakes – they are kids, they learn – is it fair that their mistakes can be remembered forever (http://www.kasparov.com/ai-never-forgets-so-we-must-teach-it-to-forgive-avast-blog-post-september-22nd-2019/). Wired took up this topic (https://www.wired.com/story/the-next-big-privacy-hurdle-teaching-ai-to-forget/) and raised a concern that we can never be forgotten. Once our data enters the super-complex machinery of AI and algorithms, they are trained, adjusted, customized, this data can never leave that system. Maybe it will not be linked to us, but it will never be forgotten.

Well, to sum up. I think that AI is not ready for the world. The world is maybe ready for the AI. We are happy that our holiday bookings are done through AI, planes are scheduled using AI and flown by them. However, we can compare an AI to a person who takes everything as true and never forgets. Kind of kaptain Kirk from Star Trek. Would this person like a holiday season?

So, let’s be happy and grateful that the AI, in a general sense, lives inside computers and does not walk around our world. Our holidays are, most certainly, better because of that.

Merry Christmas!

Goals – KPIs – Effects, or which door should I choose?

picture: Image by Arek Socha from Pixabay

Alice: Which way should I go? Cat: That depends on where you are going. Alice: I don’t know. Cat: Then it doesn’t matter which way you go.” Lewis Caroll, Alice in Wonderland

Many companies like to think that they need metrics to improve, which is often not true – they only improve when they show effects. This post is about my observations what kind of metrics lead to effects and how to think about the effects. So, choosing the right measure is mostly about choosing the right goal.

Recently I’ve been helping organizations to design measurement programs from scratch. Every time I encounter an organization which tries to establish the program, they start somewhere in the middle: not the end, not the beginning, but in the middle.

In order to illustrate this in a good way, I’ve looked at the Swedish Innovation Agency Vinnova’s effect-logic measurement: https://www.vinnova.se/globalassets/utlysningar/2018-02242/omgangar/mall-effektlogik.pdf916281.pdf (in Swedish).

In short, that kind of set-up of measurements requires two levels of measurements, but let’s start from the entire chain. The chain is presented in figure below. The figure is my own intepretation of Vinnova’s framework. In particular I add the goals, which are extremely important in measurement.

First, we need to define which goals we want to address. Then, we plan which activities we need to conduct to achieve these goals. Then we define the results from the project – what we deliver to address the goals. The results can be measured and quantified. The results also lead to some effects, which is often something that we can do thanks to these results. Finally, these new events and activities can lead to quantifiable effects.

So, how does this translate to the field of software engineering and software measurement? Let’s consider an example: we want to increase the quality of source code integrated to the main branch. We plan a project where we study the code review and testing practices. We deliver new methods which can speed up the code reviews. Our first measure, which we can turn into an indicator, is the duration of the review. The effect of this, as we anticipate, is the fact that the number of features delivered to the main branch will be higher (our effect measurement, or PI). Finally, the long term effect is that we can get more customers as we have more features.

We can easily identify measures and indicators here. This is all thanks to the fact that we put our story in a specific way – starting from the goal and ending in the effects.

So, instead of asking what to measure, first look into the goals and expected effects. Once you have these, it will be easy to identify the measures and indicators.

Good storage and traceability in ML4SE

Feature excavation and storage 🙂 image from pixabay

In the last post I’ve discussed the need to create good features from the data, and that this trumps the choice of the algorithm. Today, I’ve chosen to share my observations on the need of good data storage for machine learning. It’s a no brainer and everyone data scientist knows that this is important.

However, what the data scientists and machine learning specialists struggle with is which data to store and how?

Imagine a case when you develop a system that takes data from a Jenkins build system. It’s easy to collect the raw data from Jenkins using a REST API. You know how to do it, so you do not store the raw data – you just extract the features and dump the raw data. A week after you try to collect it again and the data is not there or is different, incomplete, manipulated. You just wanted to add one more feature to the data set, but you cannot, because the raw data is now available.

In our work with metrics and machine learning we realized that we need to store all data, raw data, featurized data, metrics and even decisions made on this data. Why, because of the traceability. All of that is caused by the constant evolution of software engineering.

First, we need to store the raw data as our feature extraction techniques evolve and thus we need to add new features. For example, a company adds a new field in Jenkins or uses a new tag when adding comments. We can use that information, but we probably need to change it for the entire data set.

Second, we need to store all intermediate metrics and decisions as we need to know whether the evolved data or evolved algorithms actually work better than the previous ones. Precision, recall and F1-scores are too coarse grained to actually understand if the improvement/deterioration is real or in the right/wrong direction.

Finally, we need to store the decisions as we need to know what we actually improve. We often store recommendations, but very seldom store decisions. We can use Online Experiment Systems (see publications by J. Bosch) in order to keep track of the results and of the decisions.

From my experience of working with companies, I see that keeping the raw data is not a problem, although it happens that it is neglected. I see that many companies neglect to store the decisions, so when an improvement is made, there is no real evidence that the improvement is real.

AI Ethics – a programmer’s perspective

Image by Tumisu from Pixabay

I’ve been working with machine learning for a while and observed the discussion about AI and ethics. From the philosophical perspective the discussion is very much problem-oriented; the discussion is about “paper cuts” from using AI.

I’ve recently looked at the article from SDTimes (sdtimes.org) about AI Ethics (https://sdtimes.com/ai/ai-ethics-early-but-formative-days/) and its early days. I’ve also looked at the book behind this article: The big nine by Amy Webb (https://www.amazon.com/Big-Nine-Thinking-Machines-Humanity/dp/1541773756). It seems that the discussion there misses an important point – that AI is based on machine learning algorithms, which are applied statistical methods.

This applied nature of AI means that it is algorithms using data to make decisions. For me, as a programmer, this poses an important threat – how can I know what is ethical and what is not ethical if it is not in the data? What does ethics mean in terms of programming – how can I evaluate the ethics?

 I can break it down into a few programming challenges:

Requirements on ethics – how can requirements on ethics be expressed?

Measurements of ethics – how can we measure that something is ethical or not?

Implementation and traceability of ethics – where does ethics get implemented? Should I look for it in the code? Where?

In the first part, the philosophers could help a great deal. They can point us to the direction of how to reason about ethics and what kind of data we should collect or use when training ML algorithms.

In the second part, we as software engineer researchers, can help. Once we know what ethics can be, we can quantify it. We can even use statistics to approximate ethics for a given algorithm. However, I’ve not seen any approach for that.

Finally, if we know how to measure ethics, we can try to link that to code and try to approximate some sort of traceability of ethics in the program code – at least to start with. Later we can even trace the ethics requirements in the code, just as we profile functions for resources and trace safety requirements.

Well, these are just some of my throughs on the topic. As I said in the title – they are from the perspective of a programmer and researcher applying ML.

For further reading, I recommend to read a great piece of work in ACM Communications: AI judges and juries – Communications of the ACM, Vol. 61 Issue 12.

How do software engineers work with ML? — an interesting paper from Microsoft

Machine learning is one of the current hot areas. As AI is believed to be the next big breakthrough, machine learning is the technology behind AI that makes it all possible.

However, ML is also a technology, it’s a software algorithm and product that needs to be developed. It’s true that the development of ML systems has become much easier in the last years, since TensorFlow, PyTorch and other frameworks are available for free. So, is the problem of developing ML system solved once we have these frameworks?

No, it’s actually far from that. We still need software engineers to design, implement, deploy and OPERATE these systems in a robust way.

In our research, we studied the adoption of ML in industry in the days before tensorflow, where ML was still perceived to be “advanced statistics” and when deep learning was still called “neural networks” – look at the PDF, and another one here.

Now, if we observe the exponential adoption of ML in industry, we can also catch the big companies to come with mature processes on how to use ML. An example of that is the paper from Microsoft. The paper describes some of the challenges, and, the most important, it describes the workflow of developing ML system. This workflow is focused a lot on data – which is metrics 🙂

What I would like to advocate in this post is that we need to have more statistics and data analysis methods in software engineering education. We should prepare our future software designers to work with data equally as to work with programming!

Software analytics in the large scale – article review from IEEE Software in the light of our research on software development speed

In the latest IEEE Software issue we can find an interesting article from our colleagues in Spain, working on software analytics (https://doi-org.ezproxy.ub.gu.se/10.1109/MS.2018.290101357).

Something that has caught my attention is the focus of the platform and visualizations on the code review process. The review speed and the review process are important for software development companies (see our work on this topic:
https://content.sciendo.com/abstract/journals/fcds/43/4/article-p281.xml). However, to get a good dashboard with these measures, which communicates the goal in the correct way is not as easy as it looks.

One of the problems is that the dashboard is too complex – too many measures related to speed can cause contradicting diagrams – e.g. review speed can increase but the integration speed can decrease, so what happened with the entire speed?

Another problem is that we focus only on speed, but never really discuss how this influences other aspect, e.g. code quality, product quality, maintainability, etc.

In the best of words this would be easy, but we live in a world which is not perfect. However, the article from IEEE Software shows that this can be achieved by providing more flexibility in the platform where the dashboard is created.

Software Analytics or Software Factfulness

I’ve recently read Hans Roslund’s book “Factfulness”, which is about the ability to recognize patterns and analyze data in the global scale. The book is about global trends like poverty, education, health, etc. Not much, if anything, about software engineering.

However, when reading I constantly thought about its relation to software analytics. In software analytics we look at software products and activities with the goal to find patterns and to understand what’s going on with our products. We produce diagrams and knowledge that stems from these diagrams. Although we provide a lot of information about the software, I’ve not really seen much work on the interpretation of software analytics.

The examples of software analytics, which I’ve stumbled on, usually are about designing diagrams and triggering decisions. They are rarely about putting this in context. How do we know that we need to take actions to reduce complexity? Is our product exceptionally complex? or is it just that all software products are getting complex?

Maybe we should dedicate more time to discuss consequences about software analytics — become more factful!

Link to the book: https://www.gapminder.org/factfulness-book/

Measuring Agile Software Development

It’s been a while since I blogged last, but does not mean that our team is not working:) Quite the contrary.

In the last few months we were busy with the investigation of how to measure agile software development and DevOps. We have looked at the companies that are about to make a transformation from waterfall and V to Agile. We also looked at the companies that did that recently and that did that kind of transformation a while back.

We found that the information needs evolve rapidly as companies evolve.

Companies willing to transform/in-transformation focus on measuring the improvement of their operations. They want to be faster, provide more features in shorter time frames, increase the quality. They also want to measure how much they transformed.

Companies that have just transformed focus on following agile practices despite that there is no such thing. They seek measurements that are “agile”, and often end up with measures of velocity, backlogs and customer reactiveness. They are happy to be agile and move on.

However, after a while they discover that these measures (i) do not have anything to do with their product, (ii) do not really care about long-term sustainability of their business, so they look at the mature agile companies.

Mature agile companies, however, focus on the products and customers. They look at the stability of their products and on the development of their business models. They focus on architectural stability and automation rather than on velocity and story points.

I hope that you enjoy the presentation on the topic that we soon give at VESC in Gothenburg.

Measuring Agile Software Development from Miroslaw Staron

How good is your measurement program?

One of our work – the MESRAM model for assessing the quality of measurement programs – has been used by our colleagues to evaluate measurement programs at two different companies: https://doi.org/10.1016/j.infsof.2018.06.006

The paper shows how easy it is to use the model and that it provides very nice results in terms of how well they reflect the real quality of the program.

If you are interested in these results from the Software Center metrics project, please also visit the original paper: https://doi.org/10.1016/j.jss.2015.10.051

And also a few papers that help you assess the quality of your KPIs and metrics: https://doi.org/10.1109/IWSM-Mensura.2016.033

Trailer about the metrics project

Dissemination of research results in the age of YouTube is not very easy. I would say it’s quite impossible. That’s why I’ve tried to make it a bit more interesting and made this trailer with the use of iMovie.

It’s my first edited video, so please be nice to it!

The link to the video at GU Play: https://play.gu.se/media/Metrics+theme+trailer/0_2f0dw0uz