Stronger features vs. stronger algorithms in ML

I’ve been working with machine learning a bit during the last couple of years. I’ve had great teachers who showed me how to use the algorithms and where to start learning. Thanks to them I understood the importance of different elements of the ML tool chain – data, storage, algorithms, hardware.

I’ve worked on the problem of how to extract features of source code so that I can use them to predict if a specific line of code has a defect or not, in particular if the defect can be caught during code reviews. I’ve spent about a year on this problem and tested all kinds of combinations, from static code analysis to using word embedding, dictionaries and other NLP mechanisms to understand the code. Nothing really worked great. I got predictions that were a bit better than then chance.

What was the problem? Well, the problem was the quality of the input data. Since I extracted data, and features from this data, automatically from large code bases (often over 3 MLOC), I often encountered the following problems:

Labeling – I could not pinpoint exactly where the problem was, which meant that I needed to approximate the label, which led to the next problem,

Consistency – when one line was considered good by one person, it could be considered problematic by another one; this meant that I needed to decide how to treat lines that are “suspicious”, and

Scales – when extracting features, some of them were on scale of 1 to 100, whereas some other ones were on the scale from 1 to 3; this meant that I needed a good scaler to get the features right.

So, here I am, working on the next implementation of the feature discovery algorithm. The algorithm that can extract features in such a way that each objects has distinct characteristics, yet the number of features is as small as possible to characterize each object. The algorithm helped me to boost the accuracy of the classification from ca. 50% to over 96%.

I’ve discovered that using simple ML algorithms on a good data set trumps everything else. I used AdaBoost with scaling of features on the good data set, and that was at least twice as good as using LSTM models with word embeddings (which were not bad anyways) for the same purpose.

My advice, therefore, is the following:

Start with a simple classification/ML algorithm and do not go into neural networks or other advanced methods,

Learn your data and look at it from several angles; use business intelligence and statistics to understand the dependencies between features (PCA, t-SNE) and chew on the data as long as you can, and

Focus on extracting features from your data, rather than expecting magic from ML; no algorithm can trump good input data and no filtering can trump a good “featurizer”

Image source: pixabay

Grit to Great or what we can learn from perseverance

I’ve picked up this book to learn a bit about perseverance and the power of pursuing goals. I’ve hoped to see if there is something I could learn about it for my new year’s resolutions.

It turned out to be a great book about being humble and to get rejected. Let me explain. The concept of grit means that one has the guts to do something. The resilience to get rejected. The initiative to start working on the next steps regardless of the outcome, and, finally the tenacity – the ability to focus on the goals.

The last one is an important one for the new year’s resolutions, but the resilience is an interesting quality. One can go on autopilot for the mundane things, but still needs the resilience when things go wrong. Sounds a bit like academic careers. We plan studies, conduct them, try to publish, get rejected, improve the papers, try to publish, etc.

We also need to have initiative to move the field of our study forward. We need to come up with new project ideas, submit research proposals. Get rejected. Fine tune the proposals, resubmit somewhere else, and so on.

Finally, the guts is a big quality. Researchers need to have the guts to take on big problems, to plan and conduct the studies, to speak in front of large audiences. Yes, speaking is not something that comes easy to most of us. We still need to prepare and find what we want to say and how. We need to adjust the talks based on the audience, the message and the goal of the talk.

It’s a great book to get some motivation for the work after the vacations. Work hard, publish, apply for funding and work even harder. Amidst all of that, please remember that you need to have students with you and that they need your attention too!

Measurement instruments

In my current PhD class I teach my younger colleagues how to work with measurements. It may sound straightforward, but one of the challenges is the data collection. I’ve written about this in my previous posts.

So, here I propose to look at the video recording from the lecture about it.

In the recording, we discuss different types of measurement instruments and how they are related to the measurement concepts like measured entity, measured attribute, base measure and measurement method.

Midnight in Chernobyl or how small mistakes can lead to great disasters

Once in a while I pick up a book about something outside of my expertise. As a kid I lived ca. 200 km from Chernobyl, where the biggest nuclear disaster happened in April 1986 (I actually remember that day and the few days after). The book got my interest because of its subtitle – the untold story of the nuclear disaster. I, admittedly, wanted to know how the disaster looked like from the side of the operators.

No one really knows what the long-term effects of the disaster really are (after all, 30+ years in not such a long term), but it’s interesting to see how the disaster happened and what we can learn from it in software engineering.

So, in short, the disaster happened because of the combination of factors.

First, the design of the reactor was flawed. The mix of substances used in the reactor have certain properties that raise the effect when they should lower it, or raise the effect when not monitored constantly.

Second, the implementation of the design, the construction of the power plant, was not great either. Materials of lower specs were used due to shortages in the USSR. The workers did not care much about the state property and the 5-year plans trumped the safety, security measures and even the common sense.

Third, and not less important, the operations were not according to the instructions. The operators did not follow the instructions for the test that they were about to commence. They reduced the power below the limit and then executed the test. Instead, they should have stopped the reactor and run the test during the next window available.

So, what does it have to do with software engineering? There was no software malfunction, but a set of human errors.

IMHO, this accident teaches us about the importance of safety mechanisms in software. I believe that many of us, who design software, do not think so much about the potential implications of what we do. We get a set of requirements, which we implement. However, what we should do, is to look broader at how users can use our system. How we can prevent any potential disaster.

For example, when we implement an app for a game. Should we allow people to play the game as much as they want? Should we provide them with all kinds of commercials? or should we help them by saying that they played long enough and that they could consider a break? Or maybe we should filter the commercials if we know that the game is played by a child?

I think that this is something we need to consider a bit more. We should even discuss this when we design our curricula and how we implement the curricula.

Is it the data or the company’s needs that come first?

When discussing data-driven development and the use of data to identify new features and products, it is always the needs of the organization that come first. The companies design the system, design their organization’s needs and then experiments which will provide the organization with the data needed to validate the hypothesis.

However, there is also another way. A while ago, we studied how large organizations work with their measurement programs ( What the theories prescribed, back then, was that the organizations should only look at their goals and needs. What we discovered was that it was a combination – what the company needs and what it can actually measure. The reality of the organizations that we studied was that not all needs could be fulfilled by the data they had or by the data they could possible have.

). What the theories prescribed, back then, was that the organizations should only look at their goals and needs. What we discovered was that it was a combination – what the company needs and what it can actually measure. The reality of the organizations that we studied was that not all needs could be fulfilled by the data they had or by the data they could possible have.

I’ve read an interesting piece at the website about new technologies – Hackernoon (

The article did not have anything to do with the measurement programs, but it had a lot to do with the data. It’s content was about the global apps, but what caught my attention was the concept of providing the user with the feedback what he/she can do with the data, rather than what data is needed for the task.

Sounds a bit crazy, but I think that it’s an important step towards a real data-driven development. Imagine that instead of thinking about discussing what we should do and how to do it, we can take a look at the data and immediately know what we can do.

If we know directly what we can do with the data, then we can just do it (or not) rather than spend time to discuss whether we can or cannot do it.

What it also means is that we can think more about the product than thinking about the data. We can think about what which features can be developed or dropped from the product. We do not even need to design experiments, we can just observe the products in field.

Action research in software engineering

Software engineering is an applied scientific area. It includes working with industrial applications and solving challenges that modern organizations face today.

Thanks to many of my colleagues, I’ve had the opportunity to work with industry-embedded research since I arrived here in Gothenburg. I want to share these experiences with colleagues and students, which led me to writing a book about action research.


This book addresses action research (AR), one of the main research methodologies used for academia-industry research collaborations. It elaborates on how to find the right research activities and how to distinguish them from non-significant ones. Further, it details how to glean lessons from the research results, no matter whether they are positive or negative. Lastly, it shows how companies can evolve and build talents while expanding their product portfolio.

The book’s structure is based on that of AR projects; it sequentially covers and discusses each phase of the project. Each chapter shares new insights into AR and provides the reader with a better understanding of how to apply it. In addition, each chapter includes a number of practical use cases or examples. Taken together, the chapters cover the entire software lifecycle: from problem diagnosis to project (or action) planning and execution, to documenting and disseminating results, including validity assessments for AR studies.

The goal of this book is to help everyone interested in industry-academia collaborations to conduct joint research. It is for students of software engineering who need to learn about how to set up an evaluation, how to run a project, and how to document the results. It is for all academics who aren’t afraid to step out of their comfort zone and enter industry. It is for industrial researchers who know that they want to do more than just develop software blindly. And finally, it is for stakeholders who want to learn how to manage industrial research projects and how to set up guidelines for their own role and expectations.

AI Superpowers

In the eve of 2019, I got the time to read my copy of AI Superpowers. I must admit that I was sceptical towards it in the beginning. I’ve read a fair number of AI books and many of them were quite superficial – a lot of text, but not much novelty. However, this book seemed to be different.

First of all, the book is about the innovators and the transformations from low-tech to high-tech. The transformation is described as a process of learning. First copying the solution of others, then making your own. First learning the market, then creating your own. Finally, the examples of building the software start-up ecosystem are based on these small examples.

Second of all, the book discusses the issues that I’ve advocated for since a while back – the ability to utilise the data at hand. The European GDPR is a great tool for us, but it can stop the innovation. China’s lack of GDPR is a problem, but also a possibility. However, it needs to be tackled or it will never be fair. the description of the wars between companies show that the scene in China is not like it is in the Silicon Valley. It’s not great, but it was a mystery to me before. I’ve not really reflected upon that.

I guess that looking at the holistic picture of how Ai will affect the society is not very common. Well, maybe except the doomsday prophecies about how AI will take our jobs. This book is a bit difference in that respect. It looks at the need for basic income and how this could reshape the society. It discusses how this can be done both on the technical and on the social levels. To show a preview of it, please take a look at how the Kai-Lee predicts that the AI will affect our work.

Finally, I’ve got a number of ideas from the book. Ideas which I can use in the upcoming course about start-ups. I strongly recommend the book to my students and all entrepreneurs, who want to understand the possibilities of this new technology. I also recommend this book for people who believe in doomsday prophecies about AI – the revolution is near, but AI will not be like a Terminator. More like HAL 🙂

AI Christmas

Image by Oberholster Venita from Pixabay

The Christmas holidays are coming and it’s a busy time in academia. The old semester is coming to an end, deadlines pile up, papers need to be written, economy reported, new semester to start.

This year I though a bit about making a reflection on what has happened in 2019. Well, the biggest thing was the AI Competence project. It took a lot of time to prepare and a lot of persons to coordinate. It’s also been a super exciting time as I’ve learned a lot about AI.

We organized seminars and courses about AI in Law, AI in journalism, AI in schools. Basically, it turned out that AI is everywhere and influences all kinds of professions. The last seminar is about AI and ethics. I’m not very good with ethics so I will not talk about that.

What I would like to talk about AI and holidays. Yes, it is half-serious, but it’s holiday season coming up, so let’s see.

First, let’s look at the concept of trustworthiness (10.1109/MITP.2019.2913265). AI can be trustworthy or not, we can also trust it or not. If we look at the confusion matrix based on that, we could quickly see that the most problmatic challenge is when we trust AI and it cannot be trusted. Then we can be fooled and it can have disastrous consequence – we can get killed if we trust an AI that flies a plane and it is malicious. I do not believe this is very likely, but can happen – who knows whether the software we construct is actually wishing us well? In a 100 million LOC software we cannot really check that.

Second, holidays are often about forgetting sins and those that trespass against us. So, can we teach AI to forget ( One big issue a few years ago was the so-called “right to be forgotten”, i.e. The right of an individual to ask to be removed from search histories, etc. Can we ask AI to forget us? And if it does now show results related to us, has it really forgotten? And if it forgets, does it only forget the “bad” things and not the “good” ones?

Third, holidays are often about being grateful for something. We’re often grateful for our families, health, life, friends. We can be grateful for basically anything. But, can AI be grateful? Does Ai have friends? Does it need a family? I don’t think it does, and what does it mean? I guess that we can still see what will happen, but I hope that AI systems will start to understand the need of these values. In cases of AlphaGo or AlphaStar, where different types of algorithms were linked together (reinforced learning and deep learning), did these two algorithms understood that they need each other to succeed? This is as close to friendship as I could find about AI, but I have not found any evidence about gratefulness.

I guess that the concept of forgiveness is also important. Most of us do not reflect about it, but let’s look at a simple case of kids using internet. They make mistakes – they are kids, they learn – is it fair that their mistakes can be remembered forever ( Wired took up this topic ( and raised a concern that we can never be forgotten. Once our data enters the super-complex machinery of AI and algorithms, they are trained, adjusted, customized, this data can never leave that system. Maybe it will not be linked to us, but it will never be forgotten.

Well, to sum up. I think that AI is not ready for the world. The world is maybe ready for the AI. We are happy that our holiday bookings are done through AI, planes are scheduled using AI and flown by them. However, we can compare an AI to a person who takes everything as true and never forgets. Kind of kaptain Kirk from Star Trek. Would this person like a holiday season?

So, let’s be happy and grateful that the AI, in a general sense, lives inside computers and does not walk around our world. Our holidays are, most certainly, better because of that.

Merry Christmas!

Goals – KPIs – Effects, or which door should I choose?

picture: Image by Arek Socha from Pixabay

Alice: Which way should I go? Cat: That depends on where you are goingAlice: I don’t know. Cat: Then it doesn’t matter which way you go.” Lewis Caroll, Alice in Wonderland

Many companies like to think that they need metrics to improve, which is often not true – they only improve when they show effects. This post is about my observations what kind of metrics lead to effects and how to think about the effects. So, choosing the right measure is mostly about choosing the right goal.

Recently I’ve been helping organizations to design measurement programs from scratch. Every time I encounter an organization which tries to establish the program, they start somewhere in the middle: not the end, not the beginning, but in the middle.

In order to illustrate this in a good way, I’ve looked at the Swedish Innovation Agency Vinnova’s effect-logic measurement: (in Swedish).

In short, that kind of set-up of measurements requires two levels of measurements, but let’s start from the entire chain. The chain is presented in figure below. The figure is my own intepretation of Vinnova’s framework. In particular I add the goals, which are extremely important in measurement.

First, we need to define which goals we want to address. Then, we plan which activities we need to conduct to achieve these goals. Then we define the results from the project – what we deliver to address the goals. The results can be measured and quantified. The results also lead to some effects, which is often something that we can do thanks to these results. Finally, these new events and activities can lead to quantifiable effects.

So, how does this translate to the field of software engineering and software measurement? Let’s consider an example: we want to increase the quality of source code integrated to the main branch. We plan a project where we study the code review and testing practices. We deliver new methods which can speed up the code reviews. Our first measure, which we can turn into an indicator, is the duration of the review. The effect of this, as we anticipate, is the fact that the number of features delivered to the main branch will be higher (our effect measurement, or PI). Finally, the long term effect is that we can get more customers as we have more features.

We can easily identify measures and indicators here. This is all thanks to the fact that we put our story in a specific way – starting from the goal and ending in the effects.

So, instead of asking what to measure, first look into the goals and expected effects. Once you have these, it will be easy to identify the measures and indicators.

Good storage and traceability in ML4SE

Feature excavation and storage 🙂 image from pixabay

In the last post I’ve discussed the need to create good features from the data, and that this trumps the choice of the algorithm. Today, I’ve chosen to share my observations on the need of good data storage for machine learning. It’s a no brainer and everyone data scientist knows that this is important.

However, what the data scientists and machine learning specialists struggle with is which data to store and how?

Imagine a case when you develop a system that takes data from a Jenkins build system. It’s easy to collect the raw data from Jenkins using a REST API. You know how to do it, so you do not store the raw data – you just extract the features and dump the raw data. A week after you try to collect it again and the data is not there or is different, incomplete, manipulated. You just wanted to add one more feature to the data set, but you cannot, because the raw data is now available.

In our work with metrics and machine learning we realized that we need to store all data, raw data, featurized data, metrics and even decisions made on this data. Why, because of the traceability. All of that is caused by the constant evolution of software engineering.

First, we need to store the raw data as our feature extraction techniques evolve and thus we need to add new features. For example, a company adds a new field in Jenkins or uses a new tag when adding comments. We can use that information, but we probably need to change it for the entire data set.

Second, we need to store all intermediate metrics and decisions as we need to know whether the evolved data or evolved algorithms actually work better than the previous ones. Precision, recall and F1-scores are too coarse grained to actually understand if the improvement/deterioration is real or in the right/wrong direction.

Finally, we need to store the decisions as we need to know what we actually improve. We often store recommendations, but very seldom store decisions. We can use Online Experiment Systems (see publications by J. Bosch) in order to keep track of the results and of the decisions.

From my experience of working with companies, I see that keeping the raw data is not a problem, although it happens that it is neglected. I see that many companies neglect to store the decisions, so when an improvement is made, there is no real evidence that the improvement is real.