Understanding anomalies in software data

Image by pixabay

Identifying and classifying anomalies in software engineering data is a well-known field. Using ML to identify intrusion attacks, credit card frauds, defects in production systems – are just a few of the examples of how broad the field is. Wherever we have data, we can have anomalies.

Both types of anomalies have similarities, but also differences, which provides us with an opportunity to study which of the algorithms for anomaly detection work best. We tried both the ML algorithms and domain-specific ones. Well, spoiler alert – not much has actually worked.

In our project together with Sahlgrenska University Hospital and Chalmers AI Research Center ( Chalmers AI Research Centre – Chair | Chalmers ), anomalies come in two shapes. One type of anomaly is the set of disturbances in radio networks, such as rain or wind. The other type is a specific type of event during surgeries, such as clamping of the carotid artery.

What works, on the other hand, is when we pivot on the problem. Instead of identifying anomalies, we can search for anomalies of a specific type. Instead of defining an anomaly as something deviating from the normal operations, we can say that we look for specific, though rare, events.

So far, we can identify anomalies pretty well and we work on being better to classify them automatically. So stay tuned if you would like to know more.

Testing of ML systems

BIld av OpenClipart-Vectors från Pixabay

Smoke testing for machine learning: simple tests to discover severe bugs | SpringerLink

Machine learning systems are very popular today, at least when it comes to research applications. They are not as popular as one would wished (or liked) in the real applications. One of the reasons is the fact that they are hard to test. We do not know how to check if an algorithm will behave as expected in all similar situations – well, we do not know which situations are similar for us and for the ML system.

This paper looks at the problem from a different angle. The research question is: RQ: What are simple and generic software tests that are capable of finding bugs and improving the quality of machine learning algorithms?

The authors developed a set of smoke tests, which they see that all ML algorithms should pass. The paper is quite exhaustive and if you are interested, I recommend to take a look at this table:

Table 1 | Smoke testing for machine learning: simple tests to discover severe bugs | SpringerLink

I love the article. It is simple, to the point and very applied. I’m going to use that in my tests of ML algorithms in the future.