How do software engineers work with ML? — an interesting paper from Microsoft

Machine learning is one of the current hot areas. As AI is believed to be the next big breakthrough, machine learning is the technology behind AI that makes it all possible.

However, ML is also a technology, it’s a software algorithm and product that needs to be developed. It’s true that the development of ML systems has become much easier in the last years, since TensorFlow, PyTorch and other frameworks are available for free. So, is the problem of developing ML system solved once we have these frameworks?

No, it’s actually far from that. We still need software engineers to design, implement, deploy and OPERATE these systems in a robust way.

In our research, we studied the adoption of ML in industry in the days before tensorflow, where ML was still perceived to be “advanced statistics” and when deep learning was still called “neural networks” – look at the PDF, and another one here.

Now, if we observe the exponential adoption of ML in industry, we can also catch the big companies to come with mature processes on how to use ML. An example of that is the paper from Microsoft. The paper describes some of the challenges, and, the most important, it describes the workflow of developing ML system. This workflow is focused a lot on data – which is metrics 🙂

What I would like to advocate in this post is that we need to have more statistics and data analysis methods in software engineering education. We should prepare our future software designers to work with data equally as to work with programming!

Software analytics in the large scale – article review from IEEE Software in the light of our research on software development speed

In the latest IEEE Software issue we can find an interesting article from our colleagues in Spain, working on software analytics (https://doi-org.ezproxy.ub.gu.se/10.1109/MS.2018.290101357).

Something that has caught my attention is the focus of the platform and visualizations on the code review process. The review speed and the review process are important for software development companies (see our work on this topic:
https://content.sciendo.com/abstract/journals/fcds/43/4/article-p281.xml). However, to get a good dashboard with these measures, which communicates the goal in the correct way is not as easy as it looks.

One of the problems is that the dashboard is too complex – too many measures related to speed can cause contradicting diagrams – e.g. review speed can increase but the integration speed can decrease, so what happened with the entire speed?

Another problem is that we focus only on speed, but never really discuss how this influences other aspect, e.g. code quality, product quality, maintainability, etc.

In the best of words this would be easy, but we live in a world which is not perfect. However, the article from IEEE Software shows that this can be achieved by providing more flexibility in the platform where the dashboard is created.