Using Deep Learning to Understand code

One of our software center activities is focused on reducing the effort that the designers spend on code analysis and quality assurance. In this project we are looking at creating a model for high and low quality code – in general.

Now I’ve come across this nice paper about using deep learning for finding whether code is more readable or not: https://doi.org/10.1016/j.infsof.2018.07.006

The paper is written by a research team from City University of Hong Kong and Beijing University of Technology. The paper presents a method that has been evaluated against human reviewers and is based on techniques that require no feature engineering. It shows that it is better than the previous approaches, yet requires less effort to set up.

The paper also provides the possibility to reuse the code – great and very interesting reading.

In Software Center, we create a deep learning model that can learn the quality of code from tools for code review and reduce the review effort by order of magnitude. Please take a look at our presentation from the Software Center Metrics Day.

Stay tuned!

Software data fuels AI, ML and Software Analytics

I’ve talked about software analytics in the previous post, in particular the latest issue of IEEE Software. In this post, let me introduce an interesting book for software engineers and software engineering scientists interested in software analytics: Bird, C., Menzies, T., & Zimmermann, T. (Eds.). (2015). The Art and Science of Analyzing Software Data. Elsevier.

After reading a few chapters, one conclusion emerged – the fact that modern software analytics is not about algorithms, it’s about data and its collection. It’s about measurement, quantification and metrics. Even the analysis of qualitative data is often done using measurements in order to speed it up.

Harvard Business Review claimed that “Big Data is Not the New Oil” as there are fundamental differences between the scarce fossil fuel and abundant data from software project (https://hbr.org/2012/11/data-humans-and-the-new-oil). However, even though data is not scarce, I believe that it will fuel the software industry for at least one more decade.

Therefore, we still need to teach our students how to work with data, how to collect and analyse it, and how to assess its value. We also need to understand how to monetise the data.

Software analytics, the next thing for software metrics in modern companies

The hot summer in Europe provided a lot of time for relaxation and contemplation:) I’ve spent some of the warm days reading some articles for the upcoming SEAA session on software analytics, which is a follow up of the special issue of IST: https://doi.org/10.1016/j.infsof.2018.03.001 

Software analytics, simply put, is using data and its visualisation to make decisions about software development. The typical data sources, both in literature and observed in many companies, are:

  1. Source code measurements from Git
  2. Defect data from JIRA
  3. Requirements data
  4. Customer data, a.k.a. field data
  5. Performance/profiling data from running the system
  6. Process data from time reporting systems, Windows journals, etc.

These data sources allow us to find bottlenecks in the performance of our software and the performance of our progress.

Software analytics has been in the heart of such paradigms as the MVP from The Lean Start-Up, where they provide the ability to steer which features are developed and which are abandoned.

Our experiences from Software Analytics are described in the book Software Development Measurement Programs, chapter 5: https://www.springer.com/us/book/9783319918358 

 

KPI – what’s the major challenge in making them work in software organizations?

Our Software Center project has worked with a number of companies to increase the impact of KPIs in modern organizations. Although the concept of KPI has been around since the 90s, many organizations still struggle with making KPIs actionable.

In this post, I’ll show the results of one of the recent assessments of KPIs. To get the understanding of how the KPIs are worked, I’ve asked about 20 managers to assess some of the KPIs used in their organizations. We used a simplified model of KPI quality, developed in the last spring. The results are presented in the figure below.

The figure shows what the gut feeling would tell us – that the major quality problems with the KPIs is the lack of clear guidelines how to react. The company has no problem with the mathematics, the quantification or even the presentation. The major challenge is the analysis model and the action model linked to that.

How to change this situation?

1. Create an action plan – what to check when the indicator shows red?

2. Find the stakeholder who has the right mandate to act.

3. Make sure that the stakeholder checks the status of the indicator regularly.

4. Make sure that the indicator stays updated and maintained.

If the above cannot be fulfilled, then it makes no sense to have the KPI, remove it, forget it and move forward with another business goal.

To read more how we assess KPI’s quality, take a look at this paper:

Staron, Miroslaw, Wilhelm Meding, Kent Niesel, and Alain Abran. “A Key performance indicator quality model and its industrial evaluation.” In Software Measurement and the International Conference on Software Process and Product Measurement (IWSM-MENSURA), 2016 Joint Conference of the International Workshop on, pp. 170-179. IEEE, 2016.

Link: https://ieeexplore.ieee.org/abstract/document/7809605/

Measuring readability of code…

Recently, I had an interesting discussion about code qualities that are seldom part of software research. An example of such quality is readability, which is the degree to which we can read the code correctly.

Low readability does not need to lead to defects in the code, but in the long run it does. In the context of software engineering of products that evolve over long time, readability is dangerously close to understandability and therefore also very close to modifiability and correctness.

I’ve come across the following paper recently:

Scalabrino, S., Linares-Vásquez, M., Oliveto, R. and Poshyvanyk, D., 2017. A Comprehensive Model for Code Readability, published in Software Evolution and Maintenance journal.

The paper has designed a set of features for texts, which can help to quantify readability. Let me quote the abstract:

“…the models proposed to estimate code readability take into account only structural aspects and visual nuances of source code, such as line length and alignment of characters. In this paper, we extend our previous work in which we use textual features to improve code readability models. We introduce 2 new textual features, and we reassess the readability prediction power of readability models on more than 600 code snippets manually evaluated, in terms of readability, by 5K+ people. […] The results demonstrate that (1) textual features complement other features and (2) a model containing all the features achieves a significantly higher accuracy as compared with all the other state‐of‐the‐art models. Also, readability estimation resulting from a more accurate model, ie, the combined model, is able to predict more accurately FindBugs warnings.”

How to validate software measures – list of attributes from a systematic review

During the weekend I did some digging into the quality of measurement, in particular, I tried to answer a question from a colleague on measurement accuracy limits. Well, instead of digging into the accuracy, I managed to look at the validation of measures in general.

I’ve been searching for methods how people evaluate software measures and I came across this nice paper from Laurie Williams and colleagues: https://dl.acm.org/citation.cfm?id=2377661

This systematic review lists 47 criteria used to evaluate software metrics, combining both the empirical and theoretical validation. Here is the list of what they found:

  • A priori validity
  • Actionability
  • Appropriate Continuity
  • Appropriate Granularity
  • Association
  • Attribute validity
  • Causal model validity
  • Causal relationship validity
  • Content validity
  • Construct validity
  • Constructiveness
  • Definition validity
  • Discriminative power
  • Dimensional consistency
  • Economic productivity
  • Empirical validity
  • External validity
  • Factor independence
  • Improvement validity
  • Instrument validity
  • Increasing growth validity
  • Interaction sensitivity
  • Internal consistency
  • Internal validity
  • Monotonicity
  • Metric Reliability
  • Non-collinearity
  • Non-exploitability
  • Non-uniformity
  • Notation validity
  • Permutation validity
  • Predictability
  • Prediction system validity
  • Process or Product Relevance
  • Protocol validity
  • Rank Consistency
  • Renaming insensitivity
  • Repeatability
  • Representation condition
  • Scale validity
  • Stability
  • Theoretical validity
  • Trackability
  • Transformation invariance
  • Underlying theory validity
  • Unit validity
  • Usability

The list is really impressing, but not all attributes apply to all types of metrics. So, one should always look for the use of metric and then seek the right type of its validation. I recommend this article as great reading for those who are thinking about creating own metrics:)

Research agenda for Continuous * (article highlight)

I’ve came across an interesting article summarizing the recent developments of continuous software engineering and related fields. The research has been done by Brian Fitzgerald and his colleagues from LERO: https://doi.org/10.1016/j.jss.2015.06.063

I recommend reading this article and here, I just put some thoughts that interested me:

  • Feature analytics is still important and will remain important for both the development and operations
  • It’s more important to be continuous than to be fast – although I would argue that being slowly continuous is never a good thing, and
  • Discontinuous improvement seems to be more interesting than continuous improvement

The article even discusses what kind of recent *2017* developments could be observed in this area, and link to the well-known initiatives, such as Lean and Agile.

How large companies innovate ….

Large software companies are really diffrent from each other. That’s hardly a surprise, but do they work in different ways?

If we look at the works like “Good to Great” or “Build to Last” by Jim Collins and his colleagues, we can see that they have similarities. They make the same mistakes and they have similar success factors.

In this paper: https://doi.org/10.1016/j.infsof.2017.12.007, the authors conducted a literature review of how the innovation is done in large companies. They have found only seven companies, but they’ve recognized a few interesting intiatives (descriptions quoted from the text):

  • intrapreneurship: intrapreneurs have the vision for new products and act on their vision as if they had their own companies: build the development team and run the business,
  • bootlegging: bootlegging (or underground or skunkworks) refers to the innovation activity that is hidden from management until its introduction. The objectives of bootlegging are pre-research, product and process improvement, troubleshooting, new product and process development and purely scientific research
  • internal venture: internal venture refers to the introduction of new business within existing business to pursue product or market innovation. New business can be established as the instrument to pursue incremental innovation (new product in current market or new market for current product) or radical innovation new product for new market).
  • spin-off, subsidiaries, joint-ventures, and
  • crowdsourcing: getting the participation of crowd and locking the crowd to create value to one company only. By taking the advantage of Web 2.0, companies look for the suitable solutions from Internet users.

These approaches vary in size, structure and scope. I recommend to read this article as a friday, before homegoing, reading 🙂

Full reference:

Henry Edison, Xiaofeng Wang, Ronald Jabangwe, Pekka Abrahamsson,
Innovation Initiatives in Large Software Companies: A Systematic Mapping Study,
Information and Software Technology, Volume 95, 2018, Pages 1-14, ISSN 0950-5849,

Continuous and collaborative technology transfer: Software engineering research with real-time industry impact – interesting article

I’ve been browsing the latest issue of IST and this article cought my attention. The article is written by Tommi Mikkonen, Casper Lassenius, Tomi Männistö, Markku Oivo, Janne Järvinen. It is about technology transfer from academia to industry. It’s available at: https://doi.org/10.1016/j.infsof.2017.10.013

The best point in this article is very important – the technology is NOT created in academia and transferred to industry, it is rather created either in industry or in collaboration with academia. This observation invalidates many of the technology transfer models, where the authors assume that the companies receive the results from academia.

But, has this actually happen? How often does it really happen? I guess, not very often.

The paper presents a model of collaboration, which is presented in the following link (and figure):

https://ars.els-cdn.com/content/image/1-s2.0-S0950584917304007-gr3.jpg

I’m happy to see more collaboration models for industry-academia co-creation of results!

Using thresholds (a la risk) to predict quality of software modules

I often tell my students that the absolute values of measures do not always say much. Take an example of McCabe cyclomatic complexity – the number of 100 (meaning 100 independent paths through a method) does not need to denote problems. It could be a large switch statement which changes locale based on the IP address type. However, it is important to monitor thresholds of measures, based on the meaning of the function and the problem at hand.

In this article from IST, “Software metrics thresholds calculation techniques to predict fault-proneness: An empirical comparison” (https://doi.org/10.1016/j.infsof.2017.11.005), we can learn three different types of finding thresholds for software measures – ROC curves, VARL, and Alves ranking (named after the author of the method). This article shows how well we can predict the fault-proneness of modules if we use thresholds rather than absolute value.

Have a nice reading!