visualization – SE metrics (Software Engineering)

The speed of light and laws of software engineering…

While on vacation, I managed to watch a number of sci-fi movies, which I wanted to watch but did not have the time during the academic year. This got me thinking about certain laws of physics and the laws of software engineering. I think there are many similarities, and let me start by considering the speed of light, as a starter.

First of all, what we know about the speed of light is that it’s the fastest we know and that Einstein’s theory of relativity says that we cannot travel faster than the speed of light. Even if we were, such a speed is tremendously difficult.

How would we know where we travel if we cannot see where we travel (we go as fast as we can see, literally). So, the travel would be very fast and, probably, very short.
If we, somehow, see where we go, or detect an obstacle (e.g. by knowing its predicted position based on prior observations), how can we steer? We’re going so fast, that for the majority of the travel time, we would be going in straight lines. These straight lines would be similar to either the hyperjumps from Star Wars or the clicks from the Guardians of the Galaxy.
If we’re literally the fastest objects, how can others see us and avoid is? Is it possible to avoid the light? No, it’s not possible. This means that we would be super-chaotic.

Therefore, I think that, even if we could, traveling at the speed of light is probably not the best idea. At least not conceptually, which may change as we change.

How does it affect the laws of software engineering then. Well, I would start with the laws of complexity.

For a while now, I’ve been broadcasting the opinion that the complexity of software cannot be reduced, it can only be hidden. For complex problems, we need complex software and complex software cannot be simple. If our algorithms have many conditions, we cannot take them away, we can hide them in functions, but never get rid of them. That’s the first parallel – we cannot travel faster than the speed of light.

We can hide complexity, and thus make the program/software easier to understand and maintain, but the better we hide it, the harder it is to avoid/predict complexity. Packaging complex algorithms in simple blocks will make it difficult to make modifications. Actually, not to make the modifications, but to overview the consequences of these modifications.

If we simplify the program/algorithm too much, we need to expect that it’s going to provide erroneous results for some cases – again, complex problems require complex programs. An example of such issue is approximating continuous functions – since our computers are discrete, there is always some degree of error in such an approximation.

Finally, interconnectivity and modularity as a means of handling complexity have their limits. I do not think we can develop increasingly complex software by increasing its size. I believe it’s going to be difficult in the long run. We need to make sure that we have the competence to handle complexity and we need to be able to make the complexity apparent.

Data analytics in SE

https://www.sciencedirect.com/science/article/abs/pii/S0950584920301981

A few years ago, data analytics and big data were super popular in software engineering. In fact, they were a bit too popular, as many authors quoted big data because they had a diagram in the paper.

Fast forward to today and the situation is a bit different. We are more mature in using data in software development. We know that Big data is about the 5 Vs and that we can reason about it. We also know what providing the diagrams is not the same as using them to direct software development.

I found this paper when looking for literature for our new work on communication in software metrics teams. My colleagues study the communication and found that there can be several sources of confusion. Now, this paper is NOT about the confusion, but about prevalence of data analytics in software engineering. The working definition of Big Data Analytics is as follows in the paper: “Big data analytics is the process of using analysis algorithms running on powerful supporting platforms to uncover potentials concealed in big data, such as hidden patterns or unknown correlations”.

The paper poses three main research questions about the studies conducted in Big Data Analytics, about the approaches used and when they are used. I’m mostly interested in the second – which approaches are used. There, the authors pose three sub-questions:

RQ2.1: What types of analytics have been used in the ASD domain?

RQ2.2: What sources of data have been used?

RQ2.3: What methods, models, or techniques have been utilized in the studies?

In particular, the second one is the most interesting one – sources of data. There, the authors found that there are plenty. The entire table (Table 7 in the paper) is actually too large to quote, but let me just quote one of the categories: Source code and data model:

Source code
Ruby programs & Ruby on Rails
Java programs
Function calls
Code metrics
Development repository
Test case
Code quality
Application data schema

I recommend this as a good reading into the current state-of-the-art in data analytics in software engineering. I think we’ve matured a lot in the last decade as a community and that brings a lot of benefit. Our software development gets better and thus our software gets better.

From the abstract: In total, 88 primary studies were selected and analyzed. Our results show that BDA is employed throughout the whole ASD lifecycle. The results reveal that data-driven software development is focused on the following areas: code repository analytics, defects/bug fixing, testing, project management analytics, and application usage analytics.

Building interactive dashboards

Review of the book Interactive Visual Data Analysis: https://www.taylorfrancis.com/books/9781315152707 by Christian Tominski, Heidrun Schumann

Designing a good dashboard is an art. We need to answer questions like – who will use the dashboard? for what? when? and how will the interaction happen. Our team has studied dashboards and developed a model for choosing which one is the most suitable one ( https://gupea.ub.gu.se/bitstream/2077/41120/1/gupea_2077_41120_1.pdf ).

However, we never studied what is important when constructing a dashboard, but the authors of this book did.

I’m super happy to have stumbled upon this book, because it has shown me how to think when constructing dashboards. It shows which elements are important and how to create actionable visual analytics, not how to use JavaScript to create a diagram.

I liked chapter 3 and 4 the most. They show how to move from simple visualizations to interactions and how to work with parameters of the visualizations. However, one needs to read chapter 2 as well to get some inspiration about which diagrams to use and how to use the visual attentive attributes like color, size or shape – both separately and in combination.

To sum up, if you want to construct a good dashboard, please read the book and I can promise you, it will not dissapoint you.