December 2022 – SE metrics (Software Engineering)

Christmas post from ChatGPT

I think that almost everyone in the field of SE has heard about the new bot – ChatGPT from OpenAI. The bot is capable of writing programs, making conversations and even creating virtual machines.

So, this year I wanted to check what ChatGPT can do. Here is my conversation with it:

M: What are the most important machine learning papers from 2022?

C: I’m sorry, but I don’t have access to papers or other information from 2022 because my knowledge was cut off in 2021. I can tell you about some important machine learning papers from the past, or provide general information about machine learning and how it has evolved over time. Is there something specific you would like to know about machine learning?

M: Ok, how about the most important papers from 2021?

C: I’m sorry, but I am not able to browse the internet and do not have access to information about specific papers or other developments that have occurred since my knowledge cutoff in 2021. However, there are many resources available online that can help you learn about the latest developments in the field of machine learning. Some places you might look for information about the most important machine learning papers from 2021 include scientific journals and conference proceedings, online news sources, and blogs or websites that cover developments in the field of artificial intelligence and machine learning. Additionally, you can check out the proceedings of major machine learning conferences, such as the International Conference on Machine Learning (ICML) and the Conference on Neural Information Processing Systems (NeurIPS), which are typically held each year and showcase the latest research in the field.

M: What is your prediction for machine learning field for 2023?

C: One trend that has been especially prominent in the field of machine learning in recent years is the increasing use of deep learning techniques, which have led to significant advances in a wide range of applications, including image and speech recognition, natural language processing, and autonomous systems. It is likely that this trend will continue in the coming years, as researchers continue to develop and refine deep learning algorithms and apply them to new and challenging problems.

Another trend that is likely to continue in the field of machine learning is the increasing importance of interpretability and explainability. As machine learning models become more sophisticated and are applied to more complex and sensitive tasks, it will be important to understand how they make their predictions and decisions, and to be able to explain these to users and stakeholders. Researchers and practitioners will therefore continue to focus on developing techniques for interpreting and explaining the behavior of machine learning models.

Finally, the field of machine learning will continue to be driven by the availability of large amounts of data and the increasing power and efficiency of computing hardware. As data continues to be generated at an ever-increasing rate, and as hardware capabilities continue to improve, it is likely that machine learning techniques will continue to advance and be applied to a wide range of problems and domains.

M: Ok, then write a program that draws a Christmas tree.

Here is a Python program that uses ASCII art to draw a simple Christmas tree:

def draw_tree():
  print("   /\\")
  print("  /  \\")
  print(" /    \\")
  print("/      \\")
  print("--------")

draw_tree()

This code will output the following tree to the console:

   /\
  /  \
 /    \
/      \
--------

You can customize the appearance of the tree by modifying the print statements. For example, you could add more branches to the tree by adding additional print statements, or you could use different characters to create a more detailed or decorative tree.

Here is an example of a more detailed tree that uses a combination of ASCII art and Unicode characters to create a more realistic-looking tree:

def draw_tree():
  print("    /\\")
  print("   /  \\")
  print("  /    \\")
  print(" /      \\")
  print("/        \\")
  print("----------")
  print("   |||")
  print("   |||")
  print("   |||")

draw_tree()

Summary: well, the conversation is not very deep, not great either, which makes me sleep better at nights, feeling we (software engineers) are still needed in 2023. At least for the time being.

Have a wonderful holiday everyone!

How can AI see programming code… (article highlight)

A systematic mapping study of source code representation for deep learning in software engineering – Samoaa – 2022 – IET Software – Wiley Online Library

Understanding programming language is an important topic in research in the area of programming language models. I’ve written before that there are ca. 50 programming language models, which we can use in software engineering. Ok, not all of them are equivalent and they are specific to the task, but they are available, so we can use and customize them.

Now, whether 50 models is a lot or not is debatable. Compared to natural language models this is a small number. Even compared to the number of programming languages this number is not impressive. However, how many languages are used widely – 10-15? Java, C, C++, Python, JavaScript, Rust, Go, and derivatives are the most common ones.

This article is a study done by our colleagues from the department. It’s too long to quote in detail, but there are a few things that I like. First, it’s a good overview of the types of language models:

Token-based representation: when the program code is basically a set of tokens/words; some can have a special meaning, but they are just words (I’ve written about this before, even worked with it: GitHub – mochodek/py-ccflex: py-ccflex – Python Flexible Code Classifier )
Tree-based representation: when the program code is seen from the perspective of their Abstract-Syntax-Tree, an example is the code2vec model: code2vec
Graph-based models: when the program code is seen as a directed graph, e.g., a control flow graph

Although I like this classification, I see that it misses one of the most prominent and the most popular one – the NLP based model. It is a type of model where the program code is seen as a set of sentences that have meaning of some sort. It is a derivative of the token-based representation, but it is much more than that. CodeX from OpenAI is an example of such model.

Nevertheless, this study provides a very interesting set of examples of models and their applications. I sincerelly suggest to take a look at this paper to understand how the models work and where they are used best.

Inline tests – do we really need more testing?

Inline Tests (pengyunie.github.io)

Some of you may not know, but I started my career as a software tester, so I’ve done my share of defect tracking and fixing. Although it was a while ago (well, over 20 years ago to be frank), I still remember a thing or two. I guess it is like riding a bike. One thing that I remember is that we did not really need more tests, but smarter testing.

This paper, nevertheless, proposes a new type of testing – inline testing – which is supposed to replace using printf(…) in code. Instead of printing values of variables for debugging purposes, we can use the new framework to create such small inline tests and execute them. The idea is simple and contributes to the maturity of our discipline.

By using inline tests, we can track the progress of our software development and its quality evolution. Since we can generate reports and use asserts, we could communicate our progress to quality management in a much better way.

I need to test this framework, especially that it works with Python, my new language of choice…