security – SE metrics (Software Engineering)

Image: wordcloud based on the text of the article

In the rapidly evolving artificial intelligence (AI), deep learning (DL) has become a cornerstone, driving advancements based on transformers and diffusers. However, the security of AI-enabled systems, particularly those utilizing deep learning techniques, are still questioned.

The authors conducted an extensive study, analyzing 3,049 vulnerabilities from the Common Vulnerabilities and Exposures (CVE) database and other sources. They employed a two-stream data analysis framework to identify patterns and understand the nature of these vulnerabilities. Their findings reveal that the decentralized and fragmented nature of DL frameworks contributes significantly to the security challenges.

The empirical study uncovered several patterns in DL vulnerabilities. Many issues stem from improper input validation, insecure dependencies, and inadequate security configurations. Additionally, the complexity of DL models makes it harder to apply conventional security measures effectively. The decentralized development environment further exacerbates these issues, as it leads to inconsistent security practices and fragmented responsibility.

It does make sense then to put a bit of effort into securing such systems. By the end of the day, input validation is no rocket science.

sec23summer_449-mirsky-prepub.pdf (usenix.org)

Cybersecurity has been, and will always be, a challenge for software systems. It is also perceived as an art when it comes to security analysis (or exploitation for that matter). There is no single tool, no single method that will make our software secure.

This article is interesting because of the way that it works. Usually, security analyzers are token-based analyzers which see programs as a set of instructions. They are very good, but they struggle with understanding the context of the analyzed program.

Let me give you an example. We’re analyzing a program for SQL injections – a very simple vulnerability. We can check that the SQL statement in the code contains any parameters. If it does not, then it’s safe – we know what we do with the database, but it’s not very common (or even useful). So, most statements will have some sort of parameters, and this is where the tricky part is. These parameters need to be validated, but this validation can be done in the same function (just before the actual SQL statement) or it can be done somewhere in the calling function/method. The check in the calling function/method is the part where token-based security analyzers give up.

Now, this paper presents an approach which works on a call graph, which allows for this interesting checks. I still need to understand it myself, but I hope to do it quite soon. The full source code is available here: GitHub – ymirsky/VulChecker: A deep learning model for localizing bugs in C/C++ source code (USENIX’23)

Category: security

On Security Weaknesses and Vulnerabilities inDeep Learning Systems

Vulnerability detection, a new article (highlight)