{"id":801,"date":"2022-12-13T07:28:52","date_gmt":"2022-12-13T06:28:52","guid":{"rendered":"https:\/\/metrics.blogg.gu.se\/?p=801"},"modified":"2022-11-25T11:06:17","modified_gmt":"2022-11-25T10:06:17","slug":"how-can-ai-see-programming-code-article-highlight","status":"publish","type":"post","link":"https:\/\/metrics.blogg.gu.se\/?p=801","title":{"rendered":"How can AI see programming code&#8230; (article highlight)"},"content":{"rendered":"\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"681\" src=\"https:\/\/metrics.blogg.gu.se\/files\/2022\/11\/writing-705667_1920-1024x681.jpg\" alt=\"\" class=\"wp-image-802\" srcset=\"https:\/\/metrics.blogg.gu.se\/files\/2022\/11\/writing-705667_1920-1024x681.jpg 1024w, https:\/\/metrics.blogg.gu.se\/files\/2022\/11\/writing-705667_1920-300x199.jpg 300w, https:\/\/metrics.blogg.gu.se\/files\/2022\/11\/writing-705667_1920-768x510.jpg 768w, https:\/\/metrics.blogg.gu.se\/files\/2022\/11\/writing-705667_1920-1200x798.jpg 1200w, https:\/\/metrics.blogg.gu.se\/files\/2022\/11\/writing-705667_1920-1320x877.jpg 1320w, https:\/\/metrics.blogg.gu.se\/files\/2022\/11\/writing-705667_1920.jpg 1920w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><figcaption>BIld av <a href=\"https:\/\/pixabay.com\/sv\/users\/wilhei-883152\/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=705667\">Willi Heidelbach<\/a> fr\u00e5n <a href=\"https:\/\/pixabay.com\/sv\/\/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=705667\">Pixabay<\/a><br><\/figcaption><\/figure>\n\n\n\n<p><a href=\"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/pdf\/10.1049\/sfw2.12064\">A systematic mapping study of source code representation for deep learning in software engineering &#8211; Samoaa &#8211; 2022 &#8211; IET Software &#8211; Wiley Online Library<\/a><\/p>\n\n\n\n<p class=\"has-drop-cap\">Understanding programming language is an important topic in research in the area of programming language models. I&#8217;ve written before that there are ca. 50 programming language models, which we can use in software engineering. Ok, not all of them are equivalent and they are specific to the task, but they are available, so we can use and customize them. <\/p>\n\n\n\n<p>Now, whether 50 models is a lot or not is debatable. Compared to natural language models this is a small number. Even compared to the number of programming languages this number is not impressive. However, how many languages are used widely &#8211; 10-15? Java, C, C++, Python, JavaScript, Rust, Go, and derivatives are the most common ones. <\/p>\n\n\n\n<p>This article is a study done by our colleagues from the department. It&#8217;s too long to quote in detail, but there are a few things that I like. First, it&#8217;s a good overview of the types of language models:<\/p>\n\n\n\n<ol><li>Token-based representation: when the program code is basically a set of tokens\/words; some can have a special meaning, but they are just words (I&#8217;ve written about this before, even worked with it:  <a href=\"https:\/\/github.com\/mochodek\/py-ccflex\">GitHub &#8211; mochodek\/py-ccflex: py-ccflex &#8211; Python Flexible Code Classifier<\/a> )<\/li><li>Tree-based representation: when the program code is seen from the perspective of their Abstract-Syntax-Tree, an example is the code2vec model:  <a href=\"https:\/\/code2vec.org\/\">code2vec<\/a> <\/li><li>Graph-based models: when the program code is seen as a directed graph, e.g., a control flow graph<\/li><\/ol>\n\n\n\n<p>Although I like this classification, I see that it misses one of the most prominent and the most popular one &#8211; the NLP based model. It is a type of model where the program code is seen as a set of sentences that have meaning of some sort. It is a derivative of the token-based representation, but it is much more than that. CodeX from OpenAI is an example of such model. <\/p>\n\n\n\n<p>Nevertheless, this study provides a very interesting set of examples of models and their applications. I sincerelly suggest to take a look at this paper to understand how the models work and where they are used best. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>A systematic mapping study of source code representation for deep learning in software engineering &#8211; Samoaa &#8211; 2022 &#8211; IET Software &#8211; Wiley Online Library Understanding programming language is an important topic in research in the area of programming language models. I&#8217;ve written before that there are ca. 50 programming language models, which we can &hellip; <a href=\"https:\/\/metrics.blogg.gu.se\/?p=801\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;How can AI see programming code&#8230; (article highlight)&#8221;<\/span><\/a><\/p>\n","protected":false},"author":68,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,4,5],"tags":[],"_links":{"self":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts\/801"}],"collection":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/users\/68"}],"replies":[{"embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=801"}],"version-history":[{"count":1,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts\/801\/revisions"}],"predecessor-version":[{"id":803,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts\/801\/revisions\/803"}],"wp:attachment":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=801"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=801"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=801"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}