{"id":861,"date":"2023-12-15T11:36:05","date_gmt":"2023-12-15T10:36:05","guid":{"rendered":"https:\/\/metrics.blogg.gu.se\/?p=861"},"modified":"2023-11-05T11:39:12","modified_gmt":"2023-11-05T10:39:12","slug":"generating-documentation-from-notebooks","status":"publish","type":"post","link":"https:\/\/metrics.blogg.gu.se\/?p=861","title":{"rendered":"Generating documentation from notebooks"},"content":{"rendered":"\n<p><a href=\"https:\/\/github.com\/jyothivedurada\/jyothivedurada.github.io\/blob\/main\/papers\/Cell2Doc.pdf\">https:\/\/github.com\/jyothivedurada\/jyothivedurada.github.io\/blob\/main\/papers\/Cell2Doc.pdf<\/a><\/p>\n\n\n\n<p class=\"has-drop-cap\">Understanding code is the same regardless if it is in a Jupyter notebook or if it is in another editor. Comments and documentation is the key. I try to teach that to my students and, some of them at least, appreciate it. Here is a paper that can change this to the better without adding to more effort. <\/p>\n\n\n\n<p>This paper introduces a machine learning pipeline that automatically generates documentation for Python code cells in data science notebooks. Here&#8217;s a more casual summary of what they did and found:<\/p>\n\n\n\n<ol><li><strong>The Solution &#8211; Cell2Doc<\/strong>: The team whipped up a new tool called Cell2Doc. It&#8217;s a smart pipeline that breaks down code cells into logical parts and documents each bit separately. This way, it gets more details and can explain complex code better than other tools.<\/li><li><strong>How It Works<\/strong>: Cell2Doc has two main parts. First, a Code Segmentation Model (CoSEG) chops up the code into chunks that make sense on their own. Then, a Code Documentation Model (CoDoc) writes up explanations for each chunk. In the end, you get a full set of docs that covers everything the code is doing.<\/li><li><strong>The Cool Part<\/strong>: This isn&#8217;t just about slapping together existing models. Cell2Doc actually makes them better at writing docs for code. It&#8217;s like giving a turbo boost to the models so they can catch more details and write clearer explanations.<\/li><li><strong>Testing It Out<\/strong>: They didn&#8217;t just build this and hope for the best. They tested it with real data from Kaggle, a place where data scientists hang out and compete. They even made a new dataset for this kind of task because the old ones weren&#8217;t cutting it.<\/li><li><strong>The Results<\/strong>: When they put Cell2Doc to the test, it did a bang-up job. It scored way higher on automated tests than other methods, and real humans liked it better too. It was better at being correct, informative, and easy to read.<\/li><li><strong>Sharing Is Caring<\/strong>: They&#8217;re not keeping this to themselves. They&#8217;ve shared Cell2Doc so anyone can use it to make their code easier to understand.<\/li><\/ol>\n\n\n\n<p>In a nutshell, Cell2Doc is like a super-smart assistant that takes the headache out of writing docs for your code. It understands the code deeply and explains it in a way that&#8217;s easy to get, which is pretty awesome for keeping things clear and making sure your work can be used by others.<\/p>\n\n\n\n<p>I consider to give this tool to my students in the sping when they learn how to program embedded systems in C. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/github.com\/jyothivedurada\/jyothivedurada.github.io\/blob\/main\/papers\/Cell2Doc.pdf Understanding code is the same regardless if it is in a Jupyter notebook or if it is in another editor. Comments and documentation is the key. I try to teach that to my students and, some of them at least, appreciate it. Here is a paper that can change this to the better without &hellip; <a href=\"https:\/\/metrics.blogg.gu.se\/?p=861\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Generating documentation from notebooks&#8221;<\/span><\/a><\/p>\n","protected":false},"author":68,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,4,5],"tags":[],"_links":{"self":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts\/861"}],"collection":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/users\/68"}],"replies":[{"embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=861"}],"version-history":[{"count":1,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts\/861\/revisions"}],"predecessor-version":[{"id":862,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts\/861\/revisions\/862"}],"wp:attachment":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=861"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=861"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=861"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}