{"id":682,"date":"2021-08-23T12:35:20","date_gmt":"2021-08-23T12:35:20","guid":{"rendered":"https:\/\/metrics.blogg.gu.se\/?p=682"},"modified":"2021-08-23T12:35:20","modified_gmt":"2021-08-23T12:35:20","slug":"automl-lets-talk-about-it","status":"publish","type":"post","link":"https:\/\/metrics.blogg.gu.se\/?p=682","title":{"rendered":"autoML &#8211; let&#8217;s talk about it&#8230;"},"content":{"rendered":"\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/metrics.blogg.gu.se\/files\/2021\/08\/small-3871893_1920-1024x683.jpg\" alt=\"\" class=\"wp-image-683\" srcset=\"https:\/\/metrics.blogg.gu.se\/files\/2021\/08\/small-3871893_1920-1024x683.jpg 1024w, https:\/\/metrics.blogg.gu.se\/files\/2021\/08\/small-3871893_1920-300x200.jpg 300w, https:\/\/metrics.blogg.gu.se\/files\/2021\/08\/small-3871893_1920-768x512.jpg 768w, https:\/\/metrics.blogg.gu.se\/files\/2021\/08\/small-3871893_1920-1200x800.jpg 1200w, https:\/\/metrics.blogg.gu.se\/files\/2021\/08\/small-3871893_1920-1320x880.jpg 1320w, https:\/\/metrics.blogg.gu.se\/files\/2021\/08\/small-3871893_1920.jpg 1920w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><figcaption>Image from Pixabay<\/figcaption><\/figure>\n\n\n\n<p class=\"has-drop-cap\">AutoML, a promise of green pastures, less work, optimal results. So, it is like that? In this post I share my view on this and experience from running the first test using that model. <\/p>\n\n\n\n<p>First of all, let&#8217;s be honest, there is not such thing as a free lunch. In case of autoML (auto-sklearn), the price tag comes first with the effort, skills and time to install it and make it work. The second is the performance&#8230;. It&#8217;s painfully slow compared to your own models, simply because it tests a lot of models here and there. It also take a lot of time to download and to make it work. <\/p>\n\n\n\n<p>But, first thing first, let me tell you where I start. So, I used the data from the MicroHRV project ( <a href=\"https:\/\/www.software-center.se\/research-themes\/technology-themes\/development-metrics\/microhrv-recognizing-rare-events-in-microwave-radio-links-and-intensive-care-units-using-machine-learning\/\">3. MicroHRV: Recognizing Rare Events in Microwave Radio Links and Intensive Care Units using Machine Learning \u2013 Software Center (software-center.se)<\/a>). The data is from patients being operated to remove clots of blood from the brain (although dangerous it may sound, the actual procedure is planned and calm). I wanted to check whether autoML can do better compared to what we have at the moment. <\/p>\n\n\n\n<p>What we have at the moment (for that particular dataset) is: Accuracy: 0.98, Precision: 0.98, Recall: 0.98 &#8211; using Random Forest classifier. So, this is actually already very good. For the medical domain, that&#8217;s actually in class of its own, given our previous studies ended up with ca. 0.7 in accuracy at best. <\/p>\n\n\n\n<p>When it comes to installing autoML &#8211; if you like stackoverflow, downgrading, upgrading, compiling, etc. and run Windows 10, then it&#8217;s your heaven. If you run Linux &#8211; no problems. Otherwise &#8211; stick to manual analyses:) <\/p>\n\n\n\n<p>After two days (and nights) of trying, the best configuration was:<\/p>\n\n\n\n<ul><li>WSL &#8211; Windows Subsystem for Linux <\/li><li>Ubuntu 20, and <\/li><li>countless of oss libraries<\/li><\/ul>\n\n\n\n<p>It takes a while to get it to work, the question is whether the results are good enough&#8230; <\/p>\n\n\n\n<p>After three hours of waiting, a lot of heat from my laptop, over 1,000 models tested resulted in Accuracy: 0.91, Precision: 0.94, Recall: 0.91<\/p>\n\n\n\n<p>So, worse than my manual selection of models. I include the confusion matrices.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1013\" src=\"https:\/\/metrics.blogg.gu.se\/files\/2021\/08\/autoML_conf-1024x1013.png\" alt=\"\" class=\"wp-image-685\" srcset=\"https:\/\/metrics.blogg.gu.se\/files\/2021\/08\/autoML_conf-1024x1013.png 1024w, https:\/\/metrics.blogg.gu.se\/files\/2021\/08\/autoML_conf-300x297.png 300w, https:\/\/metrics.blogg.gu.se\/files\/2021\/08\/autoML_conf-768x760.png 768w, https:\/\/metrics.blogg.gu.se\/files\/2021\/08\/autoML_conf.png 1066w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><figcaption>AutoML<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1013\" src=\"https:\/\/metrics.blogg.gu.se\/files\/2021\/08\/best_rf_conf-1024x1013.png\" alt=\"\" class=\"wp-image-686\" srcset=\"https:\/\/metrics.blogg.gu.se\/files\/2021\/08\/best_rf_conf-1024x1013.png 1024w, https:\/\/metrics.blogg.gu.se\/files\/2021\/08\/best_rf_conf-300x297.png 300w, https:\/\/metrics.blogg.gu.se\/files\/2021\/08\/best_rf_conf-768x760.png 768w, https:\/\/metrics.blogg.gu.se\/files\/2021\/08\/best_rf_conf.png 1066w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><figcaption>Random forest<\/figcaption><\/figure>\n\n\n\n<p>The matrices are not that different, as the validation sets are not that large either. However, it seems that the RF is still better than the best model from autoML. <\/p>\n\n\n\n<p>I need work more on that and see if I do something wrong. However, I take this as a success &#8211; I&#8217;m better than autoML (still some use of an old professor) &#8211; instead of a let-down of not getting better results. <\/p>\n\n\n\n<p>By the end of the day, 0.98 in accuracy is still very good! <\/p>\n","protected":false},"excerpt":{"rendered":"<p>AutoML, a promise of green pastures, less work, optimal results. So, it is like that? In this post I share my view on this and experience from running the first test using that model. First of all, let&#8217;s be honest, there is not such thing as a free lunch. In case of autoML (auto-sklearn), the &hellip; <a href=\"https:\/\/metrics.blogg.gu.se\/?p=682\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;autoML &#8211; let&#8217;s talk about it&#8230;&#8221;<\/span><\/a><\/p>\n","protected":false},"author":68,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,5],"tags":[],"_links":{"self":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts\/682"}],"collection":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/users\/68"}],"replies":[{"embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=682"}],"version-history":[{"count":2,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts\/682\/revisions"}],"predecessor-version":[{"id":687,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=\/wp\/v2\/posts\/682\/revisions\/687"}],"wp:attachment":[{"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=682"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=682"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/metrics.blogg.gu.se\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=682"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}