Stronger features vs. stronger algorithms in ML

I’ve been working with machine learning a bit during the last couple of years. I’ve had great teachers who showed me how to use the algorithms and where to start learning. Thanks to them I understood the importance of different elements of the ML tool chain – data, storage, algorithms, hardware.

I’ve worked on the problem of how to extract features of source code so that I can use them to predict if a specific line of code has a defect or not, in particular if the defect can be caught during code reviews. I’ve spent about a year on this problem and tested all kinds of combinations, from static code analysis to using word embedding, dictionaries and other NLP mechanisms to understand the code. Nothing really worked great. I got predictions that were a bit better than then chance.

What was the problem? Well, the problem was the quality of the input data. Since I extracted data, and features from this data, automatically from large code bases (often over 3 MLOC), I often encountered the following problems:

Labeling – I could not pinpoint exactly where the problem was, which meant that I needed to approximate the label, which led to the next problem,

Consistency – when one line was considered good by one person, it could be considered problematic by another one; this meant that I needed to decide how to treat lines that are “suspicious”, and

Scales – when extracting features, some of them were on scale of 1 to 100, whereas some other ones were on the scale from 1 to 3; this meant that I needed a good scaler to get the features right.

So, here I am, working on the next implementation of the feature discovery algorithm. The algorithm that can extract features in such a way that each objects has distinct characteristics, yet the number of features is as small as possible to characterize each object. The algorithm helped me to boost the accuracy of the classification from ca. 50% to over 96%.

I’ve discovered that using simple ML algorithms on a good data set trumps everything else. I used AdaBoost with scaling of features on the good data set, and that was at least twice as good as using LSTM models with word embeddings (which were not bad anyways) for the same purpose.

My advice, therefore, is the following:

Start with a simple classification/ML algorithm and do not go into neural networks or other advanced methods,

Learn your data and look at it from several angles; use business intelligence and statistics to understand the dependencies between features (PCA, t-SNE) and chew on the data as long as you can, and

Focus on extracting features from your data, rather than expecting magic from ML; no algorithm can trump good input data and no filtering can trump a good “featurizer”

 pixabay
Image source: pixabay

Author: Miroslaw Staron

I’m professor in Software Engineering at IT faculty. I usually blog about interesting articles (for me) and my own reflections on the development of Software Engineering, AI, computer science and automotive software.

0 thoughts on “Stronger features vs. stronger algorithms in ML”

  1. Accutane uk price, cheap accutane coupon

    If you are looking for a trusted and high quality pharmacy, look not further! It is all you need!

    Click Here To Buy Accutane Without A Doctor Prescription

    What we do at our online pharmacy is selling top quality medications at their cost price

    How long does Accutane breakout last? The tretinoin purge usually lasts for two to six weeks, although it can sometimes carry on for as long as two or three months. It can range from a few minor pimples to major breakouts, as well as skin dryness, flaking, peeling and redness.
    Does Accutane cure hormonal acne? A course of treatment can also be effective for hormonal acne, but dermatologists tend to save this option for those with severe, cystic acne that is focused on the center of the face. In some cases, hormonal acne recurs after taking isotretinoin.
    What is the generic name for Accutane? ISOTRETINOIN – ORAL (Absorica, Accutane, Amnesteem, Claravis, Sotret) side effects, medical uses, and drug interactions.
    Can hand warts spread to face? You can also spread warts from one part of your own body to another. Warts can occur anywhere on the body. Since they’re transmitted by casual contact, they’re most likely to happen on your hands, fingers, face, and feet. Keep reading to learn about warts that appear on your face.
    Does anti aging cream help acne? Retinol and glycolic acid accelerate skin cell turnover to smooth texture. Anti – Aging Acne Advanced Acne & Wrinkle Reducer helps clear and prevent blemishes and visibly reduce the appearance of fine lines and wrinkles as part of Dr. Murad’s inclusive health system of care. Dermatologist developed.
    How can I take care of my skin without products? 8 Ways To Get Better Skin Without Products, As Told By A Dermatologist Protect your skin from the sun. This is the number one piece of advice I can give you. Figure out your skin type. Take off your makeup at night. Get enough sleep. Exercise. Drink plenty of water. Skip the alcohol. Examine your skin regularly.
    Does tea tree oil make acne worse before it gets better? Tea tree essential oil can be purchased at any natural food store but should be diluted before applying to the skin. 4 Most aromatherapists recommend diluting tea tree oil in a carrier like coconut oil or sweet almond oil. But beware, these oils can clog your pores and make acne worse.
    How does malignant melanoma kill you? Melanoma begins on the skin where it is easy to see and treat. However, it can grow into the skin, reaching the blood vessels and lymphatics, and can spread within the body to various organs when it can be fatal. If it is recognized and treated early, chances of recovery are very good.
    Galleries and museums are getting creative about presenting work online during the coronavirus crisis. Here are some shows worth viewing virtually. Jane Fonda escaped her fifth arrest on Friday as hundreds of protesters joined together to fight the climate crisis on Friday in Washington, D.C., where Ben Jerry co-founders joined the rally Pinot Gris is the same grape as Pinot Grigio, but handled in a different way by the wine-grower. With lower yields and a longer time hanging on the vines, it develops bigger flavours and fuller texture While youre stuck at home waiting for travel restrictions to be lifted, we invite you on a virtual visit to each and every one of our Places to Go in 2020. The Heisman Trophy-winning quarterback threw five touchdowns Monday night against Clemson, delivering a national championship for L.S.U. The losses of Wimbledon and the British Open, hallmarks of summer in Britain, feel merely hypothetical, since soccer usually sets the rhythm of everyday life. China urged the United States on Wednesday to fulfil its obligations to the World Health Organization (WHO), after U.S. President Donald Trump halted funding to the body over its handling of the coronavirus pandemic. The former Hollyoaks star, 41, also claimed that he endured an anti-climactic evening with Spice Girl Emma Bunton, and was even third-wheeled by glamour model Katie Price. An outbreak aboard an aircraft carrier has left local officials in Guam to contend with the arrival of hundreds of infected sailors, while they also try to protect the islands population. EXCLUSIVE BY CHRIS FOY England are set to order another Curry for the Six Nations as Sale flanker Ben is on course to join twin brother Tom in Eddie Jones’ squad this month. Oregons star point guard is quickly turning into the face of womens basketball, and she appears to be up for the challenge. Most visitors think of New York’s Parks as the only place to find trees. However, a new study found New York City has over 5 million ‘forested natural areas’ along with 666,000 street trees. Racing driver Josh Webster, 26, has become a Tesco delivery driver to help self-isolating Brits through lockdown. The Porsche Carrera Cup winner is now dropping off groceries in Suffolk. Be patient, science fiction fans. There’s a gift for you this holiday season.
    http://wildwoodvideoarchive.com/entering-north-wildwood-on-the-old-wooden-bridge/?unapproved=2627162&moderation-hash=7af0d50496915ef6a07df1602e45126a#comment-2627162
    http://www.fapag.com/uncategorized/abril-mes-de-prevencao-dos-maus-tratos-na-infancia/?unapproved=44358&moderation-hash=aeeb6835eecde85364f3a70d3255ca02#comment-44358
    http://www.tommasocogato.com/curso-forum-musikae-semana-santa/?unapproved=34688&moderation-hash=12e045d485fbec8e0cf3832ad2796064#comment-34688
    https://wildarmenia.com/blog/camping-in-armenia/?unapproved=1083&moderation-hash=5a4aac9103d19c78b48dcc0f627aaf19#comment-1083
    http://mamabird.net/shopping-for-rune-on-the-wish-app-3-28-10/?unapproved=97469&moderation-hash=39ed85b4ed43bacfdf3cc13cbf27de43#comment-97469
    https://senboutiquespa.com/spa/?unapproved=494&moderation-hash=c7a4d4c0278ed23a35ecd42387f76d38#comment-494
    https://www.oddly.co/orion-citys-new-website-built-by-oddly/?unapproved=28985&moderation-hash=61c42fee98aa590bb63d7079190fafb2#comment-28985
    https://www.kitchenathoskins.com/2020/03/31/instant-pot-spaghetti-aglio-e-olio/?unapproved=75163&moderation-hash=57787cef07853017600b910ea64bbdd0#comment-75163
    http://www.zastavki.biz/?unapproved=11741&moderation-hash=f6ebd83ffcc5ec97bd11150e6299df2e#comment-11741
    http://storageobjects.com/?unapproved=65315&moderation-hash=988dcfa3783f075671fe3c4144f7859f#comment-65315

Leave a Reply

Your email address will not be published. Required fields are marked *