My machine learning class has presented quite the learning curve this semester with the combination of understanding the theory behind algorithms and deciding which model to apply when. Not to mention the added task of reproducing the code to put the models to use on real datasets. The first half of the class focused on an introduction to supervised learning methods and the different types of classifiers and regressors available for model creation and prediction.
Among the classifiers were k nearest neighbors, a softmax function (which uses logit for multi class classification), and support vector machine (a linear classification with wider margins). The regressors included linear and polynomial regression and ridge and lasso. With each model came a discussion of splitting the data into train and test sets, validating model skill, hyperparameter selection, feature extraction, and model evaluation. Most of this was taught using the scikit learn package in Python along with visualization tools using matplotlib.
Making decisions at each step of the model building process to optimize the results is challenging, especially when you don’t have a clear picture about the advantages and disadvantages of the dataset you’re working with. So imagine my excitement when I found out Microsoft’s Power BI is offering a new feature called AutoML which allows you to build ML models by dragging and dropping instead of coding!
One of the reasons I find machine learning so daunting is not knowing where to begin. I don’t know if anyone else feels this way, but cleaning data and understanding its structure before manipulating it to fit my needs is such an abstract process with no clear direction. It’s part of what makes ML fun, but also incredibly frustrating. So, AutoML is a great tool for someone new to the practice and wants to play around with model capabilities before making decisions from scratch.
Some things I love with this new offering are:
- Power BI suggests what models will provide the most insight and are best suited for the dataset you’re working on
- Guided process for input selection
- Automated data science tasks of sampling, normalization, feature extraction, algorithm and hyperparameter selection, and validation
- AND MY FAVORITE! A summary report that offers a breakdown of how the model was built for full transparency. The explanations for how each input influenced prediction and how accurate that prediction is is great for grasping the theory behind ML algorithms while seeing it in practice.
Be cautious when using these great features because this tool shouldn’t serve as a replacement for independent decision making. PBI’s suggestions should be a template to explore ML capabilities and used as a springboard for familiarizing yourself with possible outputs. From a student perspective, it’s a great way to experiment and gain some practice but when applying it to create models that will impact strategy at a company, deeper knowledge of ML applications is a must.
I haven’t been able to test out this feature due to lack of access to PBI Premium, but if someone with that capability wants to test it out and give me your feedback, I’d love to hear it! Since this is a fairly new offering, there isn’t a ton of insight into how effective the models are, but that leaves a lot of room for discussion and growth.