My machine learning class has presented quite the learning curve this semester with the combination of understanding the theory behind algorithms and deciding which model to apply when. Not to mention the added task of reproducing the code to put the models to use on real datasets. The first half of the class focused on an introduction to supervised learning methods and the different types of classifiers and regressors available for model creation and prediction.
Among the classifiers were k nearest neighbors, a softmax function (which uses logit for multi class classification), and support vector machine (a linear classification with wider margins). The regressors included linear and polynomial regression and ridge and lasso. With each model came a discussion of splitting the data into train and test sets, validating model skill, hyperparameter selection, feature extraction, and model evaluation. Most of this was taught using the scikit learn package in Python along with visualization tools using matplotlib.
Making decisions at each step of the model building process to optimize the results is challenging, especially when you don’t have a clear picture about the advantages and disadvantages of the dataset you’re working with. So imagine my excitement when I found out Microsoft’s Power BI is offering a new feature called AutoML which allows you to build ML models by dragging and dropping instead of coding!
One of the reasons I find machine learning so daunting is not knowing where to begin. I don’t know if anyone else feels this way, but cleaning data and understanding its structure before manipulating it to fit my needs is such an abstract process with no clear direction. It’s part of what makes ML fun, but also incredibly frustrating. So, AutoML is a great tool for someone new to the practice and wants to play around with model capabilities before making decisions from scratch.
Some things I love with this new offering are:
Power BI suggests what models will provide the most insight and are best suited for the dataset you’re working on
Guided process for input selection
Automated data science tasks of sampling, normalization, feature extraction, algorithm and hyperparameter selection, and validation
AND MY FAVORITE! A summary report that offers a breakdown of how the model was built for full transparency. The explanations for how each input influenced prediction and how accurate that prediction is is great for grasping the theory behind ML algorithms while seeing it in practice.
Be cautious when using these great features because this tool shouldn’t serve as a replacement for independent decision making. PBI’s suggestions should be a template to explore ML capabilities and used as a springboard for familiarizing yourself with possible outputs. From a student perspective, it’s a great way to experiment and gain some practice but when applying it to create models that will impact strategy at a company, deeper knowledge of ML applications is a must.
I haven’t been able to test out this feature due to lack of access to PBI Premium, but if someone with that capability wants to test it out and give me your feedback, I’d love to hear it! Since this is a fairly new offering, there isn’t a ton of insight into how effective the models are, but that leaves a lot of room for discussion and growth.
In the constantly mobile world we live in, there isn’t a single facet of life that isn’t dominated by data nowadays. We are absorbing, digesting, interpreting, leveraging, and producing an infinite amount of data on a daily basis – consciously and unconsciously – and I am fascinated by the possibilities this exchange of information presents. I have always had an appreciation for the vastness of knowledge humans have accumulated over time, but have only recently understood the true potential of this information in advancing our capabilities as a species.
As a student in analytics with a focus on data science, I spend most of my time delving into this potential and discovering my role in expanding it. I’ve spent 2 semesters taking classes in this field and to say that it has been challenging would be an understatement. Before I started as a graduate student in business analytics, my undergraduate background was non-technical but still very focused on data: accounting and finance. The biggest roadblock when starting on this path was getting up to speed on the various tools for data storage, mining, cleaning, exploration, and visualization out there. I was genuinely interested in what these tools could do and their real life implications, but not being able to wrangle them to create meaningful work from the get-go made me doubt whether I was cut out for this field.
As ridiculous as it sounds to hold this expectation for oneself, I have been used to understanding concepts and being able to apply them on the first introduction and data science offered a unique challenge by proving this wrong. I noticed I was more likely than not to fail at whatever I was working on on the first try. Fear of continued failure after a string of them made me averse to challenges and continuing to learn outside the classroom (which is SO important as an analytics student, but that can be its own post). After a lot of reflection and goal setting (as is the ~trend~ this time of year), one of the things I want to work on in 2019 is facing my fear of failure and truly understanding what it is that makes me passionate about data science.
I want to use this blog as a way to learn about and share new developments in data science. I’d like to see personal growth in the field, test and review various tools, share projects that I’m proud of and maybe some that I’m not so proud of, marvel at how data plays such a major role in our everyday lives, debate the ethics behind the use of data, share any valuable insights I gain in the classroom or at seminars, showcase and collaborate with like-minded thinkers, and maybe even get to know myself better as a person.
This is the first time in a while that I have focused on learning for the sake of learning and I am so excited to share this journey with you. In addition to posts about data science you can expect sneak peeks into my life and other things I am passionate about. One of the most striking things I’ve noticed about the data science community online is how welcoming and open it is to all individuals. Everyone is truly themselves and they strive to make data ~cool~ by using it to shine light on their personal areas of interest — like this post on Kanye, or this reflection on whether 2018 was a “happy” year. I want to build proficiency in this field while adding value to the community like these bloggers and hopefully emerge a better version of myself in the process.