How can I learn more about Machine Learning?

By Kathryn Hempstalk - https://www.linkedin.com/in/kathempstalk/

If you’re wanting to learn more about Machine Learning (ML), and potentially get a job in the area, here’s what I recommend:

  • Do at least one online/academic course so you have a basic understanding of the algorithms and how to apply them.  See below for a list of recommended courses.

  • Do a variety of different exercises - if you don’t have your own data to play with then there are plenty of datasets/problems available online (see below). 

  • Understand that most of the work in ML is actually preparing the data so the machine understands it, if you’re only interested in the fun bit at the end then you’ll probably have a bad time.

Things I look for in a data scientist in an interview (with a view to applying ML/statistics to real world problems):

  • You can trade off between spending forever perfecting something that is never going to be perfect, and delivering something quickly.  That is, I want something reasonably quickly that is practically useful.

  • I want to have confidence that you understand basic statistics and ML algorithms, which I would expect would be demonstrated by some examples you’ve done yourself and can walk me through.  I will likely challenge you on (in no particular order):

    • Showing you understand the problem you are trying to solve, why it is important to solve.

    • Did you draw some graphs/visualisations to understand the data? What did you find?

    • Details on the data you have to solve it - how much data do you have? What are the features? What transformations have you done?

    • The statistical significance of your results and whether any conclusions you draw are valid.

    • What limitations your data has, and what data you might want to add in order to change those limitations.

    • What algorithms you have used to solve it, how you optimised their parameters, and why you chose those algorithms (and parameters).

    • Your code - is it tidy? Did you document it? 

    • What are the real world implications of automating this problem? Ethics? Are there any scenarios where it shouldn’t be used?

    • How you could test this model in a production environment to show that the predictions of your model translates to something useful?

  • When given a problem, I want you to try to show interest in understanding what the problem really is, and potentially suggest non-ML solutions if it makes sense to do so. As an example, I once had two candidates that we felt would fit a role, and were having trouble deciding between them. We posed the same problem to each: “given data from a meter which has some sensors to measure a fluid, what would you do with it?”. The first candidate said they would check the calibration of the sensors first. The second candidate asked “what are you trying to do with the meter?”. Although both candidates reasoned further about the problem, the second candidate got the job. It stuck with me that his attitude of wanting to understand the problem rather than blindly applying statistics/ML was what we were after.

  • Big data is nice but most problems in practice are not big data problems, however, demonstrating you at least understand when to use big data technologies (such as Spark) and are prepared to learn them will be viewed with favour.

  • Cloud services and publicly available libraries are awesome and it’s beneficial for you to know about at least one cloud provider’s services (AWS, Google, Azure) with respect to ML, as well as at least one ML “language” and associated libraries (e.g. R, python).

  • There’s lots of different terms in market for roles that are associated with ML. It’s worth knowing how a data scientists differs and works with the other roles, and I’d expect you to be clear on how you would work with them to get ML models to production. Some of the common ones are: 

    • “Data scientists” are usually the people who do the model building, and work with product development teams to make sure their models work in practice;

    • “Data engineers” will usually help move/transform/store the data in a place where you can train your models;

    • “ML engineers” will usually help productionise a model but may also do either of the above;

    • “Data/Insights Analysts” typically have a similar skillset to data scientists but will tend to do more reporting, more one-off forecasting (e.g. expected revenue/demand);

    • Developers/Testers/Product Owners are typical in software development teams and are critical for getting ML models integrated into a larger system.

Online courses

Andrew Ng’s Stanford Machine Learning Course: https://www.coursera.org/learn/machine-learning/

Considered “the” course in machine learning, but does involve a fair bit of math because it explains how the underlying algorithms work, and uses octave/matlab for all the exercises. Most data scientists would use R or python and probably not touch the algorithms, but, it does help to understand how they work (so you understand what to apply and how to apply it to get good results).

Deeplearning.ai (also Andrew Ng) 5 (short) course deep learning specialisation:

https://www.deeplearning.ai/

Deep learning is a hot topic, but it needs a lot of data to get good results - and most problems don’t have lots of data so general machine learning (see above) is more useful. This is an incredibly well produced set of courses covering deep learning, and uses python (notebooks) for all assessments. Course #3 in the series - structuring your ML projects - is useful regardless of whether you are doing deep learning or not.

Data Mining with Weka:

https://www.cs.waikato.ac.nz/~ml/weka/courses.html

More practical and less maths-orientated course using Weka. Great for less technical people and shows the practical side (like data cleaning) that is often required to get good results from the algorithms. 

Other useful sorts of courses:

Big Data - Spark

https://www.edx.org/xseries/data-science-engineering-apacher-sparktm

No longer available but a nice practical course on Spark. Not sure what alternatives are around now.

Statistics 101 

Probably something like this: https://www.coursera.org/learn/probability-intro

Ethics 

Like this: https://www.coursera.org/learn/data-science-ethics

Previous
Previous

Candidates are READY

Next
Next

Summer of Salesforce