In a previous blog post, we explored the importance of machine learning (ML) and delved into the five most important things that business leaders need to know about ML. First, recall that supervised learning is concerned with the prediction and classification of data. Now it’s time to dive deeper.
1. MODEL INTERPRETABILITY
We saw that accuracy (the percentage of your data that your model predicts/classifies correctly) is not always the best metric to measure the success of your model, such as when your classes are imbalanced (for example, when 99% of emails are spam and 1% non-spam). Another space where metrics such as accuracy may not be enough is when you need your model to be interpretable. Interpretability is essentially the characteristic of being able to say why your model makes the predictions it does, which is necessary for many models deployed in financial markets due to regulation. It is also essential for algorithms that impact stakeholders’ lives and liberty, such as Northpointe’s COMPAS recidivism risk model, the output of which is used by judges to make decisions during parole hearings. There’s an inherent trade-off between accuracy and interpretability, in that the most accurate models are generally black-box and not interpretable. Even if your particular industry isn’t concerned with interpretability at the moment, it’s an important part of the conversation to be aware of, and regulation will bring it to many other industries in the coming years.
2. MONITORING AND MAINTAINING YOUR ML MODELS IN THE WILD
It’s also essential to remember that the job of machine learning doesn’t end when you’ve got your model out there making predictions or performing classifications. Models that are deployed and doing work need to be monitored and maintained. If you have a model predicting credit card fraud based on transaction data, you get useful information every time your model makes a prediction and you act on it. On top of this, the activity you’re trying to monitor and predict—credit card fraud—may be dynamic and change over time. This process is called data drift, and it shows how essential it is to update your model, as the way data is generated is constantly in flux.
3. DEEP LEARNING
By this point, you might be wondering when we’re going to mention deep learning, one of the buzziest terms in the data science and AI space. Deep learning is a form of ML that uses models called neural networks, which are loosely inspired by biological neural networks in human brains. Note that this is the extent to which the analogy holds, and deep learning is not equivalent to human intelligence.
The majority of the applications of deep learning occur in the supervised learning world in the form of image classification (facial recognition, self-driving cars, drone footage utilized to estimate crop yield in AgTech) and natural language processing (Google translation, document classification, sentiment analysis). There are other applications in time-series prediction, such as financial prediction problems. It’s important to emphasize that deep learning systems are rarely good at more than one task: an algorithm that is built for facial recognition will not be any good at classifying legal documents. Although you may like to call deep learning a form of AI, it is so in the sense of narrow artificial intelligence, not artificial general intelligence, which is the realm of hypothesized computational systems that are as intelligent as humans across the board.
4. TRANSFER LEARNING
In a world where building competitive ML models relies on state-of-the-art, domain-specific data, you’d be forgiven for being concerned about not having enough data yourself or the ability to collect it. But never fear, since much of the future of ML will involve using pre-trained models, or models already trained on other data. This is the world of transfer learning. For example, you could buy a pre-trained image classification model that recognizes and classifies cars. The cool thing about transfer learning is that you can retrain pre-trained models for your particular question and domain. For example, if you want an algorithm that classifies trucks, you could take one that classifies cars and train it further on truck photo data.
We are currently seeing the emergence of algorithm marketplaces, such as Booz Allen’s Modzy, where you can buy and sell pre-trained models. There are key, legitimate concerns, however, such as how to think about data and algorithmic bias, along with model governance, when trading algorithms without having access to the datasets that they’re trained on.  The space is ripe for growth, but it’s also ripe for abuse and regulation.
5. MACHINE LEARNING AND DECISION MAKING
The final point that I cannot stress enough is that ML—and all data work—needs to be directly embedded in the decision function. That is, ML doesn’t exist in a vacuum, it’s there to serve decision making. This can be automated (as is Google Search), embedded in a scientific process (as when an algorithm flags an MRI for a specialist to look at), or embedded in organizational processes (such as when decisions are made around what to do with customers who are predicted to churn).
You want to make sure that the data work always reflects your real-world concerns and that you avoid Type III errors, where you get the right answer but to the wrong question. This is why the data translation space is heating up and why it’s so important to establish a culture of data work in your organizations. This will require the workforce to understand what data science and ML can and can’t do, along with how to ask good questions from data professionals and establishing healthy, productive lines of communication between the data function and the rest of the company at large.
To recap, the five things you need to know about machine learning (in addition to the [five I mentioned previously]) are:
- There’s an inherent trade-off between accuracy and interpretability, but interpretability (the characteristic of being able to say why your model makes the predictions it does) is becoming more important, especially for models deployed in regulated markets like finance.
- ML models that have been deployed need to be monitored and maintained to prevent data drift.
- Deep learning can be very good at specific supervised learning tasks like image classification and natural language processing.
- If you’re concerned about collecting enough appropriate data for a specific business problem, the growing field of transfer learning allows you to retrain pre-trained models for your particular question and domain.
- ML—and all data work—needs to serve decision making to reflect real-world concerns and establish productive lines of communication between the data function and the rest of the company at large.
: Since publication of this article, Modzy has reached out and claims to “require that all models in the marketplace include details about model build, such as model architecture, training and validation datasets, and performance metrics to ensure transparency into model build”. Modzy directs readers here for additional information on this.