The Hitchhiker’s Guide to Telco Transformation - Part 4
Start Learning Machine Learning
We hear about machine learning all the time, but what is it? If we rely on Wikipedia, it looks simple. According to Wikipedia, it is a subtopic in computer science that allows machines (or computers) to learn behavior and repeat it without the need to program the operations. Within this very simple description lie a number of terms, that while simple have great implications. Such as what is learning? How does one learn or prove one has learned? But, how does it work?
First, let us look at some use cases as to better understand the value machine learning can provide:
- Fraud detection – used especially in by the retail industry, detects anomalies in purchasing histories by gathering and analyzing millions of data points to discover fraud patterns. This can raise alerts, detect stolen cards and avoid product shipments.
- Customer service – interactive chat or pop up for browsers of a website or when in need of assistance.
- Litigation and legal situations – models are used to translate complex legalese into layman’s terms so contracts can be better understood and negotiate. Or in court cases machine learning can help more quickly review a large amount of material so that the legal team can spent more time strategizing and less time reading and digesting hundreds of documents looking for behavioral patterns.
- Security – Identifying malware. Hundreds of thousands of malware are created at all the time. As humans we are limited in our ability to detect, track and deal with the thousands of alerts that are created by various tools. Machine learning helps detect, manage and identify suspicions activity.
- Healthcare – Machine learning is used to learn patient medical history for better diagnosis, treatment and adjustment.
- Sports – Machine learning is used to better predict game results in a number of publications
So machine learning is already being used in a wider variety of situations. There are number of types of machine learning algorithms. Two that are most commonly discussed are:
- Unsupervised learning
- Supervised learning
Usually, one used unsupervised learning, when you don’t really know what you are looking for, nor how the data is organized. Supervised learning is used when you know what you are looking for, but would like a mathematical model to predict the future. Let’s compare the two, to better understand the differences:
|Unsupervised learning||Supervised learning|
Unsupervised learning is needed when you need to figure out the data patterns and groupings as mention the data is not labeled and no real knowledge of output and you need to figure that out, that is where unsupervised ML will be used. A common way for unsupervised ML is by deploying AWS EMR with spark MLlib.
Let’s look at supervised learning, with supervised learning we have a labeled data the data has tags or classifications and you can use that data for predictive analytics. In many cases we also know the desired output. We use this method when we have too much data to feasibly sift through. With supervised learning, the data helps the machine discover correlations between the data points and derive hypothesis about the relationship between factors.
You can compare between supervised and unsupervised learning, in the visualization below. In the case of supervised learning, the machine can see the labeled data and the correlation between the elements and use these to help make a prediction about the other data points. With unsupervised learning there are no labels, you cannot see the relationships and you need to figure out what is in the data and what is it that you looking at.
The components of machine learning include:
- Data source – Any DB info or CSV file extracted from big data system.
- Split Data – The data is split into 2 groups: training data and evaluation data. Training data is used by the algorithm to learn predictive relationships between the various pieces of data. Evaluation data is then used to verify the accuracy of the model.
- Shuffle Data – Shuffling data is important because it helps to improve the quality of the ML model. Since the data used in training influences the algorithm or prediction, shuffling the data between the sets and comparing can lead to a better, more accurate model. Shuffling the data places it in a random order, AWS ML created better models when the data is in random order.
- Data Transformation – the ML helps optimize the data for better learning. This can then be used to classify new information and can help with future learning.
- Training Model – Once data transformation is concluded, it is time to create a model and train that model, by finding patterns and correlations in your data. This is where we are teaching the ML algorithm to hypothesize and make predictions based on the data.
- Model Parameters – Customize your model, for example, you may want to increase the amount of data or add parameters to it.
- Evaluation Model – Determine accuracy or quality of the model in predicting instances not in the training data. There are several ways to validate you models such as: model insights, binary model insights, multiclass model insights, regression model insights, preventing overfitting, cross-validation, evaluation alert, etc…
- Feature Selection – Adjust learning/model size, you may even drop features that do not contribute to the learning process.
- Score Setting – Score for prediction accuracy.
Use Model - get the predictions in: Real-Time or batch. Real time is used in web apps or mobile apps. Batch predictions send us data as a CSV file.
The post is a short extraction of what is ML its use cases and what are the process steps been used from getting the data until getting the insights out of it.
It is the first one in a series of ML posts where I will go deeper in the process and the components of it.