What is machine learning? - The data process blog

Machine learning is a branch of Artificial Intelligence and as such, it is a field that tries to create models to make better or faster decisions. Specifically, machine learning algorithms do that through analyzing data and adapting to it. As the model adapts, ideally it becomes more accurate in predicting.

Ok, but how is this useful?

The most useful cases of machine learning are cases where it is very difficult to formulate an algorithm to solve the problem. One example is the classification of objects in images. Imagine trying to create an algorithm to classify coffee beans as rotten or healthy, the color is important but also is how the beans can be oriented in the picture, at what distance they are from the camera (as that would affect their size in the image) and many other factors. This can turn out to be a near impossible task.

But how us humans can do it without even thinking? We have all these rules and patterns in our minds that were learned. And that is exactly what machine learning models will do. Essentially, every relevant aspect of the data (called feature), will be used by the model in a way that it tries to “understand” the effect of them in the expected result.

As you can imagine, the first thing to figure out is the question being asked. Is this a problem of prediction or classification? What am I interested in? Am I interested in knowing a number, for example, the temperature for tomorrow in my city or do I want to know if the coffee beans in an image are rotten or good for consumption?

Once framed the question, there must be a guarantee that the data at hand is reliable, it is clean and it’s sufficient for the needed accuracy of the model. As the complexity of the problem increases, so does the needed training sample size. Algorithms also are not intelligent as humans (yet) so they usually need a LOT of data to make sense of it and generalize well. A lot more than a human would need.

The learning methods

Machine learning algorithms have 4 types of learning methods: supervised, unsupervised, semi-supervised and reinforcement learning.

Supervised learning

The model uses labeled data to learn. What this means is that something (usually a human) has to tell the model what is what in the examples of the training/test sample. This learning method usually offers better accuracy than the unsupervised method but as it has the data requirement that it needs to be labeled, the data gathering process can be unfeasible. The targets of this type of models are either classes or numerical values. Use cases are very wide and the models can be theoretically used for anything that is divided in classes or is measured as a number.

Unsupervised learning

In this learning method, the model learns with unlabeled data, which means that it tries to identify similarities with the given data. Although this learning method produces less reliable results than supervised ones, they leverage the fact that unlabeled data that is much more easier to get. The targets of these models are associations (between data points) or clusters. Example of use cases are marketing customer profiles (clustering) for similar behaviour and related purchase items (association).

Semi-supervised learning

This method is a mix of the two, part of the data is labeled but the majority is not and can be useful if one has a lot of unlabeled data and sufficient labeled data to draw some conclusions about the unlabeled data. The labeled data is used to fit a model to predict the labels of the unlabeled data. Then, a supervised model can learn using the total data.

Reinforcement learning

Reinforcement learning uses a different approach for learning patterns: an agent takes different actions in an evironment and after evaluates if the result of those actions were good or bad, this way, it reinforces the good patterns and penalizes bad patterns. Situations where one can not determine if an action is good or bad are use cases for this method. The best and most common example is a game because the rules are clear, the target result is clear but the quality of the plays are not always clear. Other examples are walking (or flying) from point A to point B.

I hope you liked this post, feel free to comment and reach out!