Machine learning is the process of computer programs becoming more accurate at a task due to exposure to data, called training instances. What task is being performed and what type of data is provided for training is critical to deciding how and what techniques to use in the machine learning process. Since the existence or absence of labeled training data is so fundamental to the machine learning process, special terms are used to describe the two scenarios:

  • When labeled training instances are used for learning the process is said to be supervised
  • If no label instances are used, the process is said to be unsupervised.

One way to think about supervised learning is that during the learning process, a “teacher” is available that will tell the algorithm when it is predicting labels correctly and when it is making mistakes. That teacher is the subject matter expert or experts who labeled all the training instances used by the machine learning algorithm. The machine uses the teacher to make more accurate predictions. In the figure below, the teacher directs the machine to learn to predict whether a sample has a “hole” and label each sample with its prediction.


Looking for the comprehensive guide for understanding machine learning? Download the recently updated Guide for Machine Learning for Cybersecurity.