Machine learning incorporates several hundred statistical-based algorithms, and choosing the right algorithms for the job is a constant challenge of working in this field. Before testing a specific algorithm, it is important to understand the three types of machine learning attacks and manipulate input and output flexibly.
What is Supervised Learning?
Supervised learning imitates our own ability to extract patterns from known examples and use that extracted insight to engineer a repeatable outcome. This is how the car company Toyota designed its first car prototype. Rather than speculating or creating a unique manufacturing process, Toyota created its first vehicle prototype after taking apart a Chevrolet car in the corner of their family-run loom business. By observing the finished car (output) and then pulling apart its individual components (input), Toyota’s engineers unlocked the design process kept secret by Chevrolet in America. This process of understanding a known input-output combination is replicated in machine learning using supervised learning. The model analyzes and deciphers the relationship between input and output data to learn the underlying patterns. Input data is referred to as the independent variable (uppercase “X”), while the output data is called the dependent variable (lowercase “y”). An example of a dependent variable (y) might be the coordinates for a rectangle around a person in a digital photo (face recognition), the price of a house, or the class of an item (i.e., sports car, family car, sedan). Their independent variables—which supposedly impact the dependent variable—could be the pixel colors, the size and location of the house, and the specifications of the car, respectively. After analyzing many examples, the machine creates a model: an algorithmic equation for producing an output based on patterns from previous input-output examples. Using the model, the machine can then predict an output based exclusively on the input data. The market price of your used Lexus, for example, can be estimated using the labeled examples of other cars recently sold on a used car website. With access to the selling price of other similar cars, the supervised learning model can work backward to determine the relationship between a car’s value (output) and its characteristics (input). The input features of your own car can then be inputted into the model to generate a price prediction. While input data with an unknown output can be fed to the model to push out a prediction, unlabeled data cannot be used to build the model. When building a supervised learning model, each item (i.e., car, product, customer) must have labeled input and output values—known in data science as a “labeled dataset.” Examples of common algorithms used in supervised learning include regression analysis (i.e., linear regression, logistic regression, non-linear regression), decision trees, k-nearest neighbors, neural networks, and support vector machines, each of which are examined in later chapters.