Machine Learning an Overview
What is machine learning?
Machine leaning (ML) is a part of the bigger scope of artificial intelligence (A.I.). Artificial intelligence is the theory and development of a computer system that is able to perform tasks such as decision-making, picture recognition, and other things typically thought as “cognitive” tasks that normally requite human intelligence. Artificial intelligence was first coined as a term in 1956 by John McCarthy but the first work that is now generally recognized as A.I. was in 1943 for the Turing-complete “artificial neurons” by McCullouch and Pitts.
Machine learning is just one part of artificial intelligence. It is a big part of what A.I. has become. ML takes data and learns from that information. It allows a system to learn new things from the data input. Whereas A.I. is the decision maker from that data. A.I. will always go for the optimal solution, where ML will go for the only solution whether its optimal or not. A.I is trying to increase a chance of success and ML is trying to increase the chance of accuracy. Machine learning is really broken down into two main sub-categories supervised and unsupervised learning. There are others but these are the main ones. I will break those down for you separately.
Supervised learning is simply where you give the machine the answers to the data and let them learn from the data (i.e. give it a picture of a dog and tell it is a dog and then give it tons of pictures labeled dog until it can correctly identify a dog). Supervised learning is considered the faster of the two but can have a bias cause by humans. Say you give it a ton of pictures of birds and most people will pick standard birds you see every day like finches and cardinals even some eagles but what if it sees a picture of a penguin it probably would not classify it as a bird since it does not look like the birds that have been provided. Those human biases can show up in the data that is provided. So try not to be human when working with machine learning. This brings me to the types of things that supervised learning could be useful for. First set of uses I would like to bring attention to is classification. Some of the things that fall under the classification tree would be fraud detection, spam filtering, and image classification.
You see fraud detection is done by the machine taking in all the data from the user and analyzing it to see if it fits into the pattern of normal usage. It monitors all usage of the users and builds a profile of what is normal and if something falls outside of the ranges of normal then it flags it as an anomaly and may be a fraudulent purchase. An example would be you buying gas in some random small town two states away when you have not purchased in gas leading up to that point where you would normally need to get gas to get there and you have never purchased anything there before.
Spam filtering monitors your emails for things that have been known to contain spam. Let’s say you have an email that says “you won” or “inheritance” or “lottery” it would probably get flagged for a spam message since most spam emails contain some variant of these words. In machine learning you continue to let it learn what a spam message is by clicking on the “this is spam” button and updating its learning abilities. When you click the button it is a version of supervised learning because you are labeling the data.
Image classification is also a supervised learning use that falls under the classification use case. Just like before by labeling data of what a bird is and including all forms of birds in the labeling profile to include penguins and eagles and even DO-DO’s when you ask a machine to classify which pictures are birds and which are not it would be able to. Real world case of this is those image captchas that ask you to pick which images have street lights and you click on the pictures, you are actually labeling pictures for more accurate machine learning… free labor Yay!
Other uses for supervised learning fall into another category regression. Regression is a statistical relationship between two variables and predicting an outcome based on the relationship. Uses that fall under this type of use case is risk assessment and score prediction. Risk assessment uses regression based on variables provided to the system like when someone applies for a loan. The banks have tons of data on what type of loans where defaulted on and what types of loans were paid on time and other variables associated with both outcomes. So when someone applies for that loan they can plug information into the machine and it may ask some weird questions that humans don’t understand to be relevant like “Does the applicant own a car” to make the most accurate risk assessment profile. Maybe the system has learned that if the applicant makes over 100k a year and owns a home it is a low risk loan but if the applicant does not own a car with the same data it actually raises the risk 15%. So something that seems weird to us the machine has learned that the fact that owning the cars play a bigger role in determining risk as opposed to renting or owning a home. This is a totally made up scenario it was just to illustrate the way risk assessment may work with machine learning.
Supervised learning algorithms
There are many upon many different algorithms that could be used in machine learning. These are some of the most common ones used in supervised learning. Not in any specific order they are linear regression, neural network, Naive Bayes, decision tree, random forest, and support vector machine (SVM). I am not going to go into detail of every single one but just know that each one has their own uses and would be better suited in different situations. I will cover linear regression as it is used in a number of instances it is so popular that Microsoft excel includes it in the software. Linear regression is a simple relationship of two variables where the variable y can be predicted by the input of variable x. The most comprehensive algorithms are in the form of neural networks. They are based off of how the human brain works as in some information is taken in and then passed to another “neuron” for processing then will be passed to another stage to be processed for an multitude number of times. When you stack many layers on top of each other in this type of neural network you for a Deep Neural Network (DNN) there are others too like Convolutional Neural Networks. For more information on different types of algorithms go ahead and use the framework and use google.
Now this brings us to the other type of machine learning. Unsupervised learning is when you do not label the input data and you let the machine sort the data according to similarities and differences when you put no restrictions on the classifications. An example of this might be when you are a kid you may group all animals with 4 legs in one category and call them all dogs. Even though some may be horses, they all have four legs and you group them together because that’s is all you know because no one has told you that one was a dog and one was a horse. In this type of learning it may lead to insights that you may not have seen in the data before. Say you have a bunch of people who bought some hand lotion and some of those people also bought scented candles. If you were to categorize the customers by what they bought you may not put them in the same category but with unsupervised learning, you may find out the people who bought both where actually pregnant and now you have a new insight on how to market to those customers.
Dimensionality reduction and clustering are segments of unsupervised learning. In each of those sections I can offer some of the uses in each. In dimensionality reduction you have face recognition, text mining, and image recognition. In the clustering side you see examples in biology, planning, and marketing. In the clustering is marketing which I gave an example of earlier it would cluster customers into groups based on age, sex, lifestyle, and situations (i.e. pregnant, married, single). You can also put the input the other way and sort the purchases and see who falls into what categories. I am not completely sure but I would assume criminal profilers would also benefit from this type of learning by putting all the data of a crime into it and letting it sort what details are linked together and which types of people would be likely to commit these types of crimes now that we have learned about the types of machine learning let us get into some common machine learning implementations.
Other forms of learning
There are also two other forms of learning that stand out and that is semi-supervised and reinforcement learning. The first is where you label some data for the algorithm to get a base and let it figure out the rest of the unlabeled data. Which is faster than unsupervised learning by itself but can create biases. And reinforcement learning is letting the A.I. learn in an environment by making decisions and measuring the outcome and repeating the decisions until the right outcome can be achieved. This is commonly used in gaming. The three components of reinforcement learning are the agent (A.I. the learner or decision maker), the environment, and the actions.
TensorFlow is an open source software library for machine learning that was created by Google’s brain team. It works with python programing language well as well as others and runs C++ programing language in the back end. It works really well in ML with implementing neural networks which I noted earlier as being an algorithm. With TensorFlow it is possible to teach ML systems handwritten characters, voice recognition, even language translations. Recently they integrated Keras functionality which means easier ways to work with the libraries.
Hopefully you understand a little more on the workings of Machine learning and come away with an understanding on how computer have come to understand humans.