Learning Machine Learning and Artificial Intelligence with Blast
Article One - Deep Learning and the History of Artificial Intelligence
Good day everyone! Thanks for joining me on this journey to becoming a Machine Learning Engineer or an Artificial Intelligence Researcher.
This is the second article in the series and as a software engineer I’m sure you know why it’s subtitled article one.
In the previous article, the main highlight was the prerequisites for becoming a ML Engineer or an AI Researcher. If you’re reading this article then it’s because you possess the required foundational knowledge and you’re ready to build on that.
I congratulate you, but before we start building models and changing the world, it’s important we do a quick catch up on the field of Artificial Intelligence. I believe to build the future, you need to understand the past and the goal of this article is for us to do a deep dive into the history of Artificial Intelligence and also get introduced to Deep Learning.
This article will separate us from the average AI enthusiast. By the end of this article, you’ll understand and start to look at Machine Learning and Artificial Intelligence not as some miracle tech but the actual science and engineering it is. Without much further ado, hold on to your history helmets and let’s goooo.
Like I said in the previous article, most of the knowledge I’m sharing currently, amongst other sources, is gotten from the book Deep Learning with Python by Francois Chollet. Francois Chollet is the creator of Keras, one of the most widely used Deep Learning frameworks. He’s currently a software engineer at Google, so it’s safe to say we’re in great hands.
The hype around Artificial Intelligence in recent years with the advent of chatGPT and other models has been intense, especially in mainstream media. AI is almost everywhere now and everybody knows about it. AGI (Artificial General Intelligence), ASI (Artificial Super Intelligence), AI is going to take all jobs, AI is going to change the world, every week something new for AI enthusiasts to obsess about.
As aspiring Machine Learning Engineers and Artificial Intelligence Researchers, it’s important for us to be able to separate what’s real from what’s just media hype.
Artificial Intelligence was born way back in the 1950s, when some pioneers from the budding field of Computer Science started asking whether computers could be made to think. Artificial Intelligence finally crystallized as a field of research in 1956 thanks to the efforts of John McCarthy. Briefly but comprehensively, Artificial Intelligence can be described as the effort to automate intellectual tasks normally performed by humans.
Artificial Intelligence is a general field that encompasses both Machine Learning and Deep Learning. Other approaches that might not involve Learning existed well until the 1980s. They were known as symbolic AI, this involved programmers handcrafting a sufficiently large set of explicit rules for manipulating knowledge stored in explicit databases. This was good enough to solve well-defined logical problems, such as playing chess but unreliable for more complex problems like image classification or natural language translation.
The solution to the symbolic AI approach was Machine Learning.
Machine Learning started to flourish in the 1990s and it has quickly become the most popular and most successful subfield of AI thanks to the availability of faster hardware and larger datasets.
Normally, a human programmer writes rules (a computer program) to turn input data into correct answers. There's an input, the program is the process, and there's an expected output, similar to making a dish from a set of ingredients. The ingredients are the inputs, the recipe is the process and the dish is the output.
Machine Learning changes this approach. In Machine Learning, the machine examines the input data and the corresponding answers, then determines what the rules (process) should be. Like having a set of ingredients and dishes, then the machine figures out recipes to make the dishes from the ingredients.
The reason for the Machine Learning approach is so we can then give the machine a new set of ingredients and it can come up with appropriate dishes using its own recipes Learned from the test set. This differs from classical programming where you always have to provide the ingredients and recipe and always expect the same dish.
In classical programming, you tell the computer exactly how to do something, with Machine Learning you show the computer examples of what you want it do and it figures out how to do it.
Classical Programming: Input + Process = Output
Machine Learning: Input + Output = Process
You can see how this differs from the symbolic AI approach. With Machine Learning, a system is trained rather than explicitly programmed. It is presented with many examples relevant to a task e.g differentiating between crocodiles and alligators, and the system finds statistical structure in these examples that eventually allows the system to come up with processes for automating the task.
To do Machine Learning we need three things:
Input data points: For our tasks of differentiating between alligators and crocodiles, the input data points will be pictures of crocodiles and alligators, the more the better.
Examples of the expected output: These expected outputs could be tags on the pictures, labeling an image as a crocodile and another image as an alligator. (Typically, we have two sets of data, the first is the input data which could be 500 images of alligators and 500 images of crocodiles, then we have the second set of data, expected output data, which are are the same 500 images of crocodiles and 500 images of alligators but these are labelled clearly, alligator or crocodile.)
A way to measure whether the algorithm is doing a good job: This is necessary in order to determine the distance between the model’s current output and expected output. This measure is used as a feedback signal to adjust the way the algorithm works and this is the step we call Learning.
Short break? Okay let’s go on.
A Machine Learning model transforms its input data into meaningful outputs, a process that is Learned from exposure to known examples of inputs and outputs.
We feed a model inputs of hundreds of alligator and crocodile images and outputs of those images labelled as either alligator or crocodile. The model works out a process to identify both animals then we can test it on a new image of either an alligator or a crocodile, an image the model hasn’t seen before and it should be able to tell if that new image is an alligator or a crocodile. Sweet!
It all sounds simple but the central problem in Machine Learning is to meaningfully transform data, that is, to learn useful representations from the input data that gets us closer to the expected output.
To explain better, let’s say we want to develop a model that can take the coordinates (x, y) of a point and output whether the point is likely to be black or white.
In this case,
The inputs are the coordinates of the points. e.g (5,2) or (1,9) etc.
The expected outputs are the colors of our points. e.g coordinate (5,2) is white or coordinate (1,9) is black etc.
A way to measure whether our model is doing a good job could be, for instance, the percentage of points that are correctly classified. i.e if we give it 10 new points it hasn't seen before, like (1,3) or (5,5) etc., how many will it correctly identify as black or white? Will it be 10%? 50%?
Now if all the points we give the model are within the positive x and positive y axis with whites and black in different locations within that space. Our model will not be very good at determining if a point is black or white based on coordinates alone. This is because it will be quite difficult for the model to make useful representations of the data, the model won’t have an explicit meaningful way to identify black or white points, it will end up guessing and not be very accurate.
We need a better representation of the data that cleanly separates the white points from the black points. A solution would be to increase the total space by including the negative x and negative y axis. Now we can have the white points in the positive x, positive y, negative y axis and we can have the black points in the negative x, positive y and negative y axis.
With this new representation, it’ll be much easier for our model to understand what classifies a point as black and what classifies it as white, and it will correctly classify all points into black or white given just the coordinates.
This gives us a better understanding of how Machine Learning works, increasing the available space leads to better data representation which enables us to build a better functioning Machine Learning model.
This process of simplifying the data in order to help the model make useful representations in order to be more accurate in performing its task is called Feature Engineering. You must understand that Machine Learning and Artificial Intelligence is still computer engineering, there’s no magic involved, your model is as good as the data you’ve trained it on.
If the input and expected output data is clean and clear such that the model can make clear and useful representations then you’ll have a very accurate model that can give expected output on never seen before data in the real world.
Now we can’t always do Feature Engineering by hand and there are far more complex problems than identifying white and black points from coordinates e.g image classification or text-to-speech conversion.
So that leads us to Deep Learning! Deep Learning is a specific subfield of Machine Learning. It involves Learning on successive layers of increasingly meaningful data representations. Modern Deep Learning often involves tens or even hundreds of successive layers of representations, and they’re all Learned automatically from exposure to training data.
In Deep Learning, these layered representations are Learned via models called neural networks, structured in layers stacked on top of each other. You can think of Deep Learning as a multistage information distillation process, where information goes through successive filters and comes out increasingly purified (that is, useful with regard to some task).
For example, a model that identifies handwritten digits from 0 - 9. When we feed this data to a Deep Learning model with neural networks, each layer extracts a bit of information, say there are four layers, once the data has passed through all four layers the model will have enough information to make very useful representations and such a model would be accurate at identifying handwritten digits.
So that’s what Deep Learning is, technically: a multistage way to Learn data representations. It’s a simple idea—but, as it turns out, very simple mechanisms, sufficiently scaled, can end up looking like magic.
Machine Learning is about mapping inputs (such as images) to outputs (such as the label ‘crocodile’), which is done by observing many examples of inputs and outputs. Deep neural networks do this input-to-target mapping via a deep sequence of simple data transformations (layers).
I’ve given you a high level overview of these concepts, you’ll do well to make more research into them on your own. It’s difficult to distill all the necessary information into this article, the reference book I’m using is an excellent resource to learn all things Deep Learning and Machine Learning in depth.
Deep learning, although being a fairly old subfield of Machine Learning rose to prominence in the 2010s. Some technological breakthroughs that have been achieved thanks to Deep Learning include:
Near-human-level image classification, speech transcription, handwriting transcription, autonomous driving
Digital assistants such as Google Assistant and Amazon Alexa
Ability to answer natural language questions
In conclusion, I’ve been able to give an introduction to Deep Learning and the history of Artificial Intelligence. There’s still so much more I couldn’t cover in this article. I plead with you to read the reference book and make other research on the topic.
I’ll be ending this article here and in the next article, we’ll be building a linear classifier in pure tensorflow.