Learning Machine Learning and Artificial Intelligence with Blast
Article six - Models and Architectures

I am a self taught software developer from Nigeria. I enjoy writing technical documents and creating solutions with software.
Hello world!
Thanks for joining me on this journey so far, it’s been beyond rewarding and fulfilling to learn and share my learnings with these articles.
It’s now been about eight months since I wrote the first article, between then and now, a whole lot has transpired that I can only be grateful for. This seventh article is going to be the final one in the series.
From article zero to article five, I introduced Machine Learning, Deep Learning, Artificial Intelligence and we were able work on some projects.
If you’ve been following from the start, I congratulate and appreciate you. Even though I still have a lot to learn in the space, I’m definitely better off now than I was eight months ago when I started and by going through the series, I’m confident you’ll be closer to your Machine Learning Engineer/Artificial Intelligence Researcher goals.
In this article, we’re going to be discussing models and architectures in the ML/AI space. I already introduced some models and architectures in previous articles, we’ll be recapping on the ones I’ve introduced and then I’ll introduce some more, all from our source book, “Deep Learning with Python” by Francois Chollet. The book inspired this series and I’m happy to announce that I’ve now completed the book which is why I’m able to complete this series as well.
Alright, without further ado, let’s dive into article six — models and architectures.
We’ll start by answering the question, what exactly are models and architectures within the ML/AI space?
Models are simply mathematical functions trained on data to make predictions or decisions. Models map inputs to outputs and “learn” by minimizing a loss function i.e reducing errors. Throughout this series we’ve worked with different models, for Binary Classification, for Multiclass Classification, for Regression. These are introductory or beginner models, advanced models would be for Image Processing, for Natural Language Processing etc.
Architectures on the other hand are simply the design or blueprint for the models. The architecture defines how the components (layers, nodes, activation functions) are organized and connected.
So therefore, the architecture is the blueprint and the model is the trained instantiation of that blueprint. The architecture tells you how while the model tells you what. While I’m trying to establish models and architectures as two different entities for better understanding, you should know that architectures and models always go hand in hand, every model from simple to advanced has its architecture and the point of architecture is to create powerful useful models.
Now we’re clear on that, let’s recap on some of the architectures and models already introduced in previous articles in this series. We’ll go from simple to more advanced architectures and models from our source book and my goal is that by the end of this article you’ll be much more familiar with various architectures and models and you’ll be able to further your learning easily and grow as a Machine Learning Engineer or Artificial Intelligence Researcher.
Are you ready? Let’s gooo
- Model and architecture for Binary Classification:
We treated this in article three, Binary Classification refers to a Machine Learning task where the machine learns to categorize input data into two exclusive categories. The specific problem we tackled involved classifying movie reviews as either positive or negative. Some other applications of a Binary Classification model would be to classify emails as spam or not spam or whether a patient has a disease or not in the medical field.
The architecture we used for the Binary Classification model is a simple feedforward neural network (Multi-Layer Perceptron). It consists of an input layer, two fully connected (Dense) layers with 16 neurons each and ReLU activation function, followed by an output layer with one neuron and sigmoid activation function which outputs a probability between 0 and 1 to make a binary decision (close to 0 is negative, close to 1 is positive). In addition we used a binary cross-entropy loss function, an rsmprop optimizer and we tracked accuracy. We initially trained for 20 epochs, then revised to 4 to avoid overfitting with a batch size of 512.
The eventual Binary Classification model resulting from training on IMDb dataset using this architecture was able to achieve an accuracy of 88%, effectively learning to distinguish between positive and negative movie reviews.
# Code Summary
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential([
layers.Dense(16, activation="relu"),
layers.Dense(16, activation="relu"),
layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
loss="binary_crossentropy",
metrics=["accuracy"])
- Model and architecture for Multiclass Classification:
We treated this in article four, Multiclass Classification builds on Binary Classification for when you want to categorize input data into more than two categories. The specific problem we tackled involved classifying Reuters newswires into categories like sports, technology, politics etc. Other use cases of Multiclass Classification include, classifying movies/songs/books into genres, classifying type of flowers and even recognizing handwritten digits (0-9) from images.
The architecture for Multiclass Classification is similar to that of Binary Classification but the difference lies in the number of neurons, in the Multiclass Classification problem we tackled, we had 46 categories so we used 64 neurons with ReLU activation function for the Dense layers, and 46 neurons with softmax activation function for the output function. We used categorical_crossentropy for the loss function and rmsprop for the optimizer. We also tracked accuracy for this model’s architecture. We trained for 20 epochs initially then revised it to 9 epochs before evaluating the model on the test data set.
The model was able to achieve 80% accuracy, it outputs a number between 1 - 46 representing the category the newswire belongs to. Note that because the output size is large (46 categories) we could have gone for more neurons in our Dense layers e.g 124 neurons, this could perhaps given us a higher accuracy.
# Code Summary
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential([
layers.Dense(64, activation="relu"),
layers.Dense(64, activation="relu"),
layers.Dense(46, activation="softmax")
])
model.compile(optimizer="rmsprop", loss="categorical_crossentropy", metrics=["accuracy"])
- Model and architecture for Regression:
It starts to get more technical here but still in beginner territory, we tackled this problem in article five. Regression is one of the core Machine Learning problems alongside Classification. Regression involves predicting a continuous value as opposed to a discrete class which Classification handles. The goal is to output a number that’s as close as possible to the true value.
The specific problem we tackled involved predicting the median price of homes in Boston suburbs during the mid-1970s using data such as crime rate, average number of rooms, accessibility to highways and more. Other use cases for Regression models include forecasting weather conditions, estimating sales numbers, and determining the duration of projects or events.
Again, the architecture for a Regression model is quite similar to that of a Classification model, we have the input layer, two Dense layers with 64 neurons and ReLU activation function, the difference lies in the output layer, where we have a single neuron with no activation function, which is want you want to predict a single continuous value.
For the loss function we used MSE (mean squared error) which is standard for regression tasks, for the optimizer, you guessed it, rmsprop, and for the metric we tracked MAE (mean absolute error) which gives us an interpretable value of how far off our predictions are on average.
We trained the model initially for 500 epochs then 130 epochs to avoid overfitting and we achieved a MAE of 2.2 meaning the models predictions were off about $2200 on average when predicting home prices.
# Code Summary
from tensorflow.keras import layers
from tensorflow import keras
def build_model():
model = keras.Sequential([
layers.Dense(64, activation="relu"),
layers.Dense(64, activation="relu"),
layers.Dense(1) # No activation for linear output
])
model.compile(optimizer="rmsprop", loss="mse", metrics=["mae"])
return model
- Model and architecture for Image Processing:
Now we’re getting advanced! Image Processing in ML/AI involves training models to learn patterns on images. Some of the problems Image Processing solves includes Classification i.e teaching the model to identify categories e.g dogs vs cats or biking vs running, another problem is Segmentation i.e teaching the model to separate an image into different areas e.g separating a human from a background in an image and the final problem is Object Detection i.e teaching the model to draw bounding boxes around objects of interest in an image e.g how a self driving car would identify cars, pedestrians and signs in view of its camera.

The architecture for Image Processing is more complex than the feedforward neural networks we used for Binary Classification, Multiclass Classification, and Regression. The architecture for Image Processing typically involves Convolutional Neural Networks (CNNs), which are specifically designed to handle image data by exploiting its spatial structure.
In this article, we’ll be looking at the architecture specifically for image classification, other architectures with regards to Image Processing can be found in the source book.
The CNN architecture for Image Processing typically consists of an Input Layer which takes raw image data, some Convolutional Layers which apply convolution operations to extract features like edges, corners, or textures. Each layer uses filters (small matrices) that slide over the image to produce feature maps.
In the cats vs dogs example, Chollet uses multiple Convolutional Layers with ReLU activation to capture increasingly complex patterns, he also uses Pooling Layers which reduce spatial dimensions (e.g height and width) while preserving important features, making the model computationally efficient and less prone to overfitting. MaxPooling is commonly used where the maximum value in a region of the feature map is retained.
Fully Connected (Dense) Layers are used after several Convolutional and Pooling Layers. The feature maps are flattened into a 1D vector and passed through Dense Layers for Classification. In the cats vs dogs example, the final Layer has a single neuron with a sigmoid activation function for Binary Classification (i.e cats vs dogs). Dropout Layers are also added to prevent overfitting by randomly deactivating a fraction of neurons during training.
In the book, Chollet trains the CNN on a dataset of 2,000 images (1,000 cats, 1,000 dogs) using a Binary cross-entropy loss function, as it’s a Binary Classification task. An rmsprop optimizer, while tracking accuracy metric to evaluate performance. The model was trained for 30–100 epochs, with early stopping to prevent overfitting and a batch size of 32 or 64, balancing memory usage and training stability. The CNN model for the cats vs. dogs task achieves around 80–85% accuracy on the test set without data augmentation. With data augmentation and fine-tuning (e.g using a pre-trained model like VGG16), accuracy can exceed 90%.
# Code Summary
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential([
layers.Conv2D(32, (3, 3), activation="relu", input_shape=(150, 150, 3)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation="relu"),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(128, (3, 3), activation="relu"),
layers.MaxPooling2D((2, 2)),
layers.Flatten(),
layers.Dense(512, activation="relu"),
layers.Dropout(0.5),
layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
loss="binary_crossentropy",
metrics=["accuracy"])
- Model and architecture for Timeseries:
Moving on to Timeseries, another classic Machine Learning problem. It’s quite similar to Regression in the sense that they’re both used to predict continuous values i.e real numbers on scale e.g $12,000.75 or 28.1°C. The difference though is Regression deals with independent data points e.g predicting house prices based on size, location, and number of rooms or predicting a person’s weight given their height and age. While timeseries deals with dependent data points collected periodically e.g weather data (temperature each hour), sales data (monthly revenue) etc. Models for Timeseries can be used for predicting stock prices or weather or sensor readings.
The architecture for Timeseries often involves Recurrent Neural Networks (RNNs) or their variants, like Long Short-Term Memory (LSTM) networks, which are designed to handle sequential data by maintaining a "memory" of previous inputs.
In Deep Learning with Python, Chollet demonstrates Timeseries prediction using an LSTM model for forecasting temperature based on weather data.
The Architecture includes an Input Layer which takes sequences of data (e.g temperature, humidity over time). Each sequence is a 2D array of shape (timesteps, features), where timesteps is the number of time points, and features is the number of variables (e.g temperature, pressure). Next are LSTM layers which process sequential data, capturing temporal dependencies. Each LSTM unit has a memory cell to retain information over long periods, addressing the vanishing gradient problem of basic RNNs.
Chollet uses a single LSTM Layer with 32 units and ReLU activation. Next, a fully connected Dense Layer maps the processed sequence to the output. For Timeseries, a single Neuron with no activation function outputs a continuous value.
For the loss function we use Mean Squared Error (MSE) which is standard for Regression tasks. An rmsprop optimizer and we’re tracking Mean Absolute Error (MAE) metric just like with the Regression model. We train typically for 20–50 epochs, with a batch size of 16–32 and early stopping to prevent overfitting.
In Chollet’s example, the LSTM model predicts temperature with an MAE of around 2.5°C after training on a weather dataset.
# Code Summary
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential([
layers.LSTM(32, activation="relu", input_shape=(timesteps, features)),
layers.Dense(1) # No activation for linear output
])
model.compile(optimizer="rmsprop", loss="mse", metrics=["mae"])
- Model and architecture for Natural Language Processing:
I hope you’re enjoying this article and learning about these models and their architectures. Of course I can’t exhaust all the different configurations or different ways you can architect your model to solve your particular problem but I hope I’ve given you enough of a starting point where you can go on and develop well performing models. I truly believe reading all the articles in this series and having the reference book by your side and making some further research on your own, you’ll be well on your way to becoming at least a junior Machine Learning Engineer or AI researcher which was the goal when we started this series. I’m beyond grateful I’ve been able to be a part of your journey and I’m grateful for my growth in the space as well.
Alright on to the next problem that has been solved by Machine Learning and Deep Learning, Natural Language Processing. This involves teaching machines question answering, text generation, or translation. Basically training a model to understand and generate human language, which is unstructured and context-dependent.
NLP architectures are designed to handle sequential and contextual data, such as sentences or paragraphs. Unlike the feedforward Neural Networks used for Binary Classification, Multiclass Classification, or Regression, NLP architectures often involve specialized Layers to process text.
Here’s a breakdown of the key components:
Embedding Layer: Text data consists of words or tokens, which machines don’t naturally understand. The Embedding Layer converts each word into a dense vector (e.g., a 100-dimensional vector) that captures its semantic meaning. For example, the words "king" and "queen" might have similar vectors because they share related meanings. Chollet uses embeddings to represent words in the IMDb sentiment analysis task.
Recurrent Layers (LSTM/GRU): For tasks where word order matters, Recurrent Neural Networks (RNNs) like Long Short-Term Memory (LSTM) or Gated Recurrent Unit (GRU) Layers process sequences of words one at a time, maintaining a "memory" of previous words to capture context. These are effective for moderate-sized datasets and tasks like sentiment analysis.
Transformer Layers: For advanced NLP tasks, Transformers have become the gold standard. Unlike RNNs, Transformers use a mechanism called “self-attention” to process all words in a sequence simultaneously, capturing long-range dependencies (e.g understanding that "it" in a sentence refers to something mentioned much earlier). Transformers come in three flavors:
Encoder-only (e.g BERT): Ideal for understanding tasks like text classification or sentiment analysis.
Decoder-only (e.g GPT): Suited for generative tasks like text generation or chatbots.
Encoder-decoder (e.g T5, BART): Best for tasks like translation or summarization, where input and output sequences are involved.
Output Layer: The Output Layer depends on the task. For classification (e.g sentiment analysis), a Dense Layer with softmax activation outputs probabilities for each class (e.g positive or negative). For regression tasks (e.g predicting a score), a single Neuron with no activation is used. For generative tasks, the output might involve predicting the next word in a sequence.
Loss Function: For classification tasks, categorical cross-entropy is common (or binary cross-entropy for binary tasks). For generative tasks like translation, custom loss functions or cross-entropy over token probabilities are used.
Optimizer: Adam or rmsprop are standard, with AdamW (a variant of Adam) being popular for Transformer-based models.
Evaluation Metrics: Accuracy is used for classification tasks, while metrics like BLEU (for translation) or F1 score (for tasks like named entity recognition) are used for more complex NLP tasks.
In Chollet’s example, a simple NLP model for IMDb sentiment analysis uses an embedding layer followed by an LSTM layer and a Dense output layer. For advanced tasks, he introduces Transformer-based architectures, which are more powerful and scalable for large datasets.
The trained NLP model processes text by first converting words into embeddings, then modeling their relationships using Recurrent Layers (e.g., LSTM) or Transformer layers. For example, in sentiment analysis, the model learns to predict whether a movie review is positive or negative by analyzing word patterns and their context. A model trained on the IMDb dataset with an Embedding Layer and LSTM can achieve around 85–88% accuracy, similar to our Binary Classification model from article three. However, with Transformer-based models like BERT, fine-tuned on the same dataset, accuracy can exceed 90%.
For generative tasks, a Transformer-based model like GPT can generate coherent text by predicting the next word in a sequence, while encoder-decoder models like T5 can translate text or summarize documents. These models are often pretrained on massive datasets (e.g., Wikipedia, internet text) and fine-tuned for specific tasks, making them highly effective.
# Code Summary
from tensorflow import keras
from tensorflow.keras import layers
# Define vocabulary size and maximum sequence length
vocab_size = 10000 # Consider the top 10,000 most common words
max_len = 100 # Limit each review to 100 words
model = keras.Sequential([
layers.Embedding(vocab_size, 128, input_length=max_len), # Embedding layer: 10,000 words to 128-dimensional vectors
layers.LSTM(64), # LSTM layer with 64 units to capture sequential dependencies
layers.Dense(1, activation="sigmoid") # Output layer for binary classification
])
model.compile(optimizer="rmsprop",
loss="binary_crossentropy",
metrics=["accuracy"])
# Note: Before training, preprocess the text data using keras.preprocessing.text.Tokenizer
# to convert words to indices and pad sequences to a fixed length (max_len).
# Code Summary two
from tensorflow import keras
from tensorflow.keras import layers
from transformers import TFAutoModel, AutoTokenizer
# Load a pretrained Transformer model (e.g BERT)
model_name = "bert-base-uncased"
transformer_model = TFAutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Define a simple Keras model with a Transformer
inputs = keras.Input(shape=(max_len,), dtype="int32") # Tokenized input
transformer_output = transformer_model(inputs)[0] # Get the last hidden state
pooled_output = layers.GlobalAveragePooling1D()(transformer_output) # Pool the sequence
output = layers.Dense(1, activation="sigmoid")(pooled_output) # Binary classification
model = keras.Model(inputs, output)
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
- Model and architecture for Generative Deep Learning:
This is the final model and architecture we’ll be treating. Generative Deep Learning involves training machines on existing data to create new data such as generating realistic faces, writing coherent stories, generating scientific formulas or music. Generative Deep Learning pushes Machine Learning into creative domains.
The challenge lies in training models to produce data that is both realistic and diverse, capturing the underlying distribution of the training dataset without merely memorizing it.
Generative models use specialized architectures to create new data. The two most prominent architectures are Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs).
Variational Autoencoders (VAEs)
Encoder: Compresses input data (e.g an image) into a latent space, represented by a mean and variance, creating a probabilistic distribution.
Decoder: Reconstructs data from samples drawn from this latent space, generating new data similar to the input.
Architecture: Typically uses Dense or Convolutional Layers for both encoder and decoder. The latent space is regularized to ensure smooth interpolation and meaningful sampling.
Loss Function: Combines reconstruction loss (e.g mean squared error to ensure the output resembles the input) with KL-divergence (to regularize the latent space, encouraging a normal distribution).
Use Case: VAEs are great for tasks like image generation or latent space exploration, where you want to interpolate between data points (e.g morphing one digit into another).
Generative Adversarial Networks (GANs)
Generator: Takes random noise as input and generates fake data (e.g an image).
Discriminator: Classifies data as real (from the training set) or fake (from the generator).
Architecture: Both generator and discriminator are typically convolutional neural networks (CNNs) for image tasks or Dense networks for other data types. They are trained adversarially in a min-max game, where the generator improves by trying to “fool” the discriminator, and the discriminator improves by better distinguishing real from fake.
Loss Function: Adversarial loss, balancing the generator’s goal (producing realistic data) and the discriminator’s goal (accurate classification). Binary cross-entropy is commonly used.
Use Case: GANs excel at generating high-quality images, such as realistic faces or artwork, but require careful tuning to avoid issues like mode collapse (where the generator produces limited varieties of outputs).
Both GANs and VAEs use:
Optimizer: Adam or RMSprop are standard for both VAEs and GANs due to their stability in training deep networks.
Evaluation Metrics: Generative models are tricky to evaluate. Common metrics include Inception Score (for image quality and diversity) and Fréchet Inception Distance (FID, measuring similarity to real data). For text, metrics like perplexity or human evaluation are used.
In Chollet’s book (Chapter 12), he focuses on VAEs and GANs for generating MNIST digits and explores text generation with LSTMs. He also covers creative applications like DeepDream (amplifying patterns in images) and neural style transfer (applying the style of one image to another).
The trained generative model produces new data by sampling from a latent space (VAEs) or random noise (GANs). For example, a VAE trained on MNIST digits can generate new digit-like images by sampling from its latent space, while a GAN can produce realistic faces by transforming random noise.
In Chollet’s MNIST VAE example, the model reconstructs digits with reasonable quality, allowing interpolation between digits (e.g smoothly transitioning from a “3” to an “8”). A GAN trained on the same dataset can generate sharper, more realistic digits but may suffer from training instability. For text generation, an LSTM-based model can produce coherent sequences, though it may lack the sophistication of modern Transformer-based models like GPT.
Generative models are powerful but challenging to train. VAEs tend to produce blurrier outputs but are more stable, while GANs can generate sharper results but require tricks like label smoothing or feature matching to avoid issues like mode collapse. In practice, Chollet trains these models for 20–100 epoch, (text generation models may need longer training 50–200 epochs depending on the dataset size and complexity) with batch sizes of 32–128, often using GPUs to handle the computational load.
Performance is typically evaluated qualitatively (e.g., visual inspection of generated images) or with metrics like FID.
# Code Summary (MNIST VAE example)
from tensorflow import keras
from tensorflow.keras import layers
# Define input shape (MNIST images: 28x28 pixels, 1 channel)
input_shape = (28, 28, 1)
latent_dim = 2 # Latent space dimension for simplicity
# Encoder
inputs = keras.Input(shape=input_shape)
x = layers.Conv2D(32, 3, activation="relu", strides=2, padding="same")(inputs)
x = layers.Conv2D(64, 3, activation="relu", strides=2, padding="same")(x)
x = layers.Flatten()(x)
x = layers.Dense(16, activation="relu")(x)
z_mean = layers.Dense(latent_dim, name="z_mean")(x)
z_log_var = layers.Dense(latent_dim, name="z_log_var")(x)
# Sampling function to create latent vectors
def sampling(args):
z_mean, z_log_var = args
epsilon = keras.backend.random_normal(shape=(keras.backend.shape(z_mean)[0], latent_dim))
return z_mean + keras.backend.exp(0.5 * z_log_var) * epsilon
z = layers.Lambda(sampling)([z_mean, z_log_var])
# Decoder
decoder_inputs = layers.Input(shape=(latent_dim,))
x = layers.Dense(7 * 7 * 64, activation="relu")(decoder_inputs)
x = layers.Reshape((7, 7, 64))(x)
x = layers.Conv2DTranspose(64, 3, activation="relu", strides=2, padding="same")(x)
x = layers.Conv2DTranspose(32, 3, activation="relu", strides=2, padding="same")(x)
outputs = layers.Conv2DTranspose(1, 3, activation="sigmoid", padding="same")(x)
# Define models
encoder = keras.Model(inputs, [z_mean, z_log_var, z], name="encoder")
decoder = keras.Model(decoder_inputs, outputs, name="decoder")
vae = keras.Model(inputs, decoder(z), name="vae")
# Loss: reconstruction loss + KL-divergence
reconstruction_loss = keras.losses.binary_crossentropy(inputs, outputs)
reconstruction_loss *= 28 * 28
kl_loss = 1 + z_log_var - keras.backend.square(z_mean) - keras.backend.exp(z_log_var)
kl_loss = keras.backend.sum(kl_loss, axis=-1) * -0.5
vae_loss = keras.backend.mean(reconstruction_loss + kl_loss)
vae.add_loss(vae_loss)
vae.compile(optimizer="adam")
And there you have it! Congratulations on making it to the end of this article.
The final article in our journey through Machine Learning, Deep Learning, and Artificial Intelligence! From article zero to this seventh piece, we’ve explored the fundamentals of ML/AI, built models for Binary Classification, Multiclass Classification, Regression, Image Processing, Time series, Natural Language Processing, and Generative Deep Learning. Each step has been a building block, transforming complex concepts into practical, hands-on projects inspired by “Deep Learning with Python” by Francois Chollet.
I’m incredibly grateful for your companionship on this journey. Whether you’ve been with me since the first article or joined along the way, our curiosity and dedication have made this series meaningful. Over these articles, we’ve gone from simple feedforward neural networks to advanced architectures like CNNs, LSTMs, and Transformers, and even ventured into the creative world of generative models. My goal was to provide a clear path for aspiring Machine Learning Engineers and AI researchers, and I hope this series has equipped you with the knowledge and confidence to take your next steps.
As you move forward, keep experimenting with the models and architectures we’ve covered. Dive deeper into “Deep Learning with Python” for detailed explanations and additional techniques. Explore libraries like TensorFlow, Keras, or Hugging Face’s transformers for cutting-edge tools, and practice on datasets from Kaggle or TensorFlow Datasets to build your portfolio. The ML/AI field is vast and ever evolving, but with the foundation you’ve built here, you’re well on your way to becoming a junior Machine Learning Engineer or AI researcher.
What’s next for me? I intended to create a group chat for ML/AI enthusiasts to connect and explore the field together, I already have some great individuals I know personally that can join but I’m not totally convinced if I should do it yet. I’ll leave the option open and if there’s a demand for it then I’ll create it.
Of course I’ll keep learning, I have more books to learn from next (Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig) and I intend to explore Kaggle competitions as well.
Thank you for joining me on this adventure. Keep learning, stay curious, and let’s continue pushing the boundaries of what Machines can do. Until our paths cross again, happy coding!
Hit me up on any of the platforms I’ll be sharing this article on or leave a comment and I’ll reply as soon as I can, till we meet, sayonara!



