3.3.1 What is a Neural Network?
3.3 Neural Networks and Deep Learning
3.3.1 What is a Neural Network?
Definition & Origin: A neural network (also artificial neural network, ANN) is a computational model inspired by the brain’s neural structure. It consists of simple processing units (neurons) arranged in layers with weighted connections. Each neuron computes a weighted sum of its inputs plus a bias and applies a non‐linear activation function to produce an output. For example, a neuron’s pre-activation is
'z = ∑ w_i x_i + b'
and its output is y = f(z), where f() might be a sigmoid, ReLU, or other nonlinearity. Modern neural nets can have millions of such units and hundreds of layers, and when they contain at least two hidden layers they are often called deep neural networks.
Modeled loosely on the brain, a neural network is a network of interconnected “nodes” or neurons that pass signals through weighted connections. Early work on neural networks dates to McCulloch and Pitts (1943), who described simple binary “artificial neurons.” Their ideas (and Hebb’s learning rule of the 1940s) laid the foundation. After decades of fits and starts, neural nets exploded back onto the scene in the 1980s (with backpropagation) and especially in the 2010s as cheap GPUs enabled very large models.
Mathematical Formulation
Neuron Model: Formally, a neuron outputs
y = f(z), where z = w^T x + b
x = input vector (x_1, ..., x_n)
w = weight vector
b = bias term
f = activation function
Common activations include:
-
Sigmoid (logistic): f(z) = 1 / (1 + e^-z), squashing outputs to (0,1).
-
Tanh: f(z) = tanh(z), output in (–1,1).
-
ReLU: f(z) = max(0, z), zero for negative inputs, linear otherwise.
-
Softmax: Used in multi-class output layers to produce probabilities: for vector f(z)_i = e^{z_i} / ∑ e^{z_j}.
Each connection weight w_i scales its input and is learned during training. The activation function introduces nonlinearity, enabling the network to approximate complex functions.
Loss Functions: To train the network, we need a loss (cost) function measuring prediction error. For regression tasks, a common choice is mean squared error (MSE):
L_MSE = (1/N) ∑ (y_k - y_hat_k)^2
averaging squared differences between true (y) and predicted (hat y) outputs. For classification, a typical loss is cross-entropy.
For binary classification: L = -(1/N) ∑ [y log(y_hat) + (1 - y) log(1 - y_hat)]
For multi-class : L_CE = -(1/N) ∑ ∑ y_{k,i} log(y_hat_{k,i})
The network’s training objective is simply to minimize this loss over the training data. In practice, the loss is differentiable so that gradient-based optimizers can be used.
Network Architectures
Neural networks come in many architectures, specialized for different data types and tasks. Common examples include:
-
Feedforward (Fully Connected) Networks: Data flows in one direction from input to output, passing through one or more dense layers. Each layer’s neurons connect to all neurons in the next layer. A typical model might have an input layer, two hidden layers, and an output layer. Training such multilayer perceptrons (MLPs) uses standard backpropagation.
-
Convolutional Neural Networks (CNNs): Designed for grid-like data (e.g. images). Convolutional layers apply learned filters (kernels) that scan across the input, detecting local patterns (edges, textures, shapes). Layers of convolutions are often interleaved with pooling (downsampling) layers and ultimately connected to fully connected layers for classification. CNNs (e.g. AlexNet, VGG, ResNet) have been hugely successful in image and video analysis.
-
Recurrent Neural Networks (RNNs): Suitable for sequential data (time series, text). RNNs maintain a hidden state that evolves over time: at each step, the network takes the current input and the previous state as input. This allows information to persist. Simple RNNs have trouble with long-range dependencies, so variants like LSTM (Long Short-Term Memory) and GRU incorporate gating mechanisms to remember information longer. RNNs (and their sequence-to-sequence extensions) power applications like language translation and speech recognition.
-
Transformers (Self-Attention Models): Modern NLP models (BERT, GPT, etc.) use the transformer architecture. Rather than sequential recurrence, transformers apply attention mechanisms to relate all positions in the input sequence, enabling parallelism and capturing long-range context. Pretrained transformers have set new state-of-the-art in language tasks (question answering, summarization, etc.).
The above image illustrates a typical feedforward neural network with layered structure. Each layer’s neurons (yellow/orange) pass activations forward to the next layer. In contrast, a CNN would arrange neurons in filter maps over the input, and an RNN would loop its activations over time. Architectures can also be combined (e.g. CNN+RNN for video, or adding skip connections like in ResNet to allow training very deep nets).
Training Algorithms
Neural networks are trained by optimization algorithms that adjust weights to reduce the loss. The backbone of training is gradient descent. We compute the gradient of the loss with respect to each weight and take a step in the negative gradient direction:
w ← w - η * ∂L/∂w
η is the learning rate
Iteratively updates weights
This update rule is applied iteratively over many epochs of the data. Variants of gradient descent include batch GD (using all data), stochastic GD (updating per example), and mini-batch GD (using small random batches each step). In practice, more sophisticated optimizers like Adam and RMSProp adapt the learning rate per-parameter, often speeding up convergence.
Backpropagation: Computing gradients efficiently is achieved through the backpropagation algorithm. Backpropagation systematically applies the chain rule of calculus to propagate the error gradients from the output layer backward through the network to the input layer. Given a scalar loss function , we compute the gradient of the loss with respect to each weight as follows:
∂L/∂w = ∂L/∂y * ∂y/∂z * ∂z/∂w
where:
is the pre-activation value (weighted input),
is the activated output,
is the gradient of the loss with respect to the output,
is the derivative of the activation function,
because is linear in .
Backpropagation thus efficiently computes the gradient of the loss with respect to every parameter in the network by reusing intermediate derivatives at each layer (a dynamic programming strategy). Once the gradients are calculated, the model parameters (weights and biases) are updated using an optimization algorithm such as gradient descent:
w ← w - η * ∂L/∂w where is the learning rate.
Through many iterations of this process — consisting of a forward pass, loss computation, backward pass (backpropagation), and weight updates — the network progressively improves its performance by minimizing the loss function on the training data.
This technique lies at the heart of training deep neural networks and is a cornerstone of modern machine learning.
Over many iterations of forward passes, loss computation, backpropagation, and weight updates, the network learns to fit the training data.
# Pseudocode for training a neural network
initialize weights W
for epoch = 1 to N_epochs:
for each batch of (inputs X, targets Y):
Y_pred = forward_pass(X, W) # Compute outputs
loss = compute_loss(Y_pred, Y) # e.g. MSE or cross-entropy
gradients = backpropagate(loss, W)
W = W - learning_rate * gradients # Gradient descent step
Implementation Examples
To make this concrete, here are simple code snippets showing how one might implement a neural network in popular frameworks:
TensorFlow (Keras) Example: A simple dense network for image classification (e.g. MNIST digits) could be built as follows:
import tensorflow as tf
from tensorflow import keras
# Define a sequential model
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)), # Flatten 2D input to 1D
keras.layers.Dense(128, activation='relu'), # Hidden layer
keras.layers.Dropout(0.2), # Dropout for regularization
keras.layers.Dense(10, activation='softmax') # Output layer for 10 classes
])
# Compile with optimizer and loss function
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train on data (x_train, y_train) for 5 epochs
model.fit(x_train, y_train, epochs=5)
This Keras example defines a feedforward network with one hidden layer and uses the Adam optimizer and categorical cross-entropy loss. TensorFlow (with Keras) handles the gradient computations and weight updates internally.
PyTorch Example: Below is a minimal example of a neural network module in PyTorch:
import torch
from torch import nn
class SimpleNet(nn.Module):
def __init__(self):
super(SimpleNet, self).__init__()
# One fully connected layer: 784 inputs -> 10 outputs
self.fc = nn.Linear(28*28, 10)
def forward(self, x):
x = x.view(-1, 28*28) # flatten
x = torch.relu(self.fc(x)) # linear + ReLU
return torch.softmax(x, dim=1) # softmax output
model = SimpleNet()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
# Training loop (one batch example)
for epoch in range(5):
optimizer.zero_grad()
outputs = model(train_images)
loss = criterion(outputs, train_labels)
loss.backward() # compute gradients
optimizer.step() # update weights
In PyTorch, we manually define the network (by subclassing nn.Module
), compute a forward pass, compute loss, call backward()
for gradients, and then step the optimizer to update weights. This flexibility allows easy customization of architectures.
Applications of Neural Networks
Neural networks have become ubiquitous across many fields. Some representative applications include:
-
Healthcare: CNNs analyze medical images (X-rays, MRIs, CT scans) for diagnostics, e.g. detecting tumors or retinal disease. Deep learning models can also predict patient outcomes from EHR data or assist in medical image reconstruction. For example, convolutional models pretrained on ImageNet have been successfully adapted (via transfer learning) to specialized medical tasks.
-
Finance: Networks help in credit scoring (predicting loan default), fraud detection (flagging anomalous transactions), and algorithmic trading. For instance, deep nets can learn complex patterns in market data or customer behavior that classical methods might miss. (Credit: Chang et al., 2023).
-
Autonomous Vehicles: Modern self-driving cars rely heavily on neural networks for perception. Cameras and LIDAR feed data into CNN-based vision systems for object detection, lane following, and scene understanding. As one report notes, fleets of sensor-rich vehicles generate “a rich seam of training data for neural networks” to improve driving skills. Tesla’s Autopilot and other driver-assist systems use deep learning for real-time image recognition and decision-making.
-
Natural Language Processing (NLP): Sequence models and transformers power translation, speech recognition, and virtual assistants. Systems like Google Translate, Apple’s Siri, and Amazon’s Alexa use deep neural nets to understand and generate language. As SAP observes, voice commands and language translation in smartphones are classic NLP use-cases powered by AI. Large transformer models (e.g. GPT-4) can generate coherent text and answer questions, enabling chatbots and content creation.
-
Other domains: Recommendation engines (e.g. Netflix, YouTube), real-time translation, robotics control, climate modeling, game playing (DeepMind’s AlphaZero), and even creative arts (deepfake generation, style transfer) all leverage neural network techniques.
Recent Commercial Examples
In recent years many high-profile products demonstrate neural networks in action. Examples include:
-
GPT-3/GPT-4 (OpenAI): State-of-the-art transformer models for language, deployed in the ChatGPT chatbot (2022–23). These generate human-like text and power automated question-answering and content tools.
-
BERT (Google): A transformer-based model used in Google Search and other NLP tasks since 2018, enabling better understanding of queries and context.
-
AlphaFold (DeepMind/Google): Uses deep learning to predict protein 3D structures, revolutionizing biology and drug discovery.
-
Tesla Autopilot (Tesla, 2020s): Uses convolutional and recurrent neural nets to process camera and sensor data for autonomous driving. Tesla’s published materials emphasize training its neural networks on billions of miles of driving data.
-
Medical Imaging AI products: Commercial tools (e.g. aidoc, Zebra Medical Vision) use CNNs to assist radiologists in detecting pathologies.
-
Virtual Assistants: Siri (Apple), Alexa (Amazon), and Google Assistant all run deep learning pipelines (speech recognition, language understanding) to interact with users in real time.
-
Fraud Detection Systems: Banks and credit card companies deploy neural net models (often graph neural networks) to spot fraudulent transaction patterns in real time.
These examples underscore how neural networks have moved from research into widespread deployment across industries.
Summary: In summary, neural networks are multilayer, parametric models inspired by the brain. They are defined mathematically by neuron activations (weighted sums plus biases) and trained by optimizing a loss via gradient descent and backpropagation. Architectures vary (feedforward, convolutional, recurrent, attention-based) to suit different data modalities. Modern frameworks like TensorFlow and PyTorch make it easy to implement these models in code. The impact of neural networks is vast, with applications ranging from healthcare and finance to autonomous vehicles and natural language understanding. As hardware and algorithms continue to advance, neural networks are becoming ever more powerful tools in AI.
References:
[1] L. Hardesty, “Explained: Neural networks,” MIT News, 14 Apr. 2017.
[2] “Neural network (machine learning),” Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/Neural_network_(machine_learning) [Accessed: May 18, 2025].
[3] A. Oppermann, “How Loss Functions Work in Neural Networks and Deep Learning,” Built In, 14 Dec. 2022.
[4] “Backpropagation,” Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/Backpropagation [Accessed: May 18, 2025].
[5] G. Coroller et al., “Artificial intelligence and machine learning in cancer imaging,” Commun. Med., vol. 2, art. 133, 2022.
[6] SAP SE, “What is Natural Language Processing? (NLP): A guide,” [Online]. Available: https://www.sap.com/portugal/resources/what-is-natural-language-processing.html [Accessed: May 18, 2025].
[7] M. Harris, “Tesla’s Autopilot Depends on a Deluge of Data,” IEEE Spectrum, 4 Aug. 2022.
댓글
댓글 쓰기