3.2.1 Types of Machine Learning
Machine learning (ML) is a field of artificial intelligence that enables computers to learn from data and improve over time without being explicitly programmed. ML algorithms are typically categorized into four main types – supervised, unsupervised, semi-supervised, and reinforcement learning. Each type represents a different learning paradigm, with distinct approaches to how models are trained and the kinds of problems they can solve. Below, we explore each type in detail, discussing their definitions, core theoretical principles, common algorithms, and real-world applications in domains such as finance, healthcare, and natural language processing (NLP).
Supervised Learning
Definition: Supervised learning is the most widely used ML approach. In supervised learning, the model is trained on a labeled dataset – which means each training example comes with an input (features) and an expected output (label). The goal is for the algorithm to learn a general mapping from inputs to outputs (a function) that can then be used to predict outputs for new, unseen inputs. This is analogous to a student learning from a teacher: the “right answers” are provided during training, and the model gradually adjusts to minimize its errors.
Core Principles: Formally, supervised learning algorithms build a mathematical model that maps input features to an output , based on example input-output pairs. Learning is typically achieved via iterative optimization of a loss (objective) function that measures the error between the model’s predictions and the true labels. Over time and with enough examples, the model parameters are adjusted (e.g. via gradient descent in neural networks) such that accurately predicts outputs for new inputs. Two major subclasses of supervised learning are classification (predicting discrete categories) and regression (predicting continuous values). For example, an email spam filter is a classification model that learns to label emails as “spam” or “not spam,” while a house price predictor is a regression model that outputs a price based on features like size and location.
Common Algorithms: Supervised learning includes a vast array of algorithms. Some popular ones are:
-
Linear models: Linear Regression for regression tasks and Logistic Regression for binary classification.
-
Support Vector Machines (SVM): Effective for classification by finding an optimal separating hyperplane between classes.
-
Decision Trees and Ensembles: Trees that split data by features; often combined into Random Forests or boosted trees (e.g. XGBoost) for improved accuracy.
-
Neural Networks: Including deep learning models (multi-layer perceptrons, CNNs, RNNs) which learn complex patterns and have driven recent breakthroughs in image and speech recognition.
-
Nearest Neighbors: k-Nearest Neighbors (k-NN), a simple method that classifies a data point based on the majority label of its closest neighbors in the feature space.
Figure: An example of a supervised learning model (support vector machine) finding a decision boundary (solid line) that separates two classes of data points. Dashed lines indicate the margin; the model is trained to maximize this margin while correctly classifying the labeled points.
Supervised learning data requirements include a substantial amount of labeled examples. Preparing high-quality labeled data often requires human annotation and domain knowledge, which can be time-consuming and costly. Models are evaluated on their ability to generalize – i.e. perform well on a held-out test set that was not seen during training. Techniques like cross-validation are used to ensure the model isn’t overfitting (memorizing) the training data.
Applications: Supervised learning is prevalent across industries due to its ability to make accurate predictions with sufficient training data. Some notable applications include:
-
Finance: Credit scoring and risk assessment models are trained on historical loan data to predict the likelihood of default. Stock price prediction and algorithmic trading also leverage supervised time-series models. For example, banks use supervised ML to classify transactions as fraudulent or legitimate based on past labeled examples of fraud.
-
Healthcare: Medical diagnosis often uses supervised learning on labeled clinical data (e.g. images or patient records). For instance, image classification models can be trained on X-ray or MRI scans labeled by doctors to detect diseases like tumors or pneumonia. In fact, many diseases such as Alzheimer’s, heart failure, breast cancer, and pneumonia can be identified from medical data using supervised ML models, demonstrating the technology’s utility in medical diagnosis. Predictive models for patient outcomes or readmissions are also common.
-
Natural Language Processing: Most NLP tasks traditionally use supervised learning. Examples include sentiment analysis (trained on text labeled as positive/negative), language translation (trained on sentence pairs in different languages), and speech recognition (trained on audio transcripts). With large labeled datasets, models like GPT or BERT are fine-tuned in a supervised manner for tasks such as question answering or named entity recognition. An everyday example is an email spam filter, which is trained on emails labeled by users as “spam” or “not spam”.
-
Web and Tech: Recommender systems (e.g. movie or product recommendations) use supervised or closely related techniques to predict user preferences based on past behavior. Similarly, search engines train ranking algorithms to order results based on relevance (using clicks as implicit labels).
Supervised learning’s strength lies in its predictive accuracy when abundant labeled data is available and the assumption that training and future data follow a similar distribution holds. Its limitation is the reliance on labeled data – if labeling is scarce or expensive (as in many real-world settings), performance can suffer. This motivates other learning paradigms like unsupervised and semi-supervised learning when labels are missing or limited.
Unsupervised Learning
Definition: Unsupervised learning deals with unlabeled data. Here, the algorithm is given only input data without any explicit correct output, and it must discover structure or patterns in the data on its own. In other words, the system tries to learn the inherent characteristics of the data (such as groupings or correlations) without a teacher. This is akin to discovering hidden patterns in a dataset by exploration, rather than being told what to look for.
Core Principles: Since there are no labels or reward signals, unsupervised learning objectives are often defined by the data itself. Common goals include clustering (partitioning data into meaningful groups), density estimation (modeling the data distribution ), dimensionality reduction (compressing data while preserving important information), and feature learning. The learning process often involves maximizing or minimizing some inherent property: for example, clustering algorithms seek groupings that maximize intra-cluster similarity and inter-cluster differences, whereas dimensionality reduction methods preserve variance or information content in fewer dimensions. There is no simple error metric like in supervised learning; instead, unsupervised methods may use measures like reconstruction error (in autoencoders) or silhouette score (for clustering quality) to evaluate performance. In practice, unsupervised learning can serve as a preprocessing step to uncover structures that make subsequent supervised learning more effective (for instance, using clustering to create labels, or reducing noise via dimensionality reduction).
Common Techniques and Algorithms: Unsupervised learning encompasses several important techniques:
-
Clustering: Algorithms that automatically group data points into clusters based on similarity. Examples include k-means clustering, hierarchical clustering, DBSCAN, and Gaussian mixture models (which use expectation-maximization). Clustering finds latent groupings; for example, it might separate customers into distinct segments based on purchasing behavior without prior labels. Different clustering algorithms make different assumptions about cluster shape and structure (e.g., k-means finds spherical clusters around centroids).
-
Dimensionality Reduction: Methods that reduce the number of features while retaining significant variance or information. Classical techniques include Principal Component Analysis (PCA), which finds orthogonal directions of maximum variance, and Singular Value Decomposition (SVD). There are also nonlinear methods like t-SNE and UMAP for visualization of high-dimensional data in 2D/3D, and autoencoders (neural networks that compress and decompress data). By reducing dimensionality, these techniques help in visualizing data structure and can improve efficiency and reduce overfitting in predictive models.
-
Anomaly Detection: Often treated as an unsupervised (or one-class) learning problem, where the goal is to identify outliers or unusual data points that don’t fit the norm. Techniques include statistical methods or using clustering/density estimates to flag low-probability points. This is crucial in domains like fraud detection or cybersecurity where anomalous events need to be detected without explicit examples of every possible anomaly.
-
Association Rules: Methods like the Apriori algorithm fall under unsupervised learning, used to discover interesting relationships (rules) between variables in large databases (e.g., market basket analysis finds that people who buy item A also often buy item B).
Figure: Example of clustering (unsupervised learning). The plot shows data points grouped into three clusters without any labels provided. The algorithm has discovered natural groupings in the data based on feature similarity.
Unsupervised learning typically requires sufficient data diversity to reveal meaningful patterns. Because there is no human-provided ground truth, interpreting the results can be subjective – one must assess whether the discovered patterns make sense for the domain. Oftentimes, domain experts are needed to attach meaning to clusters or latent factors (for example, determining that clusters of patients correspond to different disease subtypes).
Applications: Despite the lack of labels, unsupervised learning has powerful real-world uses, especially in making sense of large, unstructured datasets:
-
Customer Segmentation (Marketing/Finance): Businesses use clustering on customer data (purchase history, demographics, behaviors) to segment customers into groups for targeted marketing. This helps identify profiles like “budget shoppers” vs “premium customers” without any pre-assigned categories. Financial institutions might cluster clients by transaction patterns to tailor services.
-
Anomaly and Fraud Detection: In banking and cybersecurity, unsupervised anomaly detection is critical. By modeling what “normal” behavior looks like (e.g., typical credit card transaction patterns), the system can flag outliers that might indicate fraudulent transactions or network intrusions. Since fraud examples are rare, unsupervised or one-class models can catch novel fraud patterns that were not explicitly labeled in training.
-
Healthcare and Biology: Unsupervised learning is used to discover patient subgroups and disease phenotypes. For example, clustering patient health records or genetic data can reveal latent groups of patients who respond differently to treatments or who have similar symptom patterns. In genomics, techniques like PCA and clustering help in identifying gene expression patterns and classifying cell types without prior labels.
-
Natural Language Processing: A lot of early NLP breakthroughs involved unsupervised learning on large text corpora. Topic modeling algorithms (like LDA – Latent Dirichlet Allocation) automatically discover topics in a collection of documents. Word embedding models (Word2Vec, GloVe) learn vector representations of words by unsupervised training (predicting context words), which capture semantic similarity and have become fundamental in NLP. Clustering can also be applied to text – for instance, grouping news articles by topic or clustering search queries by intent.
-
Data Compression and Preprocessing: Unsupervised learning methods are often used to compress data. As noted in research, clustering can be seen as a form of data compression by grouping similar data points. Techniques like PCA are used to reduce feature dimensions in image or signal processing (e.g., compressing images, denoising). In real-time applications, reducing data dimensionality can significantly speed up further processing while retaining essential information.
Unsupervised learning provides insight and initial structure in scenarios where little is known about the data. It is frequently the first step in data analysis – helping to summarize and visualize data, detect patterns, and even generate hypotheses that can later be tested with supervised learning or domain knowledge. However, because unsupervised methods can sometimes find patterns that are coincidental or not meaningful, validation and expert interpretation are important in this paradigm.
Semi-Supervised Learning
Definition: Semi-supervised learning is an approach that bridges the gap between supervised and unsupervised learning. In a semi-supervised setting, the algorithm is trained on a combination of a small amount of labeled data and a larger amount of unlabeled data. The basic idea is to leverage the abundant unlabeled data (which is easy to collect) to improve learning accuracy, while using the limited labeled data to guide the learning process. This reflects many real-world scenarios where obtaining a few labeled examples is feasible, but labeling the entire dataset is too costly or time-consuming.
Core Principles: Semi-supervised learning works under certain assumptions about the data structure that allow unlabeled data to be informative:
-
Smoothness assumption: Data points close to each other are likely to have the same label.
-
Cluster assumption: The data tends to form clusters, and points in the same cluster are likely to share a label. If true, an algorithm can infer that all points in a cluster probably belong to the same class, even if only a few are labeled.
-
Manifold/Low-density separation: The decision boundary should lie in regions of low data density, not cutting through clusters of unlabeled points.
Using these principles, semi-supervised methods try to propagate label information from the labeled subset to the unlabeled data. In practice, a common approach is to first train a preliminary model on the small labeled set, then use that model to predict labels for the unlabeled data, and finally retrain or refine the model with this larger effectively labeled set (this is known as self-training or generating pseudo-labels). Another approach is graph-based label propagation, where each data point (labeled or not) is a node in a graph and edges connect similar points; the known labels spread through the graph to unlabeled nodes over iterations.
Typical Algorithms: Semi-supervised learning does not refer to one specific algorithm but rather a variety of techniques that can be applied on top of base models:
-
Self-Training (Pseudo-Labeling): Train a supervised model on the initial labeled data, use it to predict labels on unlabeled data, add the most confident predictions to the labeled set, and repeat. This effectively increases the training set size gradually.
-
Co-Training: Two (or more) models are trained on different views of the data (e.g., different feature subsets). Each model labels unlabeled examples that it is confident about, and those labels are used to train the other model. This works if features can be split into sufficiently independent sets.
-
Graph-Based Methods: Build a similarity graph of all data points. Initially label the known labeled nodes, then use algorithms like label propagation or label spreading to diffuse those labels across the graph to unlabeled points.
-
Semi-Supervised SVM (Transductive SVM): An extension of support vector machines that tries to find a decision boundary with a large margin not only around labeled data but considering all data, effectively pushing the boundary into low-density regions of the feature space.
-
Generative Models: Another strategy is to model the joint distribution of features and labels . One can use unlabeled data to better estimate (the distribution of inputs) and combine it with the labeled data to model . Techniques like semi-supervised variational autoencoders or generative adversarial networks (GANs) have been explored for this purpose.
Because semi-supervised learning uses unlabeled data to bias the model, it can significantly improve performance when the assumptions hold. Researchers have found that even a small amount of unlabeled data, when used in conjunction with a few labeled examples, can yield considerable improvement in accuracy over using the labeled data alone.
Applications: Semi-supervised learning is valuable in any domain where labeled data is scarce but unlabeled data is plentiful:
-
Fraud Detection: In finance, there might be a limited number of confirmed fraud cases (labels) amid a vast sea of transactions. Semi-supervised models can learn the patterns of normal transactions from the unlabeled data and better detect which new transactions look anomalous or similar to the few fraudulent examples. By combining a handful of labeled fraud cases with a large unlabeled dataset of legitimate transactions, banks can improve detection of fraudulent behavior that was not explicitly labeled.
-
Medical Imaging and Diagnostics: Labeling medical data (like MRI or CT scans) requires expert radiologists and is very costly. Semi-supervised learning allows AI models to make use of many unlabeled scans alongside a smaller labeled set. For example, a model might train on a few hundred scans labeled for tumors and thousands of unlabeled scans, using the structures in the unlabeled data to learn better feature representations. This way, it can identify diseases or abnormalities more accurately by incorporating the characteristics of the unlabeled examples.
-
Speech and Audio Recognition: Annotating audio (transcribing speech or labeling sounds) is labor-intensive. Semi-supervised techniques are used in speech recognition by taking a small set of transcribed audio and a large collection of raw audio. Modern speech models often first learn acoustic models from unlabeled audio (unsupervised or self-supervised learning) and then fine-tune on the labeled portion – effectively a semi-supervised approach that dramatically improves accuracy for low-resource languages or specialized vocabulary.
-
Natural Language Processing: Semi-supervised learning is common in NLP for tasks like text classification, where one might have a few hundred labeled documents but thousands of unlabeled ones. For instance, a sentiment classifier might be improved by leveraging a large corpus of unlabeled reviews to understand language structure and word usage, combined with a small labeled set of positive/negative reviews. In one example, a sentiment analysis model can be bootstrapped with a few labeled examples and a large pool of unlabeled tweets, allowing it to better generalize slang or new expressions.
-
Web Page Classification: As an example from practice, classifying web pages into topics (sports, news, tech, etc.) can use semi-supervised learning: one might label a few pages for each category and then use the vast unlabeled web to help cluster and label other pages that share features with the labeled ones (this was historically mentioned as a real-world use case on forums like Quora).
Semi-supervised learning effectively reduces the reliance on large labeled datasets. It has become increasingly important with the rise of big data, because while data is generated at high volume, labeling capabilities lag behind. Modern deep learning has even more advanced forms of leveraging unlabeled data (like self-supervised learning, where models create surrogate tasks on unlabeled data to pre-train representations), but classical semi-supervised learning remains a key strategy in scenarios where a mix of labeled and unlabeled data is available.
Reinforcement Learning
Definition: Reinforcement learning (RL) is a learning paradigm inspired by behavioral psychology, where an agent learns to make decisions by interacting with an environment and receiving feedback in the form of rewards or penalties. Unlike supervised learning, there are no correct input-output pairs given; instead, the agent autonomously explores and must figure out what actions yield the most reward over time. The problem setup is often formalized as a Markov Decision Process (MDP): at each time step, the agent observes a state of the environment, chooses an action, and the environment returns a new state and a reward signal. The agent’s objective is to learn a policy (a strategy mapping states to actions) that maximizes the cumulative reward it receives in the long run.
Core Principles: RL is fundamentally about trial-and-error learning and delayed gratification. The agent explores actions and learns from consequences: actions leading to high rewards should be repeated, while those leading to penalties or low rewards should be avoided. Key concepts in RL include:
-
State: A representation of the current situation of the agent (e.g., positions of pieces on a chess board, readings from robot sensors).
-
Action: A choice the agent can make at a given state (move a pawn, accelerate, turn left, etc.).
-
Reward: A numerical feedback signal from the environment for an action (e.g., +1 for winning a game, -1 for losing, or a continuous score). Rewards drive the learning; the agent’s goal is to maximize the total reward.
-
Policy: The agent’s strategy, a function that outputs an action given state . This is what the agent is trying to learn – the optimal policy that yields the most reward.
-
Value Function: While a policy tells what action to take, a value function predicts how good a state is in terms of expected future rewards. Similarly, a Q-value estimates the expected reward of taking action in state . Many RL algorithms (like Q-learning) focus on learning value functions as an intermediate step to deriving a policy.
-
Exploration vs. Exploitation: The dilemma of trying new actions to discover their rewards (exploration) versus using the current best-known action to maximize reward (exploitation). A successful agent must balance the two.
The learning process in RL often uses iterative algorithms based on the Bellman equation, which relates the value of a state to the values of subsequent states. Temporal-Difference learning, Monte Carlo simulation, and Dynamic Programming are foundational techniques for evaluating policies and improving them. Modern RL frequently involves function approximation (like deep neural networks) to generalize value functions or policies across large or continuous state spaces – this combination is known as deep reinforcement learning. One famous example is DeepMind’s Deep Q-Network (DQN), which used a neural network to approximate Q-values for playing Atari games, achieving human-level performance.
Key Algorithms: Some representative RL algorithms and approaches include:
-
Q-Learning: A value-based method where the agent learns an action-value function that gives the expected reward of taking action in state and following the optimal policy thereafter. The Q-learning update is: , adjusting the Q-value towards the observed reward plus the estimated future rewards. Q-learning is off-policy (it learns the optimal policy regardless of the agent’s current behavior policy) and does not require a model of the environment.
-
Policy Gradient Methods: Instead of learning value functions, these directly adjust the policy. The REINFORCE algorithm is a basic policy gradient that tweaks policy parameters in the direction of actions that yielded higher return. More advanced variants like Actor-Critic methods maintain both a policy (actor) and a value function (critic) to reduce variance in learning.
-
Deep Reinforcement Learning: Combining neural networks with RL. For example, Deep Q-Network (DQN) uses a neural net to approximate Q-values for each state-action, enabling RL in high-dimensional state spaces (like raw pixel inputs in games). Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization (PPO) are popular algorithms for continuous action spaces (like controlling robot joints or steering angles). These methods require vast amounts of experience data and computation, but have achieved impressive results in complex tasks.
-
Model-Based RL: Approaches where the agent tries to learn a model of the environment’s dynamics (state transition probabilities and reward function). With a model, the agent can plan by simulating outcomes (e.g., using dynamic programming or tree search). AlphaGo, which mastered the game of Go, is a famous example combining model-based search (Monte Carlo Tree Search) with deep RL for policy/value estimation.
-
Hierarchical RL: Algorithms that learn at multiple levels of abstraction (options or skills) to tackle long-horizon problems by breaking them into sub-tasks.
Learning in RL is often more challenging than in supervised learning because feedback (rewards) can be sparse and delayed. The agent might not know which action was critical to eventually getting a reward. Methods like reward shaping and curriculum learning are used to guide training. Moreover, unlike supervised learning where data is static, an RL agent generates its own training data through interaction; this introduces non-i.i.d. (correlated) data and stability issues, often addressed by techniques like experience replay and target networks in DQN.
Applications: Reinforcement learning has gained fame through high-profile successes and is increasingly being applied in real-world decision-making systems:
-
Game Playing: RL is well-known for mastering games. From early successes in backgammon and chess, to DeepMind’s AlphaGo defeating a world champion in Go, and OpenAI’s bots winning against human champions in Dota 2, these accomplishments show how RL agents can learn complex strategies. These systems often use simulation environments to let agents train for millions of episodes. In these cases, the reward is usually winning or losing the game, and the agent refines its play style to maximize the win rate.
-
Trading and Finance: In algorithmic trading, RL agents can learn to make buy/sell/hold decisions to maximize profit or minimize risk. Unlike supervised models that predict prices, an RL trading agent explicitly decides actions and is evaluated by returns on investment. For example, an RL-based trading system might receive a reward proportional to daily portfolio gain and learn policies that outperform standard market strategies. IBM, for instance, has developed an RL-based trading platform that learns to execute trades with the reward signal tied to financial profit and loss. Such agents can adapt to changing market conditions by continuous learning.
-
Robotics and Control: Robotics is a natural field for RL, as many robotics problems are sequential decision tasks (locomotion, manipulation, navigation). RL has been used to train robots to perform tasks like grasping objects, walking, or flying drones by learning from trial and error in either simulations or real environments. In these scenarios, the reward might be defined as forward progress, task completion, or energy efficiency. Notably, RL has enabled robots to learn skills that are hard to hand-engineer, such as complex balancing or coordination tasks.
-
Autonomous Vehicles: Self-driving cars use reinforcement learning to make decisions like lane merging, speed control, or route planning. An agent can be rewarded for safe and efficient driving behavior. While much of self-driving car development is currently supervised (using human-labeled data), RL is used in simulation to fine-tune driving policies and in scenarios like autonomous racing or optimizing fuel efficiency with hybrid engine management.
-
Healthcare: Reinforcement learning is emerging in healthcare for treatment recommendations and personalized medicine. For example, RL algorithms have been applied to optimize dosing regimens in chemotherapy or to suggest treatments in critical care (ICU) by learning policies that maximize patient health outcomes. In recent advances, 2024 saw RL models fine-tuning radiotherapy dosing schedules with remarkable precision, adapting in real-time to patient responses to maximize treatment efficacy while minimizing side effects. This approach treats medical decision-making as an MDP where states include patient vitals and disease markers, actions are treatment choices, and rewards are proxies for patient well-being. Research in this area has shown promise for precision healthcare optimization, although safety and ethical considerations are paramount before deployment in real clinics.
-
Natural Language Processing: RL is used in NLP for tasks where the objective is not easily captured by supervised loss functions. For example, in text summarization, instead of training only on reference summaries (which is supervised), researchers have used RL to optimize for metrics like ROUGE or for human preference – the model is rewarded when it produces a high-quality summary. In dialogue systems and chatbots, RL helps an agent learn to converse by giving rewards for maintaining coherent and helpful conversations. A notable case is training language models (like GPT-based models) with human feedback: Reinforcement Learning from Human Feedback (RLHF) fine-tunes a chatbot by rewarding it for outputs that align with human preferences (making it more helpful or less toxic). This was crucial in training conversational AI like ChatGPT. In machine translation, RL has been explored for simultaneous translation, where an agent learns when to output a translation versus wait for more words, optimizing a trade-off between timeliness and accuracy.
-
Recommendation and Advertising: Online recommendation systems (for news, videos, products) have formulated the problem as an RL task, where each recommendation yields a reward if the user clicks or engages. Over time, the system learns to recommend items that maximize long-term user engagement. Similarly, in online advertising, real-time bidding for displaying an ad can be handled by an RL agent that learns bidding strategies to maximize click-through or conversion, treating user interactions as rewards.
-
Operations Research and Resource Management: RL is increasingly applied to logistics and operations – e.g., optimizing supply chain decisions, traffic signal control in smart cities, or energy grid management. A notable case was Google using DeepMind’s RL to reduce data center cooling energy: the RL agent learned to adjust cooling systems and achieved significant energy savings, outperforming manual control. In 2024, RL algorithms are being used to create more resilient supply chain networks by dynamically adapting to disruptions, demonstrating their value in complex resource allocation problems.
Reinforcement learning is powerful because it learns from experience and can in theory attain superhuman performance in well-defined environments. Its trial-and-error nature, however, means it often requires a lot of data (experimentation), and safety is a concern in real-world applications (an agent exploring wrong actions can be costly or dangerous in domains like healthcare or finance). Nevertheless, with advances in simulation, transfer learning, and safer exploration methods, RL continues to push into real-world use. It is a lively area of research and industrial application, bridging AI with control theory and operational decision-making. As computing capabilities grow and more simulation environments become available, we can expect reinforcement learning to play an increasingly central role in training autonomous systems that interact with the world.
Conclusion
In summary, the four major types of machine learning each offer different frameworks for learning from data:
-
Supervised learning uses labeled examples to learn predictive models for tasks like classification and regression, and is prevalent in applications where historical data with outcomes is available (from predicting stock prices to diagnosing diseases).
-
Unsupervised learning discovers hidden structure in unlabeled data, enabling tasks such as clustering, anomaly detection, and dimensionality reduction, which are essential for making sense of large datasets and preprocessing for other algorithms.
-
Semi-supervised learning combines a small amount of supervision with a large pool of unlabeled data, often achieving performance boosts when labels are scarce – leveraging the best of both worlds to detect fraud, classify content, or assist in medical diagnoses with limited annotations.
-
Reinforcement learning teaches agents through rewards and penalties in an interactive environment, making it suitable for sequential decision problems – from game AI and robotics to personalized treatment and recommendation systems.
Understanding these four paradigms provides a foundation for tackling a wide variety of real-world problems with AI. In practice, many advanced systems integrate multiple learning types. For example, an autonomous vehicle might use supervised learning to interpret camera images, unsupervised learning to cluster driving scenarios, and reinforcement learning to make driving decisions. Likewise, modern NLP models often use self-supervised (unsupervised) pretraining, supervised fine-tuning, and reinforcement learning for alignment. Each type of learning has its strengths and ideal use cases, and often the choice of approach (or combination of approaches) is driven by the nature of the data available and the problem to be solved.
Machine learning continues to evolve rapidly, but these four core categories remain as the conceptual pillars. By applying the appropriate learning paradigm, or blending them, AI practitioners can design solutions that learn efficiently and effectively from the data at hand – whether it’s a handful of labeled examples or a stream of interactive feedback – and create intelligent systems that perform tasks that were once the exclusive domain of human expertise.
References
-
J. Biba and B. Whitfield, “4 Types of Machine Learning to Know,” Built In, Jul. 18, 2024. [Online]. Available: builtin.com/machine-learning/types-of-machine-learning
-
“Machine learning,” Wikipedia, the free encyclopedia, https://en.wikipedia.org/wiki/Machine_learning (accessed May 13, 2025).
-
“Semi-supervised learning definition: Workings, types and more,” Builder.ai Glossary, 2023. [Online]. Available: builder.ai/glossary/semi-supervised-learning
-
“Reinforcement learning,” Wikipedia, the free encyclopedia, https://en.wikipedia.org/wiki/Reinforcement_learning (accessed May 13, 2025).
-
P. Jayaraman, J. Desman, M. Sabounchi, G. N. Nadkarni, and A. Sakhuja, “A primer on reinforcement learning in medicine for clinicians,” npj Digital Medicine, vol. 7, art. 337, 2024.
-
Neptune.ai, “10 Real-Life Applications of Reinforcement Learning,” Neptune Blog. [Online]. Available: neptune.ai/blog/reinforcement-learning-applications (accessed May 13, 2025).
-
V. Chugani, “5 Groundbreaking Applications of Reinforcement Learning in 2024,” MachineLearningMastery, Apr. 21, 2025. [Online]. Available: machinelearningmastery.com/5-groundbreaking-applications-of-reinforcement-learning-in-2024/.
-
M. M. Ahsan, S. A. Luna, and Z. Siddique, “Machine-learning-based disease diagnosis: a comprehensive review,” Healthcare (Basel), vol. 10, no. 3, p. 541, 2022.
댓글
댓글 쓰기