Types of Machine Learning Explained: Supervised, Unsupervised, and Reinforcement Learning

Alright, you’ve got the basics down from Part 1. Now comes the question I get asked constantly: “What type of ML should I use for [X]?” The answer is almost always “it depends,” but let me break down the actual decision framework I use.

There are three main flavors of ML, and picking the wrong one can waste months of work. I’ve made that mistake. Let’s make sure you don’t.

Quick Navigation: Part 1: Foundations → Part 2: Types of ML (You are here) → Part 3: Python Frameworks → Part 4: MLOps → Part 5: Enterprise Apps

Types of Machine Learning - Supervised, Unsupervised, Reinforcement Learning
Figure 1: The three main paradigms of machine learning

The Three Amigos: A Quick Overview

1. Supervised Learning

  • You have: Labeled examples (X → Y)
  • You want: To predict Y for new X
  • Example: Email → Spam or Not Spam

2. Unsupervised Learning

  • You have: Data without labels
  • You want: To find hidden patterns
  • Example: Group customers by behavior

3. Reinforcement Learning

  • You have: An environment to interact with
  • You want: To learn optimal actions
  • Example: Game AI, robotics

Supervised Learning: The Workhorse

This is where you’ll spend 80% of your ML career. Seriously. If someone asks you to “add ML to our product,” they probably mean supervised learning.

The setup is simple: you have historical data where you know the answer (labels), and you want to predict answers for new data.

Classification: Yes/No and Multiple Choice

When your output is a category:

  • Binary: Spam or not spam. Fraud or legit. Will churn or won’t churn.
  • Multi-class: What type of customer is this? Which product category? What disease?

Real story: At a financial services client, we built a credit approval classifier. The business wanted a magic “approve/deny” button. What we actually shipped was a probability score (0-100) with recommended thresholds they could adjust. This gave them control over the risk/reward tradeoff. Sometimes the ML is just the starting point for a human decision.

# Fraud Detection Classifier - Real Production Pattern
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_fscore_support, roc_auc_score
from sklearn.preprocessing import StandardScaler
import pandas as pd

# In reality, you'd load from your data warehouse
df = pd.DataFrame({
    'amount': [100, 5000, 50, 10000, 200, 15000, 75, 8000],
    'hour': [14, 3, 10, 2, 16, 4, 11, 1],
    'is_international': [0, 1, 0, 1, 0, 1, 0, 1],
    'is_fraud': [0, 1, 0, 1, 0, 1, 0, 1]
})

X = df.drop('is_fraud', axis=1)
y = df['is_fraud']

# ALWAYS scale for gradient boosting - learned this the hard way
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.25, random_state=42, stratify=y
)

# GradientBoosting is my default for tabular fraud detection
model = GradientBoostingClassifier(n_estimators=100, max_depth=4, random_state=42)
model.fit(X_train, y_train)

# For fraud, precision/recall matter more than accuracy
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]

precision, recall, f1, _ = precision_recall_fscore_support(y_test, y_pred, average='binary')
auc = roc_auc_score(y_test, y_prob)

print(f"Precision: {precision:.2%} (of flagged, how many are fraud)")
print(f"Recall: {recall:.2%} (of actual fraud, how many we caught)")
print(f"AUC-ROC: {auc:.3f}")

Regression: Predicting Numbers

When your output is continuous:

  • Predict house prices
  • Estimate customer lifetime value
  • Forecast sales next quarter
  • Predict patient length of stay
# Hospital Length of Stay Prediction
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, r2_score
import numpy as np

# Features: age, severity_score, num_prior_admissions, is_emergency
X_train = np.array([
    [65, 7.2, 3, 1], [45, 3.5, 1, 0], [72, 8.1, 4, 1],
    [38, 2.1, 0, 0], [58, 5.5, 2, 0], [81, 9.0, 5, 1]
])
# Labels: actual length of stay in days
y_train = np.array([12, 3, 15, 1, 7, 18])

model = RandomForestRegressor(n_estimators=50, random_state=42)
model.fit(X_train, y_train)

# New patient: 55yo, severity 6.0, 2 prior admissions, emergency
new_patient = np.array([[55, 6.0, 2, 1]])
predicted_days = model.predict(new_patient)[0]

print(f"Predicted length of stay: {predicted_days:.1f} days")
# Use this for bed management, staffing, insurance pre-auth

Unsupervised Learning: Finding Structure in Chaos

This is where things get philosophically interesting. You don’t have labels. You’re not predicting anything specific. You’re asking: “What patterns exist in this data that I haven’t noticed?”

Clustering: Grouping Similar Things

The classic use case: customer segmentation. You have customer data but don’t know how to categorize them. Let the algorithm find natural groupings.

# Customer Segmentation - The Right Way
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import numpy as np

# Customer features: annual_spend, visit_frequency, avg_basket
customers = np.array([
    [50000, 52, 960],   # High value, weekly shopper
    [45000, 48, 937],
    [5000, 12, 416],    # Low value, monthly
    [6000, 15, 400],
    [25000, 100, 250],  # Medium spend, very frequent, small baskets
    [28000, 95, 294],
    [40000, 6, 6666],   # Bulk buyers - few visits, huge baskets
    [35000, 4, 8750],
])

# CRUCIAL: scale your features. K-means uses distance.
scaler = StandardScaler()
customers_scaled = scaler.fit_transform(customers)

# Start with a reasonable k, then validate
kmeans = KMeans(n_clusters=4, random_state=42, n_init=10)
labels = kmeans.fit_predict(customers_scaled)

# The real work: interpreting clusters
for cluster_id in range(4):
    mask = labels == cluster_id
    cluster_data = customers[mask]
    print(f"\nCluster {cluster_id}: {mask.sum()} customers")
    print(f"  Avg spend: ${cluster_data[:, 0].mean():,.0f}")
    print(f"  Avg visits: {cluster_data[:, 1].mean():.0f}")
    print(f"  Avg basket: ${cluster_data[:, 2].mean():,.0f}")

# Now give them business names based on behavior

Anomaly Detection: Finding the Weird Stuff

This is unsupervised learning’s secret weapon in production systems. You don’t know what fraud looks like tomorrow—but you know it’ll be different from normal.

# Anomaly Detection with Isolation Forest
from sklearn.ensemble import IsolationForest
import numpy as np

# Simulated server metrics: cpu%, memory%, request_latency_ms
normal_data = np.random.normal(
    loc=[45, 60, 120], scale=[10, 12, 30], size=(500, 3)
)

# Some anomalies: high CPU, memory leak, latency spike
anomalies = np.array([
    [95, 55, 110],    # CPU spike
    [50, 98, 150],    # Memory leak
    [60, 70, 800],    # Latency through the roof
])

all_data = np.vstack([normal_data, anomalies])

# Contamination = expected % of outliers. Tune this.
detector = IsolationForest(contamination=0.01, random_state=42)
predictions = detector.fit_predict(all_data)

# -1 = anomaly, 1 = normal
anomaly_indices = np.where(predictions == -1)[0]
print(f"Found {len(anomaly_indices)} anomalies")
print(f"Indices: {anomaly_indices}")
# Last 3 indices should be our planted anomalies

Reinforcement Learning: The Cool Kid You Probably Don’t Need

RL is fascinating. It’s also probably not what you need.

In RL, an agent learns by taking actions in an environment and receiving rewards or penalties. Think game AI, robots, or algorithmic trading.

When RL makes sense:

  • Sequential decision problems (what you do now affects future options)
  • You have a simulator or cheap way to try things
  • The optimal strategy isn’t obvious and changes over time

Honest take: In 15 years, I’ve deployed exactly 2 RL systems to production. Both were for dynamic pricing optimization where we could A/B test safely. Everything else that looked like RL was better solved with simpler approaches.

# Simple Q-Learning Example - Educational
import numpy as np

# Scenario: Auto-scaling decisions
# States: 0=low_load, 1=medium, 2=high
# Actions: 0=scale_down, 1=hold, 2=scale_up

n_states, n_actions = 3, 3
Q = np.zeros((n_states, n_actions))

alpha = 0.1    # learning rate
gamma = 0.95   # discount factor
epsilon = 0.1  # exploration rate

def get_reward(state, action):
    if state == 2 and action == 0:  # Scale down during high load
        return -10  # Bad idea
    elif state == 0 and action == 2:  # Scale up during low load
        return -5  # Wasting money
    elif state == action:
        return 10  # Match action to load
    return 0

# Training loop
for episode in range(1000):
    state = np.random.randint(n_states)
    
    if np.random.random() < epsilon:
        action = np.random.randint(n_actions)
    else:
        action = np.argmax(Q[state])
    
    reward = get_reward(state, action)
    next_state = np.random.randint(n_states)
    
    Q[state, action] += alpha * (
        reward + gamma * np.max(Q[next_state]) - Q[state, action]
    )

print("Learned Q-table:")
print("Actions: scale_down | hold | scale_up")
for s, name in enumerate(['Low load', 'Med load', 'High load']):
    print(f"{name}: {Q[s].round(1)}")

Decision Framework: Which Type Do I Use?

Q: Do you have labeled data (examples with known answers)?

YES: Use Supervised Learning

NO: Continue...

Q: Do you need to make sequential decisions with feedback?

YES: Consider Reinforcement Learning (but think twice)

NO: Continue...

Q: Looking for groups or anomalies?

→ Use Unsupervised Learning

Key Takeaways

  • Supervised learning = you have answers, train a predictor. This is 80% of enterprise ML.
  • Unsupervised learning = find patterns without labels. Great for exploration and anomaly detection.
  • Reinforcement learning = sequential decisions with feedback. Rarely needed, often overhyped.
  • Start with the simplest approach that could work.

What's Next

In Part 3, we'll get into the Python framework wars: Scikit-learn vs. TensorFlow vs. PyTorch. I'll tell you when I use each, share some hard-learned lessons about production deployment, and help you avoid the "let's use deep learning for everything" trap.


References & Further Reading

  • Pattern Recognition and Machine Learning by Christopher Bishop - The theoretical foundation
  • An Introduction to Statistical Learning - statlearning.com - Free, excellent for beginners
  • Scikit-learn Clustering Documentation - scikit-learn.org
  • Sutton & Barto: Reinforcement Learning - The RL Bible (free online)
  • Google Cloud ML Best Practices - cloud.google.com
  • Isolation Forest Paper - Liu, Ting, Zhou (2008) - Original anomaly detection research

Questions? Strongly-held opinions about clustering algorithms? Find me on GitHub or drop a comment.


Discover more from Code, Cloud & Context

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.