Skip to content

Latest commit

 

History

History
277 lines (170 loc) · 4.68 KB

File metadata and controls

277 lines (170 loc) · 4.68 KB

Autoencoders: Concepts, Mathematics, and Applications

This document is a theory-first companion to the NumPy Autoencoder project.

While the main README.md focuses on code and structure, this file explains:

  • What an autoencoder is
  • Why it works
  • The mathematics behind it
  • Reconstruction vs denoising
  • Practical applications

This is written for learners encountering autoencoders for the first time, but who want more depth than surface-level explanations.


1. What Is an Autoencoder?

An autoencoder is a type of neural network that learns to reconstruct its input.

Instead of predicting labels, it tries to learn:

input ≈ output

At first glance, this may look pointless, but the key idea lies in the latent space.


2. Encoder → Latent → Decoder

An autoencoder is composed of three parts:

Input → Encoder → Latent Representation → Decoder → Reconstruction

Encoder

  • Compresses the input
  • Maps high-dimensional data into a lower-dimensional space

Latent Space

  • Compact representation of the data
  • Forces the model to learn meaningful structures in the data

Decoder

  • Expands latent vectors back to the original input space

If the latent space is smaller than the input, the network cannot simply copy the input, it must learn patterns.


3. Mathematical Formulation

Let:

  • $( x \in \mathbb{R}^n )$ be the input
  • $( z \in \mathbb{R}^k )$ be the latent vector, where $( k < n )$

Encoder

[ $z = f_\theta(x)$ ]

Typically: [ $z = \sigma(W_e x + b_e)$ ]

Decoder

[ $\hat{x} = g_\phi(z)$ ]

Typically: [ $\hat{x} = \sigma(W_d z + b_d)$ ]

Reconstruction Loss

The network is trained to minimize the reconstruction error:

[ $\mathcal{L}(x, \hat{x}) = |x - \hat{x}|^2$ ]

(MSE loss)

Training is done using backpropagation just like any other neural network.


4. Why Autoencoders Work

The bottleneck (latent space) forces the model to:

  • Capture correlations
  • Learn manifolds
  • Discard noise and redundancy

In the case of images, this often means learning:

  • Edges
  • Strokes
  • Shapes

5. Reconstruction Autoencoder

Goal

Learn to reconstruct the same clean input:

x → Encoder → Decoder → x̂

MNIST Example

  • Input: 28×28 grayscale digit
  • Flattened to 784 values
  • Latent space: e.g. 16 dimensions

The model learns a compressed representation of handwritten digits.

What the Model Learns

  • Digit structure
  • Stroke thickness
  • Common patterns (loops, lines)

6. Denoising Autoencoder

Key Idea

Train the model to remove noise.

Instead of:

x → x̂

We train:

(x + noise) → x̂ ≈ x

The input is corrupted, but the target remains clean.


Mathematical View

Let: [ $\tilde{x} = x + \epsilon, \quad \epsilon \sim \mathcal{N}(0, \sigma^2)$ ]

Loss: [ $\mathcal{L}(x, g(f(\tilde{x})))$ ]

The model learns to project noisy inputs back onto the data manifold.


7. Denoising MNIST (Intuition)

For MNIST digits:

  • Noise corrupts pixels randomly
  • Digits still lie on a low-dimensional manifold

The autoencoder learns:

  • Which pixels are important
  • Which variations are noise

This makes denoising autoencoders useful as:

  • Pretraining models
  • Feature extractors
  • Robust encoders

8. Applications of Autoencoders

- Dimensionality Reduction

  • Non-linear alternative to PCA

- Denoising

  • Images
  • Audio
  • Sensor signals

- Anomaly Detection

  • Train on normal data
  • High reconstruction error ⇒ anomaly

- Representation Learning

  • Pretraining for downstream tasks

- Generative Models (Extensions)

  • Variational Autoencoders (VAEs)
  • β-VAEs

9. Limitations

  • Plain autoencoders are not truly generative
  • Can learn identity mapping if not constrained
  • Sensitive to architecture choices

This is why constraints like:

  • Bottlenecks
  • Noise
  • Regularization

are important.


References & Further Reading

  1. Hinton & Salakhutdinov (2006) Reducing the Dimensionality of Data with Neural Networks

  2. Vincent et al. (2008) Extracting and Composing Robust Features with Denoising Autoencoders

  3. Goodfellow, Bengio, Courville Deep Learning — Chapter 14

  4. Stanford CS231n Notes Autoencoders & Representation Learning


How This Connects to the Code

  • encoder implements $( f_\theta )$
  • decoder implements $( g_\phi )$
  • latent controls constraints
  • loss defines reconstruction error

Understanding this document will make the code feel obvious, not magical.


Final Thought

Autoencoders are simple.

Their power comes from constraints, not complexity.

Once you truly understand them, many modern models start to make sense.