Complete step-by-step explanation of neural networks: what neurons are, what weights are, how calculations work, why they're important, with mathematical derivations and solved exercises.
- What is a Neural Network?
- What is a Neuron?
- What are Weights?
- How Neurons Calculate
- Why Weights are Important
- Complete Mathematical Formulation
- Multi-Layer Neural Networks
- Exercise 1: Single Neuron Calculation
- Exercise 2: Multi-Layer Network
- Exercise 3: Learning Weights
- Key Takeaways
A neural network is a computational model inspired by biological neurons that processes information through interconnected nodes (neurons) to make predictions or decisions.
Think of a neural network like a factory:
Input → Worker 1 → Worker 2 → Worker 3 → Output
Neural Network:
Input → Neuron 1 → Neuron 2 → Neuron 3 → Output
Each worker (neuron) does a specific job, and they work together to produce the final result.
Input Layer Hidden Layer Output Layer
● ● ●
● ● ●
● ● ●
● ●
Key Components:
- Input Layer: Receives data
- Hidden Layers: Process information
- Output Layer: Produces predictions
- Connections: Weights between neurons
A neuron (also called a node or unit) is the basic processing unit of a neural network. It receives inputs, performs calculations, and produces an output.
Biological Neuron:
Dendrites → Cell Body → Axon → Synapses
(inputs) (process) (output) (connections)
Artificial Neuron:
Inputs → Weighted Sum → Activation → Output
Input 1 (x₁) ────┐
│
Input 2 (x₂) ────┼──→ [Σ] ─→ [f] ─→ Output (y)
│
Input 3 (x₃) ────┘
Components:
- Inputs: Values fed into the neuron
- Weights: Strength of connections
- Weighted Sum: Sum of inputs × weights
- Bias: Added constant
- Activation Function: Applies nonlinearity
- Output: Final result
Neuron:
┌─────────────────────┐
│ Inputs: x₁, x₂, x₃ │
│ Weights: w₁, w₂, w₃│
│ │
│ z = Σ(xᵢ × wᵢ) + b │
│ y = f(z) │
│ │
│ Output: y │
└─────────────────────┘
Where:
z= weighted sum (before activation)f= activation functiony= output (after activation)
Weights are numerical values that determine the strength of connections between neurons. They control how much each input contributes to the output.
Think of weights like volume controls:
Music Source 1 ──[Volume: 0.8]──→ Speakers
Music Source 2 ──[Volume: 0.3]──→ Speakers
Music Source 3 ──[Volume: 0.5]──→ Speakers
Higher weight = Louder contribution
Neural Network:
Input 1 ──[Weight: 0.8]──→ Neuron
Input 2 ──[Weight: 0.3]──→ Neuron
Input 3 ──[Weight: 0.5]──→ Neuron
Higher weight = Stronger influence
Weights determine:
- How much each input matters
- The relationship between inputs and outputs
- What patterns the neuron learns
Example:
Weight = 0.1:
- Input has small influence
- Weak connection
Weight = 5.0:
- Input has large influence
- Strong connection
Weight = -2.0:
- Input has negative influence
- Inverts the relationship
Weight = 0.0:
- Input has no influence
- Connection is cut
In a layer with multiple neurons:
Input Layer Weights Matrix Output Layer
x₁ ───────────────────┐
│ w₁₁ w₁₂ y₁
x₂ ───────────────────┼─ w₂₁ w₂₂ ──── y₂
│ w₃₁ w₃₂
x₃ ───────────────────┘
Weight Matrix:
W = [w₁₁ w₁₂]
[w₂₁ w₂₂]
[w₃₁ w₃₂]
Each row: Connections from one input
Each column: Connections to one output
Multiply each input by its weight:
Or in vector form:
Where:
-
$x_i$ = input value -
$w_i$ = weight for input$i$ -
$b$ = bias (constant) -
$n$ = number of inputs
Bias shifts the activation:
Bias allows the neuron to:
- Shift activation threshold
- Learn patterns independent of inputs
- Adjust baseline output
Apply nonlinear function:
Common activation functions:
ReLU (Rectified Linear Unit):
Sigmoid:
Tanh:
GELU (used in transformers):
Where
Given:
- Inputs:
$x_1 = 0.5, x_2 = 0.3, x_3 = 0.8$ - Weights:
$w_1 = 0.6, w_2 = 0.4, w_3 = 0.2$ - Bias:
$b = 0.1$ - Activation: ReLU
Step 1: Weighted Sum
z = (0.5 × 0.6) + (0.3 × 0.4) + (0.8 × 0.2) + 0.1
= 0.3 + 0.12 + 0.16 + 0.1
= 0.68
Step 2: Apply Activation
y = ReLU(0.68)
= max(0, 0.68)
= 0.68
Result: Output = 0.68
Different weights = Different patterns:
Pattern 1: Emphasis on Input 1
w₁ = 5.0, w₂ = 0.1, w₃ = 0.1
→ Neuron cares mostly about input 1
Pattern 2: Balanced Weights
w₁ = 0.5, w₂ = 0.5, w₃ = 0.5
→ Neuron treats all inputs equally
Pattern 3: Inverted Relationship
w₁ = -2.0, w₂ = 1.0, w₃ = 1.0
→ Neuron inverses input 1's effect
Training adjusts weights:
Before Training:
Weights: Random values
→ Random predictions
After Training:
Weights: Learned values
→ Accurate predictions
Weights are what the model learns!
High weights: Information flows easily
Low weights: Information flows weakly
Zero weights: Information blocked
Negative weights: Information inverted
Multiple neurons with different weights:
Neuron 1: w₁ = 1.0, w₂ = 0.0 → Detects pattern A
Neuron 2: w₁ = 0.0, w₂ = 1.0 → Detects pattern B
Neuron 3: w₁ = 0.5, w₂ = 0.5 → Detects pattern C
Together: Model learns complex relationships!
Complete neuron calculation:
Where:
-
$\mathbf{x} = [x_1, x_2, ..., x_n]$ = input vector -
$\mathbf{w} = [w_1, w_2, ..., w_n]$ = weight vector -
$b$ = bias (scalar) -
$f$ = activation function -
$z$ = weighted sum (before activation) -
$y$ = output (after activation)
For multiple neurons:
Where:
-
$\mathbf{X} \in \mathbb{R}^{B \times n}$ = input matrix (B samples, n features) -
$\mathbf{W} \in \mathbb{R}^{n \times m}$ = weight matrix (n inputs, m neurons) -
$\mathbf{b} \in \mathbb{R}^{1 \times m}$ = bias vector -
$\mathbf{z} \in \mathbb{R}^{B \times m}$ = weighted sums -
$\mathbf{Y} \in \mathbb{R}^{B \times m}$ = outputs
Example:
Input Matrix:
X = [x₁₁ x₁₂] (2 samples, 2 features)
[x₂₁ x₂₂]
Weight Matrix:
W = [w₁₁ w₁₂] (2 inputs, 2 neurons)
[w₂₁ w₂₂]
Bias Vector:
b = [b₁ b₂] (2 neurons)
Calculation:
z = X × W + b
z₁₁ = x₁₁×w₁₁ + x₁₂×w₂₁ + b₁
z₁₂ = x₁₁×w₁₂ + x₁₂×w₂₂ + b₂
z₂₁ = x₂₁×w₁₁ + x₂₂×w₂₁ + b₁
z₂₂ = x₂₁×w₁₂ + x₂₂×w₂₂ + b₂
Input Layer → Hidden Layer 1 → Hidden Layer 2 → Output Layer
x₁ h₁₁ h₂₁ y₁
x₂ h₁₂ h₂₂ y₂
x₃ h₁₃ h₂₃
Layer 1:
Layer 2:
Output Layer:
Chained together:
Each layer transforms the input!
Given a single neuron with:
- Inputs:
$x_1 = 2.0, x_2 = -1.0, x_3 = 0.5$ - Weights:
$w_1 = 0.5, w_2 = -0.3, w_3 = 0.8$ - Bias:
$b = 0.2$ - Activation function: ReLU
$f(z) = \max(0, z)$
Calculate the output of this neuron.
Compute:
Substitute values:
Calculate each term:
Sum:
Apply ReLU:
The output of the neuron is
Check calculation:
- Input contribution 1:
$2.0 \times 0.5 = 1.0$ - Input contribution 2:
$-1.0 \times -0.3 = 0.3$ - Input contribution 3:
$0.5 \times 0.8 = 0.4$ - Bias:
$0.2$ - Total:
$1.0 + 0.3 + 0.4 + 0.2 = 1.9$ ✓ - ReLU(1.9) = 1.9 ✓
Given a neural network with 2 layers:
Layer 1:
- Inputs:
$x_1 = 1.0, x_2 = 0.5$ - Weights: $W_1 = \begin{bmatrix} 0.6 & 0.4 \ 0.2 & 0.8 \end{bmatrix}$
- Bias:
$b_1 = [0.1, -0.1]$ - Activation: ReLU
Layer 2:
- Inputs: Outputs from Layer 1
- Weights: $W_2 = \begin{bmatrix} 0.5 \ 0.7 \end{bmatrix}$
- Bias:
$b_2 = 0.2$ - Activation: ReLU
Calculate the final output.
Input vector:
Weight matrix:
Bias vector:
Calculate:
Matrix multiplication:
Compute:
Apply ReLU:
Input (from Layer 1):
Weight matrix:
Bias:
Calculate:
Matrix multiplication:
Compute:
Apply ReLU:
The final output is
| Layer | Input | Weights | Bias | Weighted Sum | Activation | Output |
|---|---|---|---|---|---|---|
| 1 | [1.0, 0.5] | $$\begin{bmatrix} 0.6 & 0.4 \\ 0.2 & 0.8 \end{bmatrix}$$ | [0.1, -0.1] | [0.8, 0.7] | ReLU | [0.8, 0.7] |
| 2 | [0.8, 0.7] | $$\begin{bmatrix} 0.5 \\ 0.7 \end{bmatrix}$$ | 0.2 | 1.09 | ReLU | 1.09 |
Given a neuron that should output 1.0 when inputs are [1.0, 1.0] and output 0.0 when inputs are [0.0, 0.0], find appropriate weights and bias.
Use:
- Activation: Sigmoid
$f(z) = \frac{1}{1 + e^{-z}}$ - Desired behavior: AND gate (output 1 only when both inputs are 1)
For input [1.0, 1.0], desired output ≈ 1.0:
For input [0.0, 0.0], desired output ≈ 0.0:
Note: Sigmoid outputs range from 0 to 1, so:
-
$f(z) \approx 1.0$ when$z \gg 0$ (e.g.,$z > 5$ ) -
$f(z) \approx 0.0$ when$z \ll 0$ (e.g.,$z < -5$ )
From equation 2:
For sigmoid to output ≈ 0:
Let's use:
From equation 1:
For sigmoid to output ≈ 1:
Let's use equal weights:
Check:
Test Case 1: Input [1.0, 1.0]
Test Case 2: Input [0.0, 0.0]
Test Case 3: Input [1.0, 0.0]
Test Case 4: Input [0.0, 1.0]
Appropriate weights and bias:
$w_1 = 8.0$ $w_2 = 8.0$ $b = -10.0$
The neuron implements an AND gate correctly!
This demonstrates learning:
- Training finds weights that produce desired behavior
- Different weights = Different logic functions
- Learning algorithms (like backpropagation) automatically find these weights from data!
✅ Neurons are the basic processing units
✅ Receive inputs, compute weighted sum, apply activation
✅ Output is the result of activation function
✅ Weights control connection strength
✅ Determine what patterns neurons learn
✅ Are what the model learns during training
✅ Enable complex pattern recognition
✅ Weighted sum:
✅ Activation:
✅ Matrix form enables efficient computation
✅ Weights enable learning
✅ Control information flow
✅ Enable complex pattern recognition
✅ Are adjusted during training to minimize error
✅ Multiple neurons form layers
✅ Multiple layers form networks
✅ Each layer transforms the input
✅ Deep networks learn hierarchical features
This document provides a comprehensive explanation of neural networks, neurons, weights, and calculations with mathematical derivations and solved exercises.