Machine Translation with a Transformer (English → French)

A Transformer-based machine translation project comparing:

a custom Transformer implemented in PyTorch (with hyperparameter experiments), and
a pre-trained T5 translation pipeline (Hugging Face) as a strong baseline.

Evaluation uses BERTScore and METEOR to measure translation quality and semantic similarity.

Project Highlights

Built a custom Transformer and analyzed how hyperparameters affect learning + translation quality.
Used XLM-RoBERTa tokenizer for tokenization + padding for the custom model pipeline.
Benchmarked against pre-trained T5 using the task prefix: translate English to French: and beam search decoding.
Reported performance using Precision / Recall / F1 (BERTScore) + METEOR.

Approaches

1) Custom Transformer (PyTorch)

Goal: learn the Transformer architecture deeply and evaluate the effect of changing hyperparameters.

Pipeline:

Create Train/Val/Test splits
Clean + tokenize source/target text with XLM-RoBERTa tokenizer
Train on training set and validate using BERTScore + METEOR
Tune hyperparameters (embedding size, number of heads, batch size, dropout)

Default run behavior & example translation

Training loss dropped from 5.536 → 2.991 across 10 epochs, and validation loss from 5.388 → 3.829.
Example translation (test):
English: “Hello how are you today”
French: “comment êtes vous aujourd’hui”
with BERTScore P=0.8581, R=0.8435, F1=0.8506 and METEOR=0.2978.

2) Pre-trained T5 (Hugging Face)

Goal: use a strong pretrained Transformer for higher quality translations and compare against the custom model.

Method:

Load T5 + tokenizer
Encode with prefix: translate English to French:
Decode using beam search for better translation quality
Evaluate on test batches with BERTScore + METEOR

Results (Main Comparison)

Model	Precision	Recall	F1	METEOR
Custom Transformer	0.8581	0.8435	0.8506	0.2978
T5 Pre-trained	0.8960	0.8987	0.8972	0.5160

T5 outperforms the custom Transformer across metrics in the report’s comparison.

Hyperparameter Experiments (Custom Transformer)

Experiments below were run (as reported) with epochs=10 and dataset size=7000! for tuning studies.

Embedding Size

Metrics vs embedding size

512: Precision 0.8048 / Recall 0.8070 / F1 0.8057 / METEOR 0.1636
256: Precision 0.8079 / Recall 0.7998 / F1 0.8036 / METEOR 0.1327
128: Precision 0.7997 / Recall 0.7907 / F1 0.7950 / METEOR 0.0913

Training time + loss vs embedding size

512: Train 4.204 / Val 5.682 / Time 1:26:29
256: Train 5.042 / Val 6.181 / Time 41:08
128: Train 5.646 / Val 6.399 / Time 32:10 Observation: Larger embeddings improved learning/quality but required more compute.

Attention Heads

Heads	Training Loss	Validation Loss
8	4.204	5.682
4	4.208	5.661
2	4.264	5.695
1	4.332	5.512

More heads reduced training loss slightly, but validation loss did not consistently improve (generalization limitations).

Batch Size

Batch	Training Loss	Validation Loss
16	3.808	5.509
32	4.204	5.682
64	4.687	5.830

Smaller batch size (16) produced better losses; larger batches increased both train/val loss.

Dropout

Dropout	Training Loss	Validation Loss
0.1	4.204	5.682
0.01	3.721	5.620
0.001	3.647	5.637
0.0001	3.670	5.650

Report conclusion: 0.01 offered the best balance (0.1 underfit, 0.0001 overfit tendency).{index=22}

Known Issues / Limitations

Custom Transformer

Missed parts of the input (e.g., did not capture “Hello” in translation in one observation).
Good BERTScore, but relatively low METEOR indicates fluency/wording gaps.
Larger embeddings/heads help but increase compute requirements.

T5 Pre-trained

Sometimes produces empty translations, which affects metrics (BERTScore can become 0 for those cases).
Testing was slow: ~1 hour 43 minutes for 269 samples.

Future Work

Improve custom Transformer decoding by using beam search instead of greedy.
Try alternative tokenizers (e.g., T5 tokenizer) to improve translation quality.
Speed up evaluation for large datasets; reduce empty outputs in T5 testing.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Machine Translation with Transformers .ipynb		Machine Translation with Transformers .ipynb
Machine Translation with a Transformer.pdf		Machine Translation with a Transformer.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Translation with a Transformer (English → French)

Project Highlights

Approaches

1) Custom Transformer (PyTorch)

2) Pre-trained T5 (Hugging Face)

Results (Main Comparison)

Hyperparameter Experiments (Custom Transformer)

Embedding Size

Attention Heads

Batch Size

Dropout

Known Issues / Limitations

Custom Transformer

T5 Pre-trained

Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Machine Translation with a Transformer (English → French)

Project Highlights

Approaches

1) Custom Transformer (PyTorch)

2) Pre-trained T5 (Hugging Face)

Results (Main Comparison)

Hyperparameter Experiments (Custom Transformer)

Embedding Size

Attention Heads

Batch Size

Dropout

Known Issues / Limitations

Custom Transformer

T5 Pre-trained

Future Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages