Skip to content

naveen777-github/Machine-Translation-with-Transformers

Repository files navigation

Machine Translation with a Transformer (English → French)

A Transformer-based machine translation project comparing:

  1. a custom Transformer implemented in PyTorch (with hyperparameter experiments), and
  2. a pre-trained T5 translation pipeline (Hugging Face) as a strong baseline.

Evaluation uses BERTScore and METEOR to measure translation quality and semantic similarity.


Project Highlights

  • Built a custom Transformer and analyzed how hyperparameters affect learning + translation quality.
  • Used XLM-RoBERTa tokenizer for tokenization + padding for the custom model pipeline.
  • Benchmarked against pre-trained T5 using the task prefix: translate English to French: and beam search decoding.
  • Reported performance using Precision / Recall / F1 (BERTScore) + METEOR.

Approaches

1) Custom Transformer (PyTorch)

Goal: learn the Transformer architecture deeply and evaluate the effect of changing hyperparameters.

Pipeline:

  • Create Train/Val/Test splits
  • Clean + tokenize source/target text with XLM-RoBERTa tokenizer
  • Train on training set and validate using BERTScore + METEOR
  • Tune hyperparameters (embedding size, number of heads, batch size, dropout)

Default run behavior & example translation

  • Training loss dropped from 5.536 → 2.991 across 10 epochs, and validation loss from 5.388 → 3.829.
  • Example translation (test):
    English: “Hello how are you today”
    French: “comment êtes vous aujourd’hui”
    with BERTScore P=0.8581, R=0.8435, F1=0.8506 and METEOR=0.2978.

2) Pre-trained T5 (Hugging Face)

Goal: use a strong pretrained Transformer for higher quality translations and compare against the custom model.

Method:

  • Load T5 + tokenizer
  • Encode with prefix: translate English to French:
  • Decode using beam search for better translation quality
  • Evaluate on test batches with BERTScore + METEOR

Results (Main Comparison)

image
Model Precision Recall F1 METEOR
Custom Transformer 0.8581 0.8435 0.8506 0.2978
T5 Pre-trained 0.8960 0.8987 0.8972 0.5160

T5 outperforms the custom Transformer across metrics in the report’s comparison.


Hyperparameter Experiments (Custom Transformer)

Experiments below were run (as reported) with epochs=10 and dataset size=7000! for tuning studies.

image

Embedding Size

Metrics vs embedding size

  • 512: Precision 0.8048 / Recall 0.8070 / F1 0.8057 / METEOR 0.1636
  • 256: Precision 0.8079 / Recall 0.7998 / F1 0.8036 / METEOR 0.1327
  • 128: Precision 0.7997 / Recall 0.7907 / F1 0.7950 / METEOR 0.0913

Training time + loss vs embedding size

  • 512: Train 4.204 / Val 5.682 / Time 1:26:29
  • 256: Train 5.042 / Val 6.181 / Time 41:08
  • 128: Train 5.646 / Val 6.399 / Time 32:10 Observation: Larger embeddings improved learning/quality but required more compute.

Attention Heads

image
Heads Training Loss Validation Loss
8 4.204 5.682
4 4.208 5.661
2 4.264 5.695
1 4.332 5.512

More heads reduced training loss slightly, but validation loss did not consistently improve (generalization limitations).


Batch Size

image
Batch Training Loss Validation Loss
16 3.808 5.509
32 4.204 5.682
64 4.687 5.830

Smaller batch size (16) produced better losses; larger batches increased both train/val loss.


Dropout

image
Dropout Training Loss Validation Loss
0.1 4.204 5.682
0.01 3.721 5.620
0.001 3.647 5.637
0.0001 3.670 5.650

Report conclusion: 0.01 offered the best balance (0.1 underfit, 0.0001 overfit tendency).{index=22}


Known Issues / Limitations

Custom Transformer

  • Missed parts of the input (e.g., did not capture “Hello” in translation in one observation).
  • Good BERTScore, but relatively low METEOR indicates fluency/wording gaps.
  • Larger embeddings/heads help but increase compute requirements.

T5 Pre-trained

  • Sometimes produces empty translations, which affects metrics (BERTScore can become 0 for those cases).
  • Testing was slow: ~1 hour 43 minutes for 269 samples.

Future Work

  • Improve custom Transformer decoding by using beam search instead of greedy.
  • Try alternative tokenizers (e.g., T5 tokenizer) to improve translation quality.
  • Speed up evaluation for large datasets; reduce empty outputs in T5 testing.

About

The goal of this project is to understand the Transformer architecture in depth, analyze the impact of key hyperparameters on translation quality, and compare a custom implementation with a T5 pre-trained model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors