Process "Killed" (OOM) due to RAM limits with custom dataset – How to implement lazy loading? #931

tllmmaster · 2025-12-27T05:34:09Z

tllmmaster
Dec 27, 2025

I am trying to train a GPT model from scratch using a custom dataset (approx. 100MB - 1.5GB raw text in Turkmen language). My machine has 16GB of RAM.
However, when I try to run the training script, the process gets "Killed" by the OS, presumably due to an Out of Memory (OOM) issue. It seems that loading and tokenizing the entire text file into memory at once (as done in the GPTDatasetV1 class) is consuming all available RAM.
Could you please advise on how to modify the Dataset class or the data loading pipeline to handle larger datasets efficiently?

Specifically, I am looking for a way to:

Load the data in chunks or use memory mapping (lazy loading) instead of loading the whole file into RAM.
Tokenize on-the-fly or stream the data to avoid the OOM error.

Any code snippets or guidance on implementing a memory-efficient Dataset class for this project would be greatly appreciated.

Environment:
OS: Ubuntu
RAM: 16GB
Python version: 3.10 (mysal üçin)
Dataset size: >100MB
Thanks in advance!

rasbt · 2025-12-27T16:26:09Z

rasbt
Dec 27, 2025
Maintainer

I think the first step here would be to find out what exact step is causing the issue. 1.5 GB still sounds small, and I don't think this is necessarily causing the issue.

I recommend trying to run the code in chapter 2, i.e., just loading the data and using dataloader, to see if the dataset (not the LLM training) is causing the crashing, and also to find out how much memory this step is using.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Process "Killed" (OOM) due to RAM limits with custom dataset – How to implement lazy loading? #931

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Process "Killed" (OOM) due to RAM limits with custom dataset – How to implement lazy loading? #931

Uh oh!

Uh oh!

tllmmaster Dec 27, 2025

Replies: 1 comment

Uh oh!

rasbt Dec 27, 2025 Maintainer

tllmmaster
Dec 27, 2025

rasbt
Dec 27, 2025
Maintainer