summer-offsite

Why Compress When You Have an Oracle? LLM-Powered Text Compression

Enviroment Setup

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Dataset Download

python3 datasets_download.py

Basic Usage

Compression

python main.py \
  --mode compress \
  --input_path data/text8 \
  --model_name Qwen/Qwen3-0.6B \
  --first_n_tokens 500000 \
  --batch_size 128

Arguments:

--mode Must be compress or decompress. Use compress to encode a text file.
--input_path Path to the input text file to be compressed.
--model_name Name of the LLM used for compression, e.g. Qwen/Qwen3-0.6B or Qwen/Qwen3-8B.
--first_n_tokens Only compress the first N tokens of the input. Useful to limit experiment size.
--batch_size Batch size for LLM inference during compression.

Deompression

python3 main.py --mode decompress --input_path compression_data.bin

In decompression mode you only need to give the compressed file, the other information are stored in the header

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
baselines		baselines
figures		figures
llm-compression-plots		llm-compression-plots
llm_testing		llm_testing
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
combined_c8g.medium_0.019USD.csv		combined_c8g.medium_0.019USD.csv
combined_g6e.xlarge_0.804USD.csv		combined_g6e.xlarge_0.804USD.csv
compression_results_grid_search.json		compression_results_grid_search.json
compression_results_paper.json		compression_results_paper.json
compressor.py		compressor.py
datasets_download.py		datasets_download.py
datasets_info.json		datasets_info.json
gpu-costs.py		gpu-costs.py
grid_search.py		grid_search.py
latex-plot.py		latex-plot.py
main.py		main.py
paper-read_plot.ipynb		paper-read_plot.ipynb
paper_plot.py		paper_plot.py
pytorrent_baseline.csv		pytorrent_baseline.csv
requirements.txt		requirements.txt
results.json		results.json
test.py		test.py
text8_baseline.csv		text8_baseline.csv
text8_c8g.medium_0.019USD.csv		text8_c8g.medium_0.019USD.csv
text8_g6e.xlarge_0.804USD.csv		text8_g6e.xlarge_0.804USD.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

summer-offsite

Enviroment Setup

Dataset Download

Basic Usage

Compression

Deompression

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

summer-offsite

Enviroment Setup

Dataset Download

Basic Usage

Compression

Deompression

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages