Skip to content

theaashaychahande/datacleaner

Repository files navigation

cleanframe — simple pandas-based data cleanup

cleanframe detects common column types (email, phone, date, name) and automatically normalizes them with simple rule-based fixers and validators.

Quick start

  1. Install dependencies
python -m pip install pandas
  1. Run unit tests
pip install pytest
pytest -q
  1. Use the library from a script
python - <<'PY'
import sys
sys.path.insert(0, 'datacleaner')
import pandas as pd
from cleanframe import fix

df = pd.DataFrame({
    'email': ['AASHAY@GMAIL.COM', 'vansh@EXAMPLE.com', 'bad'],
    'phone': ['800-123-4567', '(800) 333-4444', 'nope'],
    'date': ['2023-13-01', '2020-02-29', 'invalid'],
    'name': ['aashay', 'VANSH', '3rd street'],
})
cleaned = fix(df)
print(cleaned)
PY

Usage note

  • To clean a CSV file, call cleanframe.fix from a Python script and write the resulting DataFrame to disk.

Removed files and tools

  • This repo was cleaned to remove several helper/experimental scripts. Use pytest for tests and import cleanframe.fix in your Python code to perform cleaning.

Design notes

  • Detection: heuristics based on validators and pattern matching
  • Fixing: deterministic normalization, no external APIs or ML
  • Invalid values become NaN after cleaning (safe for later processing)

About

CleanFrame - One-Command Data Cleaning Auto clean messy datasets. Fix emails, phones, dates, and names with zero setup

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages