Skip to content

Bump chardet from 5.2.0 to 7.1.0#1649

Open
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/chardet-7.1.0
Open

Bump chardet from 5.2.0 to 7.1.0#1649
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/chardet-7.1.0

Conversation

@dependabot
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Mar 12, 2026

Bumps chardet from 5.2.0 to 7.1.0.

Release notes

Sourced from chardet's releases.

chardet 7.1.0

Features

  • Added PEP 263 encoding declaration detection — # -*- coding: ... -*- and # coding=... declarations on lines 1–2 of Python source files are now recognized with confidence 0.95 (#249)
  • Added chardet.universaldetector backward-compatibility stub so that from chardet.universaldetector import UniversalDetector works with a deprecation warning (#341)

Fixes

  • Fixed false UTF-7 detection of ASCII text containing ++ or +word patterns (#332)
  • Fixed 0.5s startup cost on first detect() call — model norms are now computed during loading instead of lazily iterating 21M entries (#333)
  • Fixed undocumented encoding name changes between chardet 5.x and 7.0 — detect() now returns chardet 5.x-compatible names by default (#338)
  • Improved ISO-2022-JP family detection — recognizes ESC sequences for ISO-2022-JP-2004 (JIS X 0213) and ISO-2022-JP-EXT (JIS X 0201 Kana)
  • Fixed silent truncation of corrupt model data (iter_unpack yielded fewer tuples instead of raising)
  • Fixed incorrect date in LICENSE

Performance

  • 5.5x faster first-detect time (~0.42s → ~0.075s) by computing model norms as a side-product of load_models()
  • ~40% faster model parsing via struct.iter_unpack for bulk entry extraction (eliminates ~305K individual unpack calls)

New API parameters

  • Added compat_names parameter (default True) to detect(), detect_all(), and UniversalDetector — set to False to get raw Python codec names instead of chardet 5.x/6.x compatible display names
  • Added prefer_superset parameter (default False) — remaps legacy ISO/subset encodings to their modern Windows/CP superset equivalents (e.g., ASCII → Windows-1252, ISO-8859-1 → Windows-1252). This will default to True in the next major version (8.0).
  • Deprecated should_rename_legacy in favor of prefer_superset — a deprecation warning is emitted when used

Improvements

  • Switched internal canonical encoding names to Python codec names (e.g., "utf-8" instead of "UTF-8"), with compat_names controlling the public output format
  • Added lookup_encoding() to registry for case-insensitive resolution of arbitrary encoding name input to canonical names
  • Achieved 100% line coverage across all source modules (+31 tests)
  • Updated benchmark numbers: 98.2% encoding accuracy, 95.2% language accuracy on 2,510 test files
  • Pinned test-data cloning to chardet release version tags for reproducible builds

Full changelog: https://chardet.readthedocs.io/en/latest/changelog.html

7.0.1

Fixes

  • Fixed false UTF-7 detection of SHA-1 git hashes (#324, fixing #323) — requirements files with VCS pins (e.g., +4bafdea3...) were misdetected as UTF-7, breaking tools like tox
  • Fixed _SINGLE_LANG_MAP missing aliases for single-language encoding lookup (e.g., big5big5hkscs)
  • Fixed PyPy TypeError in UTF-7 codec handling

Improvements

  • Retrained bigram models — 24 previously failing test cases now pass
  • Updated language equivalences for mutual intelligibility (Slovak/Czech, East Slavic + Bulgarian, Malay/Indonesian, Scandinavian languages)

New Contributors

... (truncated)

Changelog

Sourced from chardet's changelog.

7.1.0 (2026-03-11)

Features:

  • Added PEP 263 encoding declaration detection — # -*- coding: ... -*- and # coding=... declarations on lines 1–2 of Python source files are now recognized with confidence 0.95 (Dan Blanchard <https://github.com/dan-blanchard>, [#249](https://github.com/chardet/chardet/issues/249) <https://github.com/chardet/chardet/issues/249>)
  • Added chardet.universaldetector backward-compatibility stub so that from chardet.universaldetector import UniversalDetector works with a deprecation warning (Dan Blanchard <https://github.com/dan-blanchard>, [#341](https://github.com/chardet/chardet/issues/341) <https://github.com/chardet/chardet/issues/341>)

Fixes:

  • Fixed false UTF-7 detection of ASCII text containing ++ or +word patterns (Dan Blanchard <https://github.com/dan-blanchard>, [#332](https://github.com/chardet/chardet/issues/332) <https://github.com/chardet/chardet/issues/332>, [#335](https://github.com/chardet/chardet/issues/335) <https://github.com/chardet/chardet/pull/335>_)
  • Fixed 0.5s startup cost on first detect() call — model norms are now computed during loading instead of lazily iterating 21M entries (Dan Blanchard <https://github.com/dan-blanchard>, [#333](https://github.com/chardet/chardet/issues/333) <https://github.com/chardet/chardet/issues/333>, [#336](https://github.com/chardet/chardet/issues/336) <https://github.com/chardet/chardet/pull/336>_)
  • Fixed undocumented encoding name changes between chardet 5.x and 7.0 — detect() now returns chardet 5.x-compatible names by default (Dan Blanchard <https://github.com/dan-blanchard>, [#338](https://github.com/chardet/chardet/issues/338) <https://github.com/chardet/chardet/pull/338>)
  • Improved ISO-2022-JP family detection — recognizes ESC sequences for ISO-2022-JP-2004 (JIS X 0213) and ISO-2022-JP-EXT (JIS X 0201 Kana) (Dan Blanchard <https://github.com/dan-blanchard>_)
  • Fixed silent truncation of corrupt model data (iter_unpack yielded fewer tuples instead of raising) (Dan Blanchard <https://github.com/dan-blanchard>_)
  • Fixed incorrect date in LICENSE (Dan Blanchard <https://github.com/dan-blanchard>_)

Performance:

  • 5.5x faster first-detect time (~0.42s → ~0.075s) by computing model norms as a side-product of load_models() (Dan Blanchard <https://github.com/dan-blanchard>_)
  • ~40% faster model parsing via struct.iter_unpack for bulk entry extraction (eliminates ~305K individual unpack calls) (Dan Blanchard <https://github.com/dan-blanchard>_)

... (truncated)

Commits
  • f170eb4 perf: add early-exit check in PEP 263 detection for non-Python data
  • 81dd662 refactor: use pathlib.Path instead of str for filesystem paths in scripts
  • bf3ea5b test: achieve 100% test coverage
  • ce5e991 fix: adjust benchmark speedup threshold for pure Python vs mypyc
  • bfc8659 docs: update thread scaling table with GIL vs free-threaded benchmarks
  • feff427 Remove plans that got thrown in other directory
  • f854da5 fix: add --threads validation and docstring updates in compare_detectors.py
  • 8029f87 fix: only include threads in timing cache keys, not memory cache keys
  • cb3c71d feat: add --threads passthrough to compare_detectors.py
  • d168ef0 feat: add --threads option to benchmark_time.py for concurrent detection
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [chardet](https://github.com/chardet/chardet) from 5.2.0 to 7.1.0.
- [Release notes](https://github.com/chardet/chardet/releases)
- [Changelog](https://github.com/chardet/chardet/blob/main/docs/changelog.rst)
- [Commits](chardet/chardet@5.2.0...7.1.0)

---
updated-dependencies:
- dependency-name: chardet
  dependency-version: 7.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Mar 12, 2026
@codecov
Copy link

codecov bot commented Mar 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.72%. Comparing base (b98e44b) to head (579fcc6).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1649      +/-   ##
==========================================
- Coverage   54.74%   54.72%   -0.02%     
==========================================
  Files         335      335              
  Lines       27400    27400              
==========================================
- Hits        15000    14995       -5     
- Misses      12400    12405       +5     
Flag Coverage Δ
functionaltests 0.00% <ø> (ø)
unittests 54.72% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Development

Successfully merging this pull request may close these issues.

0 participants