Skip to content

chore(deps): bump chardet from 7.2.0 to 7.4.0.post2#5371

Closed
dependabot[bot] wants to merge 1 commit intomasterfrom
dependabot/pip/chardet-7.4.0.post2
Closed

chore(deps): bump chardet from 7.2.0 to 7.4.0.post2#5371
dependabot[bot] wants to merge 1 commit intomasterfrom
dependabot/pip/chardet-7.4.0.post2

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot Bot commented on behalf of github Mar 30, 2026

Bumps chardet from 7.2.0 to 7.4.0.post2.

Release notes

Sourced from chardet's releases.

chardet 7.4.0 brings accuracy up to 99.3% (from 98.6% in 7.3.0) and significantly faster cold start thanks to a new dense model format.

What's New

Performance:

  • New dense zlib-compressed model format (v2) drops cold start (import + first detect) from ~75ms to ~13ms with mypyc

Accuracy (98.6% → 99.3%):

  • Eliminated train/test data overlap via content fingerprinting
  • Added MADLAD-400 and Wikipedia as supplemental training sources
  • Improved non-ASCII bigram scoring: high-byte bigrams are now preserved during training and weighted by per-bigram IDF
  • Encoding-aware substitution filtering (substitutions only apply for characters the target encoding can't represent)
  • Increased training samples from 15K to 25K per language/encoding pair

Bug fixes:

  • Added dedicated structural analyzers for CP932, CP949, and Big5-HKSCS (these were previously sharing their base encoding's byte-range analyzer, missing extended ranges)

Metrics

chardet 7.4.0 (mypyc) chardet 6.0.0 charset-normalizer 3.4.6
Accuracy (2,517 files) 99.3% 88.2% 85.4%
Speed 551 files/s 12 files/s 376 files/s
Language detection 95.7% 40.0% 59.2%

Full changelog: https://chardet.readthedocs.io/en/latest/changelog.html

7.3.0

License

  • 0BSD license — the project license has been changed from MIT to 0BSD, a maximally permissive license with no attribution requirement. All prior 7.x releases should also be considered 0BSD licensed as of this release.

Features

  • Added mime_type field to detection results — identifies file types for both binary (via magic number matching) and text content. Returned in all detect(), detect_all(), and UniversalDetector results. (#350)
  • New pipeline/magic.py module detects 40+ binary file formats including images, audio/video, archives, documents, executables, and fonts. ZIP-based formats (XLSX, DOCX, JAR, APK, EPUB, wheel, OpenDocument) are distinguished by entry filenames. (#350)

Bug Fixes

  • Fixed incorrect equivalence between UTF-16-LE and UTF-16-BE in accuracy testing — these are distinct encodings with different byte order, not interchangeable

Performance

  • Added 4 new modules to mypyc compilation (orchestrator, confusion, magic, ascii), bringing the total to 11 compiled modules
  • Capped statistical scoring at 16 KB — bigram models converge quickly, so large files no longer score the full 200 KB. Worst-case detection time dropped from 62ms to 26ms with no accuracy loss.
  • Replaced dataclasses.replace() with direct DetectionResult construction on hot paths, eliminating ~354k function calls per full test suite run

Build

... (truncated)

Changelog

Sourced from chardet's changelog.

Changelog

.. note::

Entries marked "via Claude" were developed with Claude Code <https://claude.ai/code>_. Dan directed the design, reviewed all output, and takes responsibility for the result. Unmarked entries by Dan were written without AI assistance.

7.4.0 (2026-03-26)

Performance:

  • Switched to dense zlib-compressed model format (v2): models are now stored as contiguous memoryview slices of a single decompressed blob, eliminating per-model struct.unpack overhead. Cold start (import + first detect) dropped from ~75ms to ~13ms with mypyc. (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude, [#354](https://github.com/chardet/chardet/issues/354) <https://github.com/chardet/chardet/pull/354>_)

Accuracy:

  • Accuracy improved from 98.6% to 99.3% (2499/2517 files) through a combination of training and scoring improvements:

    • Eliminated train/test data overlap by content-fingerprinting test suite articles and excluding them from training data ([#351](https://github.com/chardet/chardet/issues/351) <https://github.com/chardet/chardet/pull/351>_)
    • Added MADLAD-400 and Wikipedia as supplemental training sources to fill gaps left by exclusion filtering ([#351](https://github.com/chardet/chardet/issues/351) <https://github.com/chardet/chardet/pull/351>_)
    • Improved non-ASCII bigram scoring: high-byte bigrams are now preserved during training (instead of being crushed by global normalization), and weighted by per-bigram IDF so encoding-specific byte patterns contribute proportionally to how discriminative they are ([#352](https://github.com/chardet/chardet/issues/352) <https://github.com/chardet/chardet/pull/352>_)
    • Added encoding-aware substitution filtering: character substitutions during training now only apply for characters the target encoding cannot represent
    • Increased training samples from 15K to 25K per language/encoding pair (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude)

Bug Fixes:

  • Added dedicated structural analyzers for CP932, CP949, and Big5-HKSCS: these superset encodings previously shared their base encoding's byte-range analyzer, missing extended ranges unique to each superset

... (truncated)

Commits
  • e37cf3c fix: prevent dirty-tree version in Windows mypyc wheel builds
  • f9f5af2 Fix a couple errors in the changelog
  • 53755de chore: add .superpowers/ to .gitignore
  • 3a20df6 docs: update README examples with correct outputs
  • 17ec933 docs: note train/test separation in performance.rst
  • 7296d93 docs: add footnote explaining 0 B import memory (lazy loading)
  • 4221536 docs: add chardet 7.0.1-7.3.0 to historical performance table
  • 9dfb65b docs: remove historical performance table spec/plan (preserved in git history)
  • 72413b8 docs: add historical performance table to performance.rst
  • cbcb80d fix: handle None detect() results and missing venvs in benchmarks
  • Additional commits viewable in compare view

@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Mar 30, 2026
@cypress
Copy link
Copy Markdown

cypress Bot commented Mar 30, 2026

Geotrek-admin    Run #16202

Run Properties:  status check passed Passed #16202  •  git commit ee3eb857fb ℹ️: Merge f3b6816c323e89a0b295fd0e06b43a815fd5f8d3 into 753afd7e8c8dcb0d76def459025f...
Project Geotrek-admin
Branch Review refs/pull/5371/merge
Run status status check passed Passed #16202
Run duration 02m 06s
Commit git commit ee3eb857fb ℹ️: Merge f3b6816c323e89a0b295fd0e06b43a815fd5f8d3 into 753afd7e8c8dcb0d76def459025f...
Committer dependabot[bot]
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 22
View all changes introduced in this branch ↗︎

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.44%. Comparing base (790d46a) to head (f3b6816).
⚠️ Report is 12 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5371   +/-   ##
=======================================
  Coverage   98.44%   98.44%           
=======================================
  Files         272      272           
  Lines       22395    22395           
=======================================
  Hits        22046    22046           
  Misses        349      349           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Bumps [chardet](https://github.com/chardet/chardet) from 7.2.0 to 7.4.0.post2.
- [Release notes](https://github.com/chardet/chardet/releases)
- [Changelog](https://github.com/chardet/chardet/blob/main/docs/changelog.rst)
- [Commits](chardet/chardet@7.2.0...7.4.0.post2)

---
updated-dependencies:
- dependency-name: chardet
  dependency-version: 7.4.0.post2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot force-pushed the dependabot/pip/chardet-7.4.0.post2 branch from 3514eb0 to f3b6816 Compare April 7, 2026 07:33
@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot Bot commented on behalf of github Apr 13, 2026

Superseded by #5398.

@dependabot dependabot Bot closed this Apr 13, 2026
@dependabot dependabot Bot deleted the dependabot/pip/chardet-7.4.0.post2 branch April 13, 2026 04:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants