Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

[0.1.1] - 2026-04-17

Maintenance release focused on aligning the abstract-retrieval semantics across code, templates, docs, tests, and metadata. No breaking public-API changes; the one renamed kwarg keeps its old name as a deprecated alias for this release cycle.

Added

Abstract retrieval now falls back through a DOI-only cascade when CrossRef does not return an abstract: Semantic Scholar (/paper/DOI:{doi}?fields=abstract) → PubMed (ESearch DOI→PMID, then EFetch PMID→abstract). The cascade is only invoked when the user's original raw input carried a DOI; DOIs inferred by fuzzy search do not trigger it, so that a possibly-wrong candidate does not cost extra roundtrips. In particular, a local BibTeX entry with no DOI field — regardless of whether other stages would later resolve one — does not trigger the abstract cascade.
Semantic Scholar search results now carry the abstract field, which propagates through _convert_search_metadata into the final BibTeX output whenever the identification stage already resolved the entry through SS.
EnricherModule._get_semantic_scholar_abstract(doi) helper for DOI-based Semantic Scholar abstract retrieval. Handles 404 / 429 gracefully by returning None.
_complete_fields gained an allow_abstract_fallback kwarg (default False) that gates the new cascade. _enrich_single_entry passes True only when the raw entry contributed a DOI.
Default journal_article_full template now lists abstract as an optional field, so the template declaration matches what the enricher emits. The older journal_article_with_abstract template is retained as a compatibility alias and will stay available for at least one release cycle.
Regression test test_enrich_single_entry_no_doi_in_raw_skips_abstract_fallback pinning the "no-DOI-in raw ⇒ no Semantic-Scholar / PubMed network call" guarantee at the _enrich_single_entry layer, so a future refactor of the raw_has_doi gate cannot silently start leaking network calls for local-only inputs.

Changed

_get_pubmed_abstract now requires a DOI and no longer falls back to PubMed title search. The removed title-based path empirically returned the abstract of an unrelated paper (e.g., the Zhang 2020 AI Review DOI 10.1007/s10462-019-09792-7 pulled the abstract of a different RSI segmentation paper), which is strictly worse than returning None for downstream semantic cross-checks such as the sci skill.
Abstract coverage on an internal 10-DOI cross-publisher spot-check (Nature, Science, PLOS, Cell, IEEE CVPR, Frontiers, arXiv, Springer, ACM, plus one deliberately invalid DOI) rose from 4/9 to 8/9. This number is a local indicator, not a release gate: reproducing it requires a live network and the probe scripts are no longer in the repository.

Deprecated

_complete_fields(..., allow_pubmed_fallback=...) is deprecated in favour of allow_abstract_fallback. The old name still works for one release cycle and emits DeprecationWarning. It was renamed because the flag actually gates the entire Semantic-Scholar + PubMed cascade, not PubMed alone.

Removed

IdentifierModule._check_doi_content_consistency and the consistency_score / low_consistency warning path. A fuzzy string-similarity score on bibliographic fields is not a reliable signal for detecting fabricated references, and it was only emitted as a logger.warning that downstream tools could not act on. Citation-authenticity verification belongs at the abstract-vs-claim semantic layer in the consuming tool, not at the bibliographic-string layer here.

[0.1.0] - 2026-04-17

First formal PyPI release since 0.0.12.

Added

RST documentation using Sphinx
Full API reference documentation
FAQ section with common questions
Contributing guidelines
Pre-commit hooks configuration
Google-style docstrings with Args/Returns for all public API functions
Auto-deploy documentation to GitHub Pages via CI

Changed

Split monolithic pipeline.py (~3000 lines) into a proper onecite/pipeline/ package with one module per stage (parser.py / identifier.py / enricher.py / formatter.py) plus a _utils.py for shared helpers. Public imports (from onecite.pipeline import IdentifierModule) and mocking targets (patch("onecite.pipeline.requests.get", ...)) continue to work unchanged because __init__.py re-exports every public symbol and keeps requests at the package level.
Unify CrossRef request and parsing methods; all CrossRef calls now go through a single helper with a proper User-Agent header and mailto query-string parameter.
Rewrite fuzzy-search scoring as a weighted title / author / year / venue model with three confidence tiers (auto-adopt / interactive / cautious) and a unified low-confidence threshold.
Simplify identifier routing; CrossRef and Semantic Scholar are always consulted for text queries, with signal-based additional queries to PubMed / Google Books / OpenAIRE / BASE.
Use bibtexparser.dumps() for BibTeX rendering.
Expose use_google_scholar as a real CLI flag and API parameter instead of a hard-coded False.
Clarify that templates define metadata-field requirements and a fallback BibTeX entry type, not output formatting.
Refactored exception hierarchy
Added type hints to Python API
Updated README examples
Bumped minimum Python version declaration in docs to 3.10
Updated CI actions to latest versions (checkout v4, setup-python v5)
Updated copyright year to 2024-2025
Fixed Documentation URL in pyproject.toml to point to GitHub Pages

Removed

APA and MLA output renderers; they produced inconsistent output and the CLI now rejects anything other than --output-format bibtex. Users wanting APA/MLA should post-process the BibTeX through pandoc or citeproc-py.
Hard-coded "well-known paper" shortcut that masked failures on the main example input.
MCP integration page and all related references
.readthedocs.yml (docs now hosted on GitHub Pages)
docs/_build/ build artifacts from repository

Fixed

README / docs/index.rst / docs/faq.rst no longer advertise OpenAlex or dblp as data sources — they were never wired into the code.
README quick-start example now shows booktitle (NeurIPS) instead of journal = "arXiv preprint" for the @inproceedings sample.
docs/api/pipeline.rst rewritten to match the actual module structure; removed references to classes and methods that never existed (Validator / Identifier / Completer / Formatter, set_source_priority, set_timeout, add_template_path).
docs/output_formats.rst, docs/faq.rst, docs/quick_start.rst, docs/python_api.rst, docs/templates.rst, docs/index.rst and docstrings in core.py / formatter.py no longer advertise APA / MLA output.
Crossref author names parsed as given family instead of mangled concatenations.
Semantic Scholar HTTP 429 responses return an empty candidate list cleanly instead of bubbling up.
Previously-unused exception classes (ParseError, ValidationError, FormatError) are now actually raised in the right places.
CONTRIBUTING.md no longer tells developers to use a requirements.txt that does not exist; the documented install is pip install -e .[dev].
black formatting is enforced via pyproject.toml [tool.black] plus a pre-commit hook.
URL-bearing entries are no longer queried twice.
Fallback paths mark entries as identification_failed rather than fabricating plausible-looking but invented metadata.
CrossRef and Semantic Scholar response parsing edge cases
API documentation using incorrect return value fields (output_content -> results)
Version number inconsistencies across metadata files
Python version requirement inconsistencies in docs (3.7 -> 3.10)

[0.0.11] - 2024-10-19

Added

Custom YAML-based template system
Support for multiple output formats (BibTeX, APA, MLA)
Interactive mode for ambiguous reference selection
Support for DOI, arXiv, PMID, ISBN, and GitHub identifiers
Integration with 9 major academic data sources
Test suite

Changed

Refactored core processing pipeline
Reordered data source priority (CrossRef first for DOI queries)
Clearer error messages on failed lookups

Fixed

Encoding issues with non-ASCII characters in author names
DOI parsing for URLs with trailing query strings
Python 3.10 compatibility issues

[0.0.10] - 2024-10-01

Added

Initial Python API
Basic citation processing
Support for journal articles and conference papers

Changed

Better title matching for fuzzy searches

Fixed

PubMed API response handling
Semantic Scholar rate limit handling

[0.0.9] and Earlier

See GitHub Releases for details on older versions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

[Unreleased]

[0.1.1] - 2026-04-17

Added

Changed

Deprecated

Removed

[0.1.0] - 2026-04-17

Added

Changed

Removed

Fixed

[0.0.11] - 2024-10-19

Added

Changed

Fixed

[0.0.10] - 2024-10-01

Added

Changed

Fixed

[0.0.9] and Earlier

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

[Unreleased]

[0.1.1] - 2026-04-17

Added

Changed

Deprecated

Removed

[0.1.0] - 2026-04-17

Added

Changed

Removed

Fixed

[0.0.11] - 2024-10-19

Added

Changed

Fixed

[0.0.10] - 2024-10-01

Added

Changed

Fixed

[0.0.9] and Earlier