All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Maintenance release focused on aligning the abstract-retrieval semantics across code, templates, docs, tests, and metadata. No breaking public-API changes; the one renamed kwarg keeps its old name as a deprecated alias for this release cycle.
- Abstract retrieval now falls back through a DOI-only cascade when
CrossRef does not return an abstract:
Semantic Scholar (
/paper/DOI:{doi}?fields=abstract) → PubMed (ESearch DOI→PMID, then EFetch PMID→abstract). The cascade is only invoked when the user's original raw input carried a DOI; DOIs inferred by fuzzy search do not trigger it, so that a possibly-wrong candidate does not cost extra roundtrips. In particular, a local BibTeX entry with no DOI field — regardless of whether other stages would later resolve one — does not trigger the abstract cascade. - Semantic Scholar search results now carry the
abstractfield, which propagates through_convert_search_metadatainto the final BibTeX output whenever the identification stage already resolved the entry through SS. EnricherModule._get_semantic_scholar_abstract(doi)helper for DOI-based Semantic Scholar abstract retrieval. Handles404/429gracefully by returningNone._complete_fieldsgained anallow_abstract_fallbackkwarg (defaultFalse) that gates the new cascade._enrich_single_entrypassesTrueonly when the raw entry contributed a DOI.- Default
journal_article_fulltemplate now listsabstractas an optional field, so the template declaration matches what the enricher emits. The olderjournal_article_with_abstracttemplate is retained as a compatibility alias and will stay available for at least one release cycle. - Regression test
test_enrich_single_entry_no_doi_in_raw_skips_abstract_fallbackpinning the "no-DOI-in raw ⇒ no Semantic-Scholar / PubMed network call" guarantee at the_enrich_single_entrylayer, so a future refactor of theraw_has_doigate cannot silently start leaking network calls for local-only inputs.
_get_pubmed_abstractnow requires a DOI and no longer falls back to PubMed title search. The removed title-based path empirically returned the abstract of an unrelated paper (e.g., the Zhang 2020 AI Review DOI10.1007/s10462-019-09792-7pulled the abstract of a different RSI segmentation paper), which is strictly worse than returningNonefor downstream semantic cross-checks such as thesciskill.- Abstract coverage on an internal 10-DOI cross-publisher spot-check (Nature, Science, PLOS, Cell, IEEE CVPR, Frontiers, arXiv, Springer, ACM, plus one deliberately invalid DOI) rose from 4/9 to 8/9. This number is a local indicator, not a release gate: reproducing it requires a live network and the probe scripts are no longer in the repository.
_complete_fields(..., allow_pubmed_fallback=...)is deprecated in favour ofallow_abstract_fallback. The old name still works for one release cycle and emitsDeprecationWarning. It was renamed because the flag actually gates the entire Semantic-Scholar + PubMed cascade, not PubMed alone.
IdentifierModule._check_doi_content_consistencyand theconsistency_score/low_consistencywarning path. A fuzzy string-similarity score on bibliographic fields is not a reliable signal for detecting fabricated references, and it was only emitted as alogger.warningthat downstream tools could not act on. Citation-authenticity verification belongs at the abstract-vs-claim semantic layer in the consuming tool, not at the bibliographic-string layer here.
First formal PyPI release since 0.0.12.
- RST documentation using Sphinx
- Full API reference documentation
- FAQ section with common questions
- Contributing guidelines
- Pre-commit hooks configuration
- Google-style docstrings with Args/Returns for all public API functions
- Auto-deploy documentation to GitHub Pages via CI
- Split monolithic
pipeline.py(~3000 lines) into a properonecite/pipeline/package with one module per stage (parser.py/identifier.py/enricher.py/formatter.py) plus a_utils.pyfor shared helpers. Public imports (from onecite.pipeline import IdentifierModule) and mocking targets (patch("onecite.pipeline.requests.get", ...)) continue to work unchanged because__init__.pyre-exports every public symbol and keepsrequestsat the package level. - Unify CrossRef request and parsing methods; all CrossRef calls
now go through a single helper with a proper
User-Agentheader andmailtoquery-string parameter. - Rewrite fuzzy-search scoring as a weighted title / author / year / venue model with three confidence tiers (auto-adopt / interactive / cautious) and a unified low-confidence threshold.
- Simplify identifier routing; CrossRef and Semantic Scholar are always consulted for text queries, with signal-based additional queries to PubMed / Google Books / OpenAIRE / BASE.
- Use
bibtexparser.dumps()for BibTeX rendering. - Expose
use_google_scholaras a real CLI flag and API parameter instead of a hard-codedFalse. - Clarify that templates define metadata-field requirements and a fallback BibTeX entry type, not output formatting.
- Refactored exception hierarchy
- Added type hints to Python API
- Updated README examples
- Bumped minimum Python version declaration in docs to 3.10
- Updated CI actions to latest versions (checkout v4, setup-python v5)
- Updated copyright year to 2024-2025
- Fixed Documentation URL in pyproject.toml to point to GitHub Pages
- APA and MLA output renderers; they produced inconsistent output and
the CLI now rejects anything other than
--output-format bibtex. Users wanting APA/MLA should post-process the BibTeX through pandoc or citeproc-py. - Hard-coded "well-known paper" shortcut that masked failures on the main example input.
- MCP integration page and all related references
.readthedocs.yml(docs now hosted on GitHub Pages)docs/_build/build artifacts from repository
- README /
docs/index.rst/docs/faq.rstno longer advertise OpenAlex or dblp as data sources — they were never wired into the code. - README quick-start example now shows
booktitle(NeurIPS) instead ofjournal = "arXiv preprint"for the@inproceedingssample. docs/api/pipeline.rstrewritten to match the actual module structure; removed references to classes and methods that never existed (Validator/Identifier/Completer/Formatter,set_source_priority,set_timeout,add_template_path).docs/output_formats.rst,docs/faq.rst,docs/quick_start.rst,docs/python_api.rst,docs/templates.rst,docs/index.rstand docstrings incore.py/formatter.pyno longer advertise APA / MLA output.- Crossref author names parsed as
given familyinstead of mangled concatenations. - Semantic Scholar HTTP 429 responses return an empty candidate list cleanly instead of bubbling up.
- Previously-unused exception classes (
ParseError,ValidationError,FormatError) are now actually raised in the right places. CONTRIBUTING.mdno longer tells developers to use arequirements.txtthat does not exist; the documented install ispip install -e .[dev].blackformatting is enforced viapyproject.toml[tool.black]plus a pre-commit hook.- URL-bearing entries are no longer queried twice.
- Fallback paths mark entries as
identification_failedrather than fabricating plausible-looking but invented metadata. - CrossRef and Semantic Scholar response parsing edge cases
- API documentation using incorrect return value fields (
output_content->results) - Version number inconsistencies across metadata files
- Python version requirement inconsistencies in docs (3.7 -> 3.10)
- Custom YAML-based template system
- Support for multiple output formats (BibTeX, APA, MLA)
- Interactive mode for ambiguous reference selection
- Support for DOI, arXiv, PMID, ISBN, and GitHub identifiers
- Integration with 9 major academic data sources
- Test suite
- Refactored core processing pipeline
- Reordered data source priority (CrossRef first for DOI queries)
- Clearer error messages on failed lookups
- Encoding issues with non-ASCII characters in author names
- DOI parsing for URLs with trailing query strings
- Python 3.10 compatibility issues
- Initial Python API
- Basic citation processing
- Support for journal articles and conference papers
- Better title matching for fuzzy searches
- PubMed API response handling
- Semantic Scholar rate limit handling
See GitHub Releases for details on older versions.