Fix/chinadoi fallback support by wenh06 · Pull Request #23 · DeepPSP/bib_lookup

wenh06 · 2026-01-29T09:06:24Z

This PR addresses the issue where DOIs registered via ChinaDOI (e.g., CNKI publications like 10.13748/j.cnki...) failed to resolve. These sources often do not support application/x-bibtex content negotiation and return 403 Forbidden or redirect to an intermediate HTML page.

The streamlit app is also updated along with this change.

codecov · 2026-01-29T09:08:29Z

Codecov Report

❌ Patch coverage is 88.00000% with 6 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (master@b45ba29). Learn more about missing BASE report.
⚠️ Report is 7 commits behind head on master.

Files with missing lines	Patch %	Lines
bib_lookup/bib_lookup.py	88.00%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff            @@
##             master      #23   +/-   ##
=========================================
  Coverage          ?   97.04%           
=========================================
  Files             ?        8           
  Lines             ?     1254           
  Branches          ?        0           
=========================================
  Hits              ?     1217           
  Misses            ?       37           
  Partials          ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copilot

Pull request overview

Adds a fallback path for DOIs that don’t support application/x-bibtex content negotiation (notably ChinaDOI/CNKI), and updates the Streamlit UI to surface a manual-visit link when automatic BibTeX lookup isn’t possible.

Changes:

Switch DOI resolution to use GET + content validation for BibTeX responses, with a browser-header redirect probe for ChinaDOI/CNKI targets.
Add Streamlit handling for “manual check required” lookup results (warn + link to resolved article page).
Add an integration test case for a ChinaDOI-style DOI and document the fix in the changelog.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
`bib_lookup/bib_lookup.py`	Implements GET-based DOI lookup and a redirect-fallback message path for ChinaDOI/CNKI.
`streamlit_app.py`	Detects redirect-flag results and shows a warning + link button instead of treating it as BibTeX.
`test/test_doi_bib_lookup.py`	Adds coverage for the new ChinaDOI fallback behavior.
`CHANGELOG.rst`	Documents the ChinaDOI/CNKI fallback fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-29T09:15:36Z

+        # Standard Attempt: Use API (Crossref/DataCite)
        try:
-            r = self.session.post(**feed_content)
-            res = r.content.decode("utf-8")
-        except requests.Timeout:
-            res = self.timeout_err
-        except requests.RequestException:
-            res = self.network_err
-        if self.verbose > 3:
-            print_func(f"via `_handle_doi`, fetched content = {res}")
-        return res
+            r = self.session.get(**feed_content)
+
+            if r.status_code == 200:
+                res = r.content.decode("utf-8")
+
+                if is_requesting_bibtex:
+                    # If we asked for BibTeX, strictly check if it looks like BibTeX (starts with @)
+                    # This filters out ChinaDOI's HTML pages when they return status 200.
+                    if res.strip().startswith("@"):
+                        if self.verbose > 3:
+                            print_func(f"via `_handle_doi`, fetched content = {res}")
+                        return res
+                else:
+                    # If we asked for Text/XML/RIS, trust the 200 OK response.
+                    # This fixes the bug where format="text" was returning Network Error.
+                    if self.verbose > 3:
+                        print_func(f"via `_handle_doi`, fetched content = {res}")
+                    return res
+
+        except (requests.Timeout, requests.RequestException):
+            pass
+


_handle_doi currently swallows requests.Timeout / RequestException (just pass) and later returns self.network_err, which loses the timeout vs network distinction and changes previous behavior. Also, by only decoding/returning the body on status_code == 200, responses like 404 that include "DOI Not Found" won’t reach _handle_network_error and may be misreported as Network Error. Consider: (1) returning self.timeout_err on Timeout and self.network_err on other RequestException; and (2) decoding the response body (or handling specific status codes) even when status_code != 200 so error mapping still works.

Copilot · 2026-01-29T09:15:36Z

+            try:
+                r_probe = requests.get(
+                    url, headers=browser_headers, timeout=feed_content.get("timeout", 10), allow_redirects=True
+                )
+
+                final_url = r_probe.url
+
+                # Check for ChinaDOI / CNKI
+                if "chndoi.org" in final_url or "cnki" in final_url:
+                    msg = f"[Manual Check Required] Automatic lookup failed. Please visit: {final_url}"
+                    if self.verbose >= 0:
+                        print("-" * 60)
+                        print(f"[!] {msg}")
+                        print("-" * 60)
+                    return msg


The fallback probe uses requests.get(...) instead of self.session.get(...), which bypasses the session configuration this class sets up (connection pooling/retries/proxies, and any user customization of self.session). It also prints unconditionally for the manual-check case because self.verbose >= 0 is always true at default verbosity, which will produce unexpected stdout in normal library usage. Use self.session.get(...) and gate printing behind the same verbosity convention used elsewhere (e.g., self.verbose >= 1).

Copilot · 2026-01-29T09:15:37Z

+                        print_func(f"via `_handle_doi`, fetched content = {res}")
+                    return res
+
+        except (requests.Timeout, requests.RequestException):


'except' clause does nothing but pass and there is no explanatory comment.

Copilot · 2026-01-29T09:15:37Z

+                        print("-" * 60)
+                    return msg
+
+            except Exception:


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

except Exception:

except Exception:

# Intentionally ignore all errors from this best-effort probe.

# If this browser-style request fails, we simply fall back to `self.network_err` below.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

wenh06 added 2 commits January 28, 2026 21:58

Fix DOI lookup for ChinaDOI sources

3377b0c

fix error handling formats other than bibtex

61f02b6

Copilot AI review requested due to automatic review settings January 29, 2026 09:06

Copilot started reviewing on behalf of wenh06 January 29, 2026 09:06 View session

wenh06 enabled auto-merge January 29, 2026 09:06

wenh06 self-assigned this Jan 29, 2026

wenh06 added bug Something isn't working enhancement New feature or request labels Jan 29, 2026

Copilot AI reviewed Jan 29, 2026

View reviewed changes

wenh06 and others added 4 commits February 22, 2026 09:42

Update bib_lookup/bib_lookup.py

83fdbfc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update bib_lookup/bib_lookup.py

00b9496

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix bugs

27d5f4b

Fix DOI fallback logic and suppress BeautifulSoup warnings

d35ed3f

wenh06 disabled auto-merge February 22, 2026 07:21

wenh06 merged commit 99d3c16 into master Feb 22, 2026
10 checks passed

wenh06 deleted the fix/chinadoi-fallback-support branch February 22, 2026 07:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/chinadoi fallback support#23

Fix/chinadoi fallback support#23
wenh06 merged 6 commits intomasterfrom
fix/chinadoi-fallback-support

wenh06 commented Jan 29, 2026

Uh oh!

codecov Bot commented Jan 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wenh06 commented Jan 29, 2026

Uh oh!

codecov Bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jan 29, 2026 •

edited

Loading