Skip to content

Commit bfdc83f

Browse files
committed
Re-enable proxy with version constraint for fix
Based on scholarly PR #579 which fixes the FreeProxies compatibility issue, re-enable proxy support with a version constraint (>=1.7.12) to ensure the fix is included. This should resolve the 'Client.__init__() got an unexpected keyword argument proxies' error while maintaining proxy support for better reliability. Ref: scholarly-python-package/scholarly#579
1 parent b4107ca commit bfdc83f

2 files changed

Lines changed: 20 additions & 7 deletions

File tree

_scripts/environment.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,4 @@ dependencies:
99
- markdown
1010
# Google Scholar crawler (gscrawler.py)
1111
- pandas
12-
- scholarly
12+
- scholarly>=1.7.12 # Requires version with proxy fix (PR #579)

_scripts/gscrawler.py

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99

1010
import pandas as pd
1111
from bs4 import BeautifulSoup
12+
from scholarly import ProxyGenerator
1213

1314
# Configure logging
1415
logging.basicConfig(
@@ -101,14 +102,26 @@ def clean_journal_name(journal):
101102

102103

103104
def setup_proxy():
104-
"""Setup proxy to avoid Google Scholar blocking.
105+
"""Setup free proxy to avoid Google Scholar blocking.
105106
106-
Note: Free proxy setup is currently disabled due to compatibility issues.
107-
The scholarly package's built-in session management and user-agent
108-
handling should be sufficient to avoid most blocking.
107+
Uses FreeProxies from the scholarly package to rotate through
108+
free proxy servers and avoid 403 errors.
109+
110+
Requires scholarly>=1.7.12 which includes the proxy fix from PR #579.
109111
"""
110-
# Proxy setup disabled - scholarly's session handling is sufficient
111-
logger.info("Using scholarly's default session (proxy disabled due to compatibility issues)")
112+
try:
113+
logger.info("Setting up free proxy to avoid blocking...")
114+
from scholarly import scholarly
115+
pg = ProxyGenerator()
116+
success = pg.FreeProxies()
117+
if success:
118+
scholarly.use_proxy(pg)
119+
logger.info("Proxy setup successful")
120+
else:
121+
logger.warning("Free proxy setup returned False, continuing without proxy...")
122+
except Exception as e:
123+
logger.warning(f"Proxy setup failed: {e}. Continuing without proxy...")
124+
# Continue without proxy - scholarly will use direct connection
112125

113126

114127
def get_author_publications_html(user_id):

0 commit comments

Comments
 (0)