Literature Search Pipeline — Neurosymbolic Creative AI

Code for the systematic literature search underpinning the review paper "Imbuing Creativity into Human-AI Systems: A Neurosymbolic Approach" (submitted to Nature Machine Intelligence).

The search followed PRISMA guidelines and the PRISMA-ScR extension for scoping reviews. Two databases were queried across five thematic clusters: neurosymbolic AI & creativity, cognitive architecture & dual-process theory, uncertainty & exploration in LLMs, ensemble & compositional methods, and psychedelic neuroscience & creativity.

Repository structure

code/
├── google-scholar/
│   ├── 1-keyword-search/          # SerpAPI keyword search + 18 query CSVs
│   │   └── queries/               # One CSV per search query
│   ├── 2-merge-deduplicate/       # Merge query results and remove duplicates
│   ├── 3-automated-prescreening/  # Automated inclusion/exclusion criteria
│   └── 4-chatgpt-screening/       # ChatGPT third-rater screening (GPT-5.2)
│
└── pubmed/
    ├── 1-semantic-search/         # BigQuery SQL — semantic vector search on PubMed Central
    ├── 2-deduplicate/             # Deduplication by PMC ID
    ├── 3-automated-prescreening/  # Automated inclusion/exclusion criteria
    ├── 4-chatgpt-screening/       # ChatGPT third-rater screening (GPT-5.2)
    └── 5-gemini-crossvalidation/  # Gemini cross-validation (gemini-2.0-flash)

Pipeline overview

Google Scholar (`google-scholar/`)

Step	Folder	Description
1	`1-keyword-search/`	Fetches results for 18 keyword queries via SerpAPI
2	`2-merge-deduplicate/`	Merges all query CSVs and removes duplicates
3	`3-automated-prescreening/`	Applies automated inclusion/exclusion (year, keywords, venue, preprint check)
4	`4-chatgpt-screening/`	Sends borderline records to ChatGPT for third-rater screening

PubMed Central (`pubmed/`)

Step	Folder	Description
1	`1-semantic-search/`	BigQuery SQL using `text-embedding-005` semantic search across 5 cluster queries
2	`2-deduplicate/`	Cleans and deduplicates by PMC ID
3	`3-automated-prescreening/`	Automated screening with semantic distance thresholds (include < 0.82, exclude > 0.88)
4	`4-chatgpt-screening/`	ChatGPT screening of borderline records
5	`5-gemini-crossvalidation/`	Gemini cross-validation for PubMed candidates

Search results summary

Source	Initial	After dedup	After auto-filter	Manual review	Included
Google Scholar	1,240	1,113	—	33	24
PubMed Central	232	232	—	46	19
Combined	1,472	1,345	79	79	43

Screening protocol

Manual screening used a 2-of-3 majority vote across three independent raters: two reviewers and one AI reviewer (ChatGPT, GPT-5.2; OpenAI). Disagreements were resolved by a designated tiebreaker reviewer. See the paper's Supplementary Methods for the full protocol and prompt template.

Requirements

# Google Scholar pipeline
pip install serpapi pandas

# PubMed pipeline
pip install google-cloud-bigquery pandas
pip install openai          # part-4
pip install google-generativeai  # part-5

API keys required: SERPAPI_KEY, OPENAI_API_KEY, GOOGLE_API_KEY, and a configured BigQuery project with access to the PubMed Central dataset.

Inclusion / exclusion criteria

Include:

Peer-reviewed journal articles or conference proceedings
Published 2020–2025 (2015+ for foundational neuroscience)
≥ 3 keywords matching the five thematic clusters
Top-tier or well-regarded academic venue

Exclude:

Editorials, commentaries, preprints, grey literature, dissertations
Semantic distance > 0.88 (PubMed auto-exclusion)
< 2 thematic keywords

License

MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
google-scholar		google-scholar
pubmed		pubmed
results		results
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Literature Search Pipeline — Neurosymbolic Creative AI

Repository structure

Pipeline overview

Google Scholar (`google-scholar/`)

PubMed Central (`pubmed/`)

Search results summary

Screening protocol

Requirements

Inclusion / exclusion criteria

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Literature Search Pipeline — Neurosymbolic Creative AI

Repository structure

Pipeline overview

Google Scholar (google-scholar/)

PubMed Central (pubmed/)

Search results summary

Screening protocol

Requirements

Inclusion / exclusion criteria

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Google Scholar (`google-scholar/`)

PubMed Central (`pubmed/`)

Packages