Skip to content

criticaldata/creativity-survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Literature Search Pipeline — Neurosymbolic Creative AI

Code for the systematic literature search underpinning the review paper "Imbuing Creativity into Human-AI Systems: A Neurosymbolic Approach" (submitted to Nature Machine Intelligence).

The search followed PRISMA guidelines and the PRISMA-ScR extension for scoping reviews. Two databases were queried across five thematic clusters: neurosymbolic AI & creativity, cognitive architecture & dual-process theory, uncertainty & exploration in LLMs, ensemble & compositional methods, and psychedelic neuroscience & creativity.


Repository structure

code/
├── google-scholar/
│   ├── 1-keyword-search/          # SerpAPI keyword search + 18 query CSVs
│   │   └── queries/               # One CSV per search query
│   ├── 2-merge-deduplicate/       # Merge query results and remove duplicates
│   ├── 3-automated-prescreening/  # Automated inclusion/exclusion criteria
│   └── 4-chatgpt-screening/       # ChatGPT third-rater screening (GPT-5.2)
│
└── pubmed/
    ├── 1-semantic-search/         # BigQuery SQL — semantic vector search on PubMed Central
    ├── 2-deduplicate/             # Deduplication by PMC ID
    ├── 3-automated-prescreening/  # Automated inclusion/exclusion criteria
    ├── 4-chatgpt-screening/       # ChatGPT third-rater screening (GPT-5.2)
    └── 5-gemini-crossvalidation/  # Gemini cross-validation (gemini-2.0-flash)

Pipeline overview

Google Scholar (google-scholar/)

Step Folder Description
1 1-keyword-search/ Fetches results for 18 keyword queries via SerpAPI
2 2-merge-deduplicate/ Merges all query CSVs and removes duplicates
3 3-automated-prescreening/ Applies automated inclusion/exclusion (year, keywords, venue, preprint check)
4 4-chatgpt-screening/ Sends borderline records to ChatGPT for third-rater screening

PubMed Central (pubmed/)

Step Folder Description
1 1-semantic-search/ BigQuery SQL using text-embedding-005 semantic search across 5 cluster queries
2 2-deduplicate/ Cleans and deduplicates by PMC ID
3 3-automated-prescreening/ Automated screening with semantic distance thresholds (include < 0.82, exclude > 0.88)
4 4-chatgpt-screening/ ChatGPT screening of borderline records
5 5-gemini-crossvalidation/ Gemini cross-validation for PubMed candidates

Search results summary

Source Initial After dedup After auto-filter Manual review Included
Google Scholar 1,240 1,113 33 24
PubMed Central 232 232 46 19
Combined 1,472 1,345 79 79 43

Screening protocol

Manual screening used a 2-of-3 majority vote across three independent raters: two reviewers and one AI reviewer (ChatGPT, GPT-5.2; OpenAI). Disagreements were resolved by a designated tiebreaker reviewer. See the paper's Supplementary Methods for the full protocol and prompt template.


Requirements

# Google Scholar pipeline
pip install serpapi pandas

# PubMed pipeline
pip install google-cloud-bigquery pandas
pip install openai          # part-4
pip install google-generativeai  # part-5

API keys required: SERPAPI_KEY, OPENAI_API_KEY, GOOGLE_API_KEY, and a configured BigQuery project with access to the PubMed Central dataset.


Inclusion / exclusion criteria

Include:

  • Peer-reviewed journal articles or conference proceedings
  • Published 2020–2025 (2015+ for foundational neuroscience)
  • ≥ 3 keywords matching the five thematic clusters
  • Top-tier or well-regarded academic venue

Exclude:

  • Editorials, commentaries, preprints, grey literature, dissertations
  • Semantic distance > 0.88 (PubMed auto-exclusion)
  • < 2 thematic keywords

License

MIT License. See LICENSE for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages