GitHub - Singularity-Student-Lab/project-rs: PROJECT:RS addresses the fragmentation problem every researcher faces today. Your identity is on ORCID. Your papers are on ResearchGate. Your discussions are on Twitter. Your jobs are on LinkedIn. Your literature is in Mendeley.

The trusted, curated professional network for researchers. LinkedIn meets arXiv, Mendeley, and a trusted academic commons — built around credibility, discovery, and real research progress, not vanity metrics.

📖 Documentation · 🔌 API Reference · 🗺️ Roadmap · 🤝 Contributing · 🏛️ Organization

Warning

PROJECT:RS is currently under active development and has not yet been officially named. The codebase is in pre-alpha / incubation stage. APIs, schema, and architecture are subject to change. Not recommended for production use.

🏛️ Provenance


Project Codename	`PROJECT:RS` (official name TBD)
Incubating Organization	Singularity Student Lab
Lead Developer	@jayanthoffl
Development Stage	Pre-Alpha / Incubation
License	MIT

🔭 Vision

"A place where a PhD student, professor, industry scientist, policy researcher, or independent scholar can build a verified identity, discover meaningful work, meet the right collaborators, and follow research conversations — without the noise of generic social media."

PROJECT:RS addresses the fragmentation problem every researcher faces today. Your identity is on ORCID. Your papers are on ResearchGate. Your discussions are on Twitter. Your jobs are on LinkedIn. Your literature is in Mendeley.

We are building the layer that unifies all of this. One platform. Four core questions answered clearly:

Question	How we solve it
🪪 Who is this person?	Verified researcher identity via ORCID, with institution-aware trust scores
📚 What are they working on?	Living research portfolios: papers, datasets, current projects, open questions
📰 What work deserves my attention?	A curated, AI-powered semantic feed — not an engagement-optimized timeline
🤝 Who should I collaborate with?	An active collaboration marketplace matched by expertise, methods, and goals

🏗️ Architecture

research-commons/
├── 📦 backend/                  # FastAPI — Knowledge Graph Engine
│   ├── app/
│   │   ├── main.py              # API routes: auth, feed, semantic search
│   │   ├── db/
│   │   │   └── database.py      # SQLAlchemy + PostgreSQL connection
│   │   ├── models/
│   │   │   └── models.py        # ORM: Users, Works, Authorships, Opportunities
│   │   └── services/
│   │       └── arxiv_bot.py     # ArXiv ingestion + PDF extraction + embedding
│   └── schema.sql               # PostgreSQL schema with pgvector extension
│
├── 🌐 frontend/                 # Next.js 16 — Research Interface
│   └── app/
│       ├── page.tsx             # Landing page & ORCID login
│       ├── layout.tsx           # Root layout
│       └── feed/
│           └── page.tsx         # Curated discovery feed + semantic search UI
│
└── 🐳 docker-compose.yml        # PostgreSQL + pgvector database service

System Diagram

                        ┌─────────────────────────┐
                        │   ORCID Identity Layer  │
                        │   (OAuth2 / Verified ID) │
                        └────────────┬────────────┘
                                     │
                     ┌───────────────▼───────────────┐
                     │       Next.js Frontend         │
                     │          :3000                 │
                     │  • Landing & ORCID Login       │
                     │  • Curated Discovery Feed      │
                     │  • Semantic Search UI          │
                     └───────────────┬───────────────┘
                                     │ REST API
                     ┌───────────────▼───────────────┐
                     │       FastAPI Backend           │
                     │          :8000                 │
                     │  • /auth/orcid  — OAuth flow   │
                     │  • /api/feed   — Latest papers │
                     │  • /api/search — Vector search │
                     └───────────────┬───────────────┘
                                     │
              ┌──────────────────────▼──────────────────────┐
              │          PostgreSQL 16 + pgvector            │
              │                  :5432                       │
              │                                              │
              │  users          works           opportunities│
              │  ┌──────────┐  ┌────────────┐  ┌──────────┐ │
              │  │ orcid_id │  │ doi        │  │ type     │ │
              │  │ trust_   │  │ abstract   │  │ vector   │ │
              │  │  score   │  │ embedding  │  │ (384d)   │ │
              │  │ career_  │  │ vector(384)│  │          │ │
              │  │  stage   │  └────────────┘  └──────────┘ │
              │  └──────────┘                               │
              │   ▸ 384-dimensional semantic embeddings      │
              │   ▸ Cosine distance search (MiniLM-L6-v2)   │
              └──────────────────────────────────────────────┘
                                     ▲
              ┌──────────────────────┴──────────────────────┐
              │            ArXiv Ingestion Bot               │
              │  1. Fetch latest cs.LG + quant-ph papers     │
              │  2. Download & parse full PDFs               │
              │  3. Encode abstracts → 384d vector           │
              │  4. Store in Knowledge Graph                 │
              └──────────────────────────────────────────────┘

✨ Features

Implemented ✅

Feature	Technology	Endpoint / Component
ORCID OAuth2 Login	FastAPI + ORCID	`GET /auth/orcid`
Auth Callback & User Creation	FastAPI + SQLAlchemy	`GET /auth/callback`
Verified User Profiles	PostgreSQL `users` table	Trust score, ORCID ID, career stage
ArXiv Paper Ingestion	`arxiv` + `pypdf`	`services/arxiv_bot.py`
PDF Full-Text Extraction	PyPDF	Multi-page text extraction
Semantic Embeddings	`all-MiniLM-L6-v2` (384d)	Stored as `vector(384)`
Curated Feed API	FastAPI	`GET /api/feed`
Semantic Search API	pgvector cosine distance	`GET /api/search?q=...`
Discovery Feed UI	Next.js 16	`/feed` page
Health Check	FastAPI	`GET /health`

Roadmap 🚧

🚀 Getting Started

Prerequisites

1. Clone the Repository

git clone https://github.com/Singularity-Student-Lab/research-commons.git
cd research-commons

2. Start the Database

# Spins up PostgreSQL 16 with the pgvector extension
docker compose up -d

Verify:

docker ps
# Expected: commons_db  Up  0.0.0.0:5432->5432/tcp

3. Start the Backend

cd backend

# Create and activate virtual environment
python -m venv venv
.\venv\Scripts\activate        # Windows
# source venv/bin/activate     # macOS / Linux

# Install dependencies
pip install fastapi uvicorn sqlalchemy psycopg2-binary pgvector \
            sentence-transformers arxiv pypdf

# Start the API server
uvicorn app.main:app --reload --port 8000

Note: On first run, the all-MiniLM-L6-v2 model (~80 MB) downloads automatically. This only happens once.

The API is now live at http://localhost:8000
Auto-generated docs at http://localhost:8000/docs

4. Seed the Knowledge Graph

# In a new terminal (with venv activated)
cd backend
python -m app.services.arxiv_bot

This will:

Fetch the 5 latest cs.LG and quant-ph papers from ArXiv
Download and parse their full PDFs
Generate 384-dimensional semantic embeddings for each abstract
Store everything in the local PostgreSQL Knowledge Graph

5. Start the Frontend

cd frontend
npm install
npm run dev

The app is now live at http://localhost:3000

Important

Make sure the backend is running at :8000 before starting the frontend. The feed page calls http://127.0.0.1:8000/api/feed at render time via server-side fetch.

🔌 API Reference

Base URL: http://localhost:8000

Method	Endpoint	Description
`GET`	`/`	Health ping
`GET`	`/health`	Database connectivity check
`GET`	`/auth/orcid`	Initiate ORCID OAuth2 login
`GET`	`/auth/callback?code=`	Handle OAuth callback, provision user
`GET`	`/api/feed`	Retrieve 10 most recent indexed papers
`GET`	`/api/search?q={query}`	Semantic vector search over Knowledge Graph

Semantic Search

# Search by concept — not just keywords
curl "http://localhost:8000/api/search?q=quantum+error+correction+fault+tolerant"

curl "http://localhost:8000/api/search?q=LLM+hallucination+and+factuality"

curl "http://localhost:8000/api/search?q=transformer+attention+mechanism+efficiency"

The search uses cosine distance over 384-dimensional embeddings — meaning it finds papers semantically close to your intent, not just keyword matches.

Interactive API Docs

Interface	URL
Swagger UI	http://localhost:8000/docs
ReDoc	http://localhost:8000/redoc

🗄️ Database Schema

The schema is organized into four conceptual zones, mirroring the four pillars of the platform:

-- ── Zone A: Verified Identity & Reputation ──────────────────────────────
CREATE TABLE users (
    id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    orcid_id     VARCHAR(50) UNIQUE NOT NULL,
    full_name    VARCHAR(255) NOT NULL,
    affiliation  VARCHAR(255),
    career_stage VARCHAR(100),
    trust_score  FLOAT DEFAULT 1.0,
    created_at   TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE expertise_tags  (id SERIAL PRIMARY KEY, tag_name VARCHAR(100) UNIQUE);
CREATE TABLE user_expertise  (user_id UUID, tag_id INT, endorsements INT DEFAULT 0);

-- ── Zone B: The Living Portfolio ─────────────────────────────────────────
CREATE TABLE works (
    id                 UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    doi                VARCHAR(100) UNIQUE,
    title              TEXT NOT NULL,
    abstract           TEXT,
    abstract_embedding vector(384),   -- all-MiniLM-L6-v2 dimensions
    work_type          VARCHAR(50),   -- 'paper', 'dataset', 'preprint'
    published_date     DATE
);

CREATE TABLE authorships (user_id UUID, work_id UUID, is_corresponding BOOLEAN);

-- ── Zone D: Collaboration Marketplace ────────────────────────────────────
CREATE TABLE opportunities (
    id                    UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    author_id             UUID REFERENCES users(id),
    title                 VARCHAR(255) NOT NULL,
    description           TEXT NOT NULL,
    opportunity_type      VARCHAR(50),  -- 'co-author', 'grant', 'mentorship'
    requirement_embedding vector(384),  -- matched against user portfolio vectors
    is_active             BOOLEAN DEFAULT TRUE
);

Why pgvector? Both works.abstract_embedding and opportunities.requirement_embedding are stored as 384-dimensional vectors. This enables future matching of researchers to opportunities by the mathematical similarity of their research portfolio to what a collaboration needs — not by keyword.

🧩 Platform Pillars

🪪 1. Verified Identity

A researcher profile anchored on ORCID-authenticated identity. Methods, open questions, career stage, affiliations, and contributions beyond publications — reviewing, mentoring, replication, open resources. Trust is foundational, not optional.

📰 2. Curated Discovery Feed

Not a social timeline. A research-grade feed filtered by topics, methods, trusted collaborators, and quality signals — open data badges, code availability, replication flags, peer commentary. The ArXiv bot seeds this today; Semantic Scholar, PubMed, and more are planned.

🤝 3. Collaboration Marketplace

Active, intent-based matching — not passive browsing. Researchers post specific needs and the platform matches against verified expertise portfolios using vector similarity. "I need a Bayesian statistician for a clinical trial" → matched to researchers whose published work embeds nearest to that description.

🎯 4. Career & Opportunity Hub

Postdocs, faculty openings, grants, fellowships, reviewer invitations, conference calls, and industry consulting — surfaced based on actual expertise, not generic job boards.

👥 Who This Is For

User	Core Need
PhD Students & Early-Career Researchers	Visibility, mentorship, collaboration, curated literature
Principal Investigators	Lab recruitment, grant collaborators, reputation management
Industry R&D Scientists	Expert discovery, applied collaboration, talent pipeline
Independent Scholars	Credibility without institutional affiliation
Institutions & Publishers	Verified researcher records, community showcasing

🧠 Competitive Landscape

Platform	Strengths	Gap
ORCID	Trusted identity infrastructure	Not a discovery or networking tool
ResearchGate	Professional network, Q&A, paper sharing	Noisy feed, no semantic search, weak curation
Semantic Scholar	AI-powered paper discovery	No professional identity or collaboration layer
LinkedIn	Professional network, job discovery	Not research-aware, zero academic trust signals
Mendeley	Reference management, groups	Passive; no active collaboration or reputation
PROJECT:RS	All four pillars, with curation as the defining principle	—

🛠️ Tech Stack

Layer	Technology	Rationale
Frontend	Next.js 16, TypeScript, Tailwind CSS	SSR for SEO, type-safe, fast
Backend	FastAPI (Python)	Async, auto-documented, ML-adjacent
Database	PostgreSQL 16 + pgvector	Relational integrity + native vector search
Embeddings	`all-MiniLM-L6-v2` (SentenceTransformers)	384d, fast, fully local, zero API cost
Auth	ORCID OAuth2	The gold standard for verified academic identity
Ingestion	`arxiv` Python library + `pypdf`	Fetches and parses full PDFs, not just abstracts
Infra	Docker Compose	One-command database bootstrapping

🤝 Contributing

This project is in early incubation under Singularity Student Lab. Contributions, ideas, and feedback are welcome.

Fork the repository
Branch off main: git checkout -b feature/your-feature
Commit with conventional commits: git commit -m 'feat: add collaboration matching'
Push and open a Pull Request against main

Please open an issue before submitting large changes so we can discuss the approach first.

📄 License

Distributed under the MIT License. See LICENSE for details.

Built with the belief that the research community deserves better infrastructure.

Not another profile page. A trusted, curated commons.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏛️ Provenance

🔭 Vision

🏗️ Architecture

System Diagram

✨ Features

Implemented ✅

Roadmap 🚧

🚀 Getting Started

Prerequisites

1. Clone the Repository

2. Start the Database

3. Start the Backend

4. Seed the Knowledge Graph

5. Start the Frontend

🔌 API Reference

Semantic Search

Interactive API Docs

🗄️ Database Schema

🧩 Platform Pillars

🪪 1. Verified Identity

📰 2. Curated Discovery Feed

🤝 3. Collaboration Marketplace

🎯 4. Career & Opportunity Hub

👥 Who This Is For

🧠 Competitive Landscape

🛠️ Tech Stack

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🏛️ Provenance

🔭 Vision

🏗️ Architecture

System Diagram

✨ Features

Implemented ✅

Roadmap 🚧

🚀 Getting Started

Prerequisites

1. Clone the Repository

2. Start the Database

3. Start the Backend

4. Seed the Knowledge Graph

5. Start the Frontend

🔌 API Reference

Semantic Search

Interactive API Docs

🗄️ Database Schema

🧩 Platform Pillars

🪪 1. Verified Identity

📰 2. Curated Discovery Feed

🤝 3. Collaboration Marketplace

🎯 4. Career & Opportunity Hub

👥 Who This Is For

🧠 Competitive Landscape

🛠️ Tech Stack

🤝 Contributing

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages