Skip to content

Singularity-Student-Lab/project-rs

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


PROJECT:RS

The trusted, curated professional network for researchers. LinkedIn meets arXiv, Mendeley, and a trusted academic commons β€” built around credibility, discovery, and real research progress, not vanity metrics.


Status Stage


FastAPI Next.js PostgreSQL Python TypeScript Docker ORCID


Incubated by Author License



πŸ“– Documentation Β· πŸ”Œ API Reference Β· πŸ—ΊοΈ Roadmap Β· 🀝 Contributing Β· πŸ›οΈ Organization


Warning

PROJECT:RS is currently under active development and has not yet been officially named. The codebase is in pre-alpha / incubation stage. APIs, schema, and architecture are subject to change. Not recommended for production use.


πŸ›οΈ Provenance

Project Codename PROJECT:RS (official name TBD)
Incubating Organization Singularity Student Lab
Lead Developer @jayanthoffl
Development Stage Pre-Alpha / Incubation
License MIT

πŸ”­ Vision

"A place where a PhD student, professor, industry scientist, policy researcher, or independent scholar can build a verified identity, discover meaningful work, meet the right collaborators, and follow research conversations β€” without the noise of generic social media."

PROJECT:RS addresses the fragmentation problem every researcher faces today. Your identity is on ORCID. Your papers are on ResearchGate. Your discussions are on Twitter. Your jobs are on LinkedIn. Your literature is in Mendeley.

We are building the layer that unifies all of this. One platform. Four core questions answered clearly:

Question How we solve it
πŸͺͺ Who is this person? Verified researcher identity via ORCID, with institution-aware trust scores
πŸ“š What are they working on? Living research portfolios: papers, datasets, current projects, open questions
πŸ“° What work deserves my attention? A curated, AI-powered semantic feed β€” not an engagement-optimized timeline
🀝 Who should I collaborate with? An active collaboration marketplace matched by expertise, methods, and goals

πŸ—οΈ Architecture

research-commons/
β”œβ”€β”€ πŸ“¦ backend/                  # FastAPI β€” Knowledge Graph Engine
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ main.py              # API routes: auth, feed, semantic search
β”‚   β”‚   β”œβ”€β”€ db/
β”‚   β”‚   β”‚   └── database.py      # SQLAlchemy + PostgreSQL connection
β”‚   β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”‚   └── models.py        # ORM: Users, Works, Authorships, Opportunities
β”‚   β”‚   └── services/
β”‚   β”‚       └── arxiv_bot.py     # ArXiv ingestion + PDF extraction + embedding
β”‚   └── schema.sql               # PostgreSQL schema with pgvector extension
β”‚
β”œβ”€β”€ 🌐 frontend/                 # Next.js 16 β€” Research Interface
β”‚   └── app/
β”‚       β”œβ”€β”€ page.tsx             # Landing page & ORCID login
β”‚       β”œβ”€β”€ layout.tsx           # Root layout
β”‚       └── feed/
β”‚           └── page.tsx         # Curated discovery feed + semantic search UI
β”‚
└── 🐳 docker-compose.yml        # PostgreSQL + pgvector database service

System Diagram

                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚   ORCID Identity Layer  β”‚
                        β”‚   (OAuth2 / Verified ID) β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                     β”‚
                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                     β”‚       Next.js Frontend         β”‚
                     β”‚          :3000                 β”‚
                     β”‚  β€’ Landing & ORCID Login       β”‚
                     β”‚  β€’ Curated Discovery Feed      β”‚
                     β”‚  β€’ Semantic Search UI          β”‚
                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                     β”‚ REST API
                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                     β”‚       FastAPI Backend           β”‚
                     β”‚          :8000                 β”‚
                     β”‚  β€’ /auth/orcid  β€” OAuth flow   β”‚
                     β”‚  β€’ /api/feed   β€” Latest papers β”‚
                     β”‚  β€’ /api/search β€” Vector search β”‚
                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                     β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚          PostgreSQL 16 + pgvector            β”‚
              β”‚                  :5432                       β”‚
              β”‚                                              β”‚
              β”‚  users          works           opportunitiesβ”‚
              β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
              β”‚  β”‚ orcid_id β”‚  β”‚ doi        β”‚  β”‚ type     β”‚ β”‚
              β”‚  β”‚ trust_   β”‚  β”‚ abstract   β”‚  β”‚ vector   β”‚ β”‚
              β”‚  β”‚  score   β”‚  β”‚ embedding  β”‚  β”‚ (384d)   β”‚ β”‚
              β”‚  β”‚ career_  β”‚  β”‚ vector(384)β”‚  β”‚          β”‚ β”‚
              β”‚  β”‚  stage   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
              β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                               β”‚
              β”‚   β–Έ 384-dimensional semantic embeddings      β”‚
              β”‚   β–Έ Cosine distance search (MiniLM-L6-v2)   β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                     β–²
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚            ArXiv Ingestion Bot               β”‚
              β”‚  1. Fetch latest cs.LG + quant-ph papers     β”‚
              β”‚  2. Download & parse full PDFs               β”‚
              β”‚  3. Encode abstracts β†’ 384d vector           β”‚
              β”‚  4. Store in Knowledge Graph                 β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

✨ Features

Implemented βœ…

Feature Technology Endpoint / Component
ORCID OAuth2 Login FastAPI + ORCID GET /auth/orcid
Auth Callback & User Creation FastAPI + SQLAlchemy GET /auth/callback
Verified User Profiles PostgreSQL users table Trust score, ORCID ID, career stage
ArXiv Paper Ingestion arxiv + pypdf services/arxiv_bot.py
PDF Full-Text Extraction PyPDF Multi-page text extraction
Semantic Embeddings all-MiniLM-L6-v2 (384d) Stored as vector(384)
Curated Feed API FastAPI GET /api/feed
Semantic Search API pgvector cosine distance GET /api/search?q=...
Discovery Feed UI Next.js 16 /feed page
Health Check FastAPI GET /health

Roadmap 🚧

  • Research Communities β€” topic groups, method circles, private lab spaces
  • Collaboration Marketplace β€” vector-matched active collaboration requests
  • Living Portfolio β€” narrative view of a researcher's evolving body of work
  • Scholarly Discussion Layer β€” threaded comments, paper annotations, mini-reviews
  • Career & Opportunity Hub β€” postdocs, grants, speaking invitations, fellowships
  • Trust & Reputation Engine β€” earned from peer endorsements, reviewing, replication
  • Cross-disciplinary Feed β€” adjacent-field recommendations via vector similarity
  • Institutional Integration β€” verified university/lab pages with member lists
  • JWT Session Auth β€” replace redirect-based flow with secure cookie sessions
  • Real ORCID OAuth β€” swap mock flow for production ORCID credentials

πŸš€ Getting Started

Prerequisites

Docker Python Node.js

1. Clone the Repository

git clone https://github.com/Singularity-Student-Lab/research-commons.git
cd research-commons

2. Start the Database

# Spins up PostgreSQL 16 with the pgvector extension
docker compose up -d

Verify:

docker ps
# Expected: commons_db  Up  0.0.0.0:5432->5432/tcp

3. Start the Backend

cd backend

# Create and activate virtual environment
python -m venv venv
.\venv\Scripts\activate        # Windows
# source venv/bin/activate     # macOS / Linux

# Install dependencies
pip install fastapi uvicorn sqlalchemy psycopg2-binary pgvector \
            sentence-transformers arxiv pypdf

# Start the API server
uvicorn app.main:app --reload --port 8000

Note: On first run, the all-MiniLM-L6-v2 model (~80 MB) downloads automatically. This only happens once.

The API is now live at http://localhost:8000
Auto-generated docs at http://localhost:8000/docs

4. Seed the Knowledge Graph

# In a new terminal (with venv activated)
cd backend
python -m app.services.arxiv_bot

This will:

  1. Fetch the 5 latest cs.LG and quant-ph papers from ArXiv
  2. Download and parse their full PDFs
  3. Generate 384-dimensional semantic embeddings for each abstract
  4. Store everything in the local PostgreSQL Knowledge Graph

5. Start the Frontend

cd frontend
npm install
npm run dev

The app is now live at http://localhost:3000

Important

Make sure the backend is running at :8000 before starting the frontend. The feed page calls http://127.0.0.1:8000/api/feed at render time via server-side fetch.


πŸ”Œ API Reference

Base URL: http://localhost:8000

Method Endpoint Description
GET / Health ping
GET /health Database connectivity check
GET /auth/orcid Initiate ORCID OAuth2 login
GET /auth/callback?code= Handle OAuth callback, provision user
GET /api/feed Retrieve 10 most recent indexed papers
GET /api/search?q={query} Semantic vector search over Knowledge Graph

Semantic Search

# Search by concept β€” not just keywords
curl "http://localhost:8000/api/search?q=quantum+error+correction+fault+tolerant"

curl "http://localhost:8000/api/search?q=LLM+hallucination+and+factuality"

curl "http://localhost:8000/api/search?q=transformer+attention+mechanism+efficiency"

The search uses cosine distance over 384-dimensional embeddings β€” meaning it finds papers semantically close to your intent, not just keyword matches.

Interactive API Docs

Interface URL
Swagger UI http://localhost:8000/docs
ReDoc http://localhost:8000/redoc

πŸ—„οΈ Database Schema

The schema is organized into four conceptual zones, mirroring the four pillars of the platform:

-- ── Zone A: Verified Identity & Reputation ──────────────────────────────
CREATE TABLE users (
    id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    orcid_id     VARCHAR(50) UNIQUE NOT NULL,
    full_name    VARCHAR(255) NOT NULL,
    affiliation  VARCHAR(255),
    career_stage VARCHAR(100),
    trust_score  FLOAT DEFAULT 1.0,
    created_at   TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE expertise_tags  (id SERIAL PRIMARY KEY, tag_name VARCHAR(100) UNIQUE);
CREATE TABLE user_expertise  (user_id UUID, tag_id INT, endorsements INT DEFAULT 0);

-- ── Zone B: The Living Portfolio ─────────────────────────────────────────
CREATE TABLE works (
    id                 UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    doi                VARCHAR(100) UNIQUE,
    title              TEXT NOT NULL,
    abstract           TEXT,
    abstract_embedding vector(384),   -- all-MiniLM-L6-v2 dimensions
    work_type          VARCHAR(50),   -- 'paper', 'dataset', 'preprint'
    published_date     DATE
);

CREATE TABLE authorships (user_id UUID, work_id UUID, is_corresponding BOOLEAN);

-- ── Zone D: Collaboration Marketplace ────────────────────────────────────
CREATE TABLE opportunities (
    id                    UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    author_id             UUID REFERENCES users(id),
    title                 VARCHAR(255) NOT NULL,
    description           TEXT NOT NULL,
    opportunity_type      VARCHAR(50),  -- 'co-author', 'grant', 'mentorship'
    requirement_embedding vector(384),  -- matched against user portfolio vectors
    is_active             BOOLEAN DEFAULT TRUE
);

Why pgvector? Both works.abstract_embedding and opportunities.requirement_embedding are stored as 384-dimensional vectors. This enables future matching of researchers to opportunities by the mathematical similarity of their research portfolio to what a collaboration needs β€” not by keyword.


🧩 Platform Pillars

πŸͺͺ 1. Verified Identity

A researcher profile anchored on ORCID-authenticated identity. Methods, open questions, career stage, affiliations, and contributions beyond publications β€” reviewing, mentoring, replication, open resources. Trust is foundational, not optional.

πŸ“° 2. Curated Discovery Feed

Not a social timeline. A research-grade feed filtered by topics, methods, trusted collaborators, and quality signals β€” open data badges, code availability, replication flags, peer commentary. The ArXiv bot seeds this today; Semantic Scholar, PubMed, and more are planned.

🀝 3. Collaboration Marketplace

Active, intent-based matching β€” not passive browsing. Researchers post specific needs and the platform matches against verified expertise portfolios using vector similarity. "I need a Bayesian statistician for a clinical trial" β†’ matched to researchers whose published work embeds nearest to that description.

🎯 4. Career & Opportunity Hub

Postdocs, faculty openings, grants, fellowships, reviewer invitations, conference calls, and industry consulting β€” surfaced based on actual expertise, not generic job boards.


πŸ‘₯ Who This Is For

User Core Need
PhD Students & Early-Career Researchers Visibility, mentorship, collaboration, curated literature
Principal Investigators Lab recruitment, grant collaborators, reputation management
Industry R&D Scientists Expert discovery, applied collaboration, talent pipeline
Independent Scholars Credibility without institutional affiliation
Institutions & Publishers Verified researcher records, community showcasing

🧠 Competitive Landscape

Platform Strengths Gap
ORCID Trusted identity infrastructure Not a discovery or networking tool
ResearchGate Professional network, Q&A, paper sharing Noisy feed, no semantic search, weak curation
Semantic Scholar AI-powered paper discovery No professional identity or collaboration layer
LinkedIn Professional network, job discovery Not research-aware, zero academic trust signals
Mendeley Reference management, groups Passive; no active collaboration or reputation
PROJECT:RS All four pillars, with curation as the defining principle β€”

πŸ› οΈ Tech Stack

Layer Technology Rationale
Frontend Next.js 16, TypeScript, Tailwind CSS SSR for SEO, type-safe, fast
Backend FastAPI (Python) Async, auto-documented, ML-adjacent
Database PostgreSQL 16 + pgvector Relational integrity + native vector search
Embeddings all-MiniLM-L6-v2 (SentenceTransformers) 384d, fast, fully local, zero API cost
Auth ORCID OAuth2 The gold standard for verified academic identity
Ingestion arxiv Python library + pypdf Fetches and parses full PDFs, not just abstracts
Infra Docker Compose One-command database bootstrapping

🀝 Contributing

This project is in early incubation under Singularity Student Lab. Contributions, ideas, and feedback are welcome.

  1. Fork the repository
  2. Branch off main: git checkout -b feature/your-feature
  3. Commit with conventional commits: git commit -m 'feat: add collaboration matching'
  4. Push and open a Pull Request against main

Please open an issue before submitting large changes so we can discuss the approach first.


πŸ“„ License

Distributed under the MIT License. See LICENSE for details.



Singularity Student Lab Β  jayanthoffl


Built with the belief that the research community deserves better infrastructure.

Not another profile page. A trusted, curated commons.


About

PROJECT:RS addresses the fragmentation problem every researcher faces today. Your identity is on ORCID. Your papers are on ResearchGate. Your discussions are on Twitter. Your jobs are on LinkedIn. Your literature is in Mendeley.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%