Skip to content

feat: self-service data upload for PDP, AR files, student and course data #86

@William-Hill

Description

@William-Hill

Overview

Allow authorized users (admin, ir) to upload institutional data files directly from the dashboard — PDP cohort files, AR files, student-level CSVs, and course enrollment CSVs — without needing direct database or server access.


Supported File Types

File Type Format Target Table
PDP cohort / student data CSV student_level_with_predictions
PDP AR files CSV or Excel (.xlsx) student_level_with_predictions
Course enrollment data CSV course_enrollments
ML predictions (pre-computed) CSV student_level_with_predictions, course_predictions

UI: /admin/upload page

Access: admin and ir roles only

Flow

  1. User selects file type from a dropdown (PDP Cohort, AR File, Course Data, Custom CSV)
  2. Drag-and-drop or file picker (accepts .csv, .xlsx)
  3. Preview step — shows first 10 rows in a table with column mapping confirmation
  4. Column mapping UI — auto-detect known columns, allow user to remap unknowns
  5. Validation summary — row count, detected schema, any warnings (missing required columns, unexpected values)
  6. Confirm & Upload — streams file to server, inserts in batches
  7. Progress bar + completion summary (rows inserted, rows skipped, errors)

API Routes

POST /api/admin/upload/preview

  • Accepts multipart form upload
  • Parses first 50 rows
  • Returns: detected columns, sample rows, schema match confidence, warnings

POST /api/admin/upload/commit

  • Accepts multipart form upload + column mapping JSON
  • Streams CSV/Excel parsing server-side (avoid loading entire file in memory)
  • Batch-inserts in chunks of 500 rows
  • Returns: { inserted, skipped, errors[] }
  • Uses upsert to be idempotent on re-upload

GET /api/admin/upload/history

  • Returns log of past uploads: filename, type, rows inserted, timestamp, uploader user_id

Backend Considerations

  • Excel parsing: use xlsx or exceljs npm package for .xlsx support
  • Large file handling: stream parse with csv-parse in async iterator mode; never buffer full file
  • Column normalization: trim whitespace, normalize header casing before mapping
  • Schema validation: check required columns present for each file type; surface clear errors (not stack traces) to the UI
  • Upload size limit: configure Next.js bodyParser limit (suggest 50 MB)
  • Role guard: enforce admin/ir via x-user-role header on all upload routes

Upload History Table (optional migration)

CREATE TABLE public.upload_history (
  id          BIGSERIAL PRIMARY KEY,
  user_id     UUID REFERENCES auth.users(id),
  filename    TEXT NOT NULL,
  file_type   TEXT NOT NULL,
  rows_inserted INT,
  rows_skipped  INT,
  error_count   INT,
  status      TEXT CHECK (status IN ('success', 'partial', 'failed')),
  uploaded_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Acceptance Criteria

  • Admin/IR users can reach /admin/upload (leadership/advisor/faculty get 403)
  • CSV and Excel uploads both work for all supported file types
  • Preview step shows first 10 rows before committing
  • Column mapping is auto-detected for known PDP/AR schemas
  • Upload is idempotent — re-uploading the same file doesn't duplicate rows
  • Files > 50 MB are rejected with a clear error message
  • Upload history log persisted and visible in the UI
  • Progress feedback shown during large file ingestion

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions