AI Text Detection API USING ModernBERT and FastAPI

ALERT!! WHATEVER YOU DO, PLEASE READ THE CONCLUSION. EVEN IF YOU DONT EVENTUALLY READ THE ENTIRE CONTENT ON THIS PAGE.

Swagger Ui: https://ai-detector-1089089108369.us-central1.run.app/api/v1/docs
huggingface spaces : https://huggingface.co/spaces/muyiiwaa/ai_text_human_modernbert
Api Doc: https://ai-detector-1089089108369.us-central1.run.app/api/v1/redoc

The Motivation

This project grew out of a discussion on Twitter-NG about the effectiveness of AI text detection. A certain group on the TL firmly believed that carefully finetuning a transformer based model for classifying text as either AI generated or Human written will just not work.

While i agreed that there is no perfect model out there, I was of the opinion that, a modernBERT model, if fine-tuned on a sufficiently large and relevant dataset by experts, could indeed perform decently. i.e it could identify a good portion of AI-generated text without excessively flagging human writing as AI generated (false positives).

This stance (to my surprise really) was met with considerable skepticism and pushback. So Rather than just continue the debate in theory, I decided a practical demonstration would be more constructive. So, I took on the challenge myself, spent the next two days writing this

DATASET CURATION.

I curated a specific dataset of 10,000 examples balanced between:

5,000 human-written texts: Sourced from Medium articles published before the recent generative AI boom, aiming for authentic human writing from that era.
5,000 AI-generated texts: To ensure variety and relevance, this included 1,000 examples generated by Google Gemini, alongside AI-GENERATED texts dataset from kaggle from other sources.

I then fine-tuned the answerdotai/modernbert model on this specific 10k dataset. and then wrapped a Fastapi endpoint that serves as a direct way to access and evaluate the performance of that custom-trained model on your own texts.

The Project: Sharing the Result

TRAINING METRICS AFTER THREE EPOCHS:

The final project provides:

A clean, reliable API interface written in FastApi for the custom-trained muyiiwaa/ai_detect_modernbert model.
An easy way for others to test and evaluate this specific model's performance, especially in light of the original online debate and the dataset it was trained on.

Features (Technical Implementation).

I tried to make the api as robust as i can. (Fairly easy to do in the age of AI and Vibecoding)

Core Model: Features the custom-trained muyiiwaa/ai_detect_modernbert model.
FastAPI Backend: Offers a high-performance API with automatic interactive documentation for straightforward testing.
Pydantic Validation: Ensures reliable data handling.
Structured Logging: Provides operational transparency (JSON format).
Configuration Management: Simple setup using a .env file.
Dedicated Service Layer: Organizes model loading and inference logic.
Singleton Model Service: Efficiently manages the model resource (loaded once).
Robust Error Handling: Manages potential runtime issues gracefully.
Model Pre-loading: Initializes the model on application startup for responsiveness.

How It Works (Under the Hood)

A POST request containing text is sent to /api/v1/predict.
FastAPI validates the input using the TextInput schema.
The request is handled by the detect_text endpoint.
It uses the TextDetectionService, which loaded the muyiiwaa/ai_detect_modernbert model at startup.
The service preprocesses the text (removes punctuation).
The text is tokenized and passed to the fine-tuned model for inference.
The model returns logits, which are converted to probabilities (softmax scores for Class 0: Human, Class 1: AI).
The results (scores, predicted class, label) are formatted by the service.
FastAPI validates the response via the PredictionOutput schema and returns the JSON result.

CONCLUSION

While i am of the opinion that a carefully fine tuned state of the art transformer based model can do a decent job, i also do not agree that USING ONLY AI OR ML models to discredit anyone's work is fair. In production, there are going to be false positives and these false positives are not just numbers they are humans who have put blood and sweat into their writing and are going to be unfairly put down because a detector said so

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
app		app
training		training
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Text Detection API USING ModernBERT and FastAPI

The Motivation

DATASET CURATION.

The Project: Sharing the Result

Features (Technical Implementation).

How It Works (Under the Hood)

CONCLUSION

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Text Detection API USING ModernBERT and FastAPI

The Motivation

DATASET CURATION.

The Project: Sharing the Result

Features (Technical Implementation).

How It Works (Under the Hood)

CONCLUSION

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages