ALERT!! WHATEVER YOU DO, PLEASE READ THE CONCLUSION. EVEN IF YOU DONT EVENTUALLY READ THE ENTIRE CONTENT ON THIS PAGE.
Swagger Ui: https://ai-detector-1089089108369.us-central1.run.app/api/v1/docshuggingface spaces: https://huggingface.co/spaces/muyiiwaa/ai_text_human_modernbertApi Doc: https://ai-detector-1089089108369.us-central1.run.app/api/v1/redoc
This project grew out of a discussion on Twitter-NG about the effectiveness of AI text detection. A certain group on the TL firmly believed that carefully finetuning a transformer based model for classifying text as either AI generated or Human written will just not work.
While i agreed that there is no perfect model out there, I was of the opinion that, a modernBERT model, if fine-tuned on a sufficiently large and relevant dataset by experts, could indeed perform decently. i.e it could identify a good portion of AI-generated text without excessively flagging human writing as AI generated (false positives).
This stance (to my surprise really) was met with considerable skepticism and pushback. So Rather than just continue the debate in theory, I decided a practical demonstration would be more constructive. So, I took on the challenge myself, spent the next two days writing this
I curated a specific dataset of 10,000 examples balanced between:
- 5,000 human-written texts: Sourced from Medium articles published before the recent generative AI boom, aiming for authentic human writing from that era.
- 5,000 AI-generated texts: To ensure variety and relevance, this included 1,000 examples generated by Google Gemini, alongside AI-GENERATED texts dataset from kaggle from other sources.
I then fine-tuned the answerdotai/modernbert model on this specific 10k dataset. and then wrapped a Fastapi endpoint that serves as a direct way to access and evaluate the performance of that custom-trained model on your own texts.
TRAINING METRICS AFTER THREE EPOCHS:
The final project provides:
- A clean, reliable API interface written in FastApi for the custom-trained
muyiiwaa/ai_detect_modernbertmodel. - An easy way for others to test and evaluate this specific model's performance, especially in light of the original online debate and the dataset it was trained on.
I tried to make the api as robust as i can. (Fairly easy to do in the age of AI and Vibecoding)
- Core Model: Features the custom-trained
muyiiwaa/ai_detect_modernbertmodel. - FastAPI Backend: Offers a high-performance API with automatic interactive documentation for straightforward testing.
- Pydantic Validation: Ensures reliable data handling.
- Structured Logging: Provides operational transparency (JSON format).
- Configuration Management: Simple setup using a
.envfile. - Dedicated Service Layer: Organizes model loading and inference logic.
- Singleton Model Service: Efficiently manages the model resource (loaded once).
- Robust Error Handling: Manages potential runtime issues gracefully.
- Model Pre-loading: Initializes the model on application startup for responsiveness.
- A
POSTrequest containing text is sent to/api/v1/predict. - FastAPI validates the input using the
TextInputschema. - The request is handled by the
detect_textendpoint. - It uses the
TextDetectionService, which loaded themuyiiwaa/ai_detect_modernbertmodel at startup. - The service preprocesses the text (removes punctuation).
- The text is tokenized and passed to the fine-tuned model for inference.
- The model returns logits, which are converted to probabilities (softmax scores for Class 0: Human, Class 1: AI).
- The results (scores, predicted class, label) are formatted by the service.
- FastAPI validates the response via the
PredictionOutputschema and returns the JSON result.
While i am of the opinion that a carefully fine tuned state of the art transformer based model can do a decent job, i also do not agree that USING ONLY AI OR ML models to discredit anyone's work is fair. In production, there are going to be false positives and these false positives are not just numbers they are humans who have put blood and sweat into their writing and are going to be unfairly put down because a detector said so