Skip to content

Latest commit

 

History

History
51 lines (37 loc) · 1.83 KB

File metadata and controls

51 lines (37 loc) · 1.83 KB

Latin NLP Tools Comparison

This project evaluates the speed, accuracy, and usability of four prominent Natural Language Processing (NLP) tools for Latin texts:

Two samples are used for testing:

Project Goal

To provide a reproducible and comparative analysis of Latin NLP tools for tokenisation, lemmatisation, and POS-tagging, as well as processing speed.

Project Structure

  • data/: Sample Latin texts (raw and preprocessed)
  • notebooks/: Jupyter notebooks for experiments
  • scripts/: Python scripts for preprocessing and tool execution
  • results/: Accuracy/speed metrics and visualizations

Installation

  1. Clone the repo:
    git clone https://github.com/YOUR_USERNAME/latin-nlp-comparison.git
    cd latin-nlp-comparison ```
  2. Create a virtual environment:
     source env/bin/activate ```
  3. Install dependencies:

pip install -r requirements.txt

Metrics Evaluated

  • Accuracy
    • Tokenisation
    • Lemmatisation
    • POS
  • Speed: length of time to process data
  • Usability: Observational assessment of set-up complexity, packages required, interface, export options

Wiki

See the GitHub Wiki for documentation, tool setup guides, and detailed findings.

Acknowledgments

Supervised by Bernhard Bauer at the University of Graz.