diff --git a/README.md b/README.md
index 517cad15e..86d25da2b 100644
--- a/README.md
+++ b/README.md
@@ -1,1220 +1,646 @@
-# Awesome Python
-
-An opinionated list of awesome Python frameworks, libraries, tools, software and resources.
-
-> The **#10 most-starred repo on GitHub**. Put your product where Python developers discover tools. [Become a sponsor](SPONSORSHIP.md).
-
-# Categories
-
-- [Admin Panels](#admin-panels)
-- [AI and Agents](#ai-and-agents)
-- [Algorithms and Design Patterns](#algorithms-and-design-patterns)
-- [ASGI Servers](#asgi-servers)
-- [Asynchronous Programming](#asynchronous-programming)
-- [Audio](#audio)
-- [Authentication](#authentication)
-- [Build Tools](#build-tools)
-- [Built-in Classes Enhancement](#built-in-classes-enhancement)
-- [Caching](#caching)
-- [CMS](#cms)
-- [Code Analysis](#code-analysis)
-- [Command-line Interface Development](#command-line-interface-development)
-- [Command-line Tools](#command-line-tools)
-- [Computer Vision](#computer-vision)
-- [Configuration Files](#configuration-files)
-- [Cryptography](#cryptography)
-- [Data Analysis](#data-analysis)
-- [Data Validation](#data-validation)
-- [Data Visualization](#data-visualization)
-- [Database Drivers](#database-drivers)
-- [Database](#database)
-- [Date and Time](#date-and-time)
-- [Debugging Tools](#debugging-tools)
-- [Deep Learning](#deep-learning)
-- [DevOps Tools](#devops-tools)
-- [Distributed Computing](#distributed-computing)
-- [Distribution](#distribution)
-- [Documentation](#documentation)
-- [Downloader](#downloader)
-- [Editor Plugins and IDEs](#editor-plugins-and-ides)
-- [Email](#email)
-- [Environment Management](#environment-management)
-- [File Manipulation](#file-manipulation)
-- [Functional Programming](#functional-programming)
-- [Game Development](#game-development)
-- [Geolocation](#geolocation)
-- [GUI Development](#gui-development)
-- [Hardware](#hardware)
-- [HTML Manipulation](#html-manipulation)
-- [HTTP Clients](#http-clients)
-- [Image Processing](#image-processing)
-- [Implementations](#implementations)
-- [Interactive Interpreter](#interactive-interpreter)
-- [Internationalization](#internationalization)
-- [Job Scheduler](#job-scheduler)
-- [Logging](#logging)
-- [Machine Learning](#machine-learning)
-- [Miscellaneous](#miscellaneous)
-- [Natural Language Processing](#natural-language-processing)
-- [Network Virtualization](#network-virtualization)
-- [ORM](#orm)
-- [Package Management](#package-management)
-- [Package Repositories](#package-repositories)
-- [Penetration testing](#penetration-testing)
-- [Permissions](#permissions)
-- [Processes](#processes)
-- [Quantum Computing](#quantum-computing)
-- [Recommender Systems](#recommender-systems)
-- [Refactoring](#refactoring)
-- [RESTful API](#restful-api)
-- [Robotics](#robotics)
-- [RPC Servers](#rpc-servers)
-- [Science](#science)
-- [Search](#search)
-- [Serialization](#serialization)
-- [Serverless Frameworks](#serverless-frameworks)
-- [Shell](#shell)
-- [Specific Formats Processing](#specific-formats-processing)
-- [Static Site Generator](#static-site-generator)
-- [Task Queues](#task-queues)
-- [Template Engine](#template-engine)
-- [Testing](#testing)
-- [Text Processing](#text-processing)
-- [URL Manipulation](#url-manipulation)
-- [Video](#video)
-- [Web Asset Management](#web-asset-management)
-- [Web Content Extracting](#web-content-extracting)
-- [Web Crawling](#web-crawling)
-- [Web Frameworks](#web-frameworks)
-- [WebSocket](#websocket)
-- [WSGI Servers](#wsgi-servers)
-
----
-
-## Admin Panels
-
-_Libraries for administrative interfaces._
-
-- [ajenti](https://github.com/ajenti/ajenti) - The admin panel your servers deserve.
-- [django-grappelli](https://github.com/sehmaschine/django-grappelli) - A jazzy skin for the Django Admin-Interface.
-- [django-unfold](https://github.com/unfoldadmin/django-unfold) - Elevate your Django admin with a stunning modern interface, powerful features, and seamless user experience.
-- [flask-admin](https://github.com/flask-admin/flask-admin) - Simple and extensible administrative interface framework for Flask.
-- [flower](https://github.com/mher/flower) - Real-time monitor and web admin for Celery.
-- [func-to-web](https://github.com/offerrall/FuncToWeb) - Instantly create web UIs from Python functions using type hints. Zero frontend code required.
-- [jet-bridge](https://github.com/jet-admin/jet-bridge) - Admin panel framework for any application with nice UI (ex Jet Django).
-- [wooey](https://github.com/wooey/wooey) - A Django app which creates automatic web UIs for Python scripts.
-
-## AI and Agents
-
-_Libraries for building AI applications, LLM integrations, and autonomous agents._
-
-- [agno](https://github.com/agno-agi/agno) - Open-source Python library for building AI agents and agentic systems.
-- [autoresearch](https://github.com/karpathy/autoresearch) - AI agents running autonomous research experiments on single-GPU LLM training.
-- [instructor](https://github.com/567-labs/instructor) - A library for extracting structured data from LLMs, powered by Pydantic.
-- [langchain](https://github.com/langchain-ai/langchain) - Building applications with LLMs through composability.
-- [llama_index](https://github.com/run-llama/llama_index) - A data framework for your LLM application.
-- [praisonai](https://github.com/MervinPraison/PraisonAI) - Production-ready Multi-AI Agents framework with self-reflection, 100+ LLM support, MCP integration, and agentic workflows.
-- [pydantic-ai](https://github.com/pydantic/pydantic-ai) - A Python agent framework for building generative AI applications with structured schemas.
-- [ragflow](https://github.com/infiniflow/ragflow) - An open-source RAG engine for document understanding and question answering with LLMs.
-
-## Algorithms and Design Patterns
-
-_Python implementation of data structures, algorithms and design patterns. Also see [awesome-algorithms](https://github.com/tayllan/awesome-algorithms)._
-
-- Algorithms
- - [algorithms](https://github.com/keon/algorithms) - Minimal examples of data structures and algorithms.
- - [python-ds](https://github.com/prabhupant/python-ds) - A collection of data structure and algorithms for coding interviews.
- - [sortedcontainers](https://github.com/grantjenks/python-sortedcontainers) - Fast and pure-Python implementation of sorted collections.
- - [thealgorithms](https://github.com/TheAlgorithms/Python) - All Algorithms implemented in Python.
-- Design Patterns
- - [python-cqrs](https://github.com/vadikko2/python-cqrs) - Event-Driven Architecture Framework with CQRS/CQS, Transaction Outbox, Saga orchestration.
- - [python-patterns](https://github.com/faif/python-patterns) - A collection of design patterns in Python.
- - [transitions](https://github.com/pytransitions/transitions) - A lightweight, object-oriented finite state machine implementation.
-
-## ASGI Servers
-
-_[ASGI](https://asgi.readthedocs.io/en/latest/)-compatible web servers._
-
-- [daphne](https://github.com/django/daphne) - A HTTP, HTTP2 and WebSocket protocol server for ASGI and ASGI-HTTP.
-- [granian](https://github.com/emmett-framework/granian) - A Rust HTTP server for Python applications built on top of Hyper and Tokio, supporting WSGI/ASGI/RSGI.
-- [hypercorn](https://github.com/pgjones/hypercorn) - An ASGI and WSGI Server based on Hyper libraries and inspired by Gunicorn.
-- [uvicorn](https://github.com/encode/uvicorn) - A lightning-fast ASGI server implementation, using uvloop and httptools.
-
-## Asynchronous Programming
-
-_Libraries for asynchronous, concurrent and parallel execution. Also see [awesome-asyncio](https://github.com/timofurrer/awesome-asyncio)._
-
-- [asyncio](https://docs.python.org/3/library/asyncio.html) - (Python standard library) Asynchronous I/O, event loop, coroutines and tasks.
- - [awesome-asyncio](https://github.com/timofurrer/awesome-asyncio)
-- [concurrent.futures](https://docs.python.org/3/library/concurrent.futures.html) - (Python standard library) A high-level interface for asynchronously executing callables.
-- [gevent](https://github.com/gevent/gevent) - A coroutine-based Python networking library that uses [greenlet](https://github.com/python-greenlet/greenlet).
-- [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) - (Python standard library) Process-based parallelism.
-- [trio](https://github.com/python-trio/trio) - A friendly library for async concurrency and I/O.
-- [twisted](https://github.com/twisted/twisted) - An event-driven networking engine.
-- [uvloop](https://github.com/MagicStack/uvloop) - Ultra fast asyncio event loop.
-
-## Audio
-
-_Libraries for manipulating audio and its metadata._
-
-- Audio
- - [audioflux](https://github.com/libAudioFlux/audioFlux) - A library for audio and music analysis, feature extraction.
- - [audioread](https://github.com/beetbox/audioread) - Cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding.
- - [dejavu](https://github.com/worldveil/dejavu) - Audio fingerprinting and recognition.
- - [gtts](https://github.com/pndurette/gTTS) - Python library and CLI tool for converting text to speech using Google Translate TTS.
- - [kapre](https://github.com/keunwoochoi/kapre) - Keras Audio Preprocessors.
- - [librosa](https://github.com/librosa/librosa) - Python library for audio and music analysis.
- - [matchering](https://github.com/sergree/matchering) - A library for automated reference audio mastering.
- - [mingus](https://github.com/bspaans/python-mingus) - An advanced music theory and notation package with MIDI file and playback support.
- - [pyAudioAnalysis](https://github.com/tyiannak/pyAudioAnalysis) - Audio feature extraction, classification, segmentation and applications.
- - [pydub](https://github.com/jiaaro/pydub) - Manipulate audio with a simple and easy high level interface.
- - [TimeSide](https://github.com/Parisson/TimeSide) - Open web audio processing framework.
-- Metadata
- - [beets](https://github.com/beetbox/beets) - A music library manager and [MusicBrainz](https://musicbrainz.org/) tagger.
- - [eyed3](https://github.com/nicfit/eyeD3) - A tool for working with audio files, specifically MP3 files containing ID3 metadata.
- - [mutagen](https://github.com/quodlibet/mutagen) - A Python module to handle audio metadata.
- - [tinytag](https://github.com/devsnd/tinytag) - A library for reading music meta data of MP3, OGG, FLAC and Wave files.
-
-## Authentication
-
-_Libraries for implementing authentication schemes._
-
-- OAuth
- - [authlib](https://github.com/lepture/authlib) - JavaScript Object Signing and Encryption draft implementation.
- - [django-allauth](https://github.com/pennersr/django-allauth) - Authentication app for Django that "just works."
- - [django-oauth-toolkit](https://github.com/jazzband/django-oauth-toolkit) - OAuth 2 goodies for Django.
- - [oauthlib](https://github.com/oauthlib/oauthlib) - A generic and thorough implementation of the OAuth request-signing logic.
-- JWT
- - [pyjwt](https://github.com/jpadilla/pyjwt) - JSON Web Token implementation in Python.
- - [python-jose](https://github.com/mpdavis/python-jose/) - A JOSE implementation in Python.
-
-## Build Tools
-
-_Compile software from source code._
-
-- [bitbake](https://github.com/openembedded/bitbake) - A make-like build tool for embedded Linux.
-- [buildout](https://github.com/buildout/buildout) - A build system for creating, assembling and deploying applications from multiple parts.
-- [platformio](https://github.com/platformio/platformio-core) - A console tool to build code with different development platforms.
-- [pybuilder](https://github.com/pybuilder/pybuilder) - A continuous build tool written in pure Python.
-- [scons](https://github.com/SCons/scons) - A software construction tool.
-
-## Built-in Classes Enhancement
-
-_Libraries for enhancing Python built-in classes._
-
-- [attrs](https://github.com/python-attrs/attrs) - Replacement for `__init__`, `__eq__`, `__repr__`, etc. boilerplate in class definitions.
-- [bidict](https://github.com/jab/bidict) - Efficient, Pythonic bidirectional map data structures and related functionality.
-- [box](https://github.com/cdgriffith/Box) - Python dictionaries with advanced dot notation access.
-
-## CMS
-
-_Content Management Systems._
-
-- [feincms](https://github.com/feincms/feincms) - One of the most advanced Content Management Systems built on Django.
-- [indico](https://github.com/indico/indico) - A feature-rich event management system, made @ [CERN](https://en.wikipedia.org/wiki/CERN).
-- [wagtail](https://github.com/wagtail/wagtail) - A Django content management system.
-
-## Caching
-
-_Libraries for caching data._
-
-- [beaker](https://github.com/bbangert/beaker) - A WSGI middleware for sessions and caching.
-- [django-cache-machine](https://github.com/django-cache-machine/django-cache-machine) - Automatic caching and invalidation for Django models.
-- [django-cacheops](https://github.com/Suor/django-cacheops) - A slick ORM cache with automatic granular event-driven invalidation.
-- [dogpile.cache](https://github.com/sqlalchemy/dogpile.cache) - dogpile.cache is a next generation replacement for Beaker made by the same authors.
-- [hermescache](https://pypi.org/project/HermesCache/) - Python caching library with tag-based invalidation and dogpile effect prevention.
-- [pylibmc](https://github.com/lericson/pylibmc) - A Python wrapper around the [libmemcached](https://libmemcached.org/libMemcached.html) interface.
-- [python-diskcache](https://github.com/grantjenks/python-diskcache) - SQLite and file backed cache backend with faster lookups than memcached and redis.
-
-## Code Analysis
-
-_Tools of static analysis, linters and code quality checkers. Also see [awesome-static-analysis](https://github.com/mre/awesome-static-analysis)._
-
-- Code Analysis
- - [code-graph-rag](https://github.com/vitali87/code-graph-rag) - Builds knowledge graphs from multi-language codebases using Tree-sitter and Memgraph, enabling natural language querying of code structure.
- - [code2flow](https://github.com/scottrogowski/code2flow) - Turn your Python and JavaScript code into DOT flowcharts.
- - [prospector](https://github.com/PyCQA/prospector) - A tool to analyze Python code.
- - [vulture](https://github.com/jendrikseipp/vulture) - A tool for finding and analyzing dead Python code.
-- Code Linters
- - [flake8](https://github.com/PyCQA/flake8) - A wrapper around `pycodestyle`, `pyflakes` and McCabe.
- - [awesome-flake8-extensions](https://github.com/DmytroLitvinov/awesome-flake8-extensions)
- - [pylint](https://github.com/pylint-dev/pylint) - A fully customizable source code analyzer.
- - [ruff](https://github.com/astral-sh/ruff) - An extremely fast Python linter and code formatter.
-- Code Formatters
- - [black](https://github.com/psf/black) - The uncompromising Python code formatter.
- - [isort](https://github.com/timothycrosley/isort) - A Python utility / library to sort imports.
- - [yapf](https://github.com/google/yapf) - Yet another Python code formatter from Google.
-- Static Type Checkers, also see [awesome-python-typing](https://github.com/typeddjango/awesome-python-typing)
- - [mypy](https://github.com/python/mypy) - Check variable types during compile time.
- - [pyre-check](https://github.com/facebook/pyre-check) - Performant type checking.
- - [ty](https://github.com/astral-sh/ty) - An extremely fast Python type checker and language server.
- - [typeshed](https://github.com/python/typeshed) - Collection of library stubs for Python, with static types.
-- Static Type Annotations Generators
- - [monkeytype](https://github.com/Instagram/MonkeyType) - A system for Python that generates static type annotations by collecting runtime types.
- - [pytype](https://github.com/google/pytype) - Pytype checks and infers types for Python code - without requiring type annotations.
-
-## Command-line Interface Development
-
-_Libraries for building command-line applications._
-
-- Command-line Application Development
- - [argparse](https://docs.python.org/3/library/argparse.html) - (Python standard library) Command-line option and argument parsing.
- - [cement](https://github.com/datafolklabs/cement) - CLI Application Framework for Python.
- - [click](https://github.com/pallets/click/) - A package for creating beautiful command line interfaces in a composable way.
- - [cliff](https://github.com/openstack/cliff) - A framework for creating command-line programs with multi-level commands.
- - [python-fire](https://github.com/google/python-fire) - A library for creating command line interfaces from absolutely any Python object.
- - [python-prompt-toolkit](https://github.com/prompt-toolkit/python-prompt-toolkit) - A library for building powerful interactive command lines.
- - [typer](https://github.com/tiangolo/typer) - Modern CLI framework that uses Python type hints. Built on Click and Pydantic.
-- Terminal Rendering
- - [alive-progress](https://github.com/rsalmei/alive-progress) - A new kind of Progress Bar, with real-time throughput, eta and very cool animations.
- - [asciimatics](https://github.com/peterbrittain/asciimatics) - A package to create full-screen text UIs (from interactive forms to ASCII animations).
- - [bashplotlib](https://github.com/glamp/bashplotlib) - Making basic plots in the terminal.
- - [colorama](https://github.com/tartley/colorama) - Cross-platform colored terminal text.
- - [rich](https://github.com/Textualize/rich) - Python library for rich text and beautiful formatting in the terminal. Also provides a great `RichHandler` log handler.
- - [textual](https://github.com/Textualize/textual) - A framework for building interactive user interfaces that run in the terminal and the browser.
- - [tqdm](https://github.com/tqdm/tqdm) - Fast, extensible progress bar for loops and CLI.
-
-## Command-line Tools
-
-_Useful CLI-based tools for productivity._
-
-- Productivity Tools
- - [ccb](https://github.com/bfly123/claude_code_bridge) - A CLI tool to orchestrate multiple LLMs (Claude, Gemini, etc.) in Tmux panes with cross-agent interaction.
- - [cookiecutter](https://github.com/cookiecutter/cookiecutter) - A command-line utility that creates projects from cookiecutters (project templates).
- - [copier](https://github.com/copier-org/copier) - A library and command-line utility for rendering projects templates.
- - [doitlive](https://github.com/sloria/doitlive) - A tool for live presentations in the terminal.
- - [howdoi](https://github.com/gleitz/howdoi) - Instant coding answers via the command line.
- - [invoke](https://github.com/pyinvoke/invoke) - A tool for managing shell-oriented subprocesses and organizing executable Python code into CLI-invokable tasks.
- - [fpp](https://github.com/facebook/PathPicker) - Select files out of bash output.
- - [thefuck](https://github.com/nvbn/thefuck) - Correcting your previous console command.
- - [tmuxp](https://github.com/tmux-python/tmuxp) - A [tmux](https://github.com/tmux/tmux) session manager.
- - [try](https://github.com/timofurrer/try) - A dead simple CLI to try out python packages - it's never been easier.
-- CLI Enhancements
- - [httpie](https://github.com/httpie/cli) - A command line HTTP client, a user-friendly cURL replacement.
- - [iredis](https://github.com/laixintao/iredis) - Redis CLI with autocompletion and syntax highlighting.
- - [litecli](https://github.com/dbcli/litecli) - SQLite CLI with autocompletion and syntax highlighting.
- - [mycli](https://github.com/dbcli/mycli) - MySQL CLI with autocompletion and syntax highlighting.
- - [pgcli](https://github.com/dbcli/pgcli) - PostgreSQL CLI with autocompletion and syntax highlighting.
-
-## Computer Vision
-
-_Libraries for Computer Vision._
-
-- [easyocr](https://github.com/JaidedAI/EasyOCR) - Ready-to-use OCR with 40+ languages supported.
-- [kornia](https://github.com/kornia/kornia/) - Open Source Differentiable Computer Vision Library for PyTorch.
-- [opencv](https://github.com/opencv/opencv-python) - Open Source Computer Vision Library.
-- [pytesseract](https://github.com/madmaze/pytesseract) - A wrapper for [Google Tesseract OCR](https://github.com/tesseract-ocr).
-- [tesserocr](https://github.com/sirfz/tesserocr) - Another simple, Pillow-friendly, wrapper around the `tesseract-ocr` API for OCR.
-
-## Configuration Files
-
-_Libraries for storing and parsing configuration options._
-
-- [configobj](https://github.com/DiffSK/configobj) - INI file parser with validation.
-- [configparser](https://docs.python.org/3/library/configparser.html) - (Python standard library) INI file parser.
-- [dynaconf](https://github.com/dynaconf/dynaconf) - Dynaconf is a configuration manager with plugins for Django, Flask and FastAPI.
-- [hydra](https://github.com/facebookresearch/hydra) - Hydra is a framework for elegantly configuring complex applications.
-- [python-decouple](https://github.com/HBNetwork/python-decouple) - Strict separation of settings from code.
-
-## Cryptography
-
-- [cryptography](https://github.com/pyca/cryptography) - A package designed to expose cryptographic primitives and recipes to Python developers.
-- [paramiko](https://github.com/paramiko/paramiko) - The leading native Python SSHv2 protocol library.
-- [pynacl](https://github.com/pyca/pynacl) - Python binding to the Networking and Cryptography (NaCl) library.
-
-## Data Analysis
-
-_Libraries for data analysis._
-
-- [aws-sdk-pandas](https://github.com/aws/aws-sdk-pandas) - Pandas on AWS.
-- [datasette](https://github.com/simonw/datasette) - An open source multi-tool for exploring and publishing data.
-- [desbordante](https://github.com/desbordante/desbordante-core/) - An open source data profiler for complex pattern discovery.
-- [optimus](https://github.com/hi-primus/optimus) - Agile Data Science Workflows made easy with PySpark.
-- [pandas](https://github.com/pandas-dev/pandas) - A library providing high-performance, easy-to-use data structures and data analysis tools.
-- [pathway](https://github.com/pathwaycom/pathway) - Real-time data processing framework for Python with reactive dataflows.
-- [polars](https://github.com/pola-rs/polars) - A fast DataFrame library implemented in Rust with a Python API.
-
-## Data Validation
-
-_Libraries for validating data. Used for forms in many cases._
-
-- [cerberus](https://github.com/pyeve/cerberus) - A lightweight and extensible data validation library.
-- [colander](https://github.com/Pylons/colander) - Validating and deserializing data obtained via XML, JSON, an HTML form post.
-- [jsonschema](https://github.com/python-jsonschema/jsonschema) - An implementation of [JSON Schema](http://json-schema.org/) for Python.
-- [pydantic](https://github.com/pydantic/pydantic) - Data validation using Python type hints.
-- [schema](https://github.com/keleshev/schema) - A library for validating Python data structures.
-- [schematics](https://github.com/schematics/schematics) - Data Structure Validation.
-- [voluptuous](https://github.com/alecthomas/voluptuous) - A Python data validation library.
-
-## Data Visualization
-
-_Libraries for visualizing data. Also see [awesome-javascript](https://github.com/sorrycc/awesome-javascript#data-visualization)._
-
-- [altair](https://github.com/altair-viz/altair) - Declarative statistical visualization library for Python.
-- [bokeh](https://github.com/bokeh/bokeh) - Interactive Web Plotting for Python.
-- [bqplot](https://github.com/bloomberg/bqplot) - Interactive Plotting Library for the Jupyter Notebook.
-- [cartopy](https://github.com/SciTools/cartopy) - A cartographic python library with matplotlib support.
-- [diagrams](https://github.com/mingrammer/diagrams) - Diagram as Code.
-- [matplotlib](https://github.com/matplotlib/matplotlib) - A Python 2D plotting library.
-- [plotly](https://github.com/plotly/plotly.py) - Interactive graphing library for Python.
-- [plotnine](https://github.com/has2k1/plotnine) - A grammar of graphics for Python based on ggplot2.
-- [pygal](https://github.com/Kozea/pygal) - A Python SVG Charts Creator.
-- [pygraphviz](https://github.com/pygraphviz/pygraphviz/) - Python interface to [Graphviz](http://www.graphviz.org/).
-- [pyqtgraph](https://github.com/pyqtgraph/pyqtgraph) - Interactive and realtime 2D/3D/Image plotting and science/engineering widgets.
-- [seaborn](https://github.com/mwaskom/seaborn) - Statistical data visualization using Matplotlib.
-- [streamlit](https://github.com/streamlit/streamlit) - A framework which lets you build dashboards, generate reports, or create chat apps in minutes.
-- [ultraplot](https://github.com/ultraplot/UltraPlot) - Matplotlib wrapper for publication-ready scientific figures with minimal code. Includes advanced subplot management, panel layouts, and batteries-included geoscience plotting.
-- [vispy](https://github.com/vispy/vispy) - High-performance scientific visualization based on OpenGL.
-
-## Database
-
-_Databases implemented in Python._
-
-- [duckdb](https://github.com/duckdb/duckdb) - An in-process SQL OLAP database management system; optimized for analytics and fast queries, similar to SQLite but for analytical workloads.
-- [pickledb](https://github.com/patx/pickledb) - A simple and lightweight key-value store for Python.
-- [tinydb](https://github.com/msiemens/tinydb) - A tiny, document-oriented database.
-- [ZODB](https://github.com/zopefoundation/ZODB) - A native object database for Python. A key-value and object graph database.
-
-## Database Drivers
-
-_Libraries for connecting and operating databases._
-
-- MySQL - [awesome-mysql](https://github.com/shlomi-noach/awesome-mysql)
- - [mysqlclient](https://github.com/PyMySQL/mysqlclient) - MySQL connector with Python 3 support ([mysql-python](https://sourceforge.net/projects/mysql-python/) fork).
- - [pymysql](https://github.com/PyMySQL/PyMySQL) - A pure Python MySQL driver compatible to mysql-python.
-- PostgreSQL - [awesome-postgres](https://github.com/dhamaniasad/awesome-postgres)
- - [psycopg](https://github.com/psycopg/psycopg) - The most popular PostgreSQL adapter for Python.
-- SQlite - [awesome-sqlite](https://github.com/planetopendata/awesome-sqlite)
- - [sqlite-utils](https://github.com/simonw/sqlite-utils) - Python CLI utility and library for manipulating SQLite databases.
- - [sqlite3](https://docs.python.org/3/library/sqlite3.html) - (Python standard library) SQlite interface compliant with DB-API 2.0.
-- Other Relational Databases
- - [clickhouse-driver](https://github.com/mymarilyn/clickhouse-driver) - Python driver with native interface for ClickHouse.
- - [pymssql](https://github.com/pymssql/pymssql) - A simple database interface to Microsoft SQL Server.
-- NoSQL Databases
- - [cassandra-driver](https://github.com/datastax/python-driver) - The Python Driver for Apache Cassandra.
- - [django-mongodb-backend](https://github.com/mongodb/django-mongodb-backend) - Official MongoDB database backend for Django.
- - [kafka-python](https://github.com/dpkp/kafka-python) - The Python client for Apache Kafka.
- - [pymongo](https://github.com/mongodb/mongo-python-driver) - The official Python client for MongoDB.
- - [redis-py](https://github.com/redis/redis-py) - The Python client for Redis.
-
-## Date and Time
-
-_Libraries for working with dates and times._
-
-- [arrow](https://github.com/arrow-py/arrow) - A Python library that offers a sensible and human-friendly approach to creating, manipulating, formatting and converting dates, times and timestamps.
-- [dateutil](https://github.com/dateutil/dateutil) - Extensions to the standard Python [datetime](https://docs.python.org/3/library/datetime.html) module.
-- [pendulum](https://github.com/sdispater/pendulum) - Python datetimes made easy.
-- [zoneinfo](https://docs.python.org/3/library/zoneinfo.html) - (Python standard library) IANA time zone support. Brings the [tz database](https://en.wikipedia.org/wiki/Tz_database) into Python.
-
-## Debugging Tools
-
-_Libraries for debugging code._
-
-- pdb-like Debugger
- - [ipdb](https://github.com/gotcha/ipdb) - IPython-enabled [pdb](https://docs.python.org/3/library/pdb.html).
- - [pudb](https://github.com/inducer/pudb) - A full-screen, console-based Python debugger.
-- Tracing
- - [manhole](https://github.com/ionelmc/python-manhole) - Debugging UNIX socket connections and present the stacktraces for all threads and an interactive prompt.
- - [python-hunter](https://github.com/ionelmc/python-hunter) - A flexible code tracing toolkit.
-- Profiler
- - [py-spy](https://github.com/benfred/py-spy) - A sampling profiler for Python programs. Written in Rust.
- - [vprof](https://github.com/nvdv/vprof) - Visual Python profiler.
-- Others
- - [django-debug-toolbar](https://github.com/jazzband/django-debug-toolbar) - Display various debug information for Django.
- - [flask-debugtoolbar](https://github.com/pallets-eco/flask-debugtoolbar) - A port of the django-debug-toolbar to flask.
- - [icecream](https://github.com/gruns/icecream) - Inspect variables, expressions, and program execution with a single, simple function call.
- - [memory_graph](https://github.com/bterwijn/memory_graph) - Visualize Python data at runtime to debug references, mutability, and aliasing.
- - [pyelftools](https://github.com/eliben/pyelftools) - Parsing and analyzing ELF files and DWARF debugging information.
-
-## Deep Learning
-
-_Frameworks for Neural Networks and Deep Learning. Also see [awesome-deep-learning](https://github.com/ChristosChristofidis/awesome-deep-learning)._
-
-- [jax](https://github.com/google/jax) - a library for high-performance numerical computing with automatic differentiation and JIT compilation.
-- [keras](https://github.com/keras-team/keras) - A high-level neural networks library and capable of running on top of either TensorFlow or Theano.
-- [pytorch-lightning](https://github.com/Lightning-AI/pytorch-lightning) - Deep learning framework to train, deploy, and ship AI products Lightning fast.
-- [pytorch](https://github.com/pytorch/pytorch) - Tensors and Dynamic neural networks in Python with strong GPU acceleration.
-- [stable-baselines3](https://github.com/DLR-RM/stable-baselines3) - PyTorch implementations of Stable Baselines (deep) reinforcement learning algorithms.
-- [tensorflow](https://github.com/tensorflow/tensorflow) - The most popular Deep Learning framework created by Google.
-- [vllm](https://github.com/vllm-project/vllm) - A high-throughput and memory-efficient inference and serving engine for LLMs.
-
-## DevOps Tools
-
-_Software and libraries for DevOps._
-
-- Cloud Providers
- - [boto3](https://github.com/boto/boto3) - Python interface to Amazon Web Services.
-- Configuration Management
- - [ansible](https://github.com/ansible/ansible) - A radically simple IT automation platform.
- - [cloudinit](https://github.com/canonical/cloud-init) - A multi-distribution package that handles early initialization of a cloud instance.
- - [openstack](https://www.openstack.org/) - Open source software for building private and public clouds.
- - [pyinfra](https://github.com/pyinfra-dev/pyinfra) - A versatile CLI tools and python libraries to automate infrastructure.
- - [saltstack](https://github.com/saltstack/salt) - Infrastructure automation and management system.
-- SSH-style Deployment
- - [cuisine](https://github.com/sebastien/cuisine) - Chef-like functionality for Fabric.
- - [fabric](https://github.com/fabric/fabric) - A simple, Pythonic tool for remote execution and deployment.
-- Process Management
- - [supervisor](https://github.com/Supervisor/supervisor) - Supervisor process control system for UNIX.
-- Monitoring
- - [psutil](https://github.com/giampaolo/psutil) - A cross-platform process and system utilities module.
-- Backup
- - [borg](https://github.com/borgbackup/borg) - A deduplicating archiver with compression and encryption.
-- Chaos Engineering
- - [chaostoolkit](https://github.com/chaostoolkit/chaostoolkit) - A Chaos Engineering toolkit & Orchestration for Developers.
-
-## Distributed Computing
-
-_Frameworks and libraries for Distributed Computing._
-
-- Batch Processing
- - [dask](https://github.com/dask/dask) - A flexible parallel computing library for analytic computing.
- - [luigi](https://github.com/spotify/luigi) - A module that helps you build complex pipelines of batch jobs.
- - [mpi4py](https://github.com/mpi4py/mpi4py) - Python bindings for MPI.
- - [pyspark](https://github.com/apache/spark) - [Apache Spark](https://spark.apache.org/) Python API.
- - [ray](https://github.com/ray-project/ray/) - A system for parallel and distributed Python that unifies the machine learning ecosystem.
-- Stream Processing
- - [streamparse](https://github.com/Parsely/streamparse) - Run Python code against real-time streams of data via [Apache Storm](http://storm.apache.org/).
-
-## Distribution
-
-_Libraries to create packaged executables for release distribution._
-
-- [cx-Freeze](https://github.com/marcelotduarte/cx_Freeze) - It is a Python tool that converts Python scripts into standalone executables and installers for Windows, macOS, and Linux.
-- [Nuitka](https://github.com/Nuitka/Nuitka) - Compiles Python programs into high-performance standalone executables (cross-platform, supports all Python versions).
-- [py2app](https://github.com/ronaldoussoren/py2app) - Freezes Python scripts (Mac OS X).
-- [py2exe](https://github.com/py2exe/py2exe) - Freezes Python scripts (Windows).
-- [pyarmor](https://github.com/dashingsoft/pyarmor) - A tool used to obfuscate python scripts, bind obfuscated scripts to fixed machine or expire obfuscated scripts.
-- [pyinstaller](https://github.com/pyinstaller/pyinstaller) - Converts Python programs into stand-alone executables (cross-platform).
-- [shiv](https://github.com/linkedin/shiv) - A command line utility for building fully self-contained zipapps (PEP 441), but with all their dependencies included.
-
-## Documentation
-
-_Libraries for generating project documentation._
-
-- [sphinx](https://github.com/sphinx-doc/sphinx/) - Python Documentation generator.
- - [awesome-sphinxdoc](https://github.com/yoloseem/awesome-sphinxdoc)
-- [pdoc](https://github.com/mitmproxy/pdoc) - Epydoc replacement to auto generate API documentation for Python libraries.
-
-## Downloader
-
-_Libraries for downloading._
-
-- General
- - [s3cmd](https://github.com/s3tools/s3cmd) - A command line tool for managing Amazon S3 and CloudFront.
- - [youtube-dl](https://github.com/ytdl-org/youtube-dl/) - A command-line program to download videos from YouTube and other video sites.
-- Financial Data
- - [akshare](https://github.com/jindaxiang/akshare) - A financial data interface library, built for human beings!
- - [edgartools](https://github.com/dgunning/edgartools) - Library for downloading structured data from SEC EDGAR filings and XBRL financial statements.
- - [openbb](https://github.com/OpenBB-finance/OpenBB) - A financial data platform for analysts, quants and AI agents.
- - [yfinance](https://github.com/ranaroussi/yfinance) - Easy Pythonic way to download market and financial data from Yahoo Finance.
-
-## Editor Plugins and IDEs
-
-- Emacs
- - [elpy](https://github.com/jorgenschaefer/elpy) - Emacs Python Development Environment.
-- Vim
- - [jedi-vim](https://github.com/davidhalter/jedi-vim) - Vim bindings for the Jedi auto-completion library for Python.
- - [python-mode](https://github.com/python-mode/python-mode) - An all in one plugin for turning Vim into a Python IDE.
- - [YouCompleteMe](https://github.com/Valloric/YouCompleteMe) - Includes [Jedi](https://github.com/davidhalter/jedi)-based completion engine for Python.
-- Visual Studio
- - [PTVS](https://github.com/Microsoft/PTVS) - Python Tools for Visual Studio.
-- Visual Studio Code
- - [Python for VSCode](https://marketplace.visualstudio.com/items?itemName=ms-python.python) - The official VSCode extension with rich support for Python.
-- IDE
- - [PyCharm](https://www.jetbrains.com/pycharm/) - Commercial Python IDE by JetBrains. Has free community edition available.
- - [spyder](https://github.com/spyder-ide/spyder) - Open Source Python IDE.
-
-## Email
-
-_Libraries for sending and parsing email._
-
-- Mail Servers
- - [modoboa](https://github.com/modoboa/modoboa) - A mail hosting and management platform including a modern Web UI.
-- Clients
- - [imbox](https://github.com/martinrusev/imbox) - Python IMAP for Humans.
- - [yagmail](https://github.com/kootenpv/yagmail) - Yet another Gmail/SMTP client.
-- Others
- - [flanker](https://github.com/mailgun/flanker) - An email address and Mime parsing library.
- - [mailer](https://github.com/marrow/mailer) - High-performance extensible mail delivery framework.
-
-## Environment Management
-
-_Libraries for Python version and virtual environment management._
-
-- [pyenv-win](https://github.com/pyenv-win/pyenv-win) - Pyenv for Windows, Simple Python version management.
-- [pyenv](https://github.com/pyenv/pyenv) - Simple Python version management.
-- [virtualenv](https://github.com/pypa/virtualenv) - A tool to create isolated Python environments.
-
-## File Manipulation
-
-_Libraries for file manipulation._
-
-- [mimetypes](https://docs.python.org/3/library/mimetypes.html) - (Python standard library) Map filenames to MIME types.
-- [path.py](https://github.com/jaraco/path.py) - A module wrapper for [os.path](https://docs.python.org/3/library/os.path.html).
-- [pathlib](https://docs.python.org/3/library/pathlib.html) - (Python standard library) A cross-platform, object-oriented path library.
-- [python-magic](https://github.com/ahupp/python-magic) - A Python interface to the libmagic file type identification library.
-- [watchdog](https://github.com/gorakhargosh/watchdog) - API and shell utilities to monitor file system events.
-- [watchfiles](https://github.com/samuelcolvin/watchfiles) - Simple, modern and fast file watching and code reload in python.
-
-## Functional Programming
-
-_Functional Programming with Python._
+# awesome-nlp
+
+[](https://github.com/sindresorhus/awesome)
+
+A curated list of resources dedicated to Natural Language Processing
+
+
+
+Read this in [English](./README.md), [Traditional Chinese](./README-ZH-TW.md)
+
+_Please read the [contribution guidelines](contributing.md) before contributing. Please add your favourite NLP resource by raising a [pull request](https://github.com/keonkim/awesome-nlp/pulls)_
+
+## Contents
+
+* [Research Summaries and Trends](#research-summaries-and-trends)
+* [Prominent NLP Research Labs](#prominent-nlp-research-labs)
+* [Tutorials](#tutorials)
+ * [Reading Content](#reading-content)
+ * [Videos and Courses](#videos-and-online-courses)
+ * [Books](#books)
+* [Libraries](#libraries)
+ * [Node.js](#node-js)
+ * [Python](#python)
+ * [C++](#c++)
+ * [Java](#java)
+ * [Kotlin](#kotlin)
+ * [Scala](#scala)
+ * [R](#R)
+ * [Clojure](#clojure)
+ * [Ruby](#ruby)
+ * [Rust](#rust)
+ * [NLP++](#NLP++)
+ * [Julia](#julia)
+* [Services](#services)
+* [Annotation Tools](#annotation-tools)
+* [Datasets](#datasets)
+* [NLP in Korean](#nlp-in-korean)
+* [NLP in Arabic](#nlp-in-arabic)
+* [NLP in Chinese](#nlp-in-chinese)
+* [NLP in German](#nlp-in-german)
+* [NLP in Polish](#nlp-in-polish)
+* [NLP in Spanish](#nlp-in-spanish)
+* [NLP in Indic Languages](#nlp-in-indic-languages)
+* [NLP in Thai](#nlp-in-thai)
+* [NLP in Danish](#nlp-in-danish)
+* [NLP in Vietnamese](#nlp-in-vietnamese)
+* [NLP for Dutch](#nlp-for-dutch)
+* [NLP in Indonesian](#nlp-in-indonesian)
+* [NLP in Urdu](#nlp-in-urdu)
+* [NLP in Persian](#nlp-in-persian)
+* [NLP in Ukrainian](#nlp-in-ukrainian)
+* [NLP in Hungarian](#nlp-in-hungarian)
+* [NLP in Portuguese](#nlp-in-portuguese)
+* [Other Languages](#other-languages)
+* [Citation](#citation)
+* [Credits](#credits)
+
+## Research Summaries and Trends
+
+* [NLP-Overview](https://nlpoverview.com/) is an up-to-date overview of deep learning techniques applied to NLP, including theory, implementations, applications, and state-of-the-art results. This is a great Deep NLP Introduction for researchers.
+* [NLP-Progress](https://nlpprogress.com/) tracks the progress in Natural Language Processing, including the datasets and the current state-of-the-art for the most common NLP tasks
+* [NLP's ImageNet moment has arrived](https://thegradient.pub/nlp-imagenet/)
+* [ACL 2018 Highlights: Understanding Representation and Evaluation in More Challenging Settings](http://ruder.io/acl-2018-highlights/)
+* [Four deep learning trends from ACL 2017. Part One: Linguistic Structure and Word Embeddings](https://www.abigailsee.com/2017/08/30/four-deep-learning-trends-from-acl-2017-part-1.html)
+* [Four deep learning trends from ACL 2017. Part Two: Interpretability and Attention](https://www.abigailsee.com/2017/08/30/four-deep-learning-trends-from-acl-2017-part-2.html)
+* [Highlights of EMNLP 2017: Exciting Datasets, Return of the Clusters, and More!](http://blog.aylien.com/highlights-emnlp-2017-exciting-datasets-return-clusters/)
+* [Deep Learning for Natural Language Processing (NLP): Advancements & Trends](https://tryolabs.com/blog/2017/12/12/deep-learning-for-nlp-advancements-and-trends-in-2017/?utm_campaign=Revue%20newsletter&utm_medium=Newsletter&utm_source=The%20Wild%20Week%20in%20AI)
+* [Survey of the State of the Art in Natural Language Generation](https://arxiv.org/abs/1703.09902)
+
+## Prominent NLP Research Labs
+[Back to Top](#contents)
+
+* [The Berkeley NLP Group](http://nlp.cs.berkeley.edu/index.shtml) - Notable contributions include a tool to reconstruct long dead languages, referenced [here](https://www.bbc.com/news/science-environment-21427896) and by taking corpora from 637 languages currently spoken in Asia and the Pacific and recreating their descendant.
+* [Language Technologies Institute, Carnegie Mellon University](http://www.cs.cmu.edu/~nasmith/nlp-cl.html) - Notable projects include [Avenue Project](http://www.cs.cmu.edu/~avenue/), a syntax driven machine translation system for endangered languages like Quechua and Aymara and previously, [Noah's Ark](http://www.cs.cmu.edu/~ark/) which created [AQMAR](http://www.cs.cmu.edu/~ark/AQMAR/) to improve NLP tools for Arabic.
+* [NLP research group, Columbia University](http://www1.cs.columbia.edu/nlp/index.cgi) - Responsible for creating BOLT ( interactive error handling for speech translation systems) and an un-named project to characterize laughter in dialogue.
+* [The Center or Language and Speech Processing, John Hopkins University](http://clsp.jhu.edu/) - Recently in the news for developing speech recognition software to create a diagnostic test or Parkinson's Disease, [here](https://www.clsp.jhu.edu/2019/03/27/speech-recognition-software-and-machine-learning-tools-are-being-used-to-create-diagnostic-test-for-parkinsons-disease/#.XNFqrIkzYdU).
+* [Computational Linguistics and Information Processing Group, University of Maryland](https://wiki.umiacs.umd.edu/clip/index.php/Main_Page) - Notable contributions include [Human-Computer Cooperation or Word-by-Word Question Answering](http://www.umiacs.umd.edu/~jbg/projects/IIS-1652666) and modeling development of phonetic representations.
+* [Penn Natural Language Processing, University of Pennsylvania](https://nlp.cis.upenn.edu/)- Famous for creating the [Penn Treebank](https://www.seas.upenn.edu/~pdtb/).
+* [The Stanford Nautral Language Processing Group](https://nlp.stanford.edu/)- One of the top NLP research labs in the world, notable for creating [Stanford CoreNLP](https://nlp.stanford.edu/software/corenlp.shtml) and their [coreference resolution system](https://nlp.stanford.edu/software/dcoref.shtml)
+
+
+## Tutorials
+[Back to Top](#contents)
+
+### Reading Content
+
+General Machine Learning
+
+* [Machine Learning 101](https://docs.google.com/presentation/d/1kSuQyW5DTnkVaZEjGYCkfOxvzCqGEFzWBy4e9Uedd9k/edit?usp=sharing) from Google's Senior Creative Engineer explains Machine Learning for engineer's and executives alike
+* [AI Playbook](https://aiplaybook.a16z.com/) - a16z AI playbook is a great link to forward to your managers or content for your presentations
+* [Ruder's Blog](http://ruder.io/#open) by [Sebastian Ruder](https://twitter.com/seb_ruder) for commentary on the best of NLP Research
+* [How To Label Data](https://www.lighttag.io/how-to-label-data/) guide to managing larger linguistic annotation projects
+* [Depends on the Definition](https://www.depends-on-the-definition.com/) collection of blog posts covering a wide array of NLP topics with detailed implementation
+
+Introductions and Guides to NLP
+
+* [Understand & Implement Natural Language Processing](https://www.analyticsvidhya.com/blog/2017/01/ultimate-guide-to-understand-implement-natural-language-processing-codes-in-python/)
+* [NLP in Python](http://github.com/NirantK/nlp-python-deep-learning) - Collection of Github notebooks
+* [Natural Language Processing: An Introduction](https://academic.oup.com/jamia/article/18/5/544/829676) - Oxford
+* [Deep Learning for NLP with Pytorch](https://pytorch.org/tutorials/beginner/deep_learning_nlp_tutorial.html)
+* [Hands-On NLTK Tutorial](https://github.com/hb20007/hands-on-nltk-tutorial) - NLTK Tutorials, Jupyter notebooks
+* [Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit](https://www.nltk.org/book/) - An online and print book introducing NLP concepts using NLTK. The book's authors also wrote the NLTK library.
+* [Train a new language model from scratch](https://huggingface.co/blog/how-to-train) - Hugging Face 🤗
+* [The Super Duper NLP Repo (SDNLPR)](https://notebooks.quantumstat.com/): Collection of Colab notebooks covering a wide array of NLP task implementations.
+* [Advanced NLP with spaCy](https://course.spacy.io/en/) - Free online course covering text processing, large-scale data analysis, processing pipelines, and training neural network models for custom NLP tasks.
+* [Kaggle NLP Learning Guide](https://www.kaggle.com/learn-guide/natural-language-processing) - Beginner-friendly tutorials including getting started guides, deep learning for NLP, and visual explanations of techniques like BERT, GloVe, and TF-IDF.
+
+Blogs and Newsletters
+
+* [Deep Learning, NLP, and Representations](https://colah.github.io/posts/2014-07-NLP-RNNs-Representations/)
+* [The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)](https://jalammar.github.io/illustrated-bert/) and [The Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/)
+* [Natural Language Processing](https://nlpers.blogspot.com/) by Hal Daumé III
+* [arXiv: Natural Language Processing (Almost) from Scratch](https://arxiv.org/pdf/1103.0398.pdf)
+* [Karpathy's The Unreasonable Effectiveness of Recurrent Neural Networks](https://karpathy.github.io/2015/05/21/rnn-effectiveness)
+* [Machine Learning Mastery: Deep Learning for Natural Language Processing](https://machinelearningmastery.com/category/natural-language-processing)
+* [Visual NLP Paper Summaries](https://amitness.com/categories/#nlp)
+
+### Videos and Online Courses
+[Back to Top](#contents)
+
+* [Advanced Natural Language Processing](https://people.cs.umass.edu/~miyyer/cs685_f20/) - CS 685, UMass Amherst CS
+* [Deep Natural Language Processing](https://github.com/oxford-cs-deepnlp-2017/lectures) - Lectures series from Oxford
+* [Deep Learning for Natural Language Processing (cs224-n)](https://web.stanford.edu/class/cs224n/) - Richard Socher and Christopher Manning's Stanford Course
+* [Neural Networks for NLP](http://phontron.com/class/nn4nlp2017/) - Carnegie Mellon Language Technology Institute there
+* [Deep NLP Course](https://github.com/yandexdataschool/nlp_course) by Yandex Data School, covering important ideas from text embedding to machine translation including sequence modeling, language models and so on.
+* [fast.ai Code-First Intro to Natural Language Processing](https://www.fast.ai/2019/07/08/fastai-nlp/) - This covers a blend of traditional NLP topics (including regex, SVD, naive bayes, tokenization) and recent neural network approaches (including RNNs, seq2seq, GRUs, and the Transformer), as well as addressing urgent ethical issues, such as bias and disinformation. Find the Jupyter Notebooks [here](https://github.com/fastai/course-nlp)
+* [Machine Learning University - Accelerated Natural Language Processing](https://www.youtube.com/playlist?list=PL8P_Z6C4GcuWfAq8Pt6PBYlck4OprHXsw) - Lectures go from introduction to NLP and text processing to Recurrent Neural Networks and Transformers. Material can be found [here](https://github.com/aws-samples/aws-machine-learning-university-accelerated-nlp).
+* [Applied Natural Language Processing](https://www.youtube.com/playlist?list=PLH-xYrxjfO2WyR3pOAB006CYMhNt4wTqp)- Lecture series from IIT Madras taking from the basics all the way to autoencoders and everything. The github notebooks for this course are also available [here](https://github.com/Ramaseshanr/anlp)
+* [DeepLearning.AI Natural Language Processing Specialization](https://www.deeplearning.ai/courses/natural-language-processing-specialization/) - 4-course program covering sentiment analysis, word embeddings, RNNs, LSTMs, attention mechanisms, and Transformer models like BERT and T5 for tasks including machine translation and summarization.
+
+
+### Books
+
+* [Speech and Language Processing](https://web.stanford.edu/~jurafsky/slp3/) - free, by Prof. Dan Jurafsy
+* [Natural Language Processing](https://github.com/jacobeisenstein/gt-nlp-class) - free, NLP notes by Dr. Jacob Eisenstein at GeorgiaTech
+* [NLP with PyTorch](https://github.com/joosthub/PyTorchNLPBook) - Brian & Delip Rao
+* [Text Mining in R](https://www.tidytextmining.com)
+* [Natural Language Processing with Python](https://www.nltk.org/book/)
+* [Practical Natural Language Processing](https://www.oreilly.com/library/view/practical-natural-language/9781492054047/)
+* [Natural Language Processing with Spark NLP](https://www.oreilly.com/library/view/natural-language-processing/9781492047759/)
+* [Deep Learning for Natural Language Processing](https://www.manning.com/books/deep-learning-for-natural-language-processing) by Stephan Raaijmakers
+* [Real-World Natural Language Processing](https://www.manning.com/books/real-world-natural-language-processing) - by Masato Hagiwara
+* [Natural Language Processing in Action, Second Edition](https://www.manning.com/books/natural-language-processing-in-action-second-edition) - by Hobson Lane and Maria Dyshel
+* [Transformers in Action](https://www.manning.com/books/transformers-in-action) - by Nicole Koenigstein
+* [The Math Behind Artificial Intelligence](https://www.freecodecamp.org/news/the-math-behind-artificial-intelligence-book) - by Tiago Monteiro | A free FreeCodeCamp book teaching the math behind AI in plain English from an engineering point of view. It covers linear algebra, calculus, probability & statistics, and optimization theory with analogies, real-life applications, and Python code examples.
+
+## Libraries
+
+[Back to Top](#contents)
+
+* **Node.js and Javascript** - Node.js Libaries for NLP | [Back to Top](#contents)
+ * [Twitter-text](https://github.com/twitter/twitter-text) - A JavaScript implementation of Twitter's text processing library
+ * [Knwl.js](https://github.com/benhmoore/Knwl.js) - A Natural Language Processor in JS
+ * [Retext](https://github.com/retextjs/retext) - Extensible system for analyzing and manipulating natural language
+ * [NLP Compromise](https://github.com/spencermountain/compromise) - Natural Language processing in the browser
+ * [Natural](https://github.com/NaturalNode/natural) - general natural language facilities for node
+ * [Poplar](https://github.com/synyi/poplar) - A web-based annotation tool for natural language processing (NLP)
+ * [NLP.js](https://github.com/axa-group/nlp.js) - An NLP library for building bots
+ * [node-question-answering](https://github.com/huggingface/node-question-answering) - Fast and production-ready question answering w/ DistilBERT in Node.js
+
+* **Python** - Python NLP Libraries | [Back to Top](#contents)
+ - [sentimental-onix](https://github.com/sloev/sentimental-onix) Sentiment models for spacy using onnx
+ - [TextAttack](https://github.com/QData/TextAttack) - Adversarial attacks, adversarial training, and data augmentation in NLP
+ - [TextBlob](http://textblob.readthedocs.org/) - Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of [Natural Language Toolkit (NLTK)](https://www.nltk.org/) and [Pattern](https://github.com/clips/pattern), and plays nicely with both :+1:
+ - [spaCy](https://github.com/explosion/spaCy) - Industrial strength NLP with Python and Cython :+1:
+ - [StackedCV](https://github.com/mvonombogho/stackedcv) - Resume parsing API that extracts structured JSON from PDF/DOCX files using Claude Haiku.
+ - [Speedster](https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/speedster) - Automatically apply SOTA optimization techniques to achieve the maximum inference speed-up on your hardware
+ - [textacy](https://github.com/chartbeat-labs/textacy) - Higher level NLP built on spaCy
+ - [gensim](https://radimrehurek.com/gensim/index.html) - Python library to conduct unsupervised semantic modelling from plain text :+1:
+ - [scattertext](https://github.com/JasonKessler/scattertext) - Python library to produce d3 visualizations of how language differs between corpora
+ - [GluonNLP](https://github.com/dmlc/gluon-nlp) - A deep learning toolkit for NLP, built on MXNet/Gluon, for research prototyping and industrial deployment of state-of-the-art models on a wide range of NLP tasks.
+ - [AllenNLP](https://github.com/allenai/allennlp) - An NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks.
+ - [PyTorch-NLP](https://github.com/PetrochukM/PyTorch-NLP) - NLP research toolkit designed to support rapid prototyping with better data loaders, word vector loaders, neural network layer representations, common NLP metrics such as BLEU
+ - [Rosetta](https://github.com/columbia-applied-data-science/rosetta) - Text processing tools and wrappers (e.g. Vowpal Wabbit)
+ - [PyNLPl](https://github.com/proycon/pynlpl) - Python Natural Language Processing Library. General purpose NLP library for Python, handles some specific formats like ARPA language models, Moses phrasetables, GIZA++ alignments.
+ - [foliapy](https://github.com/proycon/foliapy) - Python library for working with [FoLiA](https://proycon.github.io/folia/), an XML format for linguistic annotation.
+ - [PySS3](https://github.com/sergioburdisso/pyss3) - Python package that implements a novel white-box machine learning model for text classification, called SS3. Since SS3 has the ability to visually explain its rationale, this package also comes with easy-to-use interactive visualizations tools ([online demos](http://tworld.io/ss3/)).
+ - [jPTDP](https://github.com/datquocnguyen/jPTDP) - A toolkit for joint part-of-speech (POS) tagging and dependency parsing. jPTDP provides pre-trained models for 40+ languages.
+ - [BigARTM](https://github.com/bigartm/bigartm) - a fast library for topic modelling
+ - [Snips NLU](https://github.com/snipsco/snips-nlu) - A production ready library for intent parsing
+ - [Chazutsu](https://github.com/chakki-works/chazutsu) - A library for downloading&parsing standard NLP research datasets
+ - [Word Forms](https://github.com/gutfeeling/word_forms) - Word forms can accurately generate all possible forms of an English word
+ - [Multilingual Latent Dirichlet Allocation (LDA)](https://github.com/ArtificiAI/Multilingual-Latent-Dirichlet-Allocation-LDA) - A multilingual and extensible document clustering pipeline
+ - [Natural Language Toolkit (NLTK)](https://www.nltk.org/) - A library containing a wide variety of NLP functionality, supporting over 50 corpora.
+ - [NLP Architect](https://github.com/NervanaSystems/nlp-architect) - A library for exploring the state-of-the-art deep learning topologies and techniques for NLP and NLU
+ - [Flair](https://github.com/zalandoresearch/flair) - A very simple framework for state-of-the-art multilingual NLP built on PyTorch. Includes BERT, ELMo and Flair embeddings.
+ - [Kashgari](https://github.com/BrikerMan/Kashgari) - Simple, Keras-powered multilingual NLP framework, allows you to build your models in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS) and text classification tasks. Includes BERT and word2vec embedding.
+ - [FARM](https://github.com/deepset-ai/FARM) - Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
+ - [Haystack](https://github.com/deepset-ai/haystack) - End-to-end Python framework for building natural language search interfaces to data. Leverages Transformers and the State-of-the-Art of NLP. Supports DPR, Elasticsearch, HuggingFace's Modelhub, and much more!
+ - [PraisonAI](https://github.com/MervinPraison/PraisonAI) - Multi-AI Agents framework with 100+ LLM support via LiteLLM, MCP integration, agentic workflows, and built-in memory for NLP tasks.
+ - [Rita DSL](https://github.com/zaibacu/rita-dsl) - a DSL, loosely based on [RUTA on Apache UIMA](https://uima.apache.org/ruta.html). Allows to define language patterns (rule-based NLP) which are then translated into [spaCy](https://spacy.io/), or if you prefer less features and lightweight - regex patterns.
+ - [Transformers](https://github.com/huggingface/transformers) - Natural Language Processing for TensorFlow 2.0 and PyTorch.
+ - [Tokenizers](https://github.com/huggingface/tokenizers) - Tokenizers optimized for Research and Production.
+ - [fairSeq](https://github.com/pytorch/fairseq) Facebook AI Research implementations of SOTA seq2seq models in Pytorch.
+ - [corex_topic](https://github.com/gregversteeg/corex_topic) - Hierarchical Topic Modeling with Minimal Domain Knowledge
+ - [Sockeye](https://github.com/awslabs/sockeye) - Neural Machine Translation (NMT) toolkit that powers Amazon Translate.
+ - [DL Translate](https://github.com/xhlulu/dl-translate) - A deep learning-based translation library for 50 languages, built on `transformers` and Facebook's mBART Large.
+ - [Jury](https://github.com/obss/jury) - Evaluation of NLP model outputs offering various automated metrics.
+ - [python-ucto](https://github.com/proycon/python-ucto) - Unicode-aware regular-expression based tokenizer for various languages. Python binding to C++ library, supports [FoLiA format](https://proycon.github.io/folia).
+ - [Pearmut](https://github.com/zouharvi/pearmut) - Human annotation tool for multilingual NLP tasks, such as machine translation.
+
+- **C++** - C++ Libraries | [Back to Top](#contents)
+ - [InsNet](https://github.com/chncwang/InsNet) - A neural network library for building instance-dependent NLP models with padding-free dynamic batching.
+ - [MIT Information Extraction Toolkit](https://github.com/mit-nlp/MITIE) - C, C++, and Python tools for named entity recognition and relation extraction
+ - [CRF++](https://taku910.github.io/crfpp/) - Open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data & other Natural Language Processing tasks.
+ - [CRFsuite](http://www.chokkan.org/software/crfsuite/) - CRFsuite is an implementation of Conditional Random Fields (CRFs) for labeling sequential data.
+ - [BLLIP Parser](https://github.com/BLLIP/bllip-parser) - BLLIP Natural Language Parser (also known as the Charniak-Johnson parser)
+ - [colibri-core](https://github.com/proycon/colibri-core) - C++ library, command line tools, and Python binding for extracting and working with basic linguistic constructions such as n-grams and skipgrams in a quick and memory-efficient way.
+ - [ucto](https://github.com/LanguageMachines/ucto) - Unicode-aware regular-expression based tokenizer for various languages. Tool and C++ library. Supports FoLiA format.
+ - [libfolia](https://github.com/LanguageMachines/libfolia) - C++ library for the [FoLiA format](https://proycon.github.io/folia/)
+ - [frog](https://github.com/LanguageMachines/frog) - Memory-based NLP suite developed for Dutch: PoS tagger, lemmatiser, dependency parser, NER, shallow parser, morphological analyzer.
+ - [MeTA](https://github.com/meta-toolkit/meta) - [MeTA : ModErn Text Analysis](https://meta-toolkit.org/) is a C++ Data Sciences Toolkit that facilitates mining big text data.
+ - [Mecab (Japanese)](https://taku910.github.io/mecab/)
+ - [Moses](http://statmt.org/moses/)
+ - [StarSpace](https://github.com/facebookresearch/StarSpace) - a library from Facebook for creating embeddings of word-level, paragraph-level, document-level and for text classification
+ - [QSMM](http://qsmm.org) - adaptive probabilistic top-down and bottom-up parsers
+
+- **Java** - Java NLP Libraries | [Back to Top](#contents)
+ - [Stanford NLP](https://nlp.stanford.edu/software/index.shtml)
+ - [OpenNLP](https://opennlp.apache.org/)
+ - [NLP4J](https://emorynlp.github.io/nlp4j/)
+ - [Word2vec in Java](https://deeplearning4j.org/docs/latest/deeplearning4j-nlp-word2vec)
+ - [ReVerb](https://github.com/knowitall/reverb/) Web-Scale Open Information Extraction
+ - [OpenRegex](https://github.com/knowitall/openregex) An efficient and flexible token-based regular expression language and engine.
+ - [CogcompNLP](https://github.com/CogComp/cogcomp-nlp) - Core libraries developed in the U of Illinois' Cognitive Computation Group.
+ - [MALLET](http://mallet.cs.umass.edu/) - MAchine Learning for LanguagE Toolkit - package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
+ - [RDRPOSTagger](https://github.com/datquocnguyen/RDRPOSTagger) - A robust POS tagging toolkit available (in both Java & Python) together with pre-trained models for 40+ languages.
+
+- **Kotlin** - Kotlin NLP Libraries | [Back to Top](#contents)
+ - [Lingua](https://github.com/pemistahl/lingua/) A language detection library for Kotlin and Java, suitable for long and short text alike
+ - [Kotidgy](https://github.com/meiblorn/kotidgy) — an index-based text data generator written in Kotlin
+
+- **Scala** - Scala NLP Libraries | [Back to Top](#contents)
+ - [Saul](https://github.com/CogComp/saul) - Library for developing NLP systems, including built in modules like SRL, POS, etc.
+ - [ATR4S](https://github.com/ispras/atr4s) - Toolkit with state-of-the-art [automatic term recognition](https://en.wikipedia.org/wiki/Terminology_extraction) methods.
+ - [tm](https://github.com/ispras/tm) - Implementation of topic modeling based on regularized multilingual [PLSA](https://en.wikipedia.org/wiki/Probabilistic_latent_semantic_analysis).
+ - [word2vec-scala](https://github.com/Refefer/word2vec-scala) - Scala interface to word2vec model; includes operations on vectors like word-distance and word-analogy.
+ - [Epic](https://github.com/dlwh/epic) - Epic is a high performance statistical parser written in Scala, along with a framework for building complex structured prediction models.
+ - [Spark NLP](https://github.com/JohnSnowLabs/spark-nlp) - Spark NLP is a natural language processing library built on top of Apache Spark ML that provides simple, performant & accurate NLP annotations for machine learning pipelines that scale easily in a distributed environment.
+
+- **R** - R NLP Libraries | [Back to Top](#contents)
+ - [text2vec](https://github.com/dselivanov/text2vec) - Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
+ - [wordVectors](https://github.com/bmschmidt/wordVectors) - An R package for creating and exploring word2vec and other word embedding models
+ - [RMallet](https://github.com/mimno/RMallet) - R package to interface with the Java machine learning tool MALLET
+ - [dfr-browser](https://github.com/agoldst/dfr-browser) - Creates d3 visualizations for browsing topic models of text in a web browser.
+ - [dfrtopics](https://github.com/agoldst/dfrtopics) - R package for exploring topic models of text.
+ - [sentiment_classifier](https://github.com/kevincobain2000/sentiment_classifier) - Sentiment Classification using Word Sense Disambiguation and WordNet Reader
+ - [jProcessing](https://github.com/kevincobain2000/jProcessing) - Japanese Natural Langauge Processing Libraries, with Japanese sentiment classification
+ - [corporaexplorer](https://kgjerde.github.io/corporaexplorer/) - An R package for dynamic exploration of text collections
+ - [tidytext](https://github.com/juliasilge/tidytext) - Text mining using tidy tools
+ - [spacyr](https://github.com/quanteda/spacyr) - R wrapper to spaCy NLP
+ - [CRAN Task View: Natural Language Processing](https://github.com/cran-task-views/NaturalLanguageProcessing/)
+
+- **Clojure** | [Back to Top](#contents)
+ - [Clojure-openNLP](https://github.com/dakrone/clojure-opennlp) - Natural Language Processing in Clojure (opennlp)
+ - [Infections-clj](https://github.com/r0man/inflections-clj) - Rails-like inflection library for Clojure and ClojureScript
+ - [postagga](https://github.com/fekr/postagga) - A library to parse natural language in Clojure and ClojureScript
+
+- **Ruby** | [Back to Top](#contents)
+ - Kevin Dias's [A collection of Natural Language Processing (NLP) Ruby libraries, tools and software](https://github.com/diasks2/ruby-nlp)
+ - [Practical Natural Language Processing done in Ruby](https://github.com/arbox/nlp-with-ruby)
+
+- **Rust** | [Back to Top](#contents)
+ - [adk-rust](https://github.com/zavora-ai/adk-rust) - Production-ready AI agent development kit with model-agnostic design (Gemini, OpenAI, Anthropic), multiple agent types, and MCP support
+ - [whatlang](https://github.com/greyblake/whatlang-rs) — Natural language recognition library based on trigrams
+ - [snips-nlu-rs](https://github.com/snipsco/snips-nlu-rs) - A production ready library for intent parsing
+ - [rust-bert](https://github.com/guillaume-be/rust-bert) - Ready-to-use NLP pipelines and Transformer-based models
+
+- **NLP++** - NLP++ Language | [Back to Top](#contents)
+ - [VSCode Language Extension](https://marketplace.visualstudio.com/items?itemName=dehilster.nlp) - NLP++ Language Extension for VSCode
+ - [nlp-engine](https://github.com/VisualText/nlp-engine) - NLP++ engine to run NLP++ code on Linux including a full English parser
+ - [VisualText](http://visualtext.org) - Homepage for the NLP++ Language
+ - [NLP++ Wiki](http://wiki.naturalphilosophy.org/index.php?title=NLP%2B%2B) - Wiki entry for the NLP++ language
+
+- **Julia** | [Back to Top](#contents)
+ - [CorpusLoaders](https://github.com/JuliaText/CorpusLoaders.jl) - A variety of loaders for various NLP corpora
+ - [Languages](https://github.com/JuliaText/Languages.jl) - A package for working with human languages
+ - [TextAnalysis](https://github.com/JuliaText/TextAnalysis.jl) - Julia package for text analysis
+ - [TextModels](https://github.com/JuliaText/TextModels.jl) - Neural Network based models for Natural Language Processing
+ - [WordTokenizers](https://github.com/JuliaText/WordTokenizers.jl) - High performance tokenizers for natural language processing and other related tasks
+ - [Word2Vec](https://github.com/JuliaText/Word2Vec.jl) - Julia interface to word2vec
+
+### Services
+
+NLP as API with higher level functionality such as NER, Topic tagging and so on | [Back to Top](#contents)
+
+- [Vedika API](https://vedika.io) - AI-powered Vedic astrology API with multi-agent swarm intelligence
+- [Wit-ai](https://github.com/wit-ai/wit) - Natural Language Interface for apps and devices
+- [IBM Watson's Natural Language Understanding](https://github.com/watson-developer-cloud/natural-language-understanding-nodejs) - API and Github demo
+- [Amazon Comprehend](https://aws.amazon.com/comprehend/) - NLP and ML suite covers most common tasks like NER, tagging, and sentiment analysis
+- [Google Cloud Natural Language API](https://cloud.google.com/natural-language/) - Syntax Analysis, NER, Sentiment Analysis, and Content tagging in atleast 9 languages include English and Chinese (Simplified and Traditional).
+- [ParallelDots](https://www.paralleldots.com/text-analysis-apis) - High level Text Analysis API Service ranging from Sentiment Analysis to Intent Analysis
+- [Microsoft Cognitive Service](https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/)
+- [TextRazor](https://www.textrazor.com/)
+- [Rosette](https://www.rosette.com/)
+- [StackedCV](https://github.com/mvonombogho/stackedcv) - LLM-powered resume parser API — extracts structured information (experience, education, skills) from PDF/DOCX using Claude Haiku.
+- [Textalytic](https://www.textalytic.com) - Natural Language Processing in the Browser with sentiment analysis, named entity extraction, POS tagging, word frequencies, topic modeling, word clouds, and more
+- [NLP Cloud](https://nlpcloud.io) - SpaCy NLP models (custom and pre-trained ones) served through a RESTful API for named entity recognition (NER), POS tagging, and more.
+- [Cloudmersive](https://cloudmersive.com/nlp-api) - Unified and free NLP APIs that perform actions such as speech tagging, text rephrasing, language translation/detection, and sentence parsing
+
+### Annotation Tools
+
+- [GATE](https://gate.ac.uk/overview.html) - General Architecture and Text Engineering is 15+ years old, free and open source
+- [Anafora](https://github.com/weitechen/anafora) is free and open source, web-based raw text annotation tool
+- [brat](https://brat.nlplab.org/) - brat rapid annotation tool is an online environment for collaborative text annotation
+- [doccano](https://github.com/chakki-works/doccano) - doccano is free, open-source, and provides annotation features for text classification, sequence labeling and sequence to sequence
+- [INCEpTION](https://inception-project.github.io) - A semantic annotation platform offering intelligent assistance and knowledge management
+- [tagtog](https://www.tagtog.net/), team-first web tool to find, create, maintain, and share datasets - costs $
+- [prodigy](https://prodi.gy/) is an annotation tool powered by active learning, costs $
+- [LightTag](https://lighttag.io) - Hosted and managed text annotation tool for teams, costs $
+- [rstWeb](https://corpling.uis.georgetown.edu/rstweb/info/) - open source local or online tool for discourse tree annotations
+- [GitDox](https://corpling.uis.georgetown.edu/gitdox/) - open source server annotation tool with GitHub version control and validation for XML data and collaborative spreadsheet grids
+- [Label Studio](https://www.heartex.ai/) - Hosted and managed text annotation tool for teams, freemium based, costs $
+- [Datasaur](https://datasaur.ai/) support various NLP tasks for individual or teams, freemium based
+- [Konfuzio](https://konfuzio.com/en/) - team-first hosted and on-prem text, image and PDF annotation tool powered by active learning, freemium based, costs $
+- [UBIAI](https://ubiai.tools/) - Easy-to-use text annotation tool for teams with most comprehensive auto-annotation features. Supports NER, relations and document classification as well as OCR annotation for invoice labeling, costs $
+- [Shoonya](https://github.com/AI4Bharat/Shoonya-Backend) - Shoonya is free and open source data annotation platform with wide varities of organization and workspace level management system. Shoonya is data agnostic, can be used by teams to annotate data with various level of verification stages at scale.
+- [Annotation Lab](https://www.johnsnowlabs.com/annotation-lab/) - Free End-to-End No-Code platform for text annotation and DL model training/tuning. Out-of-the-box support for Named Entity Recognition, Classification, Relation extraction and Assertion Status Spark NLP models. Unlimited support for users, teams, projects, documents. Not FOSS.
+- [FLAT](https://github.com/proycon/flat) - FLAT is a web-based linguistic annotation environment based around the [FoLiA format](http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Free and open source.
+
+
+## Techniques
+
+### Text Embeddings
+
+#### Word Embeddings
-- [coconut](https://github.com/evhub/coconut) - A variant of Python built for simple, elegant, Pythonic functional programming.
-- [cytoolz](https://github.com/pytoolz/cytoolz/) - Cython implementation of `Toolz`: High performance functional utilities.
-- [functools](https://docs.python.org/3/library/functools.html) - (Python standard library) Higher-order functions and operations on callable objects.
-- [funcy](https://github.com/Suor/funcy) - A fancy and practical functional tools.
-- [more-itertools](https://github.com/erikrose/more-itertools) - More routines for operating on iterables, beyond `itertools`.
-- [returns](https://github.com/dry-python/returns) - A set of type-safe monads, transformers, and composition utilities.
-- [toolz](https://github.com/pytoolz/toolz) - A collection of functional utilities for iterators, functions, and dictionaries.
-
-## GUI Development
-
-_Libraries for working with graphical user interface applications._
-
-- [curses](https://docs.python.org/3/library/curses.html) - Built-in wrapper for [ncurses](http://www.gnu.org/software/ncurses/) used to create terminal GUI applications.
-- [customtkinter](https://github.com/tomschimansky/customtkinter) - A modern and customizable python UI-library based on Tkinter.
-- [dearpygui](https://github.com/RaylockLLC/DearPyGui/) - A Simple GPU accelerated Python GUI framework
-- [enaml](https://github.com/nucleic/enaml) - Creating beautiful user-interfaces with Declarative Syntax like QML.
-- [flet](https://github.com/flet-dev/flet) - Cross-platform GUI framework for building modern apps in pure Python.
-- [flexx](https://github.com/zoofIO/flexx) - Flexx is a pure Python toolkit for creating GUI's, that uses web technology for its rendering.
-- [gooey](https://github.com/chriskiehl/Gooey) - Turn command line programs into a full GUI application with one line.
-- [kivy](https://github.com/kivy/kivy) - A library for creating NUI applications, running on Windows, Linux, Mac OS X, Android and iOS.
-- [nicegui](https://github.com/zauberzeug/nicegui) - An easy-to-use, Python-based UI framework, which shows up in your web browser.
-- [pyglet](https://github.com/pyglet/pyglet) - A cross-platform windowing and multimedia library for Python.
-- [pygobject](https://github.com/GNOME/pygobject) - Python Bindings for GLib/GObject/GIO/GTK+ (GTK+3).
-- [PyQt](https://www.riverbankcomputing.com/static/Docs/PyQt6/) - Python bindings for the [Qt](https://www.qt.io/) cross-platform application and UI framework.
-- [pyside](https://github.com/pyside/pyside-setup) - Qt for Python offers the official Python bindings for [Qt](https://www.qt.io/), this is same as PyQt but it's the official binding with different licensing.
-- [pywebview](https://github.com/r0x0r/pywebview/) - A lightweight cross-platform native wrapper around a webview component.
-- [tkinter](https://docs.python.org/3/library/tkinter.html) - (Python standard library) The standard Python interface to the Tcl/Tk GUI toolkit.
-- [toga](https://github.com/pybee/toga) - A Python native, OS native GUI toolkit.
-- [urwid](https://github.com/urwid/urwid) - A library for creating terminal GUI applications with strong support for widgets, events, rich colors, etc.
-- [wxPython](https://github.com/wxWidgets/Phoenix) - A blending of the wxWidgets C++ class library with the Python.
-
-## GraphQL
-
-_Libraries for working with GraphQL._
-
-- [graphene](https://github.com/graphql-python/graphene/) - GraphQL framework for Python.
-- [strawberry-django](https://github.com/strawberry-graphql/strawberry-django) - Strawberry GraphQL integration with Django.
-
-## Game Development
-
-_Awesome game development libraries._
-
-- [arcade](https://github.com/pythonarcade/arcade) - Arcade is a modern Python framework for crafting games with compelling graphics and sound.
-- [cocos](https://github.com/los-cocos/cocos) - A framework for building 2D games, demos, and other graphical/interactive applications.
-- [harfang3d](https://github.com/harfang3d/harfang3d) - Python framework for 3D, VR and game development.
-- [panda3d](https://github.com/panda3d/panda3d) - 3D game engine developed by Disney.
-- [pygame](https://github.com/pygame/pygame) - Pygame is a set of Python modules designed for writing games.
-- [pyopengl](https://github.com/mcfletch/pyopengl) - Python ctypes bindings for OpenGL and it's related APIs.
-- [py-sdl2](https://github.com/py-sdl/py-sdl2) - A ctypes based wrapper for the SDL2 library.
-- [renpy](https://github.com/renpy/renpy) - A Visual Novel engine.
-
-## Geolocation
-
-_Libraries for geocoding addresses and working with latitudes and longitudes._
-
-- [django-countries](https://github.com/SmileyChris/django-countries) - A Django app that provides a country field for models and forms.
-- [geodjango](https://docs.djangoproject.com/en/dev/ref/contrib/gis/) - A world-class geographic web framework.
-- [geojson](https://github.com/jazzband/geojson) - Python bindings and utilities for GeoJSON.
-- [geopandas](https://github.com/geopandas/geopandas) - Python tools for geographic data (GeoSeries/GeoDataFrame) built on pandas.
-- [geopy](https://github.com/geopy/geopy) - Python Geocoding Toolbox.
+- Thumb Rule: **fastText >> GloVe > word2vec**
-## HTML Manipulation
+- [word2vec](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) - [implementation](https://code.google.com/archive/p/word2vec/) - [explainer blog](http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/)
+- [glove](https://nlp.stanford.edu/pubs/glove.pdf) - [explainer blog](https://blog.acolyer.org/2016/04/22/glove-global-vectors-for-word-representation/)
+- fasttext - [implementation](https://github.com/facebookresearch/fastText) - [paper](https://arxiv.org/abs/1607.04606) - [explainer blog](https://towardsdatascience.com/fasttext-under-the-hood-11efc57b2b3)
-_Libraries for working with HTML and XML._
+#### Sentence and Language Model Based Word Embeddings
-- [beautifulsoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) - Providing Pythonic idioms for iterating, searching, and modifying HTML or XML.
-- [cssutils](https://github.com/jaraco/cssutils) - A CSS library for Python.
-- [html5lib](https://github.com/html5lib/html5lib-python) - A standards-compliant library for parsing and serializing HTML documents and fragments.
-- [justhtml](https://github.com/EmilStenstrom/justhtml/) - A pure Python HTML5 parser that just works.
-- [lxml](https://github.com/lxml/lxml) - A very fast, easy-to-use and versatile library for handling HTML and XML.
-- [markupsafe](https://github.com/pallets/markupsafe) - Implements a XML/HTML/XHTML Markup safe string for Python.
-- [pyquery](https://github.com/gawel/pyquery) - A jQuery-like library for parsing HTML.
-- [untangle](https://github.com/stchris/untangle) - Converts XML documents to Python objects for easy access.
-- [weasyprint](https://github.com/Kozea/WeasyPrint) - A visual rendering engine for HTML and CSS that can export to PDF.
-- [xmldataset](https://github.com/spurin/xmldataset) - Simple XML Parsing.
-- [xmltodict](https://github.com/martinblech/xmltodict) - Working with XML feel like you are working with JSON.
+[Back to Top](#contents)
-## HTTP Clients
+- ElMo - [Deep Contextualized Word Representations](https://arxiv.org/abs/1802.05365) - [PyTorch implmentation](https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md) - [TF Implementation](https://github.com/allenai/bilm-tf)
+- ULMFiT - [Universal Language Model Fine-tuning for Text Classification](https://arxiv.org/abs/1801.06146) by Jeremy Howard and Sebastian Ruder
+- InferSent - [Supervised Learning of Universal Sentence Representations from Natural Language Inference Data](https://arxiv.org/abs/1705.02364) by facebook
+- CoVe - [Learned in Translation: Contextualized Word Vectors](https://arxiv.org/abs/1708.00107)
+- Pargraph vectors - from [Distributed Representations of Sentences and Documents](https://cs.stanford.edu/~quocle/paragraph_vector.pdf). See [doc2vec tutorial at gensim](https://rare-technologies.com/doc2vec-tutorial/)
+- [sense2vec](https://arxiv.org/abs/1511.06388) - on word sense disambiguation
+- [Skip Thought Vectors](https://arxiv.org/abs/1506.06726) - word representation method
+- [Adaptive skip-gram](https://arxiv.org/abs/1502.07257) - similar approach, with adaptive properties
+- [Sequence to Sequence Learning](https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf) - word vectors for machine translation
-_Libraries for working with HTTP._
+### Question Answering and Knowledge Extraction
-- [httpx](https://github.com/encode/httpx) - A next generation HTTP client for Python.
-- [requests](https://github.com/psf/requests) - HTTP Requests for Humans.
-- [treq](https://github.com/twisted/treq) - Python requests like API built on top of Twisted's HTTP client.
-- [urllib3](https://github.com/urllib3/urllib3) - A HTTP library with thread-safe connection pooling, file post support, sanity friendly.
+[Back to Top](#contents)
-## Hardware
-
-_Libraries for programming with hardware._
-
-- [bleak](https://github.com/hbldh/bleak) - A cross platform Bluetooth Low Energy Client for Python using asyncio.
-- [mouse](https://github.com/boppreh/mouse) - Hook and simulate global mouse events on Windows and Linux.
-- [pynput](https://github.com/moses-palmer/pynput) - A library to control and monitor input devices.
-- [scapy](https://github.com/secdev/scapy) - A brilliant packet manipulation library.
+- [DrQA](https://github.com/facebookresearch/DrQA) - Open Domain Question Answering work by Facebook Research on Wikipedia data
+- [Document-QA](https://github.com/allenai/document-qa) - Simple and Effective Multi-Paragraph Reading Comprehension by AllenAI
+- [Template-Based Information Extraction without the Templates](https://www.usna.edu/Users/cs/nchamber/pubs/acl2011-chambers-templates.pdf)
+- [Privee: An Architecture for Automatically Analyzing Web Privacy Policies](https://www.sebastianzimmeck.de/zimmeckAndBellovin2014Privee.pdf)
-## Image Processing
+## Datasets
-_Libraries for manipulating images._
+[Back to Top](#contents)
-- [pillow](https://github.com/python-pillow/Pillow) - Pillow is the friendly [PIL](http://www.pythonware.com/products/pil/) fork.
-- [pymatting](https://github.com/pymatting/pymatting) - A library for alpha matting.
-- [python-barcode](https://github.com/WhyNotHugo/python-barcode) - Create barcodes in Python with no extra dependencies.
-- [python-qrcode](https://github.com/lincolnloop/python-qrcode) - A pure Python QR Code generator.
-- [pyvips](https://github.com/libvips/pyvips) - A fast image processing library with low memory needs.
-- [quads](https://github.com/fogleman/Quads) - Computer art based on quadtrees.
-- [scikit-image](https://github.com/scikit-image/scikit-image) - A Python library for (scientific) image processing.
-- [thumbor](https://github.com/thumbor/thumbor) - A smart imaging service. It enables on-demand crop, re-sizing and flipping of images.
-- [wand](https://github.com/emcconville/wand) - Python bindings for [MagickWand](http://www.imagemagick.org/script/magick-wand.php), C API for ImageMagick.
+- [nlp-datasets](https://github.com/niderhoff/nlp-datasets) great collection of nlp datasets
+- [gensim-data](https://github.com/RaRe-Technologies/gensim-data) - Data repository for pretrained NLP models and NLP corpora.
+- [tiny_qa_benchmark_pp](https://github.com/vincentkoc/tiny_qa_benchmark_pp/) - Repository of tiny NLP multi-lingual QA datasets and library to generate your own synthetic copies.
-## Implementations
-
-_Implementations of Python._
-
-- [clpython](https://github.com/metawilm/cl-python) - Implementation of the Python programming language written in Common Lisp.
-- [cpython](https://github.com/python/cpython) - Default, most widely used implementation of the Python programming language written in C.
-- [cython](https://github.com/cython/cython) - Optimizing Static Compiler for Python.
-- [ironpython](https://github.com/IronLanguages/ironpython3) - Implementation of the Python programming language written in C#.
-- [micropython](https://github.com/micropython/micropython) - A lean and efficient Python programming language implementation.
-- [numba](https://github.com/numba/numba) - Python JIT compiler to LLVM aimed at scientific Python.
-- [PeachPy](https://github.com/Maratyszcza/PeachPy) - x86-64 assembler embedded in Python.
-- [pypy](https://github.com/pypy/pypy) - A very fast and compliant implementation of the Python language.
-- [pyston](https://github.com/pyston/pyston/) - A Python implementation using JIT techniques.
+## Multilingual NLP Frameworks
-## Interactive Interpreter
+[Back to Top](#contents)
-_Interactive Python interpreters (REPL)._
+- [UDPipe](https://github.com/ufal/udpipe) is a trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files. Primarily written in C++, offers a fast and reliable solution for multilingual NLP processing.
+- [NLP-Cube](https://github.com/adobe/NLP-Cube) : Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing. New platform, written in Python with Dynet 2.0. Offers standalone (CLI/Python bindings) and server functionality (REST API).
+- [UralicNLP](https://github.com/mikahama/uralicNLP) is an NLP library mostly for many endangered Uralic languages such as Sami languages, Mordvin languages, Mari languages, Komi languages and so on. Also some non-endangered languages are supported such as Finnish together with non-Uralic languages such as Swedish and Arabic. UralicNLP can do morphological analysis, generation, lemmatization and disambiguation.
-- [bpython](https://github.com/bpython/bpython) - A fancy interface to the Python interpreter.
-- [jupyter](https://github.com/jupyter/notebook) - A rich toolkit to help you make the most out of using Python interactively.
- - [awesome-jupyter](https://github.com/markusschanta/awesome-jupyter)
-- [marimo](https://github.com/marimo-team/marimo) - Transform data and train models, feels like a next-gen notebook, stored as Git-friendly Python.
-- [ptpython](https://github.com/jonathanslenders/ptpython) - Advanced Python REPL built on top of the [python-prompt-toolkit](https://github.com/jonathanslenders/python-prompt-toolkit).
+## NLP in Korean
-## Internationalization
+[Back to Top](#contents)
-_Libraries for working with i18n._
+### Libraries
-- [babel](https://github.com/python-babel/babel) - An internationalization library for Python.
+- [KoNLPy](http://konlpy.org) - Python package for Korean natural language processing.
+- [Mecab (Korean)](https://eunjeon.blogspot.com/) - C++ library for Korean NLP
+- [KoalaNLP](https://koalanlp.github.io/koalanlp/) - Scala library for Korean Natural Language Processing.
+- [KoNLP](https://cran.r-project.org/package=KoNLP) - R package for Korean Natural language processing
-## Job Scheduler
-
-_Libraries for scheduling jobs._
-
-- [airflow](https://github.com/apache/airflow) - Airflow is a platform to programmatically author, schedule and monitor workflows.
-- [apscheduler](https://github.com/agronholm/apscheduler) - A light but powerful in-process task scheduler that lets you schedule functions.
-- [django-schedule](https://github.com/thauber/django-schedule) - A calendaring app for Django.
-- [doit](https://github.com/pydoit/doit) - A task runner and build tool.
-- [joblib](https://github.com/joblib/joblib) - A set of tools to provide lightweight pipelining in Python.
-- [plan](https://github.com/fengsp/plan) - Writing crontab file in Python like a charm.
-- [prefect](https://github.com/PrefectHQ/prefect) - A modern workflow orchestration framework that makes it easy to build, schedule and monitor robust data pipelines.
-- [schedule](https://github.com/dbader/schedule) - Python job scheduling for humans.
-- [SpiffWorkflow](https://github.com/knipknap/SpiffWorkflow) - A powerful workflow engine implemented in pure Python.
-- [taskflow](https://docs.openstack.org/developer/taskflow/) - A Python library that helps to make task execution easy, consistent and reliable.
-
-## Logging
-
-_Libraries for generating and working with logs._
+### Blogs and Tutorials
-- [logbook](https://github.com/getlogbook/logbook) - Logging replacement for Python.
-- [logging](https://docs.python.org/3/library/logging.html) - (Python standard library) Logging facility for Python.
-- [loguru](https://github.com/Delgan/loguru) - Library which aims to bring enjoyable logging in Python.
-- [sentry-python](https://github.com/getsentry/sentry-python) - Sentry SDK for Python.
-- [structlog](https://github.com/hynek/structlog) - Structured logging made easy.
+- [dsindex's blog](https://dsindex.github.io/)
+- [Kangwon University's NLP course in Korean](http://cs.kangwon.ac.kr/~leeck/NLP/)
-## Machine Learning
+### Datasets
-_Libraries for Machine Learning. Also see [awesome-machine-learning](https://github.com/josephmisiti/awesome-machine-learning#python)._
+- [KAIST Corpus](http://semanticweb.kaist.ac.kr/home/index.php/KAIST_Corpus) - A corpus from the Korea Advanced Institute of Science and Technology in Korean.
+- [Naver Sentiment Movie Corpus in Korean](https://github.com/e9t/nsmc/)
+- [Chosun Ilbo archive](http://srchdb1.chosun.com/pdf/i_archive/) - dataset in Korean from one of the major newspapers in South Korea, the Chosun Ilbo.
+- [Chat data](https://github.com/songys/Chatbot_data) - Chatbot data in Korean
+- [Petitions](https://github.com/akngs/petitions) - Collect expired petition data from the Blue House National Petition Site.
+- [Korean Parallel corpora](https://github.com/j-min/korean-parallel-corpora) - Neural Machine Translation(NMT) Dataset for **Korean to French** & **Korean to English**
+- [KorQuAD](https://korquad.github.io/) - Korean SQuAD dataset with Wiki HTML source. Mentions both v1.0 and v2.1 at the time of adding to Awesome NLP
-- [diffusers](https://github.com/huggingface/diffusers) - A library that provides pretrained diffusion models for generating and editing images, audio, and video.
-- [feature_engine](https://github.com/feature-engine/feature_engine) - sklearn compatible API with the widest toolset for feature engineering and selection.
-- [gym](https://github.com/openai/gym) - A toolkit for developing and comparing reinforcement learning algorithms.
-- [h2o](https://github.com/h2oai/h2o-3) - Open Source Fast Scalable Machine Learning Platform.
-- [metrics](https://github.com/benhamner/Metrics) - Machine learning evaluation metrics.
-- [mindsdb](https://github.com/mindsdb/mindsdb) - MindsDB is an open source AI layer for existing databases that allows you to effortlessly develop, train and deploy state-of-the-art machine learning models using standard queries.
-- [pgmpy](https://github.com/pgmpy/pgmpy) - A Python library for probabilistic graphical models and Bayesian networks.
-- [rasa](https://github.com/RasaHQ/rasa) - An open-source machine learning framework for automated text and voice-based conversations.
-- [scikit-learn](https://github.com/scikit-learn/scikit-learn) - The most popular Python library for Machine Learning with extensive documentation and community support.
-- [spark.ml](http://spark.apache.org/docs/latest/ml-guide.html) - [Apache Spark](http://spark.apache.org/)'s scalable Machine Learning library for distributed computing.
-- [transformers](https://github.com/huggingface/transformers) - A framework that lets you easily use pretrained transformer models for NLP, vision, and audio tasks.
-- [xgboost](https://github.com/dmlc/xgboost) - A scalable, portable, and distributed gradient boosting library.
+## NLP in Arabic
-## Microsoft Windows
+[Back to Top](#contents)
-_Python programming on Microsoft Windows._
+### Libraries
-- [pythonnet](https://github.com/pythonnet/pythonnet) - Python Integration with the .NET Common Language Runtime (CLR).
-- [pywin32](https://github.com/mhammond/pywin32) - Python Extensions for Windows.
-- [winpython](https://github.com/winpython/winpython) - Portable development environment for Windows 10/11.
+- [goarabic](https://github.com/01walid/goarabic) - Go package for Arabic text processing
+- [jsastem](https://github.com/ejtaal/jsastem) - Javascript for Arabic stemming
+- [PyArabic](https://pypi.org/project/PyArabic/) - Python libraries for Arabic
+- [RFTokenizer](https://github.com/amir-zeldes/RFTokenizer) - trainable Python segmenter for Arabic, Hebrew and Coptic
-## Miscellaneous
+### Datasets
-_Useful libraries or tools that don't fit in the categories above._
+- [Multidomain Datasets](https://github.com/hadyelsahar/large-arabic-sentiment-analysis-resouces) - Largest Available Multi-Domain Resources for Arabic Sentiment Analysis
+- [LABR](https://github.com/mohamedadaly/labr) - LArge Arabic Book Reviews dataset
+- [Arabic Stopwords](https://github.com/mohataher/arabic-stop-words) - A list of Arabic stopwords from various resources
-- [blinker](https://github.com/jek/blinker) - A fast Python in-process signal/event dispatching system.
-- [boltons](https://github.com/mahmoud/boltons) - A set of pure-Python utilities.
-- [itsdangerous](https://github.com/pallets/itsdangerous) - Various helpers to pass trusted data to untrusted environments.
-- [pluginbase](https://github.com/mitsuhiko/pluginbase) - A simple but flexible plugin system for Python.
-- [tryton](https://github.com/tryton/tryton) - A general-purpose business framework.
+## NLP in Chinese
-## Natural Language Processing
+[Back to Top](#contents)
-_Libraries for working with human languages._
+### Libraries
-- General
- - [gensim](https://github.com/RaRe-Technologies/gensim) - Topic Modeling for Humans.
- - [langid.py](https://github.com/saffsd/langid.py) - Stand-alone language identification system.
- - [nltk](https://github.com/nltk/nltk) - A leading platform for building Python programs to work with human language data.
- - [pattern](https://github.com/clips/pattern) - A web mining module.
- - [polyglot](https://github.com/aboSamoor/polyglot) - Natural language pipeline supporting hundreds of languages.
- - [spacy](https://github.com/explosion/spaCy) - A library for industrial-strength natural language processing in Python and Cython.
- - [stanza](https://github.com/stanfordnlp/stanza) - The Stanford NLP Group's official Python library, supporting 60+ languages.
-- Chinese
- - [funnlp](https://github.com/fighting41love/funNLP) - A collection of tools and datasets for Chinese NLP.
- - [jieba](https://github.com/fxsjy/jieba) - The most popular Chinese text segmentation library.
- - [pkuseg-python](https://github.com/lancopku/pkuseg-python) - A toolkit for Chinese word segmentation in various domains.
- - [snownlp](https://github.com/isnowfy/snownlp) - A library for processing Chinese text.
+- [jieba](https://github.com/fxsjy/jieba#jieba-1) - Python package for Words Segmentation Utilities in Chinese
+- [SnowNLP](https://github.com/isnowfy/snownlp) - Python package for Chinese NLP
+- [FudanNLP](https://github.com/FudanNLP/fnlp) - Java library for Chinese text processing
+- [HanLP](https://github.com/hankcs/HanLP) - The multilingual NLP library
-## Network Virtualization
+### Anthology
+- [funNLP](https://github.com/fighting41love/funNLP) - Collection of NLP tools and resources mainly for Chinese
-_Tools and libraries for Virtual Networking and SDN (Software Defined Networking)._
+## NLP in German
-- [mininet](https://github.com/mininet/mininet) - A popular network emulator and API written in Python.
-- [napalm](https://github.com/napalm-automation/napalm) - Cross-vendor API to manipulate network devices.
-- [pox](https://github.com/noxrepo/pox) - A Python-based SDN control applications, such as OpenFlow SDN controllers.
+- [German-NLP](https://github.com/adbar/German-NLP) - Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
-## ORM
+## NLP in Polish
-_Libraries that implement Object-Relational Mapping or data mapping techniques._
+- [Polish-NLP](https://github.com/ksopyla/awesome-nlp-polish) - A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models, tools, datasets.
-- Relational Databases
- - [django.db.models](https://docs.djangoproject.com/en/dev/topics/db/models/) - The Django ORM.
- - [sqlalchemy](https://github.com/sqlalchemy/sqlalchemy) - The Python SQL Toolkit and Object Relational Mapper.
- - [awesome-sqlalchemy](https://github.com/dahlia/awesome-sqlalchemy)
- - [dataset](https://github.com/pudo/dataset) - Store Python dicts in a database - works with SQLite, MySQL, and PostgreSQL.
- - [peewee](https://github.com/coleifer/peewee) - A small, expressive ORM.
- - [pony](https://github.com/ponyorm/pony/) - ORM that provides a generator-oriented interface to SQL.
- - [pydal](https://github.com/web2py/pydal/) - A pure Python Database Abstraction Layer.
- - [sqlmodel](https://github.com/fastapi/sqlmodel) - SQLModel is based on Python type annotations, and powered by Pydantic and SQLAlchemy.
-- NoSQL Databases
- - [beanie](https://github.com/BeanieODM/beanie) - An asynchronous Python object-document mapper (ODM) for MongoDB.
- - [mongoengine](https://github.com/MongoEngine/mongoengine) - A Python Object-Document-Mapper for working with MongoDB.
- - [odmantic](https://github.com/art049/odmantic) - Sync and Async ODM for MongoDB built on top of Pydantic for model definition and validation.
- - [pynamodb](https://github.com/pynamodb/PynamoDB) - A Pythonic interface for [Amazon DynamoDB](https://aws.amazon.com/dynamodb/).
+## NLP in Spanish
-## Package Management
+[Back to Top](#contents)
-_Libraries for package and dependency management._
+### Libraries
-- [pip](https://github.com/pypa/pip) - The package installer for Python.
- - [pip-tools](https://github.com/jazzband/pip-tools) - A set of tools to keep your pinned Python dependencies fresh.
-- [conda](https://github.com/conda/conda/) - Cross-platform, Python-agnostic binary package manager.
-- [hatch](https://github.com/pypa/hatch) - Modern, extensible Python project management.
-- [poetry](https://github.com/sdispater/poetry) - Python dependency management and packaging made easy.
-- [uv](https://github.com/astral-sh/uv) - An extremely fast Python package and project manager, written in Rust.
+- [spanlp](https://github.com/jfreddypuentes/spanlp) - Python library to detect, censor and clean profanity, vulgarities, hateful words, racism, xenophobia and bullying in texts written in Spanish. It contains data of 21 Spanish-speaking countries.
-## Package Repositories
+### Data
-_Local PyPI repository server and proxies._
+- [Columbian Political Speeches](https://github.com/dav009/LatinamericanTextResources)
+- [Copenhagen Treebank](https://mbkromann.github.io/copenhagen-dependency-treebank/)
+- [Spanish Billion words corpus with Word2Vec embeddings](https://github.com/crscardellino/sbwce)
+- [Compilation of Spanish Unannotated Corpora](https://github.com/josecannete/spanish-unannotated-corpora)
-- [bandersnatch](https://github.com/pypa/bandersnatch/) - PyPI mirroring tool provided by Python Packaging Authority (PyPA).
-- [devpi](https://github.com/devpi/devpi) - PyPI server and packaging/testing/release tool.
-- [warehouse](https://github.com/pypa/warehouse) - Next generation Python Package Repository (PyPI).
+### Word and Sentence Embeddings
+- [Spanish Word Embeddings Computed with Different Methods and from Different Corpora](https://github.com/dccuchile/spanish-word-embeddings)
+- [Spanish Word Embeddings Computed from Large Corpora and Different Sizes Using fastText](https://github.com/BotCenter/spanishWordEmbeddings)
+- [Spanish Sentence Embeddings Computed from Large Corpora Using sent2vec](https://github.com/BotCenter/spanishSent2Vec)
+- [Beto - BERT for Spanish](https://github.com/dccuchile/beto)
-## Penetration Testing
-_Frameworks and tools for penetration testing._
+## NLP in Indic languages
-- [fsociety](https://github.com/Manisso/fsociety) - A Penetration testing framework.
-- [setoolkit](https://github.com/trustedsec/social-engineer-toolkit) - A toolkit for social engineering.
-- [sherlock](https://github.com/sherlock-project/sherlock) - Hunt down social media accounts by username across social networks.
-- [sqlmap](https://github.com/sqlmapproject/sqlmap) - Automatic SQL injection and database takeover tool.
+[Back to Top](#contents)
-## Permissions
+### Data, Corpora and Treebanks
-_Libraries that allow or deny users access to data or functionality._
+- [Hindi Dependency Treebank](https://ltrc.iiit.ac.in/treebank_H2014/) - A multi-representational multi-layered treebank for Hindi and Urdu
+- [Universal Dependencies Treebank in Hindi](https://universaldependencies.org/treebanks/hi_hdtb/index.html)
+ - [Parallel Universal Dependencies Treebank in Hindi](http://universaldependencies.org/treebanks/hi_pud/index.html) - A smaller part of the above-mentioned treebank.
+- [ISI FIRE Stopwords List (Hindi and Bangla)](https://www.isical.ac.in/~fire/data/)
+- [Peter Graham's Stopwords List](https://github.com/6/stopwords-json)
+- [NLTK Corpus](https://www.nltk.org/book/ch02.html) 60k Words POS Tagged, Bangla, Hindi, Marathi, Telugu
+- [Hindi Movie Reviews Dataset](https://github.com/goru001/nlp-for-hindi) ~1k Samples, 3 polarity classes
+- [BBC News Hindi Dataset](https://github.com/NirantK/hindi2vec/releases/tag/bbc-hindi-v0.1) 4.3k Samples, 14 classes
+- [IIT Patna Hindi ABSA Dataset](https://github.com/pnisarg/ABSA) 5.4k Samples, 12 Domains, 4k aspect terms, aspect and sentence level polarity in 4 classes
+- [Bangla ABSA](https://github.com/AtikRahman/Bangla_Datasets_ABSA) 5.5k Samples, 2 Domains, 10 aspect terms
+- [IIT Patna Movie Review Sentiment Dataset](https://www.iitp.ac.in/~ai-nlp-ml/resources.html) 2k Samples, 3 polarity labels
-- [django-guardian](https://github.com/django-guardian/django-guardian) - Implementation of per object permissions for Django 1.2+
-- [django-rules](https://github.com/dfunckt/django-rules) - A tiny but powerful app providing object-level permissions to Django, without requiring a database.
+#### Corpora/Datasets that need a login/access can be gained via email
-## Processes
+- [SAIL 2015](http://amitavadas.com/SAIL/) Twitter and Facebook labelled sentiment samples in Hindi, Bengali, Tamil, Telugu.
+- [IIT Bombay NLP Resources](http://www.cfilt.iitb.ac.in/Sentiment_Analysis_Resources.html) Sentiwordnet, Movie and Tourism parallel labelled corpora, polarity labelled sense annotated corpus, Marathi polarity labelled corpus.
+- [TDIL-IC aggregates a lot of useful resources and provides access to otherwise gated datasets](https://tdil-dc.in/index.php?option=com_catalogue&task=viewTools&id=83&lang=en)
-_Libraries for starting and communicating with OS processes._
+### Language Models and Word Embeddings
-- [delegator.py](https://github.com/amitt001/delegator.py) - [Subprocesses](https://docs.python.org/3/library/subprocess.html) for Humans 2.0.
-- [sarge](https://github.com/vsajip/sarge) - Yet another wrapper for subprocess.
-- [sh](https://github.com/amoffat/sh) - A full-fledged subprocess replacement for Python.
+- [Hindi2Vec](https://nirantk.com/hindi2vec/) and [nlp-for-hindi](https://github.com/goru001/nlp-for-hindi) ULMFIT style languge model
+- [IIT Patna Bilingual Word Embeddings Hi-En](https://www.iitp.ac.in/~ai-nlp-ml/resources.html)
+- [Fasttext word embeddings in a whole bunch of languages, trained on Common Crawl](https://fasttext.cc/docs/en/crawl-vectors.html)
+- [Hindi and Bengali Word2Vec](https://github.com/Kyubyong/wordvectors)
+- [Hindi and Urdu Elmo Model](https://github.com/HIT-SCIR/ELMoForManyLangs)
+- [Sanskrit Albert](https://huggingface.co/surajp/albert-base-sanskrit) Trained on Sanskrit Wikipedia and OSCAR corpus
-## Quantum Computing
+### Libraries and Tooling
-_Libraries for quantum computing._
+- [Multi-Task Deep Morphological Analyzer](https://github.com/Saurav0074/mt-dma) Deep Network based Morphological Parser for Hindi and Urdu
+- [Anoop Kunchukuttan](https://github.com/anoopkunchukuttan/indic_nlp_library) 18 Languages, whole host of features from tokenization to translation
+- [SivaReddy's Dependency Parser](http://sivareddy.in/downloads) Dependency Parser and Pos Tagger for Kannada, Hindi and Telugu. [Python3 Port](https://github.com/CalmDownKarm/sivareddydependencyparser)
+- [iNLTK](https://github.com/goru001/inltk) - A Natural Language Toolkit for Indic Languages (Indian subcontinent languages) built on top of Pytorch/Fastai, which aims to provide out of the box support for common NLP tasks.
-- [Cirq](https://github.com/quantumlib/Cirq) — A Google-developed framework focused on hardware-aware quantum circuit design for NISQ devices.
-- [pennylane](https://github.com/PennyLaneAI/pennylane) — A hybrid quantum-classical machine learning library with automatic differentiation support.
-- [qiskit](https://github.com/Qiskit/qiskit) — An IBM-backed quantum SDK for building, simulating, and running circuits on real quantum hardware.
-- [qutip](https://github.com/qutip/qutip) - Quantum Toolbox in Python.
+## NLP in Thai
-## Recommender Systems
+[Back to Top](#contents)
-_Libraries for building recommender systems._
+### Libraries
-- [annoy](https://github.com/spotify/annoy) - Approximate Nearest Neighbors in C++/Python optimized for memory usage.
-- [fastFM](https://github.com/ibayer/fastFM) - A library for Factorization Machines.
-- [implicit](https://github.com/benfred/implicit) - A fast Python implementation of collaborative filtering for implicit datasets.
-- [lightfm](https://github.com/lyst/lightfm) - A Python implementation of a number of popular recommendation algorithms.
-- [spotlight](https://github.com/maciejkula/spotlight) - Deep recommender models using PyTorch.
-- [scikit-surprise](https://github.com/NicolasHug/Surprise) - A scikit for building and analyzing recommender systems.
-- [tensorrec](https://github.com/jfkirk/tensorrec) - A Recommendation Engine Framework in TensorFlow.
+- [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp) - Thai NLP in Python Package
+- [JTCC](https://github.com/wittawatj/jtcc) - A character cluster library in Java
+- [CutKum](https://github.com/pucktada/cutkum) - Word segmentation with deep learning in TensorFlow
+- [Thai Language Toolkit](https://pypi.python.org/pypi/tltk/) - Based on a paper by Wirote Aroonmanakun in 2002 with included dataset
+- [SynThai](https://github.com/KenjiroAI/SynThai) - Word segmentation and POS tagging using deep learning in Python
-## Refactoring
+### Data
-_Refactoring tools and libraries for Python._
+- [Inter-BEST](https://www.nectec.or.th/corpus/index.php?league=pm) - A text corpus with 5 million words with word segmentation
+- [Prime Minister 29](https://github.com/PyThaiNLP/lexicon-thai/tree/master/thai-corpus/Prime%20Minister%2029) - Dataset containing speeches of the current Prime Minister of Thailand
-- [bowler](https://github.com/facebookincubator/Bowler) - Safe code refactoring for modern Python.
-- [rope](https://github.com/python-rope/rope) - Rope is a python refactoring library.
+## NLP in Danish
-## RESTful API
+- [Named Entity Recognition for Danish](https://github.com/ITUnlp/daner)
+- [DaNLP](https://github.com/alexandrainst/danlp) - NLP resources in Danish
+- [Awesome Danish](https://github.com/fnielsen/awesome-danish) - A curated list of awesome resources for Danish language technology
-_Libraries for building RESTful APIs._
+## NLP in Vietnamese
-- Django
- - [django-rest-framework](https://github.com/encode/django-rest-framework) - A powerful and flexible toolkit to build web APIs.
- - [django-tastypie](https://github.com/django-tastypie/django-tastypie) - Creating delicious APIs for Django apps.
-- Flask
- - [eve](https://github.com/pyeve/eve) - REST API framework powered by Flask, MongoDB and good intentions.
- - [flask-api](https://github.com/flask-api/flask-api) - Browsable Web APIs for Flask.
- - [flask-restful](https://github.com/flask-restful/flask-restful) - Quickly building REST APIs for Flask.
-- Pyramid
- - [cornice](https://github.com/Cornices/cornice) - A RESTful framework for Pyramid.
-- Framework agnostic
- - [falcon](https://github.com/falconry/falcon) - A high-performance framework for building cloud APIs and web app backends.
- - [fastapi](https://github.com/tiangolo/fastapi) - A modern, fast, web framework for building APIs with Python 3.6+ based on standard Python type hints.
- - [hug](https://github.com/hugapi/hug) - A Python 3 framework for cleanly exposing APIs.
- - [sandman2](https://github.com/jeffknupp/sandman2) - Automated REST APIs for existing database-driven systems.
- - [sanic](https://github.com/sanic-org/sanic) - A Python 3.6+ web server and web framework that's written to go fast.
+### Libraries
-## Robotics
+- [underthesea](https://github.com/undertheseanlp/underthesea) - Vietnamese NLP Toolkit
+- [vn.vitk](https://github.com/phuonglh/vn.vitk) - A Vietnamese Text Processing Toolkit
+- [VnCoreNLP](https://github.com/vncorenlp/VnCoreNLP) - A Vietnamese natural language processing toolkit
+- [PhoBERT](https://github.com/VinAIResearch/PhoBERT) - Pre-trained language models for Vietnamese
+- [pyvi](https://github.com/trungtv/pyvi) - Python Vietnamese Core NLP Toolkit
+- [VieNeu-TTS](https://github.com/pnnbao97/VieNeu-TTS) - An Advanced On-Device Vietnamese Text-to-Speech System With Instant Voice Cloning.
-_Libraries for robotics._
+### Data
-- [PythonRobotics](https://github.com/AtsushiSakai/PythonRobotics) - This is a compilation of various robotics algorithms with visualizations.
+- [Vietnamese treebank](https://vlsp.hpda.vn/demo/?page=resources&lang=en) - 10,000 sentences for the constituency parsing task
+- [BKTreeBank](https://arxiv.org/pdf/1710.05519.pdf) - a Vietnamese Dependency Treebank
+- [UD_Vietnamese](https://github.com/UniversalDependencies/UD_Vietnamese-VTB) - Vietnamese Universal Dependency Treebank
+- [VIVOS](https://ailab.hcmus.edu.vn/vivos/) - a free Vietnamese speech corpus consisting of 15 hours of recording speech by AILab
+- [VNTQcorpus(big).txt](http://viet.jnlp.org/download-du-lieu-tu-vung-corpus) - 1.75 million sentences in news
+- [ViText2SQL](https://github.com/VinAIResearch/ViText2SQL) - A dataset for Vietnamese Text-to-SQL semantic parsing (EMNLP-2020 Findings)
+- [EVB Corpus](https://github.com/qhungngo/EVBCorpus) - 20,000,000 words (20 million) from 15 bilingual books, 100 parallel English-Vietnamese / Vietnamese-English texts, 250 parallel law and ordinance texts, 5,000 news articles, and 2,000 film subtitles.
-## RPC Servers
-_RPC-compatible servers._
+## NLP for Dutch
-- [rpyc](https://github.com/tomerfiliba/rpyc) (Remote Python Call) - A transparent and symmetric RPC library for Python
-- [zerorpc](https://github.com/0rpc/zerorpc-python) - zerorpc is a flexible RPC implementation based on [ZeroMQ](http://zeromq.org/) and [MessagePack](http://msgpack.org/).
+[Back to Top](#contents)
-## Science
+- [python-frog](https://github.com/proycon/python-frog) - Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER)
+- [SimpleNLG_NL](https://github.com/rfdj/SimpleNLG-NL) - Dutch surface realiser used for Natural Language Generation in Dutch, based on the SimpleNLG implementation for English and French.
+- [Alpino](https://github.com/rug-compling/alpino) - Dependency parser for Dutch (also does PoS tagging and Lemmatisation).
+- [Kaldi NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL) - Dutch Speech Recognition models based on [Kaldi](http://kaldi-asr.org/).
+- [spaCy](https://spacy.io/) - [Dutch model](https://spacy.io/models/nl) available. - Industrial strength NLP with Python and Cython.
-_Libraries for scientific computing. Also see [Python-for-Scientists](https://github.com/TomNicholas/Python-for-Scientists)._
-- [astropy](https://github.com/astropy/astropy) - A community Python library for Astronomy.
-- [bcbb](https://github.com/chapmanb/bcbb) - Collection of useful code related to biological analysis.
-- [bcbio-nextgen](https://github.com/chapmanb/bcbio-nextgen) - Providing best-practice pipelines for fully automated high throughput sequencing analysis.
-- [biopython](https://github.com/biopython/biopython) - Biopython is a set of freely available tools for biological computation.
-- [cclib](https://github.com/cclib/cclib) - A library for parsing and interpreting the results of computational chemistry packages.
-- [colour](https://github.com/colour-science/colour) - Implementing a comprehensive number of colour theory transformations and algorithms.
-- [karateclub](https://github.com/benedekrozemberczki/karateclub) - Unsupervised machine learning toolbox for graph structured data.
-- [manim](https://github.com/ManimCommunity/manim) - An animation engine for explanatory math videos.
-- [networkx](https://github.com/networkx/networkx) - A high-productivity software for complex networks.
-- [nipy](https://github.com/nipy/nipy) - A collection of neuroimaging toolkits.
-- [numpy](https://github.com/numpy/numpy) - A fundamental package for scientific computing with Python.
-- [obspy](https://github.com/obspy/obspy) - A Python toolbox for seismology.
-- [openbabel](https://github.com/openbabel/openbabel) - A chemical toolbox designed to speak the many languages of chemical data.
-- [pathsim](https://github.com/pathsim/pathsim) - A block-based system modeling and simulation framework with a browser-based visual editor.
-- [pydy](https://github.com/pydy/pydy) - Short for Python Dynamics, used to assist with workflow in the modeling of dynamic motion.
-- [pymc](https://github.com/pymc-devs/pymc3) - Markov Chain Monte Carlo sampling toolkit.
-- [rdkit](https://github.com/rdkit/rdkit) - Cheminformatics and Machine Learning Software.
-- [scipy](https://github.com/scipy/scipy) - A Python-based ecosystem of open-source software for mathematics, science, and engineering.
-- [shapely](https://github.com/shapely/shapely) - Manipulation and analysis of geometric objects in the Cartesian plane.
-- [simpy](https://gitlab.com/team-simpy/simpy) - A process-based discrete-event simulation framework.
-- [statsmodels](https://github.com/statsmodels/statsmodels) - Statistical modeling and econometrics in Python.
-- [sympy](https://github.com/sympy/sympy) - A Python library for symbolic mathematics.
+## NLP in Indonesian
-## Search
+### Datasets
+- Kompas and Tempo collections at [ILPS](http://ilps.science.uva.nl/resources/bahasa/)
+- [PANL10N for PoS tagging](http://www.panl10n.net/english/outputs/Indonesia/UI/0802/UI-1M-tagged.zip): 39K sentences and 900K word tokens
+- [IDN for PoS tagging](https://github.com/famrashel/idn-tagged-corpus): This corpus contains 10K sentences and 250K word tokens
+- [Indonesian Treebank](https://github.com/famrashel/idn-treebank) and [Universal Dependencies-Indonesian](https://github.com/UniversalDependencies/UD_Indonesian-GSD)
+- [IndoSum](https://github.com/kata-ai/indosum) for text summarization and classification both
+- [Wordnet-Bahasa](http://wn-msa.sourceforge.net/) - large, free, semantic dictionary
+- IndoBenchmark [IndoNLU](https://github.com/indobenchmark/indonlu) includes pre-trained language model (IndoBERT), FastText model, Indo4B corpus, and several NLU benchmark datasets
-_Libraries and software for indexing and performing search queries on data._
+### Libraries & Embedding
+- Natural language toolkit [bahasa](https://github.com/kangfend/bahasa)
+- [Indonesian Word Embedding](https://github.com/galuhsahid/indonesian-word-embedding)
+- Pretrained [Indonesian fastText Text Embedding](https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.id.zip) trained on Wikipedia
+- IndoBenchmark [IndoNLU](https://github.com/indobenchmark/indonlu) includes pretrained language model (IndoBERT), FastText model, Indo4B corpus, and several NLU benchmark datasets
-- [django-haystack](https://github.com/django-haystack/django-haystack) - Modular search for Django.
-- [elasticsearch-dsl-py](https://github.com/elastic/elasticsearch-dsl-py) - The official high-level Python client for Elasticsearch.
-- [elasticsearch-py](https://github.com/elastic/elasticsearch-py) - The official low-level Python client for [Elasticsearch](https://www.elastic.co/products/elasticsearch).
-- [pysolr](https://github.com/django-haystack/pysolr) - A lightweight Python wrapper for [Apache Solr](https://lucene.apache.org/solr/).
-- [whoosh](https://github.com/mchaput/whoosh) - A fast, pure Python search engine library.
-
-## Serialization
-
-_Libraries for serializing complex data types._
-
-- [marshmallow](https://github.com/marshmallow-code/marshmallow) - A lightweight library for converting complex objects to and from simple Python datatypes.
-- [orjson](https://github.com/ijl/orjson) - Fast, correct JSON library.
-- [pysimdjson](https://github.com/TkTech/pysimdjson) - A Python bindings for [simdjson](https://github.com/lemire/simdjson).
-- [python-rapidjson](https://github.com/python-rapidjson/python-rapidjson) - A Python wrapper around [RapidJSON](https://github.com/Tencent/rapidjson).
-- [toonify](https://github.com/ScrapeGraphAI/toonify) - A compact, human-readable serialization format that reduces LLM token usage by 30-60% compared to JSON.
-- [ultrajson](https://github.com/esnme/ultrajson) - A fast JSON decoder and encoder written in C with Python bindings.
-
-## Serverless Frameworks
-
-_Frameworks for developing serverless Python code._
-
-- [python-lambda](https://github.com/nficano/python-lambda) - A toolkit for developing and deploying Python code in AWS Lambda.
-- [zappa](https://github.com/zappa/Zappa) - A tool for deploying WSGI applications on AWS Lambda and API Gateway.
-
-## Shell
-
-_Shells built with Python._
-
-- [xonsh](https://github.com/xonsh/xonsh/) - A Python-powered shell. Full-featured and cross-platform.
-
-## Specific Formats Processing
-
-_Libraries for parsing and manipulating specific text formats._
-
-- General
- - [docling](https://github.com/docling-project/docling) - Library for converting documents into structured data.
- - [kreuzberg](https://github.com/kreuzberg-dev/kreuzberg) - High-performance document extraction library with a Rust core, supporting 62+ formats including PDF, Office, images with OCR, HTML, email, and archives.
- - [tablib](https://github.com/jazzband/tablib) - A module for Tabular Datasets in XLS, CSV, JSON, YAML.
- - [textract](https://github.com/deanmalmgren/textract) - Extract text from any document, Word, PowerPoint, PDFs, etc.
-- Office
- - [docxtpl](https://github.com/elapouya/python-docx-template) - Editing a docx document by jinja2 template
- - [openpyxl](https://openpyxl.readthedocs.io/en/stable/) - A library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files.
- - [pyexcel](https://github.com/pyexcel/pyexcel) - Providing one API for reading, manipulating and writing csv, ods, xls, xlsx and xlsm files.
- - [python-docx](https://github.com/python-openxml/python-docx) - Reads, queries and modifies Microsoft Word 2007/2008 docx files.
- - [python-pptx](https://github.com/scanny/python-pptx) - Python library for creating and updating PowerPoint (.pptx) files.
- - [xlsxwriter](https://github.com/jmcnamara/XlsxWriter) - A Python module for creating Excel .xlsx files.
- - [xlwings](https://github.com/ZoomerAnalytics/xlwings) - A BSD-licensed library that makes it easy to call Python from Excel and vice versa.
-- PDF
- - [pdf_oxide](https://github.com/yfedoseev/pdf_oxide) - A fast PDF library for text extraction, image extraction, and markdown conversion, powered by Rust.
- - [pdfminer.six](https://github.com/pdfminer/pdfminer.six) - Pdfminer.six is a community maintained fork of the original PDFMiner.
- - [pikepdf](https://github.com/pikepdf/pikepdf) - A powerful library for reading and editing PDF files, based on qpdf.
- - [PyPDF2](https://github.com/mstamy2/PyPDF2) - A library capable of splitting, merging and transforming PDF pages.
- - [reportlab](https://www.reportlab.com/opensource/) - Allowing Rapid creation of rich PDF documents.
-- Markdown
- - [markdown-it-py](https://github.com/executablebooks/markdown-it-py) - Markdown parser with 100% CommonMark support, extensions, and syntax plugins.
- - [markdown](https://github.com/waylan/Python-Markdown) - A Python implementation of John Gruber’s Markdown.
- - [markitdown](https://github.com/microsoft/markitdown) - Python tool for converting files and office documents to Markdown.
- - [mistune](https://github.com/lepture/mistune) - Fastest and full featured pure Python parsers of Markdown.
-- YAML
- - [pyyaml](https://github.com/yaml/pyyaml) - YAML implementations for Python.
-- TOML
- - [tomllib](https://docs.python.org/3/library/tomllib.html) - (Python standard library) Parse TOML files.
-- CSV
- - [csvkit](https://github.com/wireservice/csvkit) - Utilities for converting to and working with CSV.
-- Archive
- - [unp](https://github.com/mitsuhiko/unp) - A command line tool that can unpack archives easily.
-
-## Static Site Generator
-
-_Static site generator is a software that takes some text + templates as input and produces HTML files on the output._
-
-- [lektor](https://github.com/lektor/lektor) - An easy to use static CMS and blog engine.
-- [makesite](https://github.com/sunainapai/makesite) - Simple, lightweight, and magic-free static site/blog generator (< 130 lines).
-- [mkdocs](https://github.com/mkdocs/mkdocs/) - Markdown friendly documentation generator.
-- [nikola](https://github.com/getnikola/nikola) - A static website and blog generator.
-- [pelican](https://github.com/getpelican/pelican) - Static site generator that supports Markdown and reST syntax.
-
-## Task Queues
-
-_Libraries for working with task queues._
-
-- [celery](https://github.com/celery/celery) - An asynchronous task queue/job queue based on distributed message passing.
-- [dramatiq](https://github.com/Bogdanp/dramatiq) - A fast and reliable background task processing library for Python 3.
-- [huey](https://github.com/coleifer/huey) - Little multi-threaded task queue.
-- [mrq](https://github.com/pricingassistant/mrq) - A distributed worker task queue in Python using Redis & gevent.
-- [rq](https://github.com/rq/rq) - Simple job queues for Python.
-
-## Template Engine
-
-_Libraries and tools for templating and lexing._
-
-- [genshi](https://github.com/edgewall/genshi) - Python templating toolkit for generation of web-aware output.
-- [jinja](https://github.com/pallets/jinja) - A modern and designer friendly templating language.
-- [mako](https://github.com/sqlalchemy/mako) - Hyperfast and lightweight templating for the Python platform.
-
-## Testing
-
-_Libraries for testing codebases and generating test data._
-
-- Testing Frameworks
- - [hypothesis](https://github.com/HypothesisWorks/hypothesis) - Hypothesis is an advanced Quickcheck style property based testing library.
- - [nose2](https://github.com/nose-devs/nose2) - The successor to `nose`, based on `unittest2`.
- - [pytest](https://github.com/pytest-dev/pytest) - A mature full-featured Python testing tool.
- - [robotframework](https://github.com/robotframework/robotframework) - A generic test automation framework.
- - [scanapi](https://github.com/scanapi/scanapi) - Automated Testing and Documentation for your REST API.
- - [unittest](https://docs.python.org/3/library/unittest.html) - (Python standard library) Unit testing framework.
-- Test Runners
- - [green](https://github.com/CleanCut/green) - A clean, colorful test runner.
- - [mamba](https://github.com/nestorsalceda/mamba) - The definitive testing tool for Python. Born under the banner of BDD.
- - [tox](https://github.com/tox-dev/tox) - Auto builds and tests distributions in multiple Python versions
-- GUI / Web Testing
- - [locust](https://github.com/locustio/locust) - Scalable user load testing tool written in Python.
- - [pyautogui](https://github.com/asweigart/pyautogui) - PyAutoGUI is a cross-platform GUI automation Python module for human beings.
- - [schemathesis](https://github.com/kiwicom/schemathesis) - A tool for automatic property-based testing of web applications built with Open API / Swagger specifications.
- - [selenium](https://github.com/SeleniumHQ/selenium) - Python bindings for [Selenium](https://selenium.dev/) [WebDriver](https://selenium.dev/documentation/webdriver/).
- - [splinter](https://github.com/cobrateam/splinter) - Open source tool for testing web applications.
-- Mock
- - [doublex](https://pypi.org/project/doublex/) - Powerful test doubles framework for Python.
- - [freezegun](https://github.com/spulec/freezegun) - Travel through time by mocking the datetime module.
- - [httmock](https://github.com/patrys/httmock) - A mocking library for requests for Python 2.6+ and 3.2+.
- - [httpretty](https://github.com/gabrielfalcao/HTTPretty) - HTTP request mock tool for Python.
- - [mock](https://docs.python.org/3/library/unittest.mock.html) - (Python standard library) A mocking and patching library.
- - [mocket](https://github.com/mindflayer/python-mocket) - A socket mock framework with gevent/asyncio/SSL support.
- - [responses](https://github.com/getsentry/responses) - A utility library for mocking out the requests Python library.
- - [vcrpy](https://github.com/kevin1024/vcrpy) - Record and replay HTTP interactions on your tests.
-- Object Factories
- - [factory_boy](https://github.com/FactoryBoy/factory_boy) - A test fixtures replacement for Python.
- - [mixer](https://github.com/klen/mixer) - Another fixtures replacement. Supports Django, Flask, SQLAlchemy, Peewee and etc.
- - [polyfactory](https://github.com/litestar-org/polyfactory) - mock data generation library with support to classes (continuation of `pydantic-factories`)
-- Code Coverage
- - [coverage](https://github.com/coveragepy/coveragepy) - Code coverage measurement.
-- Fake Data
- - [faker](https://github.com/joke2k/faker) - A Python package that generates fake data.
- - [mimesis](https://github.com/lk-geimfari/mimesis) - is a Python library that help you generate fake data.
-
-## Text Processing
-
-_Libraries for parsing and manipulating plain texts._
-
-- General
- - [chardet](https://github.com/chardet/chardet) - Python 2/3 compatible character encoding detector.
- - [difflib](https://docs.python.org/3/library/difflib.html) - (Python standard library) Helpers for computing deltas.
- - [ftfy](https://github.com/LuminosoInsight/python-ftfy) - Makes Unicode text less broken and more consistent automagically.
- - [Levenshtein](https://github.com/ztane/python-Levenshtein/) - Fast computation of Levenshtein distance and string similarity.
- - [pangu.py](https://github.com/vinta/pangu.py) - Paranoid text spacing.
- - [pyfiglet](https://github.com/pwaller/pyfiglet) - An implementation of figlet written in Python.
- - [pypinyin](https://github.com/mozillazg/python-pinyin) - Convert Chinese hanzi (漢字) to pinyin (拼音).
- - [textdistance](https://github.com/orsinium/textdistance) - Compute distance between sequences with 30+ algorithms.
- - [unidecode](https://github.com/avian2/unidecode) - ASCII transliterations of Unicode text.
-- Slugify
- - [awesome-slugify](https://github.com/dimka665/awesome-slugify) - A Python slugify library that can preserve unicode.
- - [python-slugify](https://github.com/un33k/python-slugify) - A Python slugify library that translates unicode to ASCII.
- - [unicode-slugify](https://github.com/mozilla/unicode-slugify) - A slugifier that generates unicode slugs with Django as a dependency.
-- Unique identifiers
- - [hashids](https://github.com/davidaurelio/hashids-python) - Implementation of [hashids](http://hashids.org) in Python.
- - [shortuuid](https://github.com/skorokithakis/shortuuid) - A generator library for concise, unambiguous and URL-safe UUIDs.
-- Parser
- - [pygments](https://github.com/pygments/pygments) - A generic syntax highlighter.
- - [pyparsing](https://github.com/pyparsing/pyparsing) - A general purpose framework for generating parsers.
- - [python-nameparser](https://github.com/derek73/python-nameparser) - Parsing human names into their individual components.
- - [python-phonenumbers](https://github.com/daviddrysdale/python-phonenumbers) - Parsing, formatting, storing and validating international phone numbers.
- - [python-user-agents](https://github.com/selwin/python-user-agents) - Browser user agent parser.
- - [sqlparse](https://github.com/andialbrecht/sqlparse) - A non-validating SQL parser.
-
-## URL Manipulation
-
-_Libraries for parsing URLs._
-
-- [furl](https://github.com/gruns/furl) - A small Python library that makes parsing and manipulating URLs easy.
-- [purl](https://github.com/codeinthehole/purl) - A simple, immutable URL class with a clean API for interrogation and manipulation.
-- [webargs](https://github.com/marshmallow-code/webargs) - A friendly library for parsing HTTP request arguments with built-in support for popular web frameworks.
-
-## Video
-
-_Libraries for manipulating video and GIFs._
-
-- [moviepy](https://github.com/Zulko/moviepy) - A module for script-based movie editing with many formats, including animated GIFs.
-- [scikit-video](https://github.com/aizvorski/scikit-video) - Video processing routines for SciPy.
-- [vidgear](https://github.com/abhiTronix/vidgear) - Most Powerful multi-threaded Video Processing framework.
-
-## Web Asset Management
-
-_Tools for managing, compressing and minifying website assets._
-
-- [django-compressor](https://github.com/django-compressor/django-compressor) - Compresses linked and inline JavaScript or CSS into a single cached file.
-- [django-pipeline](https://github.com/jazzband/django-pipeline) - An asset packaging library for Django.
-- [django-storages](https://github.com/jschneier/django-storages) - A collection of custom storage back ends for Django.
-- [fanstatic](https://github.com/zopefoundation/fanstatic) - Packages, optimizes, and serves static file dependencies as Python packages.
-- [flask-assets](https://github.com/miracle2k/flask-assets) - Helps you integrate webassets into your Flask app.
-- [webassets](https://github.com/miracle2k/webassets) - Bundles, optimizes, and manages unique cache-busting URLs for static resources.
-
-## Web Content Extracting
-
-_Libraries for extracting web contents._
-
-- [html2text](https://github.com/Alir3z4/html2text) - Convert HTML to Markdown-formatted text.
-- [lassie](https://github.com/michaelhelmick/lassie) - Web Content Retrieval for Humans.
-- [micawber](https://github.com/coleifer/micawber) - A small library for extracting rich content from URLs.
-- [newspaper](https://github.com/codelucas/newspaper) - News extraction, article extraction and content curation in Python.
-- [python-readability](https://github.com/buriy/python-readability) - Fast Python port of arc90's readability tool.
-- [requests-html](https://github.com/psf/requests-html) - Pythonic HTML Parsing for Humans.
-- [sumy](https://github.com/miso-belica/sumy) - A module for automatic summarization of text documents and HTML pages.
-- [toapi](https://github.com/gaojiuli/toapi) - Every web site provides APIs.
-
-## Web Crawling
-
-_Libraries to automate web scraping._
-
-- [browser-use](https://github.com/browser-use/browser-use) - Make websites accessible for AI agents with easy browser automation.
-- [crawl4ai](https://github.com/unclecode/crawl4ai) - An open-source, LLM-friendly web crawler that provides lightning-fast, structured data extraction specifically designed for AI agents.
-- [feedparser](https://github.com/kurtmckee/feedparser) - Universal feed parser.
-- [grab](https://github.com/lorien/grab) - Site scraping framework.
-- [mechanicalsoup](https://github.com/MechanicalSoup/MechanicalSoup) - A Python library for automating interaction with websites.
-- [scrapy](https://github.com/scrapy/scrapy) - A fast high-level screen scraping and web crawling framework.
-
-## Web Frameworks
-
-_Traditional full stack web frameworks. Also see [RESTful API](https://github.com/vinta/awesome-python#restful-api)._
-
-- Synchronous
- - [django](https://github.com/django/django) - The most popular web framework in Python.
- - [awesome-django](https://github.com/shahraizali/awesome-django)
- - [flask](https://github.com/pallets/flask) - A microframework for Python.
- - [awesome-flask](https://github.com/humiaozuzu/awesome-flask)
- - [pyramid](https://github.com/Pylons/pyramid) - A small, fast, down-to-earth, open source Python web framework.
- - [awesome-pyramid](https://github.com/uralbash/awesome-pyramid)
- - [fasthtml](https://github.com/AnswerDotAI/fasthtml) - The fastest way to create an HTML app.
- - [awesome-fasthtml](https://github.com/amosgyamfi/awesome-fasthtml)
- - [masonite](https://github.com/MasoniteFramework/masonite) - The modern and developer centric Python web framework.
-- Asynchronous
- - [litestar](https://github.com/litestar-org/litestar) - Production-ready, capable and extensible ASGI Web framework.
- - [microdot](https://github.com/miguelgrinberg/microdot) - The impossibly small web framework for Python and MicroPython.
- - [reflex](https://github.com/reflex-dev/reflex) – A framework for building reactive, full-stack web applications entirely with python .
- - [tornado](https://github.com/tornadoweb/tornado) - A web framework and asynchronous networking library.
+## NLP in Urdu
-## WebSocket
+### Datasets
+- [Collection of Urdu datasets](https://github.com/mirfan899/Urdu) for POS, NER and NLP tasks
-_Libraries for working with WebSocket._
+### Libraries
+- [Natural Language Processing library](https://github.com/urduhack/urduhack) for ( 🇵🇰)Urdu language
-- [autobahn-python](https://github.com/crossbario/autobahn-python) - WebSocket & WAMP for Python on Twisted and [asyncio](https://docs.python.org/3/library/asyncio.html).
-- [channels](https://github.com/django/channels) - Developer-friendly asynchrony for Django.
-- [websockets](https://github.com/aaugustin/websockets) - A library for building WebSocket servers and clients with a focus on correctness and simplicity.
+## NLP in Persian
-## WSGI Servers
+[Back to Top](#contents)
-_WSGI-compatible web servers._
+### Libraries
+- [Hazm](https://github.com/roshan-research/hazm) - Persian NLP Toolkit.
+- [Parsivar](https://github.com/ICTRC/Parsivar): A Language Processing Toolkit for Persian
+- [Perke](https://github.com/AlirezaTheH/perke): Perke is a Python keyphrase extraction package for Persian language. It provides an end-to-end keyphrase extraction pipeline in which each component can be easily modified or extended to develop new models.
+- [Perstem](https://github.com/jonsafari/perstem): Persian stemmer, morphological analyzer, transliterator, and partial part-of-speech tagger
+- [ParsiAnalyzer](https://github.com/NarimanN2/ParsiAnalyzer): Persian Analyzer For Elasticsearch
+- [virastar](https://github.com/aziz/virastar): Cleaning up Persian text!
-- [gunicorn](https://github.com/benoitc/gunicorn) - Pre-forked, ported from Ruby's Unicorn project.
-- [uwsgi](https://github.com/unbit/uwsgi) - A project aims at developing a full stack for building hosting services, written in C.
-- [waitress](https://github.com/Pylons/waitress) - Multi-threaded, powers Pyramid.
-- [werkzeug](https://github.com/pallets/werkzeug) - A WSGI utility library for Python that powers Flask and can easily be embedded into your own projects.
+### Datasets
+- [Bijankhan Corpus](https://dbrg.ut.ac.ir/بیژن%E2%80%8Cخان/): Bijankhan corpus is a tagged corpus that is suitable for natural language processing research on the Persian (Farsi) language. This collection is gathered form daily news and common texts. In this collection all documents are categorized into different subjects such as political, cultural and so on. Totally, there are 4300 different subjects. The Bijankhan collection contains about 2.6 millions manually tagged words with a tag set that contains 40 Persian POS tags.
+- [Uppsala Persian Corpus (UPC)](https://sites.google.com/site/mojganserajicom/home/upc): Uppsala Persian Corpus (UPC) is a large, freely available Persian corpus. The corpus is a modified version of the Bijankhan corpus with additional sentence segmentation and consistent tokenization containing 2,704,028 tokens and annotated with 31 part-of-speech tags. The part-of-speech tags are listed with explanations in [this table](https://sites.google.com/site/mojganserajicom/home/upc/Table_tag.pdf).
+- [Large-Scale Colloquial Persian](http://hdl.handle.net/11234/1-3195): Large Scale Colloquial Persian Dataset (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a comprehensive problem. LSCP includes 120M sentences from 27M casual Persian tweets with its dependency relations in syntactic annotation, Part-of-speech tags, sentiment polarity and automatic translation of original Persian sentences in English (EN), German (DE), Czech (CS), Italian (IT) and Hindi (HI) spoken languages. Learn more about this project at [LSCP webpage](https://iasbs.ac.ir/~ansari/lscp/).
+- [ArmanPersoNERCorpus](https://github.com/HaniehP/PersianNER): The dataset includes 250,015 tokens and 7,682 Persian sentences in total. It is available in 3 folds to be used in turn as training and test sets. Each file contains one token, along with its manually annotated named-entity tag, per line. Each sentence is separated with a newline. The NER tags are in IOB format.
+- [FarsiYar PersianNER](https://github.com/Text-Mining/Persian-NER): The dataset includes about 25,000,000 tokens and about 1,000,000 Persian sentences in total based on [Persian Wikipedia Corpus](https://github.com/Text-Mining/Persian-Wikipedia-Corpus). The NER tags are in IOB format. More than 1000 volunteers contributed tag improvements to this dataset via web panel or android app. They release updated tags every two weeks.
+- [PERLEX](http://farsbase.net/PERLEX.html): The first Persian dataset for relation extraction, which is an expert translated version of the "Semeval-2010-Task-8" dataset. Link to the relevant publication.
+- [Persian Syntactic Dependency Treebank](http://dadegan.ir/catalog/perdt): This treebank is supplied for free noncommercial use. For commercial uses feel free to contact us. The number of annotated sentences is 29,982 sentences including samples from almost all verbs of the Persian valency lexicon.
+- [Uppsala Persian Dependency Treebank (UPDT)](http://stp.lingfil.uu.se/~mojgan/UPDT.html): Dependency-based syntactically annotated corpus.
+- [Hamshahri](https://dbrg.ut.ac.ir/hamshahri/): Hamshahri collection is a standard reliable Persian text collection that was used at Cross Language Evaluation Forum (CLEF) during years 2008 and 2009 for evaluation of Persian information retrieval systems.
-# Resources
-Where to discover learning resources or new Python libraries.
+## NLP in Ukrainian
-## Newsletters
+[Back to Top](#contents)
-- [Awesome Python Newsletter](http://python.libhunt.com/newsletter)
-- [Pycoder's Weekly](https://pycoders.com/)
-- [Python Tricks](https://realpython.com/python-tricks/)
-- [Python Weekly](https://www.pythonweekly.com/)
+- [awesome-ukrainian-nlp](https://github.com/asivokon/awesome-ukrainian-nlp) - a curated list of Ukrainian NLP datasets, models, etc.
+- [UkrainianLT](https://github.com/Helsinki-NLP/UkrainianLT) - another curated list with a focus on machine translation and speech processing
-## Podcasts
-- [Django Chat](https://djangochat.com/)
-- [PyPodcats](https://pypodcats.live)
-- [Python Bytes](https://pythonbytes.fm)
-- [Python Test](https://podcast.pythontest.com/)
-- [Talk Python To Me](https://talkpython.fm/)
-- [The Real Python Podcast](https://realpython.com/podcasts/rpp/)
+## NLP in Hungarian
-# Contributing
+[Back to Top](#contents)
-Your contributions are always welcome! Please take a look at the [contribution guidelines](https://github.com/vinta/awesome-python/blob/master/CONTRIBUTING.md) first.
+- [awesome-hungarian-nlp](https://github.com/oroszgy/awesome-hungarian-nlp): A curated list of free resources dedicated to Hungarian Natural Language Processing.
----
+## NLP in Portuguese
-If you have any question about this opinionated list, do not hesitate to contact [@VintaChen](https://twitter.com/VintaChen) on Twitter.
+[Back to Top](#contents)
+
+- [Portuguese-nlp](https://github.com/ajdavidl/Portuguese-NLP) - a List of resources and tools developed with focus on Portuguese.
+
+## Other Languages
+
+- Russian: [pymorphy2](https://github.com/kmike/pymorphy2) - a good pos-tagger for Russian
+- Asian Languages: Thai, Lao, Chinese, Japanese, and Korean [ICU Tokenizer](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu-tokenizer.html) implementation in ElasticSearch
+- Ancient Languages: [CLTK](https://github.com/cltk/cltk): The Classical Language Toolkit is a Python library and collection of texts for doing NLP in ancient languages
+- Hebrew: [NLPH_Resources](https://github.com/NLPH/NLPH_Resources) - A collection of papers, corpora and linguistic resources for NLP in Hebrew
+
+[Back to Top](#contents)
+
+## Citation
+
+If you find this repository useful, please consider citing this list:
+
+```bibtex
+@misc{awesome-nlp,
+ title = {Awesome NLP},
+ author = {Kim, Keon and Chelikavada, Krish},
+ year = {2018},
+ url = {https://github.com/keon/awesome-nlp},
+ note = {GitHub repository}
+}
+```