Skip to content

embedsim 0.1.0

Choose a tag to compare

@tcdent tcdent released this 04 Oct 01:22
· 4 commits to main since this release

Release Notes - embedsim v0.1.0

A Python library for measuring semantic similarity and detecting outliers in text collections using
embeddings.

Features

Core Functionality:

  • pairsim() - Compare two texts using cosine similarity of their embeddings
  • groupsim() - Analyze text collections and identify outliers using centroid-based coherence scoring

Embedding Model Support:

  • OpenAI models (openai-3-small, openai-3-large) via API
  • Local sentence-transformer models (Jina v2, MiniLM, etc.) for privacy and offline use
  • Configurable via function parameters or environment variables

Use Cases:

  • Content moderation and off-topic detection
  • Document clustering and outlier identification
  • Quality assurance for generated content
  • Search relevance scoring
  • Duplicate detection