AllMeans

Modern topic modeling with minimal user input. AllMeans v2.0 provides automatic topic discovery using state-of-the-art clustering algorithms, real evaluation metrics, and intelligent keyword extraction with part-of-speech filtering and lemmatization.

Features

Multiple Clustering Algorithms: K-Means, NMF, LDA, HDBSCAN+UMAP
Automatic K Selection: Multi-objective optimization finds optimal number of topics
Smart Keyword Extraction: POS-based filtering removes ordinals, numbers, and uninformative words
Lemmatization: Normalizes word forms (singular/plural, verb tenses) for better topic quality
Real Evaluation Metrics: C_V Coherence, Topic Diversity, Silhouette, Davies-Bouldin
Scikit-learn API: Familiar fit()/transform() pattern
Verbosity Controls: Rich progress bars and detailed output options
CLI Interface: Command-line tool for quick topic modeling

Installation

pip install allmeans

Or with uv:

uv add allmeans

Optional Dependencies

# Sentiment analysis
pip install allmeans[sentiment]

# Embeddings support (sentence-transformers, gensim)
pip install allmeans[embeddings]

# Visualization tools
pip install allmeans[viz]

# All extras
pip install allmeans[all]

Quick Start

Basic Usage

from AllMeans import TopicModel

# Your text
text = """
Machine learning is a subset of artificial intelligence.
Deep learning uses neural networks with multiple layers.
Natural language processing helps computers understand human language.
Computer vision enables machines to interpret visual information.
"""

# Create and fit model
model = TopicModel(
    method="kmeans",           # or "nmf", "lda", "hdbscan"
    feature_method="tfidf",    # or "bow", "sif"
    auto_k=True,               # automatically find optimal K
    k_range=(2, 10),          # range to search
    verbose=True               # show progress
)

model.fit(text)

# Get results
results = model.get_results()

# Print discovered topics
for topic in results.topics:
    print(f"\n📌 {topic.label}")
    print(f"   Keywords: {', '.join(topic.keywords[:5])}")
    print(f"   Size: {topic.size} sentences")
    print(f"   Coherence: {topic.coherence:.3f}")

Working with Documents

# List of documents instead of single text
documents = [
    "Python is a high-level programming language.",
    "JavaScript is popular for web development.",
    "Machine learning models require training data.",
    "Data science combines statistics and programming.",
]

model = TopicModel(n_clusters=2, auto_k=False)
model.fit(documents)

# Transform new documents
new_docs = ["Deep learning is a subset of machine learning."]
assignments = model.transform(new_docs)

Command-Line Interface

# Fit model on text file
allmeans fit --input article.txt --method kmeans --verbose

# With custom parameters
allmeans fit \
    --input data.txt \
    --output results.json \
    --method hdbscan \
    --features tfidf \
    --clusters 5

# View topics from saved results
allmeans topics --results results.json

Advanced Examples

Wikipedia Article Analysis

import requests
from bs4 import BeautifulSoup
from AllMeans import TopicModel

# Fetch Wikipedia article
url = "https://en.wikipedia.org/wiki/Roman_Empire"
response = requests.get(url, headers={
    "User-Agent": "AllMeans/2.0"
})
soup = BeautifulSoup(response.content, 'html.parser')
content = soup.find('div', {'id': 'mw-content-text'})
paragraphs = content.find_all('p')
text = ' '.join([p.get_text() for p in paragraphs if p.get_text().strip()])

# Model topics with auto-K selection
model = TopicModel(
    method="kmeans",
    feature_method="tfidf",
    auto_k=True,
    k_range=(3, 8),
    early_stop=2,
    random_state=42,
    verbose=True
)

model.fit(text)
results = model.get_results()

# Display results
print(f"\nDiscovered {len(results.topics)} topics:")
for topic in results.topics:
    print(f"\n{topic.label} ({topic.size} sentences)")
    print(f"Keywords: {', '.join(topic.keywords)}")
    print(f"Example: {topic.exemplar_sentences[0][:100]}...")

Custom Exclusions

# Exclude specific words from keywords
model = TopicModel(
    exclusions=["said", "also", "however"],
    excl_sim=0.9,  # Jaro-Winkler similarity threshold
    filter_pos=True  # Enable POS filtering
)

model.fit(text)

Evaluation Metrics

results = model.get_results()

print("Evaluation Metrics:")
print(f"Coherence (C_V): {results.scores['coherence']:.3f}")
print(f"Diversity: {results.scores['diversity']:.3f}")
print(f"Silhouette: {results.scores['silhouette']:.3f}")
print(f"Davies-Bouldin: {results.scores['davies_bouldin']:.3f}")

API Reference

TopicModel

TopicModel(
    method="kmeans",              # Clustering: "kmeans", "nmf", "lda", "hdbscan"
    feature_method="tfidf",       # Features: "tfidf", "bow", "sif"
    n_clusters=None,              # Fixed K (None for auto)
    auto_k=True,                  # Auto-select K
    k_range=(2, 10),             # K range to search
    early_stop=2,                 # Early stopping patience
    exclusions=None,              # Words to exclude
    excl_sim=0.9,                # Exclusion similarity threshold
    filter_pos=True,              # POS-based filtering
    random_state=42,              # Random seed
    verbose=False                 # Show progress
)

Methods:

fit(text) - Fit model on text or documents
transform(text) - Predict topics for new text
fit_transform(text) - Fit and transform in one step
get_results() - Get TopicModelResults object

TopicModelResults

Attributes:

topics - List of Topic objects
assignments - Cluster assignments for each sentence
scores - Dictionary of evaluation metrics
config - Model configuration
feature_matrix - TF-IDF or other features
sentences - Original sentences

Topic

Attributes:

id - Topic ID
label - Topic label (top keyword)
keywords - List of keywords
size - Number of sentences
coherence - Topic coherence score
diversity - Topic diversity score
exemplar_sentences - Example sentences

Migration from v1.x

v2.0 is a complete rewrite with breaking changes. See MIGRATION.md for detailed migration guide.

Quick migration:

# v1.x (deprecated)
from AllMeans import AllMeans
allmeans = AllMeans(text=text)
clusters = allmeans.model_topics(early_stop=2, verbose=False)

# v2.0 (current)
from AllMeans import TopicModel
model = TopicModel(early_stop=2, verbose=False)
model.fit(text)
results = model.get_results()

Performance

AllMeans v2.0 has been tested on texts ranging from 1,000 to 100,000+ characters. Performance scales with text size:

Small texts (1K-10K chars): < 1 second
Medium texts (10K-50K chars): 1-5 seconds
Large texts (50K-100K chars): 5-30 seconds

Auto-K selection adds overhead proportional to k_range size. Use early_stop to reduce computation time.

How It Works

Preprocessing: Text is split into sentences and lemmatized
Feature Extraction: TF-IDF, BoW, or SIF embeddings
Clustering: K-Means, NMF, LDA, or HDBSCAN groups similar sentences
Auto-K Selection (optional): Tests multiple K values, selects optimal via metrics
Keyword Extraction: TF-IDF scores filtered by POS tags and lemmatization
Label Selection: Most diverse keywords chosen as topic labels
Evaluation: Coherence, diversity, and clustering metrics computed

Contributing

Contributions welcome! Please open an issue or submit a pull request on GitHub.

License

MIT License - see LICENSE for details.

Citation

If you use AllMeans in research, please cite:

@software{allmeans2024,
  author = {Maurin-Jones, Kai},
  title = {AllMeans: Modern Topic Modeling with Minimal User Input},
  year = {2024},
  url = {https://github.com/kmaurinjones/AllMeans},
  version = {2.0.0}
}

Changelog

See CHANGELOG.md for version history and updates.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
AllMeans		AllMeans
examples		examples
tests		tests
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MIGRATION.md		MIGRATION.md
README.md		README.md
example.png		example.png
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AllMeans

Features

Installation

Optional Dependencies

Quick Start

Basic Usage

Working with Documents

Command-Line Interface

Advanced Examples

Wikipedia Article Analysis

Custom Exclusions

Evaluation Metrics

API Reference

TopicModel

TopicModelResults

Topic

Migration from v1.x

Performance

How It Works

Contributing

License

Citation

Changelog

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AllMeans

Features

Installation

Optional Dependencies

Quick Start

Basic Usage

Working with Documents

Command-Line Interface

Advanced Examples

Wikipedia Article Analysis

Custom Exclusions

Evaluation Metrics

API Reference

TopicModel

TopicModelResults

Topic

Migration from v1.x

Performance

How It Works

Contributing

License

Citation

Changelog

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages