Skip to content

keyarr/Select

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Select

A Python CLI tool for enriching lead data from CSV files with company information, AI-powered ICP scoring, and export capabilities. Select automates the process of taking a simple list of company names and domains, then enriching each lead with descriptions, contact emails, ideal customer profile scores, and recommended outreach strategies.

The pipeline supports multiple enrichment sources (Clearbit API, Hunter.io, web scraping) with automatic fallback, concurrent processing for speed, and both CLI and Streamlit dashboard interfaces. It uses Pydantic models for data validation, httpx for async HTTP, and integrates with OpenAI or local Ollama models for intelligent lead scoring.

Architecture

┌─────────┐     ┌────────────┐     ┌────────────┐     ┌─────────┐     ┌────────┐
│   CLI   │───▶│ Ingestion  │───▶│ Enrichment │───▶│ Scoring │───▶│ Export │
│ (Typer) │     │  (CSV)     │     │ (API/LLM)  │     │ (ICP)   │     │ (CSV)  │
└─────────┘     └────────────┘     └────────────┘     └─────────┘     └────────┘
                                       │
                                       ▼
                                  ┌───────────┐
                                  │ Dashboard │
                                  │(Streamlit)│
                                  └───────────┘

Pipeline stages:

  1. Ingestion — Reads CSV input, validates rows via Pydantic, skips malformed data
  2. Enrichment — Calls Clearbit/Hunter.io APIs, falls back to web scraping on failure
  3. Scoring — Uses OpenAI (gpt-4o-mini) or Ollama (llama3.2) to classify ICP score (1-5) and recommend outreach approach
  4. Export — Writes enriched leads to CSV (and optionally Notion)
  5. Dashboard (optional) — Streamlit UI for upload, view, filter, and download

Setup

Prerequisites

  • Python 3.11+
  • pip

Installation

# Clone the repository
git clone https://github.com/keyarr/Select.git
cd Select

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate  # Linux/macOS
# .venv\Scripts\activate   # Windows

# Install dependencies
pip install -r requirements.txt

Environment Variables

Create a .env file in the project root:

# Required for OpenAI scoring (optional if using Ollama)
OPENAI_API_KEY=sk-your-openai-key

# Optional: Clearbit API for company enrichment
CLEARBIT_API_KEY=your-clearbit-key

# Optional: Hunter.io API for contact email lookup
HUNTER_API_KEY=your-hunter-key

# Optional: Notion integration for export
NOTION_TOKEN=your-notion-token
NOTION_DATABASE_ID=your-database-id

Usage

CLI

# Basic usage — enrich leads from CSV
python sel.py --input leads.csv --output enriched.csv

# Specify number of concurrent workers
python sel.py --input leads.csv --output enriched.csv --workers 8

# Skip LLM scoring (enrichment only, no ICP classification)
python sel.py --input leads.csv --output enriched.csv --skip-scoring

# Use OpenAI for scoring (requires OPENAI_API_KEY)
python sel.py --input leads.csv --output enriched.csv --provider openai --model gpt-4o-mini

# Use local Ollama for scoring (default)
python sel.py --input leads.csv --output enriched.csv --provider ollama --model llama3.2

# Use the included sample data
python sel.py --input sample_leads.csv --output enriched.csv

CLI Flags and Options

Flag Type Default Description
--input / -i str (required) Path to input CSV file containing leads
--output / -o str (required) Path to output CSV file for enriched leads
--workers / -w int 5 Number of concurrent workers (1-16)
--skip-scoring flag False Skip LLM-based ICP scoring step
--provider / -p str ollama LLM provider: openai or ollama
--model / -m str llama3.2 Model name for the LLM provider
--max-retries int 3 Maximum retry attempts per lead on transient errors
--help / -h flag Show help message and exit

Dashboard

# Launch the Streamlit dashboard
streamlit run dashboard/app.py

The dashboard provides:

  • CSV Upload — Drag-and-drop or file picker to upload lead CSVs
  • Enrichment Trigger — Start the pipeline directly from the browser
  • Results Table — Sortable, filterable view of enriched leads
  • ICP Filtering — Filter by score range and industry
  • CSV Download — Export enriched results as a downloadable CSV

API Key Setup

Clearbit (Optional)

Used for company description and industry enrichment.

  1. Sign up at clearbit.com
  2. Get your API key from the dashboard
  3. Set CLEARBIT_API_KEY in your .env

Hunter.io (Optional)

Used for contact email discovery.

  1. Sign up at hunter.io
  2. Get your API key from the API settings page
  3. Set HUNTER_API_KEY in your .env

OpenAI (Optional)

Used for AI-powered ICP scoring and outreach recommendation.

  1. Sign up at platform.openai.com
  2. Create an API key
  3. Set OPENAI_API_KEY in your .env

Note: If no OpenAI key is provided, the pipeline falls back to Ollama (local) for scoring. Install Ollama and pull a model with ollama pull llama3.2 to use the local fallback.

Input Format

The input CSV must have at minimum a company_name column. Optional columns include domain and industry:

company_name,domain,industry
TechNova Solutions,technovatech.com,Technology
GreenLeaf Capital,greenleafcapital.com,Finance

Output Format

The enriched CSV includes all input fields plus enrichment results:

company_name,domain,industry,description,contact_email,icp_score,recommended_approach,timestamp,enrichment_source,error

License

MIT License. See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages