Skip to content

Medhatt21/label_parser

Repository files navigation

Label Parser Monitor Service

A containerized service that monitors a directory for new product images, processes them through OpenAI's vision API, and saves structured data to CSV. Now includes a web interface for interactive image processing.

Features

  • Directory Monitoring: Automatically processes images in d_warehouse directory
  • Web Interface: Upload and process images through a modern web UI
  • AI Processing: Uses OpenAI GPT-4 Vision to extract product information
  • Complete Text Extraction: Captures ALL visible text from product labels
  • CSV Export: Download results as CSV files
  • Retry Logic: Exponential backoff for API failures
  • Containerized: Docker deployment with volume persistence
  • Henkel Branding: Professional UI with company branding

Setup

  1. Copy environment variables:
cp .env.example .env
  1. Add your OpenAI API key to .env:
OPENAI_API_KEY=your_actual_api_key_here
OPENAI_TIMEOUT=60
  1. Build and run with Docker Compose:
docker-compose up --build

Usage

Web Interface

  1. Open your browser to http://localhost:8000
  2. Upload a product image using drag-and-drop or file browser
  3. Click "Process Image" and wait for AI analysis
  4. View results and download CSV file

Directory Monitoring

  1. Place product images in data/d_warehouse/
  2. The monitor service automatically processes new images
  3. Results are saved to data/d_mart/processed_images.csv

Local Development

Run the web interface locally:

python run_web.py

Run the monitor service locally:

python run_local.py

Testing

Run the test suite to verify functionality:

# Test without OpenAI (uses mock data)
python test_monitor.py

# Test with real OpenAI API (requires API key)
python test_monitor.py
# Choose 'y' when prompted

Configuration

Environment variables:

Variable Default Description
OPENAI_API_KEY - Your OpenAI API key
OPENAI_MODEL gpt-4o OpenAI model to use
OPENAI_TEMPERATURE 0.1 Response randomness
OPENAI_MAX_TOKENS 100 Max response length
OPENAI_TIMEOUT 60 API timeout in seconds

CSV Output Format

Column Description
image_name Original image filename
item Complete product description with ALL label text
price Product price
brand Brand name
size Product size
product_type Product category

Supported Image Formats

  • JPG/JPEG
  • PNG
  • BMP
  • TIFF

Services

The application runs two services:

  1. Monitor Service (label-parser-monitor): Watches directory for new images
  2. Web Service (label-parser-web): Provides web interface on port 8000

Troubleshooting

API Timeout Errors:

  • Increase OPENAI_TIMEOUT value
  • Check network connectivity
  • Verify API key is valid

Processing Failures:

  • Service retries failed requests 3 times with exponential backoff
  • Check logs in logs/image_parser.log
  • Ensure images are valid and not corrupted

Web Interface Issues:

  • Ensure port 8000 is not in use
  • Check browser console for JavaScript errors
  • Verify file upload size limits

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors