Skip to content

Refactor: Generalize API endpoints to support OpenAI-compatible models via .env(Resubmission of #157[Wrong account previously used].)#158

Open
funnamer wants to merge 1 commit intoVectifyAI:mainfrom
funnamer:feat/openai-compatible-api
Open

Refactor: Generalize API endpoints to support OpenAI-compatible models via .env(Resubmission of #157[Wrong account previously used].)#158
funnamer wants to merge 1 commit intoVectifyAI:mainfrom
funnamer:feat/openai-compatible-api

Conversation

@funnamer
Copy link

Overview

This PR refactors the API calling logic to be model-agnostic. By generalizing the ChatGPT-specific naming conventions to a standard OpenAI format and moving configurations to .env, the project can now seamlessly support any OpenAI-compatible API services (such as DeepSeek).

Detailed Changes

1. Configuration (.env)

  • Renamed CHATGPT_API_KEY to OPENAI_API_KEY.
  • Added support for OPENAI_MODEL (defaults to gpt-4o-2024-11-20) and OPENAI_BASE_URL (defaults to https://api.openai.com/v1).

2. Core Utilities (pageindex/utils.py)

  • Renamed API wrapper functions for broader accuracy:
    • ChatGPT_API_with_finish_reasonOpenAI_API_with_finish_reason
    • ChatGPT_APIOpenAI_API
    • ChatGPT_API_asyncOpenAI_API_async
  • Updated the OpenAI client initialization to explicitly use the base_url parameter fetched from the .env file.
  • Updated count_tokens logic: Changed tiktoken.encoding_for_model(model) to explicitly use tiktoken.get_encoding("cl100k_base"). This prevents token-counting crashes when using third-party models (like DeepSeek) that are not natively recognized by the tiktoken library, ensuring robust cross-provider compatibility.
  • Synchronized all documentation generation and node summary generation calls with the new function names.

3. PDF Main Workflow (page_index.py)

  • Replaced all instances of ChatGPT_API* calls with OpenAI_API* across the entire PDF parsing and TOC generation workflow (e.g., check_title_appearance, toc_detector_single_page, extract_toc_content, toc_transformer, etc.).
  • The parameter signatures remain strictly unchanged to ensure stability.

4. Markdown Workflow & Entry Scripts (run_pageindex.py, pageindex/page_index_md.py)

  • Added load_dotenv() at the top of the entry scripts to ensure environment variables are correctly loaded.
  • The --model argument default value now correctly falls back to os.getenv('OPENAI_MODEL').

Compatibility & Migration Guide

Behavioral Change: Keys, models, and base URLs are no longer hardcoded.
Migration: To use a third-party compatible API (like DeepSeek), simply update the .env file:

OPENAI_API_KEY=your_api_key_here
OPENAI_MODEL=your_model_name
OPENAI_BASE_URL=[https://api.deepseek.com](https://api.deepseek.com)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant