Skip to content

Add POST /graph/extract REST API for programmatic graph extraction.#351

Open
Nishieee wants to merge 7 commits into
apache:mainfrom
Nishieee:feat/graph-extract-rest-api
Open

Add POST /graph/extract REST API for programmatic graph extraction.#351
Nishieee wants to merge 7 commits into
apache:mainfrom
Nishieee:feat/graph-extract-rest-api

Conversation

@Nishieee
Copy link
Copy Markdown

Summary

Closes #348.

HugeGraph-LLM already supports graph extraction through the Gradio demo, but there was no public REST endpoint for it. This PR adds POST /graph/extract to the existing FastAPI app, routing requests through SchedulerSingleton and FlowName.GRAPH_EXTRACT — the same path the demo uses.

Key changes

  • Add GraphExtractRequest with validation for texts, schema, split_type, and related options
  • Add graph_http_api and register it on the existing auth router
  • Make split_type configurable in GraphExtractFlow (default "document", so demo behavior is unchanged)
  • Return structured JSON (vertices / edges as arrays), with optional warning and meta

Example request

{
  "texts": "Sarah is 30 and works as an attorney.",
  "schema": { "vertexlabels": [], "edgelabels": [], "propertykeys": [] },
  "split_type": "document",
  "include_meta": true
}

Invalid or empty input returns 422; scheduler failures return 500.

Test plan

  • cd hugegraph-llm && SKIP_EXTERNAL_SERVICES=true uv run pytest src/tests/api/test_graph_api.py -v --tb=short
  • Regression check: /rag, /text2gremlin, /config/graph, and /graph/extract all register
  • Ruff format and lint pass
  • Manual curl against running app with extract LLM configured
  • Compare API output with Gradio graph extraction on the same input

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels May 31, 2026
@github-actions github-actions Bot added the llm label May 31, 2026
@imbajin imbajin requested a review from Copilot May 31, 2026 13:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a public FastAPI endpoint for graph extraction so clients can programmatically invoke the existing GRAPH_EXTRACT scheduler flow instead of using only the Gradio demo.

Changes:

  • Adds /graph/extract API wiring and request/response handling.
  • Adds GraphExtractRequest validation and tests for routing, validation, and scheduler errors.
  • Makes split_type configurable in GraphExtractFlow while preserving the default.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
hugegraph-llm/src/hugegraph_llm/api/graph_api.py Adds the new graph extraction REST endpoint.
hugegraph-llm/src/hugegraph_llm/api/models/rag_requests.py Adds request model and input normalization for graph extraction.
hugegraph-llm/src/hugegraph_llm/flows/graph_extract.py Threads configurable split_type into graph extraction flow preparation.
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/app.py Registers the graph extraction API router with the existing app.
hugegraph-llm/src/tests/api/test_graph_api.py Adds API tests for successful extraction, validation failures, errors, and route registration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread hugegraph-llm/src/hugegraph_llm/api/models/rag_requests.py Outdated
Comment thread hugegraph-llm/src/hugegraph_llm/api/models/rag_requests.py
Nishieee and others added 2 commits May 31, 2026 18:31
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: yes. Summary: The new graph-name extraction path needs request-scoped graph configuration before it is safe for programmatic use. Evidence: static review; targeted graph API tests passed.

Comment thread hugegraph-llm/src/hugegraph_llm/api/models/rag_requests.py
Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: yes. Summary: request-scoped graph config still mutates shared HugeGraph settings. Evidence: static review of hugegraph-llm/src/hugegraph_llm/api/graph_api.py lines 29-35.

Comment thread hugegraph-llm/src/hugegraph_llm/api/graph_api.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Comment on lines +220 to +231
@model_validator(mode="after")
def require_client_config_for_named_schema(self):
# A named-graph schema needs request-scoped connection settings; inline JSON
# schemas (starting with "{") are self-contained and never hit HugeGraph.
schema = self.graph_schema
is_named_schema = isinstance(schema, str) and not schema.strip().startswith("{")
if is_named_schema and self.client_config is None:
raise ValueError(
"client_config is required when 'schema' refers to an existing graph name; "
"provide inline schema JSON instead to extract without a HugeGraph connection."
)
return self
Reject client_config when 'schema' is inline JSON (it never connects to
HugeGraph, so it was silently ignored), and require client_config.graph
to match a named-graph schema. Also fix GraphConfigRequest.gs to be
Optional[str]. Adds tests for both rejection paths and the triples
extract_type forwarding.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: yes. Summary: The named-schema request path can still inherit a stale global graphspace when client_config.gs is omitted. Evidence: static review of SchemaManager graphspace fallback plus targeted graph API/schema-manager tests passing.

graphspace=huge_settings.graph_space,
user=graph_user if graph_user is not None else huge_settings.graph_user,
pwd=graph_pwd if graph_pwd is not None else huge_settings.graph_pwd,
graphspace=graph_space if graph_space is not None else huge_settings.graph_space,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Avoid leaking the global graphspace into request-scoped schema fetches

Evidence: GraphConfigRequest.gs is optional, so a /graph/extract request can provide a named graph and omit gs; GraphExtractFlow.prepare() then forwards graph_space=None, and this constructor falls back to huge_settings.graph_space. If the process global graphspace was set by an earlier config path, this request can fetch the schema from that graphspace even though the caller did not select it. Please distinguish an omitted request-scoped field from the no-request-config path, and cover the gs-omitted case with a non-empty global graph_space.

Omitting client_config.gs no longer inherits the global huge_settings
graphspace. WkFlowInput now carries the whole connection as one dict
(None = use globals), and SchemaManager applies it wholesale instead of
per-field None fallback. Adds tests for the gs-omitted case with a
non-empty global graphspace and the no-connection fallback path.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request llm size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Add a public REST API for graph extraction

3 participants