Add POST /graph/extract REST API for programmatic graph extraction.#351
Add POST /graph/extract REST API for programmatic graph extraction.#351Nishieee wants to merge 7 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a public FastAPI endpoint for graph extraction so clients can programmatically invoke the existing GRAPH_EXTRACT scheduler flow instead of using only the Gradio demo.
Changes:
- Adds
/graph/extractAPI wiring and request/response handling. - Adds
GraphExtractRequestvalidation and tests for routing, validation, and scheduler errors. - Makes
split_typeconfigurable inGraphExtractFlowwhile preserving the default.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
hugegraph-llm/src/hugegraph_llm/api/graph_api.py |
Adds the new graph extraction REST endpoint. |
hugegraph-llm/src/hugegraph_llm/api/models/rag_requests.py |
Adds request model and input normalization for graph extraction. |
hugegraph-llm/src/hugegraph_llm/flows/graph_extract.py |
Threads configurable split_type into graph extraction flow preparation. |
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/app.py |
Registers the graph extraction API router with the existing app. |
hugegraph-llm/src/tests/api/test_graph_api.py |
Adds API tests for successful extraction, validation failures, errors, and route registration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
imbajin
left a comment
There was a problem hiding this comment.
Blocking: yes. Summary: The new graph-name extraction path needs request-scoped graph configuration before it is safe for programmatic use. Evidence: static review; targeted graph API tests passed.
imbajin
left a comment
There was a problem hiding this comment.
Blocking: yes. Summary: request-scoped graph config still mutates shared HugeGraph settings. Evidence: static review of hugegraph-llm/src/hugegraph_llm/api/graph_api.py lines 29-35.
| @model_validator(mode="after") | ||
| def require_client_config_for_named_schema(self): | ||
| # A named-graph schema needs request-scoped connection settings; inline JSON | ||
| # schemas (starting with "{") are self-contained and never hit HugeGraph. | ||
| schema = self.graph_schema | ||
| is_named_schema = isinstance(schema, str) and not schema.strip().startswith("{") | ||
| if is_named_schema and self.client_config is None: | ||
| raise ValueError( | ||
| "client_config is required when 'schema' refers to an existing graph name; " | ||
| "provide inline schema JSON instead to extract without a HugeGraph connection." | ||
| ) | ||
| return self |
Reject client_config when 'schema' is inline JSON (it never connects to HugeGraph, so it was silently ignored), and require client_config.graph to match a named-graph schema. Also fix GraphConfigRequest.gs to be Optional[str]. Adds tests for both rejection paths and the triples extract_type forwarding. Co-authored-by: Cursor <cursoragent@cursor.com>
imbajin
left a comment
There was a problem hiding this comment.
Blocking: yes. Summary: The named-schema request path can still inherit a stale global graphspace when client_config.gs is omitted. Evidence: static review of SchemaManager graphspace fallback plus targeted graph API/schema-manager tests passing.
| graphspace=huge_settings.graph_space, | ||
| user=graph_user if graph_user is not None else huge_settings.graph_user, | ||
| pwd=graph_pwd if graph_pwd is not None else huge_settings.graph_pwd, | ||
| graphspace=graph_space if graph_space is not None else huge_settings.graph_space, |
There was a problem hiding this comment.
Evidence: GraphConfigRequest.gs is optional, so a /graph/extract request can provide a named graph and omit gs; GraphExtractFlow.prepare() then forwards graph_space=None, and this constructor falls back to huge_settings.graph_space. If the process global graphspace was set by an earlier config path, this request can fetch the schema from that graphspace even though the caller did not select it. Please distinguish an omitted request-scoped field from the no-request-config path, and cover the gs-omitted case with a non-empty global graph_space.
Omitting client_config.gs no longer inherits the global huge_settings graphspace. WkFlowInput now carries the whole connection as one dict (None = use globals), and SchemaManager applies it wholesale instead of per-field None fallback. Adds tests for the gs-omitted case with a non-empty global graphspace and the no-connection fallback path. Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
Closes #348.
HugeGraph-LLM already supports graph extraction through the Gradio demo, but there was no public REST endpoint for it. This PR adds
POST /graph/extractto the existing FastAPI app, routing requests throughSchedulerSingletonandFlowName.GRAPH_EXTRACT— the same path the demo uses.Key changes
GraphExtractRequestwith validation fortexts,schema,split_type, and related optionsgraph_http_apiand register it on the existing auth routersplit_typeconfigurable inGraphExtractFlow(default"document", so demo behavior is unchanged)vertices/edgesas arrays), with optionalwarningandmetaExample request
{ "texts": "Sarah is 30 and works as an attorney.", "schema": { "vertexlabels": [], "edgelabels": [], "propertykeys": [] }, "split_type": "document", "include_meta": true }Invalid or empty input returns
422; scheduler failures return500.Test plan
cd hugegraph-llm && SKIP_EXTERNAL_SERVICES=true uv run pytest src/tests/api/test_graph_api.py -v --tb=short/rag,/text2gremlin,/config/graph, and/graph/extractall registercurlagainst running app with extract LLM configured