Skip to content

Commit 13d868a

Browse files
GML-2049 chunker updates (#29)
### **PR Type** Enhancement, Bug fix, Tests ___ ### **Description** - Enable apiToken for TigerGraph connections - Skip getToken when token provided - Add unit tests for apiToken - Bound HTML/Markdown chunks recursively (4096) - Default fallback size and overlap support - Fix PDF image paths, form artifacts - Handle spaces; deduplicate table rows - Route graph stats to function calls - Update provider prompts for counts ___ ### Diagram Walkthrough ```mermaid flowchart LR CHUNK["Chunkers updated (defaults, recursive)"] HTML["HTML chunker\nfallback+recursive"] MD["Markdown chunker\nfallback+recursive"] CHAR["Character chunker\n4096 fallback"] RECUR["Recursive chunker\n4096 fallback"] CONN["DB connections\napiToken support"] CFG["Config init\napiToken passthrough"] PDF["PDF extractor\nimage+markdown fixes"] PROMPT["Routing prompts\nGraph stats -> functions"] LOAD["Loader\nconfigurable batch/delay"] DOCKER["Compose\nTG service optional"] CHUNK -- "applies to" --> HTML CHUNK -- "applies to" --> MD CHUNK -- "applies to" --> CHAR CHUNK -- "applies to" --> RECUR CFG -- "used by" --> CONN CONN -- "unit tests" --> PROMPT PDF -- "clean images/markdown" --> CHUNK PROMPT -- "provider prompts updated" --> CHUNK LOAD -- "tunable throughput" --> CFG DOCKER -- "external TG supported" --> CONN ``` <details> <summary><h3> File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Enhancement</strong></td><td><details><summary>8 files</summary><table> <tr> <td><strong>character_chunker.py</strong><dd><code>Default to 4096 and validate overlaps</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-086cb1310ad96c42ae62b4fde5d4878bb5553f711b6d051005450b85a17492cb">+6/-6</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>html_chunker.py</strong><dd><code>Recursive split for oversized header sections</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-d99da1157b1f0eea3c23bf54cfe0d42cba987287b112c15f4b35b16e2e498ac1">+31/-3</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>markdown_chunker.py</strong><dd><code>Fallback size and recursive markdown splitting</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-c42407e62189ab854f9c10b7ea1b0b16701f2188daca41e36d3ce569e756984a">+20/-13</a>&nbsp; </td> </tr> <tr> <td><strong>recursive_chunker.py</strong><dd><code>Default recursive chunk size set to 4096</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-6afa5baf1bf76a0ba886adfd895408a1c772cbc24235a9fcbdbb1bae8cac69b5">+4/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>config.py</strong><dd><code>Support static apiToken and conditional getToken</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-1bacff878451e5aa9c6d164150c7b2daad028d5e7acba90bb720cb73ffdd827b">+2/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>connections.py</strong><dd><code>Use apiToken directly; skip getToken; async support</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-2c15601c7002076bf82559499eeb3f746145bb1433c7883f1c77e61b24a50d20">+29/-1</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>base_llm.py</strong><dd><code>Route graph statistics questions to function calls</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-9eb1335890737196b08ce1158f1de3ff08db71d02a613fbc9b967c347a0aa36d">+8/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>supportai_ingest.py</strong><dd><code>Pass chunk size/overlap to HTML chunker</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-b4a80de039d3bcbf47c3bb7705354de123426efc4abd3d1f0e93570721c0f820">+3/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Bug fix</strong></td><td><details><summary>1 files</summary><table> <tr> <td><strong>text_extractors.py</strong><dd><code>Fix image paths; clean PDF markdown artifacts</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-c749a2c8ea1a8bc0a734203aef7fa5aa9300d705006afbb4cac26985c2ac257d">+99/-6</a>&nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Configuration changes</strong></td><td><details><summary>3 files</summary><table> <tr> <td><strong>ecc_util.py</strong><dd><code>Update chunker defaults and pass new parameters</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-890bb6f3c6fbe84bfda83faf66d59a1f8058f9760e9e2ee4cac1c388a90f276f">+5/-3</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>graph_rag.py</strong><dd><code>Configurable batch size and optional upsert delay</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-55a1e5b20c75c4a71f03a1541658d1a4de6567501d51550a1464f49901cb626a">+7/-5</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>docker-compose.yml</strong><dd><code>Comment out TigerGraph service; externalize dependency</code>&nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-e45e45baeda1c1e73482975a664062aa56f20c03dd9d64a827aba57775bed0d3">+12/-12</a>&nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Tests</strong></td><td><details><summary>1 files</summary><table> <tr> <td><strong>test_connections.py</strong><dd><code>Add unit tests for apiToken connection handling</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-8a4af273a44b613bebaa3e29ef95b20f56d221885b6a3332223d5b4d2203880e">+117/-0</a>&nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Documentation</strong></td><td><details><summary>7 files</summary><table> <tr> <td><strong>generate_function.txt</strong><dd><code>Clarify count queries route to Count functions</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-2d8cd2c2831fbe9bf617715e1f3283566c7e35b6cb8c76337fbe1fca93d234dc">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>generate_function.txt</strong><dd><code>Clarify count queries route to Count functions</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-7a05cd838d0fbc74cd095d6300d38aa945902e8571917f14545fa433b1ba2f6f">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>generate_function.txt</strong><dd><code>Clarify count queries route to Count functions</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-ebb5cd586870142fc16974ce4dec23ec858ec49cec6a599a301df699d4a91cf5">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>generate_function.txt</strong><dd><code>Clarify count queries route to Count functions</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-c2b1f009ab574e291d9fb2f989cc75b7eb0ea287ce87ca72f36ee863d36e3786">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>generate_function.txt</strong><dd><code>Clarify count queries route to Count functions</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-cdc5a67e8b6a4d3e4810329f036c2f08b062094f26d73c0133f6cfc504a45efb">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>generate_function.txt</strong><dd><code>Clarify count queries route to Count functions</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-ed98f91d4d2bb1d67ac44004e04e536d3e38b5c0c3c844887d95824c8aa13c0b">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>generate_function.txt</strong><dd><code>Clarify count queries route to Count functions</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-6bc117f2b29afcf62aa14d094d48537e9ffeacd6ab2b4a8b1cd9251be736f7f1">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Additional files</strong></td><td><details><summary>1 files</summary><table> <tr> <td><strong>generate_function.txt</strong></td> <td><a href="https://github.com/tigergraph/graphrag/pull/29/files#diff-85d6281f16d53975597ec7e01ddfc9895652f050e631c62392b70d4f8defd794">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr></tr></tbody></table> </details> ___
2 parents c8f425f + 6f1d6e3 commit 13d868a

21 files changed

Lines changed: 351 additions & 63 deletions

File tree

common/chunkers/character_chunker.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
from common.chunkers.base_chunker import BaseChunker
22

3+
_DEFAULT_FALLBACK_SIZE = 4096
4+
35

46
class CharacterChunker(BaseChunker):
5-
def __init__(self, chunk_size=1024, overlap_size=0):
6-
if chunk_size <= overlap_size:
7-
raise ValueError("Chunk size must be larger than overlap size")
8-
self.chunk_size = chunk_size
7+
def __init__(self, chunk_size=0, overlap_size=0):
8+
self.chunk_size = chunk_size if chunk_size > 0 else _DEFAULT_FALLBACK_SIZE
99
self.overlap_size = overlap_size
1010

1111
def chunk(self, input_string):
12-
if self.chunk_size <= 0:
13-
return []
12+
if self.chunk_size <= self.overlap_size:
13+
raise ValueError("Chunk size must be larger than overlap size")
1414

1515
chunks = []
1616
i = 0

common/chunkers/html_chunker.py

Lines changed: 31 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,12 @@
1515
from typing import Optional, List, Tuple
1616
import re
1717
from common.chunkers.base_chunker import BaseChunker
18+
from common.chunkers.separators import TEXT_SEPARATORS
1819
from langchain_text_splitters import HTMLSectionSplitter
20+
from langchain.text_splitter import RecursiveCharacterTextSplitter
21+
22+
23+
_DEFAULT_FALLBACK_SIZE = 4096
1924

2025

2126
class HTMLChunker(BaseChunker):
@@ -25,12 +30,20 @@ class HTMLChunker(BaseChunker):
2530
- Automatically detects which headers (h1-h6) are present in the HTML
2631
- Uses only the headers that exist in the document for optimal chunking
2732
- If custom headers are provided, uses those instead of auto-detection
33+
- Supports chunk_size / chunk_overlap: when chunk_size > 0, oversized
34+
header-based chunks are further split with RecursiveCharacterTextSplitter
35+
- When chunk_size is 0 (default), a fallback of 4096 is used so that
36+
headerless HTML documents are still split into reasonable chunks
2837
"""
2938

3039
def __init__(
3140
self,
32-
headers: Optional[List[Tuple[str, str]]] = None # e.g. [("h1", "Header 1"), ("h2", "Header 2")]
41+
chunk_size: int = 0,
42+
chunk_overlap: int = 0,
43+
headers: Optional[List[Tuple[str, str]]] = None,
3344
):
45+
self.chunk_size = chunk_size if chunk_size > 0 else _DEFAULT_FALLBACK_SIZE
46+
self.chunk_overlap = chunk_overlap
3447
self.headers = headers
3548

3649
def _detect_headers(self, html_content: str) -> List[Tuple[str, str]]:
@@ -77,8 +90,23 @@ def chunk(self, input_string: str) -> List[str]:
7790
splitter = HTMLSectionSplitter(headers_to_split_on=headers_to_use)
7891
docs = splitter.split_text(input_string)
7992

80-
# Extract text content from Document objects
81-
return [doc.page_content for doc in docs]
93+
initial_chunks = [doc.page_content for doc in docs]
94+
95+
if any(len(chunk) > self.chunk_size for chunk in initial_chunks):
96+
recursive_splitter = RecursiveCharacterTextSplitter(
97+
separators=TEXT_SEPARATORS,
98+
chunk_size=self.chunk_size,
99+
chunk_overlap=self.chunk_overlap,
100+
)
101+
final_chunks = []
102+
for chunk in initial_chunks:
103+
if len(chunk) > self.chunk_size:
104+
final_chunks.extend(recursive_splitter.split_text(chunk))
105+
else:
106+
final_chunks.append(chunk)
107+
return final_chunks
108+
109+
return initial_chunks
82110

83111
def __call__(self, input_string: str) -> List[str]:
84112
return self.chunk(input_string)

common/chunkers/markdown_chunker.py

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,11 @@
1717
from langchain_text_splitters.markdown import ExperimentalMarkdownSyntaxTextSplitter
1818
from langchain.text_splitter import RecursiveCharacterTextSplitter
1919

20+
# When chunk_size is not configured, cap any heading-section that exceeds this
21+
# so that form-based PDFs (tables/bold but no # headings) are not left as a
22+
# single multi-thousand-character chunk.
23+
_DEFAULT_FALLBACK_SIZE = 4096
24+
2025

2126
class MarkdownChunker(BaseChunker):
2227

@@ -25,31 +30,33 @@ def __init__(
2530
chunk_size: int = 0,
2631
chunk_overlap: int = 0
2732
):
28-
self.chunk_size = chunk_size
33+
self.chunk_size = chunk_size if chunk_size > 0 else _DEFAULT_FALLBACK_SIZE
2934
self.chunk_overlap = chunk_overlap
3035

3136
def chunk(self, input_string):
3237
md_splitter = ExperimentalMarkdownSyntaxTextSplitter()
3338

39+
# ExperimentalMarkdownSyntaxTextSplitter splits on # headings only.
40+
# Documents without headings (e.g. form PDFs with tables/bold but no #)
41+
# are returned as a single section, so a recursive fallback is always
42+
# applied when any section exceeds the configured (or default) limit.
3443
initial_chunks = [x.page_content for x in md_splitter.split_text(input_string)]
35-
md_chunks = []
3644

37-
if self.chunk_size > 0:
45+
if any(len(chunk) > self.chunk_size for chunk in initial_chunks):
3846
recursive_splitter = RecursiveCharacterTextSplitter(
3947
separators=TEXT_SEPARATORS,
4048
chunk_size=self.chunk_size,
4149
chunk_overlap=self.chunk_overlap,
4250
)
43-
44-
if any(len(chunk) > self.chunk_size for chunk in initial_chunks):
45-
for chunk in initial_chunks:
46-
if len(chunk) > self.chunk_size:
47-
# Split oversized chunks further
48-
md_chunks.extend(recursive_splitter.split_text(chunk))
49-
else:
50-
md_chunks.append(chunk)
51-
52-
return md_chunks if md_chunks else initial_chunks
51+
md_chunks = []
52+
for chunk in initial_chunks:
53+
if len(chunk) > self.chunk_size:
54+
md_chunks.extend(recursive_splitter.split_text(chunk))
55+
else:
56+
md_chunks.append(chunk)
57+
return md_chunks
58+
59+
return initial_chunks
5360

5461
def __call__(self, input_string):
5562
return self.chunk(input_string)

common/chunkers/recursive_chunker.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,12 @@
1616
from common.chunkers.separators import TEXT_SEPARATORS
1717
from langchain.text_splitter import RecursiveCharacterTextSplitter
1818

19+
_DEFAULT_FALLBACK_SIZE = 4096
20+
1921

2022
class RecursiveChunker(BaseChunker):
21-
def __init__(self, chunk_size=1024, overlap_size=0):
22-
self.chunk_size = chunk_size
23+
def __init__(self, chunk_size=0, overlap_size=0):
24+
self.chunk_size = chunk_size if chunk_size > 0 else _DEFAULT_FALLBACK_SIZE
2325
self.overlap_size = overlap_size
2426

2527
def chunk(self, input_string):

common/config.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -259,8 +259,9 @@ def get_multimodal_service() -> LLM_Model:
259259
gsPort=db_config.get("gsPort", "14240"),
260260
restppPort=db_config.get("restppPort", "9000"),
261261
graphname=db_config.get("graphname", ""),
262+
apiToken=db_config.get("apiToken", ""),
262263
)
263-
if db_config.get("getToken"):
264+
if not db_config.get("apiToken") and db_config.get("getToken"):
264265
conn.getToken()
265266

266267
embedding_store = TigerGraphEmbeddingStore(

common/db/connections.py

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,34 @@ def get_db_connection_pwd_manual(
120120
return conn
121121

122122
def elevate_db_connection_to_token(host, username, password, graphname, async_conn: bool = False) -> TigerGraphConnectionProxy:
123+
# If a pre-existing apiToken is provided in config, use it directly
124+
# and skip the getToken() call to avoid conflicts.
125+
static_token = db_config.get("apiToken", "")
126+
127+
if static_token:
128+
LogWriter.info("Using pre-configured apiToken from db_config")
129+
if async_conn:
130+
conn = AsyncTigerGraphConnection(
131+
host=host,
132+
username=username,
133+
password=password,
134+
graphname=graphname,
135+
apiToken=static_token,
136+
restppPort=db_config.get("restppPort", "9000"),
137+
gsPort=db_config.get("gsPort", "14240"),
138+
)
139+
else:
140+
conn = TigerGraphConnection(
141+
host=host,
142+
username=username,
143+
password=password,
144+
graphname=graphname,
145+
apiToken=static_token,
146+
restppPort=db_config.get("restppPort", "9000"),
147+
gsPort=db_config.get("gsPort", "14240"),
148+
)
149+
return conn
150+
123151
conn = TigerGraphConnection(
124152
host=host,
125153
username=username,
@@ -129,7 +157,7 @@ def elevate_db_connection_to_token(host, username, password, graphname, async_co
129157
gsPort=db_config.get("gsPort", "14240")
130158
)
131159

132-
if db_config["getToken"]:
160+
if db_config.get("getToken"):
133161
try:
134162
apiToken = conn.getToken()[0]
135163
except HTTPError:

common/llm_services/base_llm.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -109,13 +109,19 @@ def route_response_prompt(self):
109109
prompt = """\
110110
You are an expert at routing a user question to a vectorstore, function calls, or conversation history.
111111
Use the conversation history for questions that are similar to previous ones or that reference earlier answers or responses.
112-
Use the vectorstore for questions on that would be best suited by text documents.
112+
Use the vectorstore for questions that would be best suited by text documents.
113113
Use the function calls for questions that ask about structured data, or operations on structured data.
114114
Questions referring to same entities in a previous, earlier, or above answer or response should be routed to the conversation history.
115115
Keep in mind that some questions about documents such as "how many documents are there?" can be answered by function calls.
116116
The function calls can be used to answer questions about these entities: {v_types} and relationships: {e_types}.
117+
IMPORTANT: Questions about graph database statistics or metadata MUST be routed to function calls. This includes:
118+
- Counting vertices/nodes/edges (e.g. "how many vertices are there", "how many edges in the graph")
119+
- Listing or describing vertex/edge types, schema, or graph structure
120+
- Aggregations, totals, or summaries of data stored in the graph database
121+
- Any question mentioning "graph", "graph db", "graph database", "vertices", "nodes", or "edges" in the context of statistics or counts
122+
These are database queries, NOT document lookups — always route them to function calls.
117123
Otherwise, use vectorstore. Choose one of 'functions', 'vectorstore', or 'history' based on the question and conversation history.
118-
Return the a JSON with a single key 'datasource' and no premable or explaination.
124+
Return a JSON with a single key 'datasource' and no preamble or explanation.
119125
Question to route: {question}
120126
Conversation history: {conversation}
121127
Format: {format_instructions}\

common/prompts/aws_bedrock_claude3haiku/generate_function.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
Use the vertex types, edge types, and their attributes and IDs below to write the pyTigerGraph function call to answer the question using a pyTigerGraph connection.
2-
When the question asks for "How many", make sure to always select a function that contains "Count" in the description/function call. Make sure never to generate a function that is not listed below.
2+
When the question asks for "How many", counts, totals, or statistics about vertices/nodes/edges in the graph or graph database, make sure to always select a function that contains "Count" in the description/function call. For example, questions like "how many vertices are there in the graph" or "how many vertices are there in the graph db" should use getVertexCount or getEdgeCount. Make sure never to generate a function that is not listed below.
33
When certain entities are mapped to vertex attributes, may consider to generate a WHERE clause.
44
If a WHERE clause is generated, please follow the instruction with proper quoting. To construct a WHERE clause string. Ensure that string attribute values are properly quoted.
55
For example, if the generated function contains "('Person', where='name=William Torres')", Expected Output: "('Person', where='name="William Torres"')", This rule applies to all types of attributes. e.g., name, email, address and so on.

common/prompts/aws_bedrock_titan/generate_function.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
Use the vertex types, edge types, and their attributes and IDs to write the pyTigerGraph function call to answer the question using a pyTigerGraph connection.
2-
When the question asks for "How many", make sure to always select a function that contains "Count" in the description/function call. Make sure never to generate a function that is not listed below.
2+
When the question asks for "How many", counts, totals, or statistics about vertices/nodes/edges in the graph or graph database, make sure to always select a function that contains "Count" in the description/function call. For example, questions like "how many vertices are there in the graph" or "how many vertices are there in the graph db" should use getVertexCount or getEdgeCount. Make sure never to generate a function that is not listed below.
33
When certain entities are mapped to vertex attributes, may consider to generate a WHERE clause.
44
If a WHERE clause is generated, please follow the instruction with proper quoting. To construct a WHERE clause string. Ensure that string attribute values are properly quoted.
55
For example, if the generated function contains "('Person', where='name=William Torres')", Expected Output: "('Person', where='name="William Torres"')", This rule applies to all types of attributes. e.g., name, email, address and so on.

common/prompts/azure_open_ai_gpt35_turbo_instruct/generate_function.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
Use the vertex types, edge types, and their attributes and IDs below to write the pyTigerGraph function call to answer the question using a pyTigerGraph connection.
2-
When the question asks for "How many", make sure to always select a function that contains "Count" in the description/function call. Make sure never to generate a function that is not listed below.
2+
When the question asks for "How many", counts, totals, or statistics about vertices/nodes/edges in the graph or graph database, make sure to always select a function that contains "Count" in the description/function call. For example, questions like "how many vertices are there in the graph" or "how many vertices are there in the graph db" should use getVertexCount or getEdgeCount. Make sure never to generate a function that is not listed below.
33
When certain entities are mapped to vertex attributes, may consider to generate a WHERE clause.
44
If a WHERE clause is generated, please follow the instruction with proper quoting. To construct a WHERE clause string. Ensure that string attribute values are properly quoted.
55
For example, if the generated function contains "('Person', where='name=William Torres')", Expected Output: "('Person', where='name="William Torres"')", This rule applies to all types of attributes. e.g., name, email, address and so on.

0 commit comments

Comments
 (0)