Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
52f2e01
feat: add configuration management module with dictionary paths and g…
LRriver Sep 30, 2025
fadaaf7
feat: add Gremlin parsing base classes with Step, Traversal core data…
LRriver Sep 30, 2025
b775d29
feat: add Gremlin expression processing module with predicates and co…
LRriver Sep 30, 2025
f0588a1
feat: add graph database schema management with vertex/edge labels an…
LRriver Sep 30, 2025
5f3b039
feat: add Gremlin base component library with synonym replacement and…
LRriver Sep 30, 2025
822272f
feat: add ANTLR syntax tree visitor with Gremlin query to Recipe pars…
LRriver Sep 30, 2025
441b32c
feat: add recursive backtracking traversal generator for diverse quer…
LRriver Sep 30, 2025
2de2096
feat: add main corpus generator with batch processing, global dedupli…
LRriver Sep 30, 2025
c92f09a
config: add global configuration file with generation parameters and …
LRriver Sep 30, 2025
25ca990
data: add cypher2gremlin dataset with 3514 real query templates
LRriver Sep 30, 2025
25a2876
docs: add project README with quick start guide and usage instructions
LRriver Sep 30, 2025
541aa20
feat: add ANTLR-generated Gremlin grammar package with lexer, parser …
LRriver Sep 30, 2025
eb7eb01
data: add schema and graph data
LRriver Sep 30, 2025
f0579e8
feat: add template directory with schema dictionary and synonym files
LRriver Sep 30, 2025
9c13457
test: add gremlin statement generalization generation test module
LRriver Sep 30, 2025
b14ffb3
test: add generator unit tests for corpus generation validation
LRriver Sep 30, 2025
7cd8427
Add graph2gremlin.py: Initial template-based Gremlin data generation …
LRriver Sep 30, 2025
4da021c
Add gremlin_checker.py: Syntax checking using Antlr4
LRriver Sep 30, 2025
bc10fe2
Add llm_handler.py: LLM interaction model for query generalization an…
LRriver Sep 30, 2025
6ea48d5
Add qa_generalize.py: Seed data generalization using gremlin_checker …
LRriver Sep 30, 2025
78f8c2a
Add instruct_convert.py: Instruction format conversion and train/test…
LRriver Sep 30, 2025
b7f3f4a
Add da_data: Schema and graph data
LRriver Sep 30, 2025
332b879
Add data/seed_data: Seed data directory
LRriver Sep 30, 2025
8a94bad
Add data/vertical_training_sets: Vertical domain scenario generalized…
LRriver Sep 30, 2025
676d28c
Add books on Gremlin syntax knowledge to process data.
LRriver Sep 30, 2025
90f346f
Add a dataset of Gremlin QA pairs synthesized based on LLM.
LRriver Sep 30, 2025
4120356
Add README.md
LRriver Sep 30, 2025
67b523a
Compatible with OpenAI format
LRriver Oct 5, 2025
bccc147
Increase Gremlin syntax vocabulary that supports generalization, and …
LRriver Oct 29, 2025
44592b4
modify README.md
LRriver Oct 29, 2025
a1d614c
Add Apache-2.0 license, fix review comments
LRriver Oct 30, 2025
471e141
Modify the .licenserc.yaml file to ignore license checks for .interp,…
LRriver Oct 30, 2025
c1a7834
feat(text2gremlin): add generalize_llm.py to translate Gremlin querie…
LRriver Mar 7, 2026
2788dcf
chore(text2gremlin): replace config.json with config_example.json and…
LRriver Mar 7, 2026
cbe7abb
chore(text2gremlin): stop tracking config.json
LRriver Mar 7, 2026
28aa83c
fix: translate nested traversals in sideEffect/where/not/choose/union…
LRriver Mar 7, 2026
9eafaa2
feat: add CRUD templates for better write operation coverage
LRriver Mar 7, 2026
dade70c
feat: add syntax distribution analysis script and regenerate corpus
LRriver Mar 7, 2026
21c1bb2
chore: remove outdated corpus file
LRriver Mar 7, 2026
fceae69
Merge branch 'main' into text2gremlin
imbajin Mar 8, 2026
a1df93e
feat: add multi-domain schema reference data for scenario migration
LRriver Mar 9, 2026
faff1d9
chore: add timeout field to LLM config example
LRriver Mar 9, 2026
77dd85d
refactor: move generalize_llm into llm_augment package and remove GPT…
LRriver Mar 9, 2026
696b3bb
feat: add scenario migration script with multi-domain CRUD generation
LRriver Mar 9, 2026
6bac37d
feat: add dataset merge script with CRUD distribution stats
LRriver Mar 9, 2026
897dcfb
feat: add DPO preference data generation with three task types
LRriver Mar 9, 2026
8ba59be
feat: add unified LLM pipeline entry point with stage control
LRriver Mar 9, 2026
d7c261f
data: add LLM multi-style translated corpus output
LRriver Mar 9, 2026
f85c6e4
data: add extracted text2gremlin pairs for migration input
LRriver Mar 9, 2026
8ac9671
data: add scenario-migrated corpus across 20 domains
LRriver Mar 9, 2026
13764f9
data: add merged text2gremlin dataset with CRUD stats
LRriver Mar 9, 2026
ae38958
data: add DPO preference training data (Groovy vs Gremlin)
LRriver Mar 9, 2026
0ebdb15
data: add Gremlin syntax distribution analysis report
LRriver Mar 9, 2026
a6f7740
docs: update README with LLM augmentation pipeline and project structure
LRriver Mar 9, 2026
180719e
fix: slim down requirements.txt to direct dependencies only, removing…
LRriver Mar 9, 2026
1c967f3
style: apply ruff format to all text2gremlin Python files
LRriver Mar 9, 2026
eb3f51c
data:Unified Diverse Translation Tone Variable Names
LRriver Mar 17, 2026
7e28c00
feat(dpo): add multi-domain DPO generation with Groovy syntax skip fo…
LRriver Mar 19, 2026
d6bcfaf
docs: update README with final DPO data stats and add English version
LRriver Mar 19, 2026
2e10379
data(dpo): add merged preference data with 8920 samples across 21 dom…
LRriver Mar 19, 2026
d9da0b8
chore: remove outdated DPO data file
LRriver Mar 19, 2026
8019f8a
docs:modify README
LRriver Mar 19, 2026
1ff125a
Merge branch 'apache:main' into text2gremlin
LRriver May 27, 2026
b534331
Fix text2gremlin ruff checks
LRriver May 29, 2026
5b2bf1f
Improve scenario migration operation modes
LRriver May 30, 2026
118aaa4
Address text2gremlin review feedback
LRriver May 30, 2026
0887abd
Address remaining text2gremlin review issues
LRriver May 30, 2026
519ef99
Clean up text2gremlin dictionaries
LRriver May 30, 2026
e9bb59c
Document Text2Gremlin Hugging Face dataset
LRriver May 31, 2026
6bbf946
Address text2gremlin CodeQL and review feedback
LRriver May 31, 2026
95b5adb
Document text2gremlin pipeline subprocess output
LRriver May 31, 2026
ccb0b97
Use mtime for text2gremlin latest file lookup
LRriver Jun 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .licenserc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,9 @@ header: # `header` section is configurations for source codes license header.
- '**/poetry.lock'
- '.github/**/*'
- 'docker/**/*'
- '**/*.interp'
- '**/*.tokens'
- '**/*.csv'

comment: on-failure
# on what condition license-eye will comment on the pull request, `on-failure`, `always`, `never`.
Expand Down
15 changes: 15 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,9 @@ constraint-dependencies = [
[tool.ruff]
line-length = 120
target-version = "py310"
extend-exclude = [
"text2gremlin/AST_Text2Gremlin/base/gremlin/*.py",
]

[tool.ruff.lint]
# Select a broad set of rules for comprehensive checks.
Expand All @@ -173,6 +176,18 @@ ignore = [
"tests/**/*.py" = ["T20"]
"hugegraph-ml/src/hugegraph_ml/examples/**/*.py" = ["T20"]
"hugegraph-python-client/src/pyhugegraph/structure/*.py" = ["N802"]
"text2gremlin/**/*.py" = [
"E501", # Long Chinese prompts and generated Gremlin visitor fragments are kept readable.
"E741", # Gremlin examples use compact variable names in local comprehensions.
"F403", # ANTLR visitor code relies on antlr4's compatibility import style.
"F811", # Visitor methods mirror grammar alternatives and may intentionally repeat names.
"N999", # Keep existing module names used by standalone scripts.
"RUF002", # Chinese docstrings intentionally use full-width punctuation.
"RUF012", # Class-level step configuration dictionaries are constants in practice.
"SIM102", # Keep complex traversal-generation branching explicit.
"SIM108", # Keep complex traversal-generation branching explicit.
"T20", # Standalone CLI/data-generation scripts write progress to stdout.
]

[tool.ruff.lint.isort]
known-first-party = ["hugegraph_llm", "hugegraph_python_client", "hugegraph_ml", "vermeer_python_client"]
Expand Down
13 changes: 13 additions & 0 deletions text2gremlin/AST_Text2Gremlin/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# 配置文件
config.json
.env
output/

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
*.egg-info/
.eggs/
Loading
Loading