ATLAS-5218: Add command-line utility to run Gremlin queries#531
ATLAS-5218: Add command-line utility to run Gremlin queries#531pinal-shah wants to merge 2 commits intomasterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a command-line utility for running Gremlin queries against Atlas' embedded JanusGraph backend. The tool provides direct access to the graph database for debugging and administrative queries, supporting both inline queries and script files with optional transaction commit.
Changes:
- Introduces a new maven module
atlas-gremlin-cli-toolwith core CLI functionality - Adds a shell script wrapper to configure and launch the Java tool
- Provides logback configuration for logging and a README with usage instructions
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/atlas-gremlin-cli/src/main/java/org/apache/atlas/tools/GremlinCli.java | Main Java implementation handling query execution, transaction management, and command-line argument parsing |
| tools/atlas-gremlin-cli/scripts/atlas-gremlin-cli.sh | Shell script wrapper that sets up classpath, validates environment, and launches the Java tool |
| tools/atlas-gremlin-cli/src/main/resources/atlas-logback.xml | Logback configuration for logging to file with rolling policy |
| tools/atlas-gremlin-cli/pom.xml | Maven module definition with dependencies scoped as "provided" |
| tools/atlas-gremlin-cli/README | Documentation explaining setup and usage |
| distro/src/main/assemblies/atlas-gremlin-cli.xml | Assembly configuration to package the tool as a tar.gz distribution |
| distro/pom.xml | Adds the new assembly descriptor to the distribution build |
| pom.xml | Adds the new tool module to the parent POM |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| <scope>provided</scope> | ||
| </dependency> | ||
|
|
||
| <dependency> | ||
| <groupId>org.apache.atlas</groupId> | ||
| <artifactId>atlas-graphdb-janus</artifactId> | ||
| <version>${project.version}</version> | ||
| <scope>provided</scope> | ||
| </dependency> | ||
|
|
||
| <dependency> | ||
| <groupId>org.slf4j</groupId> | ||
| <artifactId>slf4j-api</artifactId> | ||
| <scope>provided</scope> |
There was a problem hiding this comment.
The tool relies on ATLAS_CLASSPATH being set by the user to provide all dependencies (commons-cli, atlas-graphdb-janus, slf4j-api, and their transitive dependencies). This differs from other tools like notification-analyzer which bundle all dependencies in their distribution. If ATLAS_CLASSPATH is not properly set or is missing required JARs, the tool will fail with ClassNotFoundException at runtime. Consider documenting the exact JAR requirements in the README or following the notification-analyzer pattern of bundling dependencies in the distribution for better user experience.
| <scope>provided</scope> | |
| </dependency> | |
| <dependency> | |
| <groupId>org.apache.atlas</groupId> | |
| <artifactId>atlas-graphdb-janus</artifactId> | |
| <version>${project.version}</version> | |
| <scope>provided</scope> | |
| </dependency> | |
| <dependency> | |
| <groupId>org.slf4j</groupId> | |
| <artifactId>slf4j-api</artifactId> | |
| <scope>provided</scope> | |
| </dependency> | |
| <dependency> | |
| <groupId>org.apache.atlas</groupId> | |
| <artifactId>atlas-graphdb-janus</artifactId> | |
| <version>${project.version}</version> | |
| </dependency> | |
| <dependency> | |
| <groupId>org.slf4j</groupId> | |
| <artifactId>slf4j-api</artifactId> |
|
|
||
| public static void main(String[] args) throws Exception { | ||
| CommandLine cmd = parseArgs(args); | ||
|
|
||
| if (cmd.hasOption("h")) { | ||
| printHelp(); | ||
| return; | ||
| } | ||
|
|
||
| final String query = getQuery(cmd); | ||
| final boolean commit = cmd.hasOption("commit"); | ||
|
|
||
| new AtlasJanusGraphDatabase(); | ||
|
|
||
| JanusGraph graph = AtlasJanusGraphDatabase.getGraphInstance(); | ||
|
|
||
| try { | ||
| GraphTraversalSource g = graph.traversal(); | ||
| GremlinGroovyScriptEngine engine = new GremlinGroovyScriptEngine(); | ||
|
|
||
| Bindings bindings = engine.createBindings(); | ||
| bindings.put("graph", graph); | ||
| bindings.put("g", g); | ||
| bindings.put("__", __.class); | ||
| bindings.put("P", P.class); | ||
|
|
||
| Object result = eval(engine, bindings, query); | ||
|
|
||
| if (result instanceof Traversal) { | ||
| result = ((Traversal<?, ?>) result).toList(); | ||
| } | ||
|
|
||
| System.out.println(String.valueOf(result)); | ||
|
|
||
| finishTx(graph, commit); | ||
| } catch (ScriptException se) { | ||
| safeRollback(graph); | ||
| throw se; | ||
| } catch (Throwable t) { | ||
| safeRollback(graph); | ||
| throw t; | ||
| } finally { | ||
| try { | ||
| graph.close(); | ||
| } catch (Exception e) { | ||
| LOG.warn("Failed to close graph", e); | ||
| } |
There was a problem hiding this comment.
The main method lacks proper exit code handling. Similar Atlas tools like RepairIndex and BulkFetchAndUpdate use System.exit with explicit exit codes (EXIT_CODE_SUCCESS=0, EXIT_CODE_FAILED=1) to signal success or failure to the calling process. Without this, the CLI will always return exit code 0 even when exceptions occur, which prevents shell scripts from detecting failures. Consider adding exit codes and System.exit calls similar to other tools in the repository.
| public static void main(String[] args) throws Exception { | |
| CommandLine cmd = parseArgs(args); | |
| if (cmd.hasOption("h")) { | |
| printHelp(); | |
| return; | |
| } | |
| final String query = getQuery(cmd); | |
| final boolean commit = cmd.hasOption("commit"); | |
| new AtlasJanusGraphDatabase(); | |
| JanusGraph graph = AtlasJanusGraphDatabase.getGraphInstance(); | |
| try { | |
| GraphTraversalSource g = graph.traversal(); | |
| GremlinGroovyScriptEngine engine = new GremlinGroovyScriptEngine(); | |
| Bindings bindings = engine.createBindings(); | |
| bindings.put("graph", graph); | |
| bindings.put("g", g); | |
| bindings.put("__", __.class); | |
| bindings.put("P", P.class); | |
| Object result = eval(engine, bindings, query); | |
| if (result instanceof Traversal) { | |
| result = ((Traversal<?, ?>) result).toList(); | |
| } | |
| System.out.println(String.valueOf(result)); | |
| finishTx(graph, commit); | |
| } catch (ScriptException se) { | |
| safeRollback(graph); | |
| throw se; | |
| } catch (Throwable t) { | |
| safeRollback(graph); | |
| throw t; | |
| } finally { | |
| try { | |
| graph.close(); | |
| } catch (Exception e) { | |
| LOG.warn("Failed to close graph", e); | |
| } | |
| private static final int EXIT_CODE_SUCCESS = 0; | |
| private static final int EXIT_CODE_FAILED = 1; | |
| public static void main(String[] args) throws Exception { | |
| int exitCode = EXIT_CODE_SUCCESS; | |
| JanusGraph graph = null; | |
| try { | |
| CommandLine cmd = parseArgs(args); | |
| if (cmd.hasOption("h")) { | |
| printHelp(); | |
| return; | |
| } | |
| final String query = getQuery(cmd); | |
| final boolean commit = cmd.hasOption("commit"); | |
| new AtlasJanusGraphDatabase(); | |
| graph = AtlasJanusGraphDatabase.getGraphInstance(); | |
| try { | |
| GraphTraversalSource g = graph.traversal(); | |
| GremlinGroovyScriptEngine engine = new GremlinGroovyScriptEngine(); | |
| Bindings bindings = engine.createBindings(); | |
| bindings.put("graph", graph); | |
| bindings.put("g", g); | |
| bindings.put("__", __.class); | |
| bindings.put("P", P.class); | |
| Object result = eval(engine, bindings, query); | |
| if (result instanceof Traversal) { | |
| result = ((Traversal<?, ?>) result).toList(); | |
| } | |
| System.out.println(String.valueOf(result)); | |
| finishTx(graph, commit); | |
| } catch (ScriptException se) { | |
| safeRollback(graph); | |
| LOG.error("Error executing Gremlin script", se); | |
| exitCode = EXIT_CODE_FAILED; | |
| } catch (Throwable t) { | |
| safeRollback(graph); | |
| LOG.error("Unexpected error executing Gremlin CLI", t); | |
| exitCode = EXIT_CODE_FAILED; | |
| } | |
| } catch (Exception e) { | |
| LOG.error("Error running Gremlin CLI", e); | |
| exitCode = EXIT_CODE_FAILED; | |
| } finally { | |
| if (graph != null) { | |
| try { | |
| graph.close(); | |
| } catch (Exception e) { | |
| LOG.warn("Failed to close graph", e); | |
| } | |
| } | |
| System.exit(exitCode); |
| try { | ||
| GraphTraversalSource g = graph.traversal(); | ||
| GremlinGroovyScriptEngine engine = new GremlinGroovyScriptEngine(); | ||
|
|
||
| Bindings bindings = engine.createBindings(); | ||
| bindings.put("graph", graph); | ||
| bindings.put("g", g); | ||
| bindings.put("__", __.class); | ||
| bindings.put("P", P.class); | ||
|
|
||
| Object result = eval(engine, bindings, query); | ||
|
|
||
| if (result instanceof Traversal) { | ||
| result = ((Traversal<?, ?>) result).toList(); | ||
| } | ||
|
|
||
| System.out.println(String.valueOf(result)); | ||
|
|
||
| finishTx(graph, commit); | ||
| } catch (ScriptException se) { | ||
| safeRollback(graph); | ||
| throw se; | ||
| } catch (Throwable t) { | ||
| safeRollback(graph); | ||
| throw t; | ||
| } finally { | ||
| try { | ||
| graph.close(); | ||
| } catch (Exception e) { | ||
| LOG.warn("Failed to close graph", e); | ||
| } | ||
| } |
There was a problem hiding this comment.
The GremlinGroovyScriptEngine should be cleaned up after use by calling reset() to prevent memory leaks from script compilation. The existing codebase pattern in AtlasJanusGraph shows that the script engine should be released after execution. Consider adding a finally block to release the engine, similar to how it's done in AtlasJanusGraph.releaseGremlinScriptEngine().
| private static String getQuery(CommandLine cmd) throws IOException { | ||
| String q = cmd.getOptionValue("q"); | ||
| String f = cmd.getOptionValue("f"); | ||
|
|
||
| if ((q == null || q.trim().isEmpty()) && (f == null || f.trim().isEmpty())) { | ||
| throw new IllegalArgumentException("Missing query. Provide -q/--query or -f/--file. Use -h for help."); | ||
| } | ||
|
|
||
| if (q != null && f != null) { | ||
| throw new IllegalArgumentException("Provide only one of -q/--query or -f/--file."); | ||
| } | ||
|
|
||
| if (q != null) { | ||
| return q; | ||
| } | ||
|
|
||
| return new String(Files.readAllBytes(Paths.get(f)), StandardCharsets.UTF_8); |
There was a problem hiding this comment.
No validation is performed on the file path before reading. This could potentially allow reading arbitrary files that the Atlas process has access to. Consider validating that the file path doesn't contain directory traversal patterns (..) or restrict it to a specific directory. Additionally, verify that the file is readable and exists before attempting to read it to provide better error messages.
| new AtlasJanusGraphDatabase(); | ||
|
|
There was a problem hiding this comment.
The AtlasJanusGraphDatabase is instantiated but the return value is not used. Looking at similar tools like RepairIndex, they directly call AtlasJanusGraphDatabase.getGraphInstance() without instantiating the database object first. This unnecessary instantiation could be removed as the static initialization happens inside getGraphInstance().
| new AtlasJanusGraphDatabase(); |
| } catch (ScriptException se) { | ||
| safeRollback(graph); | ||
| throw se; | ||
| } catch (Throwable t) { |
There was a problem hiding this comment.
Exception handling is inconsistent with other tools in the repository. RepairIndex and BulkFetchAndUpdate both use LOG.error() with the message and exception together in a single call. The catch blocks here rethrow exceptions without logging error messages, which makes debugging harder for users. Consider logging errors before rethrowing, similar to the pattern: LOG.error("Failed!", e).
| } catch (ScriptException se) { | |
| safeRollback(graph); | |
| throw se; | |
| } catch (Throwable t) { | |
| } catch (ScriptException se) { | |
| LOG.error("Gremlin script execution failed", se); | |
| safeRollback(graph); | |
| throw se; | |
| } catch (Throwable t) { | |
| LOG.error("Gremlin CLI execution failed", t); |
| GraphTraversalSource g = graph.traversal(); | ||
| GremlinGroovyScriptEngine engine = new GremlinGroovyScriptEngine(); | ||
|
|
||
| Bindings bindings = engine.createBindings(); | ||
| bindings.put("graph", graph); | ||
| bindings.put("g", g); | ||
| bindings.put("__", __.class); | ||
| bindings.put("P", P.class); | ||
|
|
||
| Object result = eval(engine, bindings, query); | ||
|
|
||
| if (result instanceof Traversal) { | ||
| result = ((Traversal<?, ?>) result).toList(); | ||
| } | ||
|
|
||
| System.out.println(String.valueOf(result)); | ||
|
|
||
| finishTx(graph, commit); | ||
| } catch (ScriptException se) { | ||
| safeRollback(graph); | ||
| throw se; | ||
| } catch (Throwable t) { | ||
| safeRollback(graph); | ||
| throw t; | ||
| } finally { | ||
| try { | ||
| graph.close(); | ||
| } catch (Exception e) { | ||
| LOG.warn("Failed to close graph", e); | ||
| } | ||
| } |
There was a problem hiding this comment.
The GraphTraversalSource 'g' should be closed when done, as it may hold resources. Consider adding g.close() in the finally block, similar to how graph.close() is handled. This ensures proper resource cleanup even when exceptions occur.
| return q; | ||
| } | ||
|
|
||
| return new String(Files.readAllBytes(Paths.get(f)), StandardCharsets.UTF_8); |
There was a problem hiding this comment.
Reading entire file contents into memory with Files.readAllBytes could cause OutOfMemoryError for large Groovy script files. Consider adding a file size check or using a streaming approach. For example, check the file size before reading and reject files larger than a reasonable threshold (e.g., 10MB) to prevent potential DoS scenarios.
| <appender name="console" class="ch.qos.logback.core.ConsoleAppender"> | ||
| <param name="Target" value="System.out"/> | ||
| <encoder> | ||
| <pattern>%date [%thread] %level{5} [%file:%line] %msg%n</pattern> | ||
| </encoder> | ||
| <filter class="ch.qos.logback.classic.filter.ThresholdFilter"> | ||
| <level>INFO</level> | ||
| </filter> | ||
| </appender> |
There was a problem hiding this comment.
The console appender is defined but never used in the root logger configuration. Line 44-45 only reference the FILE appender. If console output is intended for debugging purposes, it should be added to the root logger, or if it's not needed, the console appender definition can be removed to avoid confusion.
| if [ -z "${ATLAS_CONF:-}" ]; then | ||
| echo "ATLAS_CONF is not set. Example: export ATLAS_CONF=/etc/atlas/conf" >&2 | ||
| echo "This script will set: -Datlas.conf=\$ATLAS_CONF" >&2 | ||
| exit 2 |
There was a problem hiding this comment.
- I suggest introducing ATLAS_HOME, with default set to /opt/atlas (the location where Atlas is installed in docker container)
- Instead of failing here, I suggest using default value of ${ATLAS_HOME}/conf
- Similarly for ATLAS_CLASSPATH, when not specified include a default set of libraries under ${ATLAS_HOME}/server/webapp/atlas/WEB-INF/lib/
What changes were proposed in this pull request?
A command-line utility to run Gremlin queries against Atlas' embedded JanusGraph backend (just for developers)
How was this patch tested?