Skip to content

Restructure documentation using Diataxis framework#379

Merged
robfrank merged 27 commits intomainfrom
docs/diataxis-restructuring
Mar 20, 2026
Merged

Restructure documentation using Diataxis framework#379
robfrank merged 27 commits intomainfrom
docs/diataxis-restructuring

Conversation

@robfrank
Copy link
Copy Markdown
Contributor

@robfrank robfrank commented Mar 9, 2026

Summary

Restructures the ArcadeDB documentation from 10 flat sections into a Diataxis-based taxonomy for improved findability and broader audience reach (Python, JavaScript, AI/ML developers).

New structure

  • Tutorials — Learning-oriented: quickstarts for Python, JavaScript, vector search, time series
  • Use Cases — 9 complete use case pages (recommendation engine, knowledge graph, Graph RAG, fraud detection, realtime analytics, social network, supply chain, IAM, customer 360) + 5 planned stubs
  • Concepts — Understanding-oriented: multi-model architecture, vector search, high availability, plus existing concepts (graphs, time series, schema, etc.)
  • How-To — Task-oriented: connectivity (Python, Node.js), data modeling (vector embeddings), operations (monitoring, performance tuning), migration
  • Reference — Information-oriented: SQL, Cypher, Gremlin, GraphQL, MongoDB QL, Redis, Java API, HTTP API, gRPC API, vector functions, graph algorithms (68 algorithms), time series, MCP
  • Tools — Studio, console, Swagger UI
  • Appendix — Community, known issues, OrientDB differences

Key changes

  • All 860+ anchors and 560+ cross-references preserved
  • New Python quickstart tutorial (psycopg via PostgreSQL protocol)
  • New JavaScript quickstart tutorial (pg via PostgreSQL protocol)
  • New vector search documentation (concepts, tutorial, 40+ function reference, embeddings how-to)
  • Graph algorithms moved from appendix to reference with categorized navigation
  • New gRPC API reference (2 services, 22 RPCs, message types)
  • New monitoring and performance tuning how-to guides
  • New multi-model architecture and high availability concepts pages
  • Cross-references added linking tutorials, use cases, concepts, and reference sections

Validation

  • docs-validator.py: all filenames, anchors, and cross-references valid
  • mvn generate-resources: BUILD SUCCESS (HTML + PDF)
  • 202 AsciiDoc files, 861 anchors, 563 cross-references

Test plan

  • Run python3 docs-validator.py — all checks pass
  • Run mvn generate-resources — BUILD SUCCESS
  • Run mvn jetty:run and verify navigation at http://localhost:8080
  • Spot-check cross-references in new pages (tutorials, use cases, concepts)
  • Verify PDF generation includes new sections

🤖 Generated with Claude Code

@robfrank robfrank force-pushed the docs/diataxis-restructuring branch from 7647e30 to b8d76a5 Compare March 9, 2026 21:30
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly overhauls the ArcadeDB documentation by adopting the Diataxis framework, which categorizes content into Tutorials, How-To Guides, Explanations (Concepts), and Reference. This strategic reorganization aims to enhance the user experience by making information more accessible and relevant to different user needs, from quickstarts for new developers to detailed API references for experienced users. The change improves the overall structure and navigability of the documentation, ensuring that users can efficiently find the information they need to learn, use, and operate ArcadeDB.

Highlights

  • Documentation Restructuring: The entire ArcadeDB documentation has been reorganized from 10 flat sections into a Diataxis-based taxonomy, improving content findability and targeting a broader developer audience.
  • New Tutorials: New quickstart tutorials were added for Python and JavaScript, both utilizing the PostgreSQL protocol, alongside new tutorials for vector search and time series.
  • Enhanced Vector Search Documentation: Comprehensive new documentation for vector search was introduced, covering concepts, a tutorial, over 40 function references, and how-to guides for embeddings.
  • Expanded API References: A new gRPC API reference was added, detailing 2 services, 22 RPCs, and message types, and existing API references were reorganized.
  • New Operational Guides: New how-to guides for monitoring and performance tuning were added to assist with operational aspects of ArcadeDB.
  • Updated Core Concepts: New concepts pages were introduced for multi-model architecture and high availability, enriching the understanding-oriented section.
  • Cross-Referencing Improvements: All 860+ anchors and 560+ cross-references were preserved, and new cross-references were added to link related sections across tutorials, use cases, concepts, and reference materials.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • Content Reorganization
    • Moved graph algorithms from the appendix to the reference section with categorized navigation.
    • Restructured administration and developer guides into the new 'How-To' section.
    • Reorganized API and query language references into the new 'Reference' section.
    • Moved community, issues, and OrientDB differences to a new 'Appendix' section.
  • Documentation Structure
    • Implemented the Diataxis framework for documentation organization.
    • Introduced new top-level sections: Tutorials, Use Cases, Concepts, How-To, Reference, Tools, and Appendix.
    • Ensured all existing anchors and cross-references (860+ and 560+ respectively) were preserved and updated.
  • Minor Enhancements
    • Corrected 'extendible hashing' to 'extendable hashing' in index descriptions.
    • Updated image paths across numerous renamed and moved files.
    • Added 'See Also' links in various concept pages to improve navigation.
  • New Content & Features
    • Added Python and JavaScript quickstart tutorials using the PostgreSQL protocol.
    • Introduced comprehensive vector search documentation, including concepts, a tutorial, and function references.
    • Included a new gRPC API reference with detailed services and message types.
    • Developed new how-to guides for monitoring and performance tuning.
    • Created new concepts pages for multi-model architecture and high availability.
    • Added 9 complete use case pages and 5 planned stubs, each with runnable examples.
Activity
  • The documentation validator script (docs-validator.py) was run, and all filenames, anchors, and cross-references were confirmed as valid.
  • The Maven generate-resources command executed successfully, producing both HTML and PDF outputs.
  • The local navigation at http://localhost:8080 was verified using mvn jetty:run.
  • Spot-checks were performed on cross-references within new pages (tutorials, use cases, concepts).
  • PDF generation was verified to include all new sections.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The code review identified several issues. The variable name 'for' in java-select-api.adoc is a reserved keyword in Java and should be renamed. The btoa() function in http-nodejs.adoc is a browser-specific API and should be replaced with the Buffer class for Node.js compatibility. The term "extendable hashing" in concepts/indexes.adoc and java-schema.adoc is a typo and should be corrected to "extendible hashing". There's an inconsistency in the syntax for defining vector properties in how-to/data-modeling/vector-embeddings.adoc, where ARRAY_OF_FLOATS should be changed to LIST OF FLOAT for consistency. There is a typo in the comment in cypher-clauses.adoc, where // Multiple UNIONs should be // Multiple UNIONNs.

Note: Security Review did not run due to the size of the PR.


// Multi-hop: friends of friends
Set<String> fof = database.select().fromType("User")
Set<String> for = database.select().fromType("User")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The variable name for is a reserved keyword in Java and cannot be used as an identifier. This will cause a compilation error in the example code. Please rename it to something else, like fof (as it was previously) or another non-keyword name.

Set<String> fof = database.select().fromType("User")

[source,javascript]
----
const base = 'http://localhost:2480/api/v1';
const auth = 'Basic ' + btoa('root:arcadedb');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The btoa() function is not available in Node.js by default; it is a browser-specific API. For server-side Node.js code, you should use the Buffer class to perform Base64 encoding. This will make the code example runnable in a Node.js environment.

const auth = 'Basic ' + Buffer.from('root:arcadedb').toString('base64');

This same issue is present on line 97.

const auth = 'Basic ' + Buffer.from('root:arcadedb').toString('base64');


- **<<lsm-tree,LSM Tree>>** (default) — Optimized for range scans, ordered iteration, and write-heavy workloads.
- **Hash Index** — O(1) equality lookups using https://en.wikipedia.org/wiki/Extendible_hashing[extendible hashing]. Best for primary key access, JOINs, and edge traversal where ordering is not needed.
- **Hash Index** — O(1) equality lookups using https://en.wikipedia.org/wiki/Extendible_hashing[extendable hashing]. Best for primary key access, JOINs, and edge traversal where ordering is not needed.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The term for the hashing algorithm is "Extendible Hashing". The change to "extendable" is a typo. Please revert it to "extendible" to maintain technical accuracy. The linked Wikipedia article also uses "Extendible".

- **Hash Index** — O(1) equality lookups using https://en.wikipedia.org/wiki/Extendible_hashing[extendible hashing]. Best for primary key access, JOINs, and edge traversal where ordering is not needed.

==== Hash Index algorithm

ArcadeDB's Hash Index uses https://en.wikipedia.org/wiki/Extendible_hashing[extendible hashing], a disk-oriented algorithm that provides O(1) equality lookups with typically 1-2 page reads.
ArcadeDB's Hash Index uses https://en.wikipedia.org/wiki/Extendible_hashing[extendable hashing], a disk-oriented algorithm that provides O(1) equality lookups with typically 1-2 page reads.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The term for the hashing algorithm is "Extendible Hashing". The change to "extendable" is a typo. Please revert it to "extendible" to maintain technical accuracy.

ArcadeDB's Hash Index uses https://en.wikipedia.org/wiki/Extendible_hashing[extendible hashing], a disk-oriented algorithm that provides O(1) equality lookups with typically 1-2 page reads.

Comment on lines +117 to +118
CREATE PROPERTY Product.imageEmbedding ARRAY_OF_FLOATS
CREATE PROPERTY Product.textEmbedding ARRAY_OF_FLOATS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's an inconsistency in the syntax for defining vector properties. This file uses ARRAY_OF_FLOATS, while other parts of the documentation use LIST OF FLOAT (e.g., line 32 in this file) or ARRAY OF FLOATS. For consistency and clarity, it would be best to use a single, correct syntax throughout. I'd suggest using LIST OF FLOAT as it appears in the more basic examples.

CREATE PROPERTY Product.imageEmbedding LIST OF FLOAT
CREATE PROPERTY Product.textEmbedding  LIST OF FLOAT

MATCH (c:Company) RETURN c.name AS name

// Multiple UNIONs
// Multiple UNIONNs
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a typo in the comment. It should be // Multiple UNIONs.

// Multiple UNIONs

* `indexType` can be:
** `LSM_TREE`, implemented as a https://en.wikipedia.org/wiki/Log-structured_merge-tree[Log Structured Merge tree]. Best for range scans and ordered iteration.
** `HASH`, implemented using https://en.wikipedia.org/wiki/Extendible_hashing[extendible hashing]. Provides O(1) equality lookups — best for primary keys, JOINs, and edge traversal where ordering is not needed. Does not support range queries.
** `HASH`, implemented using https://en.wikipedia.org/wiki/Extendible_hashing[extendable hashing]. Provides O(1) equality lookups — best for primary keys, JOINs, and edge traversal where ordering is not needed. Does not support range queries.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The term for the hashing algorithm is "Extendible Hashing". The change to "extendable" is a typo. Please revert it to "extendible" to maintain technical accuracy.

** `HASH`, implemented using https://en.wikipedia.org/wiki/Extendible_hashing[extendible hashing]. Provides O(1) equality lookups — best for primary keys, JOINs, and edge traversal where ordering is not needed. Does not support range queries.

----

NOTE: Hash indexes use <<hash-index,extendible hashing>> and provide O(1) equality lookups — significantly faster than LSM Tree for point queries (`WHERE key = value`). However, they do not support range scans (`<`, `>`, `BETWEEN`) or ordered iteration (`ORDER BY`). Use `UNIQUE`/`NOTUNIQUE` (LSM Tree) when you need range queries or ordering.
NOTE: Hash indexes use <<hash-index,extendable hashing>> and provide O(1) equality lookups — significantly faster than LSM Tree for point queries (`WHERE key = value`). However, they do not support range scans (`<`, `>`, `BETWEEN`) or ordered iteration (`ORDER BY`). Use `UNIQUE`/`NOTUNIQUE` (LSM Tree) when you need range queries or ordering.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The term for the hashing algorithm is "Extendible Hashing". The change to "extendable" is a typo. Please revert it to "extendible" to maintain technical accuracy.

NOTE: Hash indexes use <<hash-index,extendible hashing>> and provide O(1) equality lookups — significantly faster than LSM Tree for point queries (`WHERE key = value`). However, they do not support range scans (`<`, `>`, `BETWEEN`) or ordered iteration (`ORDER BY`). Use `UNIQUE`/`NOTUNIQUE` (LSM Tree) when you need range queries or ordering.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 9, 2026

🧪 CI Insights

Here's what we observed from your CI run for f41d7b7.

🟢 All jobs passed!

But CI Insights is watching 👀

robfrank and others added 26 commits March 18, 2026 22:38
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update relative image paths from ../images/ to ../../images/ for files
that moved from depth 1 to depth 2 in the new directory structure.
Update all GitHub edit links to point to the new file locations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Each page documents a complete use case from the arcadedb-usecases repo
with architecture overview, key queries, and try-it-yourself instructions:
recommendation-engine, knowledge-graph, graph-rag, fraud-detection,
realtime-analytics, social-network-analytics, supply-chain, iam, customer-360.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Stub pages for semantic-search, geospatial-analytics, content-management,
network-monitoring, and data-lineage with planned features and target
scenarios. Replaces plain-text list in chapter.adoc with included pages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Python tutorial uses psycopg via PostgreSQL protocol (sourced from IAM
use case patterns). JavaScript tutorial uses pg client via PostgreSQL
protocol (sourced from supply-chain use case patterns). Both cover
schema creation, SQL/Cypher queries, and vector search.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Python guide now covers PostgreSQL protocol (psycopg), HTTP/JSON API
(requests), and embedded bindings with code examples. JavaScript guide
now covers PostgreSQL protocol (pg), HTTP/JSON API (fetch + axios),
and TypeScript usage with code examples. Both recommend PostgreSQL
protocol as the primary approach.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… how-to)

Phase 3 - Vector & AI/ML documentation:
- concepts/vector-search.adoc: Architecture, HNSW/DiskANN, quantization,
  similarity functions, parameter tuning guide
- tutorials/vector-search-tutorial.adoc: Step-by-step from schema to
  hybrid vector+graph queries
- how-to/data-modeling/vector-embeddings.adoc: Embedding model selection,
  index creation, quantization trade-offs, hybrid search, batch ingestion
- reference/vector-functions/: Categorized index of 40+ vector SQL
  functions (distance, manipulation, quantization, scoring, sparse)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…refs

Move 68 graph algorithms (5131 lines) from appendix to reference/
graph-algorithms/ with a chapter overview linking to all categories.
Appendix now contains a redirect. Add algorithm cross-references to
fraud-detection, social-network-analytics, and iam use cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…uning

Phase 5 - Time Series, gRPC, Monitoring:
- tutorials/time-series-tutorial.adoc: 10-step tutorial from schema
  creation through PromQL and continuous aggregates
- reference/time-series/chapter.adoc: Quick reference linking to the
  comprehensive timeseries concepts page
- reference/grpc-api/: 3 files documenting 2 services, 22 RPCs, message
  types, enums, and Python/Node.js connection examples
- how-to/operations/monitoring.adoc: Prometheus integration, metrics,
  Grafana dashboard setup
- how-to/operations/performance-tuning.adoc: Memory, indexes, EXPLAIN,
  buckets, page size, connection pools

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add concepts/multi-model.adoc explaining ArcadeDB's unified multi-model
architecture and concepts/high-availability.adoc covering the
leader-replica replication model. Add See Also cross-reference sections
to graphs.adoc and timeseries.adoc linking to related use cases and
tutorials.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SQL and Cypher used level-2 headings (== ) making them top-level sections
instead of subsections under Reference. Connectivity, Operations,
Data Modeling, and Migration children used the same heading level as
their parents, appearing as siblings instead of nested children.

Bumped all heading levels so the hierarchy renders correctly:
- Reference (==) > SQL, Cypher, Gremlin, etc. (===)
- How-To (==) > Connectivity (===) > JDBC, Postgres, etc. (====)
- How-To (==) > Operations (===) > Server, Backup, etc. (====)
- How-To (==) > Data Modeling (===) > Full-Text, Geospatial, etc. (====)
- How-To (==) > Migration (===) > Importer (====)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create a centralized stylesheet matching the arcadedb.com website design
language. Replaces Asciidoctor defaults with Inter/Space Grotesk/JetBrains
Mono fonts, website color palette, dark code blocks with remapped Rouge
syntax highlighting, styled TOC sidebar, tables, and admonition blocks.
Consolidates inline CSS from index.adoc, content.adoc, and web-footer.adoc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename the tutorial anchor from [[multi-model]] to [[multi-model-tutorial]]
to avoid conflict with the concepts page anchor. Update cross-references
in content.adoc accordingly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…bles

- Rename 5 duplicate anchor IDs (sql-insert, agg-median, map-submap,
  monitoring, high-availability) to unique names, update cross-references
- Fix section heading levels in 5 files (concepts/high-availability,
  concepts/graphs, cypher-compatibility, studio/server, studio/database)
- Switch 10 D2 diagrams from sketch:true to sketch:false to eliminate
  SVG <pattern> tag warnings in PDF generation
- Fix incomplete table rows in java-ref-database and algorithms
- Fix explicit list numbering in java-vectors to use auto-numbering

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…concepts

- Add Diataxis navigation TOC (Tutorials, How-Tos, Explanations, Reference)
- Group reference into Query Languages and APIs sub-sections via leveloffset
- Move Data Types and Binary Types to top of reference section
- Make Binary Types a sub-section of Data Types
- Rename Concepts to Explanations per Diataxis terminology
- Move Storage Internals and LSM-Tree from Reference to Explanations
- Integrate "Features Used" bullet lists into use-case intro paragraphs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@robfrank robfrank force-pushed the docs/diataxis-restructuring branch from 4b4fe89 to ffae66d Compare March 18, 2026 21:41
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@robfrank robfrank merged commit 3be3dc8 into main Mar 20, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant