CASSANDRA-21129: Offline TCM dump tool #4581

dracarys09 · 2026-01-23T05:15:51Z

When a Cassandra node fails to start due to Transactional Cluster Metadata (TCM/CEP-21) corruption or issues, operators need a way to inspect the cluster metadata state offline without starting the node. The existing tools (nodetool, cqlsh) require a running node, leaving operators blind when debugging startup failures.

With CEP-21 (Transactional Cluster Metadata), cluster metadata is stored in system tables:

system.local_metadata_log - Contains transformation entries (epoch -> transformation)
system.metadata_snapshots - Contains periodic snapshots of ClusterMetadata

When a node fails to start due to TCM corruption or inconsistencies, operators have no way to inspect the metadata state without a running node. This tool fills that gap by reading directly from SSTables.

Thanks for sending a pull request! Here are some tips if you're new here:

Ensure you have added or run the appropriate tests for your PR.
Be sure to keep the PR description updated to reflect all changes.
Write your PR title to summarize what this PR proposes.
If possible, provide a concise example to reproduce the issue for a faster review.
Read our contributor guidelines
If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

When a Cassandra node fails to start due to Transactional Cluster Metadata (TCM/CEP-21) corruption or issues, operators need a way to inspect the cluster metadata state offline without starting the node. The existing tools (nodetool, cqlsh) require a running node, leaving operators blind when debugging startup failures. With CEP-21 (Transactional Cluster Metadata), cluster metadata is stored in system tables: * system.local_metadata_log - Contains transformation entries (epoch -> transformation) * system.metadata_snapshots - Contains periodic snapshots of ClusterMetadata When a node fails to start due to TCM corruption or inconsistencies, operators have no way to inspect the metadata state without a running node. This tool fills that gap by reading directly from SSTables.

krummas

So this is an emergency recovery tool, hopefully extremely rarely used by an operator, I think we can slim it down a lot, these are the features I think we need here:

dump metadata to current (or user provided) epoch
- serialized binary format
- metadata.toString, to avoid locking us in to any formats
dump log (with start/end epoch), just toString each entry
maybe add option to dump system_clustermetadata.distributed_metadata_log if this is run on a CMS node

issues;

shell script should live in tools/bin/ directory
tool name - this does not dump sstable metadata, it dumps cluster metadata from sstables, sstable metadata is something different (see tools/bin/sstablemetadata)
it copies the sstables to $CASSANDRA_HOME/data (or, if that is unset, in to the current directory) - we should create a temporary directory for import and clean that directory up after dumping the metadata, we need something like

                Path p = Files.createTempDirectory("dumptcmlog");
                DatabaseDescriptor.getRawConfig().data_file_directories = new String[] {p.resolve("data").toString()};
                DatabaseDescriptor.getRawConfig().commitlog_directory = p.resolve("commitlog").toString();
                DatabaseDescriptor.getRawConfig().accord.journal_directory = p.resolve("accord_journal").toString();
                DatabaseDescriptor.getRawConfig().hints_directory = p.resolve("hints").toString();
                DatabaseDescriptor.getRawConfig().saved_caches_directory = p.resolve("saved_caches").toString();

to make sure we only touch the tmp directory

krummas

looks good in general, just a few minor things inline

krummas · 2026-02-09T08:54:26Z