CASSANDRA-21157 add table_id to system_distributed.compression_dictionaries table #4601

smiklosovic · 2026-02-04T04:55:33Z

Thanks for sending a pull request! Here are some tips if you're new here:

Ensure you have added or run the appropriate tests for your PR.
Be sure to keep the PR description updated to reflect all changes.
Write your PR title to summarize what this PR proposes.
If possible, provide a concise example to reproduce the issue for a faster review.
Read our contributor guidelines
If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

yifan-c · 2026-02-10T03:39:16Z

src/java/org/apache/cassandra/schema/SystemDistributedKeyspace.java

+                                                              "PRIMARY KEY ((keyspace_name, table_name), table_id, dict_id)) " +
+                                                              "WITH CLUSTERING ORDER BY (table_id DESC, dict_id DESC)"; // in order to retrieve the latest dictionary; the contract is the newer the dictionary the larger the dict_id


Should table_id be part of the partition key? (instead of being a clustering key).

dict_id is used to order the dictionaries for the same table. The table_id is not useful for the purpose.

but that means that we would need to specify it every single time. If I just want to see what is there for keyspace / table I can't because I would need to know table id too. We also do not need to be afraid that we would fetch "too much" if PK is just keyspace and table, because these are retrieved in a lightweight manner anyway and we have a way how to clear orphaned too so ...

yifan-c · 2026-02-10T03:45:05Z

src/java/org/apache/cassandra/schema/SystemDistributedKeyspace.java

     */
    @Nullable
-    public static List<LightweightCompressionDictionary> retrieveLightweightCompressionDictionaries(String keyspaceName, String tableName)
+    public static List<LightweightCompressionDictionary> retrieveLightweightCompressionDictionaries()


Is it only called by retrieveOrphanedLightweightCompressionDictionaries. If so, should they consolidate?

it is possible but I would just keep it how it is, I guess that having a way to just fetch everything, being it lightweight, does not hurt at all, for possible future usages we do not know how would look like yet.

yifan-c · 2026-02-10T03:46:07Z

src/java/org/apache/cassandra/tools/nodetool/CompressionDictionaryCommandGroup.java

+    @Command(name = "cleanup", description = "Clean up orphaned dictionaries by deleting them from " + SystemDistributedKeyspace.NAME
+                                             + '.' + SystemDistributedKeyspace.COMPRESSION_DICTIONARIES +
+                                             " table, these are ones for which a table they were trained for was dropped.")
+    public static class CleanupDictionaries extends AbstractCommand


How about CleanupOrphanedDictionaries? Just to be super clear

smiklosovic force-pushed the CASSANDRA-21157 branch from e283bc1 to 5e48792 Compare February 4, 2026 07:32

add table_id to system_distributed.compression_dictionaries table

967aaa0

smiklosovic force-pushed the CASSANDRA-21157 branch from 5e48792 to 967aaa0 Compare February 4, 2026 07:49

clear orphaned

773db9a

smiklosovic force-pushed the CASSANDRA-21157 branch from 5642c85 to 773db9a Compare February 5, 2026 03:17

smiklosovic requested a review from yifan-c February 5, 2026 11:05

yifan-c reviewed Feb 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CASSANDRA-21157 add table_id to system_distributed.compression_dictionaries table #4601

CASSANDRA-21157 add table_id to system_distributed.compression_dictionaries table #4601

smiklosovic commented Feb 4, 2026

Uh oh!

yifan-c Feb 10, 2026

Uh oh!

smiklosovic Feb 10, 2026

Uh oh!

yifan-c Feb 10, 2026

Uh oh!

yifan-c Feb 10, 2026

Uh oh!

smiklosovic Feb 10, 2026

Uh oh!

yifan-c Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		"PRIMARY KEY ((keyspace_name, table_name), table_id, dict_id)) " +
		"WITH CLUSTERING ORDER BY (table_id DESC, dict_id DESC)"; // in order to retrieve the latest dictionary; the contract is the newer the dictionary the larger the dict_id

CASSANDRA-21157 add table_id to system_distributed.compression_dictionaries table #4601

Are you sure you want to change the base?

CASSANDRA-21157 add table_id to system_distributed.compression_dictionaries table #4601

Conversation

smiklosovic commented Feb 4, 2026

Uh oh!

yifan-c Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

smiklosovic Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

yifan-c Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

yifan-c Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

smiklosovic Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

yifan-c Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants