feat: add map contains when map used by knudtty · Pull Request #1497 · hyperdxio/hyperdx

knudtty · 2025-12-17T21:04:15Z

Closes HDX-3070

Adding mapContains allows a bloom filter index to be used to not search a granule if a key for a given map is not present in that granule. In some testing I've done it yielded 40% less rows scanned

changeset-bot · 2025-12-17T21:04:19Z

🦋 Changeset detected

Latest commit: 6fed8ca

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
@hyperdx/common-utils	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

vercel · 2025-12-17T21:04:20Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
hyperdx-v2-oss-app	Ready	Preview, Comment	Dec 18, 2025 11:14pm

claude · 2025-12-17T21:05:59Z

Code Review

Critical Issues

SQL Injection vulnerability in renderChartConfig.ts:510 - Map keys are not escaped before string interpolation. Use backticks or proper SQL escaping for mapName and escape quotes in keyName
Feature flag logic error in renderChartConfig.ts:15-17 - || operator causes optimization to be ALWAYS enabled (truthy in non-production). Should be: process.env.NEXT_PUBLIC_MAP_CONTAINS_OPTIMIZATION_ENABLED === 'true' || process.env.NODE_ENV !== 'production'
Missing null safety in renderChartConfig.ts:467 - cur.default_expression could be undefined/null for materialized columns. Add null check before .indexOf()

Important Issues

Inconsistent error handling in renderChartConfig.ts:378, 533 - Empty catch blocks silently swallow all errors including TypeErrors. Consider logging errors in development or catching specific parser errors only
Missing validation in extractIdent:319 - node.array_index[0].index.value needs type validation - should verify it's a string before using as keyName

Minor Issues

Tests are comprehensive and cover edge cases well
CTE detection logic is correct
Deduplication using objectHash is appropriate

Recommendation: Fix the SQL injection and feature flag logic before merge.

github-actions · 2025-12-17T21:10:06Z

E2E Test Results

✅ All tests passed • 46 passed • 3 skipped • 816s

Status	Count
✅ Passed	46
❌ Failed	0
⚠️ Flaky	1
⏭️ Skipped	3

Tests ran across 4 shards in parallel.

View full report →

This reverts commit 23ace04.

knudtty · 2025-12-18T21:22:59Z

-exports[`renderChartConfig HAVING clause should render HAVING clause with SQL language 1`] = `"SELECT count(),severity FROM default.logs WHERE (timestamp >= fromUnixTimestamp64Milli(1739318400000) AND timestamp <= fromUnixTimestamp64Milli(1739491200000)) GROUP BY severity HAVING count(*) > 100"`;
+exports[`renderChartConfig HAVING clause should render HAVING clause with SQL language 1`] = `"SELECT count(),severity FROM default.logs WHERE (timestamp >= fromUnixTimestamp64Milli(1739318400000) AND timestamp <= fromUnixTimestamp64Milli(1739491200000)) GROUP BY severity HAVING COUNT(*) > 100"`;

-exports[`renderChartConfig HAVING clause should render HAVING clause with granularity and groupBy 1`] = `"SELECT count(),event_type,toStartOfInterval(toDateTime(timestamp), INTERVAL 5 minute) AS \`__hdx_time_bucket\` FROM default.events WHERE (timestamp >= fromUnixTimestamp64Milli(1739318400000) AND timestamp <= fromUnixTimestamp64Milli(1739491200000)) GROUP BY event_type,toStartOfInterval(toDateTime(timestamp), INTERVAL 5 minute) AS \`__hdx_time_bucket\` HAVING count(*) > 50 ORDER BY toStartOfInterval(toDateTime(timestamp), INTERVAL 5 minute) AS \`__hdx_time_bucket\`"`;
+exports[`renderChartConfig HAVING clause should render HAVING clause with granularity and groupBy 1`] = `"SELECT count(),event_type,toStartOfInterval(toDateTime(timestamp), INTERVAL 5 minute) AS \`__hdx_time_bucket\` FROM default.events WHERE (timestamp >= fromUnixTimestamp64Milli(1739318400000) AND timestamp <= fromUnixTimestamp64Milli(1739491200000)) GROUP BY event_type,toStartOfInterval(toDateTime(timestamp), INTERVAL 5 minute) AS \`__hdx_time_bucket\` HAVING COUNT(*) > 50 ORDER BY toStartOfInterval(toDateTime(timestamp), INTERVAL 5 minute) AS \`__hdx_time_bucket\`"`;


These cases never ran it through the sql parser previously, so the sql parser just capitalizes a few things. Otherwise the tests didn't change

knudtty · 2025-12-18T21:23:15Z

+      expect(actual.toLowerCase()).toContain(
+        'avg(response_time) > 500 and count(*) > 10',
+      );


just case changes

knudtty · 2025-12-18T21:24:27Z

+  };
+};
+
+const optimizeMapAccessWhere = ({


Builds ast, traverses extracting each ident, adds mapContains to the ast if the proper conditions are met, and builds back into sql

knudtty · 2025-12-18T21:25:39Z

+        case 'column_ref': {
+          const ident = extractIdent(node as ColumnRef);
+          if (ident) {
+            idents.push({ doesContain: true, ident });


Either a column or map, we want both since a materialized column could be a map key optimization in disguise

knudtty · 2025-12-18T21:27:32Z

+    // replace materialized idents with map ident
+    for (const curIdent of idents) {
+      if (curIdent.ident.type === 'column') {
+        const materializedMapIdent = materializedColumnToMapIdent.get(
+          curIdent.ident.name,
+        );
+        if (materializedMapIdent) {
+          curIdent.ident = materializedMapIdent;
+        }
+      }
+    }


This allows us to add mapContains even if that map entry is materialized, which is advantageous to still use the map key index

I'm not sure if the materialized field would be beneficial from this optimization. but good to know

wrn14897 · 2025-12-19T02:37:11Z

 import { CustomSchemaSQLSerializerV2, SearchQueryBuilder } from '@/queryParser';

+const MAP_CONTAINS_OPTIMIZATION_ENABLED =
+  process.env.NEXT_PUBLIC_MAP_CONTAINS_OPTIMIZATION_ENABLED ||


style: better to follow the pattern like (process.env.NEXT_PUBLIC_MAP_CONTAINS_OPTIMIZATION_ENABLED ?? 'false') === 'true'
and move it to config.ts

wrn14897 · 2025-12-19T02:42:05Z

+    const maps = new Set(
+      columns.filter(v => v.type.startsWith('Map')).map(v => v.name),
+    );


perf: We can move this out of the method, and we don’t need to traverse the AST if it’s empty. Also maps is a bit ambiguous, maybe mapFieldNames ?

wrn14897 · 2025-12-19T02:46:02Z

+
+    return parser.sqlify(ast);
+  } catch {
+    // ignore


Should we log errors during development for debugging purposes?

wrn14897 · 2025-12-19T02:49:58Z

+    }
+
+    // add map idents to AST
+    const addIdentToAst = (ident: SQLMapValueIdent, doesContain: boolean) => {


style: We can probably move this function out instead of manipulating the AST directly

wrn14897 · 2025-12-19T02:51:39Z


+const MAP_CONTAINS_OPTIMIZATION_ENABLED =
+  process.env.NEXT_PUBLIC_MAP_CONTAINS_OPTIMIZATION_ENABLED ||
+  process.env.NODE_ENV !== 'production';


I'd suggest removing this (in case somehow NODE_ENV is not set properly in prod) and add NEXT_PUBLIC_MAP_CONTAINS_OPTIMIZATION_ENABLED to your .env.local instead

knudtty · 2026-01-13T15:33:25Z

Closing this. I don't think it'll be easy to cover all possible SQL cases, I'm going to just focus on modifying the Lucene queries for now.

knudtty added 2 commits December 17, 2025 16:00

feat: add mapContains to where clause if map is used

23ace04

add changeset

82190a5

knudtty requested review from a team and wrn14897 and removed request for a team December 17, 2025 21:04

vercel Bot deployed to Preview December 17, 2025 21:05 View deployment

wrn14897 reviewed Dec 17, 2025

View reviewed changes

Comment thread packages/common-utils/src/core/renderChartConfig.ts Outdated

knudtty added 2 commits December 18, 2025 10:14

Revert "feat: add mapContains to where clause if map is used"

7b1ab10

This reverts commit 23ace04.

feat: add mapContains to where clause if map is used

9e9aa9e

vercel Bot deployed to Preview December 18, 2025 21:23 View deployment

knudtty commented Dec 18, 2025

View reviewed changes

wrap in feature flag

8f3baa8

vercel Bot deployed to Preview December 18, 2025 23:03 View deployment

enable during CI

b0b8196

vercel Bot deployed to Preview December 18, 2025 23:10 View deployment

add claude recommends

6fed8ca

vercel Bot deployed to Preview December 18, 2025 23:14 View deployment

wrn14897 reviewed Dec 19, 2025

View reviewed changes

knudtty closed this Jan 13, 2026

Conversation

knudtty commented Dec 17, 2025

Uh oh!

changeset-bot Bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

vercel Bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Critical Issues

Important Issues

Minor Issues

Uh oh!

github-actions Bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Test Results

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wrn14897 Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wrn14897 Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

knudtty commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot Bot commented Dec 17, 2025 •

edited

Loading

vercel Bot commented Dec 17, 2025 •

edited

Loading

claude Bot commented Dec 17, 2025 •

edited

Loading

github-actions Bot commented Dec 17, 2025 •

edited

Loading

wrn14897 Dec 19, 2025 •

edited

Loading

wrn14897 Dec 19, 2025 •

edited

Loading