Skip to content

Perf: optimize Tablet write with columnar string storage and lazy DeviceID construction (~10x throughput)#748

Open
ColinLeeo wants to merge 4 commits intodevelopfrom
tablet_refine
Open

Perf: optimize Tablet write with columnar string storage and lazy DeviceID construction (~10x throughput)#748
ColinLeeo wants to merge 4 commits intodevelopfrom
tablet_refine

Conversation

@ColinLeeo
Copy link
Copy Markdown
Contributor

@ColinLeeo ColinLeeo commented Mar 19, 2026

Write 200 x 50000 rows: 15039 ms
Throughput: 664938 rows/s

截屏2026-03-20 00 47 02

Write 200 x 50000 rows: 1578 ms
Throughput: 6337140 rows/s

截屏2026-03-20 00 51 04 [collapsed.zip](https://github.com/user-attachments/files/26122101/collapsed.zip)

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 19, 2026

Codecov Report

❌ Patch coverage is 78.65169% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.87%. Comparing base (ce16cb2) to head (dbf86df).

Files with missing lines Patch % Lines
cpp/src/writer/tsfile_writer.cc 57.69% 8 Missing and 3 partials ⚠️
cpp/src/common/tablet.h 79.31% 2 Missing and 4 partials ⚠️
cpp/src/common/tablet.cc 94.11% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #748      +/-   ##
===========================================
+ Coverage    61.85%   61.87%   +0.02%     
===========================================
  Files          704      704              
  Lines        41276    41347      +71     
  Branches      5929     5948      +19     
===========================================
+ Hits         25531    25585      +54     
- Misses       14905    14915      +10     
- Partials       840      847       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ColinLeeo ColinLeeo changed the title refine tabet writing. Perf: optimize Tablet write with columnar string storage and lazy DeviceID construction (~10x throughput) Mar 20, 2026
@hongzhi-gao
Copy link
Copy Markdown
Contributor

LGTM

@ColinLeeo ColinLeeo requested a review from jt2594838 March 24, 2026 09:04
Comment on lines +101 to +103
auto* sc = new StringColumn();
sc->init(max_row_num_, max_row_num_ * 32);
value_matrix_[c].string_col = sc;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use mem_alloc?

Comment on lines +466 to +480
for (auto col_idx : id_column_indexes_) {
const StringColumn& sc = *value_matrix_[col_idx].string_col;
const uint32_t* off = sc.offsets;
const char* buf = sc.buffer;
for (uint32_t i = 1; i < row_count; i++) {
if (boundary[i >> 6] & (1ULL << (i & 63))) continue;
uint32_t len_a = off[i] - off[i - 1];
uint32_t len_b = off[i + 1] - off[i];
if (len_a != len_b ||
(len_a > 0 &&
memcmp(buf + off[i - 1], buf + off[i], len_a) != 0)) {
boundary[i >> 6] |= (1ULL << (i & 63));
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May traverse tag columns in reversed order, because we tend to organize tags from big (like country) to small (like street).
You are more likely to find differences between small tags within the same TsFile or write batch.

if (len_a != len_b ||
(len_a > 0 &&
memcmp(buf + off[i - 1], buf + off[i], len_a) != 0)) {
boundary[i >> 6] |= (1ULL << (i & 63));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the number of boundaries reaches the number of rows, may break.

Comment on lines +83 to +93
void append(uint32_t row, const char* data, uint32_t len) {
// Grow buffer if needed
if (buf_used + len > buf_capacity) {
buf_capacity = buf_capacity * 2 + len;
buffer = (char*)common::mem_realloc(buffer, buf_capacity);
}
memcpy(buffer + buf_used, data, len);
offsets[row] = buf_used;
offsets[row + 1] = buf_used + len;
buf_used += len;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If data equals the value of the previous row, may simply use the same offsets and avoid a memory copy.
However, if the memory comparison is too often, but the memory copy is not avoided, we should stop comparing them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants