Skip to content

Add tsblock tsfile writer#753

Merged
JackieTien97 merged 3 commits intoapache:developfrom
shuwenwei:addTsBlockTsFileWriter
Mar 27, 2026
Merged

Add tsblock tsfile writer#753
JackieTien97 merged 3 commits intoapache:developfrom
shuwenwei:addTsBlockTsFileWriter

Conversation

@shuwenwei
Copy link
Copy Markdown
Member

No description provided.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a V4 writer to convert table-model TsBlock data into a TsFile, with accompanying tests, plus some refactoring to enable subclass access in aligned chunk-group writers.

Changes:

  • Introduce TableTsBlock2TsFileWriter to split a table-model TsBlock by device and write per-device aligned chunks (with optional synthetic time generation).
  • Add unit tests covering multiple mappings (existing time column vs generated time; tag-driven device vs single-device).
  • Extend chunk writer implementations (new TableChunkGroupWriterImpl.write(...) and make some AlignedChunkGroupWriterImpl members protected) to support the new writer flow.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
java/tsfile/src/test/java/org/apache/tsfile/write/writer/TableTsBlock2TsFileWriterTest.java Adds tests validating the new TsBlock→TsFile writing behavior across different column mappings.
java/tsfile/src/main/java/org/apache/tsfile/write/v4/TableTsBlock2TsFileWriter.java New writer that partitions TsBlock rows by device and writes aligned chunk groups, optionally generating per-device timestamps.
java/tsfile/src/main/java/org/apache/tsfile/write/chunk/TableChunkGroupWriterImpl.java Adds a convenience write(...) API for table-style writes (currently contains correctness issues).
java/tsfile/src/main/java/org/apache/tsfile/write/chunk/AlignedChunkGroupWriterImpl.java Changes access modifiers to protected to enable reuse/extension by table-model writers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +163 to +167
segments[i + 1] =
tsBlock
.getValueColumns()[tagColumnIndexInTsBlock[i]]
.getBinary(rowIdx)
.getStringValue(TSFileConfig.STRING_CHARSET);
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getDeviceId assumes every tag column is a non-null STRING/BINARY and unconditionally calls getBinary(rowIdx).getStringValue(...). If a tag value is null (or the tag column type isn’t binary-compatible), this will throw at runtime. Consider following the Tablet#getDeviceID approach: check isNull(rowIdx) and use getObject(rowIdx)/toString() (or otherwise handle non-string tag types) so device ID generation is robust to nulls and supported tag types.

Suggested change
segments[i + 1] =
tsBlock
.getValueColumns()[tagColumnIndexInTsBlock[i]]
.getBinary(rowIdx)
.getStringValue(TSFileConfig.STRING_CHARSET);
Column tagColumn = tsBlock.getValueColumns()[tagColumnIndexInTsBlock[i]];
if (tagColumn.isNull(rowIdx)) {
segments[i + 1] = null;
} else {
Object value = tagColumn.getObject(rowIdx);
segments[i + 1] = value == null ? null : value.toString();
}

Copilot uses AI. Check for mistakes.
}

@Test
public void test1() throws IOException, WriteProcessException {
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test method names test1, test2, test3 don’t communicate what scenarios are being validated (e.g., existing time column vs generated time column, with/without tag columns). Renaming them to reflect the specific behavior under test will make failures much easier to triage.

Suggested change
public void test1() throws IOException, WriteProcessException {
public void testWriteWithExistingTimeColumnAndTagColumns() throws IOException, WriteProcessException {

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +61
ValueChunkWriter[] valueChunkWriters = new ValueChunkWriter[valueColumns.length];
for (int i = 0; i < measurementSchemas.size(); i++) {
valueChunkWriters[i] = tryToAddSeriesWriterInternal(measurementSchemas.get(i));
}
for (int rowIndex = startRowIndex; rowIndex < endRowIndex; rowIndex++) {
long time = timeColumn.getLong(rowIndex);
for (int valueColumnIndex = 0; valueColumnIndex < valueColumns.length; valueColumnIndex++) {
Column valueColumn = valueColumns[valueColumnIndex];
IMeasurementSchema measurementSchema = measurementSchemas.get(valueColumnIndex);
ValueChunkWriter valueChunkWriter = valueChunkWriters[rowIndex];
boolean isNull = valueColumn.isNull(rowIndex);
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In write(...), the ValueChunkWriter is indexed by rowIndex (valueChunkWriters[rowIndex]) instead of by valueColumnIndex. This will throw ArrayIndexOutOfBoundsException as soon as endRowIndex exceeds the number of columns, and it also writes data into the wrong series. Use the value-column index when selecting the writer, and consider validating measurementSchemas.size() matches valueColumns.length before initializing/iterating to avoid AIOOB or null writers. Also, this method currently skips the out-of-order check (checkIsHistoryData(time)) that other chunk writers perform, so out-of-order writes may silently corrupt state; it should apply the same check (and adjust the throws signature accordingly).

Copilot uses AI. Check for mistakes.
throws WriteProcessException {
int pointCount = 0;
for (int rowIndex = startRowIndex; rowIndex < endRowIndex; rowIndex++) {
long time = timeColumn.getLong(rowIndex);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be better to check timeColumn.isNull, you can throw exception if so.

@JackieTien97 JackieTien97 merged commit 27af406 into apache:develop Mar 27, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants