Pipe: merge batched aligned chunks in scan parser#18010
Open
Caideyipi wants to merge 2 commits into
Open
Conversation
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Description
This PR improves the pipe TsFile scan parser for legal aligned TsFiles whose value chunks are physically written in column batches, such as files produced by batched aligned compaction.
The current scan parser emits an aligned tablet when the value chunk occurrence index changes. For batched aligned compaction output, value chunks can be laid out as:
This layout is valid, but the previous parser behavior makes the emitted tablets inherit the physical compaction batch width, commonly 10 columns from
compaction_max_aligned_series_num_in_one_batch, even when pipe reader memory allows a wider aligned tablet. That increases the number of tablets and hurts pipe performance.This PR changes the scan parser to cache pending aligned value chunk groups by time chunk index and emit them only when memory limits or chunk group boundaries require it. With enough memory, consecutive physical value column batches for the same aligned chunks are merged into wider aligned tablets instead of being split at the compaction batch boundary.
It also defines
pipeDataStructureTabletRowSize <= 0as disabling the row-count cap for pipe tablets. In that mode, tablet row count is calculated only frompipe_data_structure_tablet_size_in_bytes, so users can rely on the memory-size limit instead of the fixed row-count limit.Changes
TsFileInsertionEventScanParser.pipeDataStructureTabletRowSizeas no row-count cap inPipeMemoryWeightUtil.0/negative values.Validation
mvn spotless:apply -pl iotdb-core/datanodegit diff --checkI also tried:
mvn -Ddevelocity.off=true -pl iotdb-core/datanode -DskipTests compilemvn -Ddevelocity.off=true -Dmaven.main.skip=true -pl iotdb-core/datanode -Dtest=TsFileInsertionEventParserTest#testScanParserMergesBatchedAlignedValueChunkGroups+testPipeTabletRowSizeCanBeDisabledByNonPositiveValue testmvn -pl iotdb-core/datanode -Dtest=TsFileInsertionEventParserTest#testScanParserMergesBatchedAlignedValueChunkGroups+testScanParserFlushesBatchedAlignedValueChunkGroupsByMemoryLimit+testPipeTabletRowSizeCanBeDisabledByNonPositiveValue testThese Maven compile/test attempts are blocked in this workspace by existing datanode-wide compile issues outside this PR, including generated query fill/aggregation classes and
IOUtils.readFullyunresolved symbols in unrelated files. The focused tests did not get executed because compilation fails before Surefire runs.