Skip to content

HDDS-14661. Handle split table writes for Multipart Upload#10588

Open
spacemonkd wants to merge 1 commit into
apache:masterfrom
spacemonkd:HDDS-14661
Open

HDDS-14661. Handle split table writes for Multipart Upload#10588
spacemonkd wants to merge 1 commit into
apache:masterfrom
spacemonkd:HDDS-14661

Conversation

@spacemonkd

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

HDDS-14661. Handle split table writes for Multipart Upload

Please describe your PR in detail:

Implements the runtime support for schemaVersion 1 multipart uploads using the split multipartPartsTable.

This patch covers the MPU lifecycle after an upload is already marked as schemaVersion 1:

  • CommitPart writes part metadata into multipartPartsTable instead of growing MultipartInfoTable.
  • Complete MPU reads parts from multipartPartsTable, assembles the final key, and deletes all consumed part rows.
  • Abort MPU scans split-table parts, releases quota, moves part blocks to the deleted table, and removes part rows.
  • Expired MPU cleanup follows the same split-table cleanup path for abandoned uploads.
  • ListParts reads ordered part metadata from multipartPartsTable.
  • Legacy schemaVersion 0 behavior remains unchanged.

This patch intentionally does not decide when new uploads become schemaVersion 1.
PR #10062 owns the upgrade/finalization and initiate-MPU schema selection. Once that branch starts creating schemaVersion 1 MPUs, this patch provides the runtime read/write/cleanup behavior needed to handle them end to end.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14661

How was this patch tested?

Patch was tested using unit tests

@spacemonkd spacemonkd self-assigned this Jun 23, 2026
@spacemonkd spacemonkd added the s3 S3 Gateway label Jun 23, 2026
@spacemonkd spacemonkd requested a review from ivandika3 June 23, 2026 06:21
@spacemonkd spacemonkd assigned errose28 and rakeshadr and unassigned rakeshadr and errose28 Jun 23, 2026
@spacemonkd spacemonkd marked this pull request as ready for review June 25, 2026 19:03
@spacemonkd

Copy link
Copy Markdown
Contributor Author

@errose28 @ivandika3 @kerneltime could you please take a look at this PR?
I plan on merging this PR first to put the wiring in place and then simultaneously merge the upgrade PR as they are co-dependent.

Having both of these ready to merge so we can do it in quick succession would be great!

@ivandika3 ivandika3 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @spacemonkd for the patch. Overall looks good. Left some comments.

Let's get this in asap, this feature has been outstanding quite a while.

/**
* Cache-aware scanner for multipart parts table rows.
*/
public final class MultipartPartScanUtil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd suggest to move this to OMMultipartUploadUtils so we only need a single util class for MPU.

Comment on lines +51 to +88
try (TableIterator<OmMultipartPartKey,
? extends Table.KeyValue<OmMultipartPartKey, OmMultipartPartInfo>>
iterator = omMetadataManager.getMultipartPartsTable().iterator(prefix)) {
while (iterator.hasNext()) {
Table.KeyValue<OmMultipartPartKey, OmMultipartPartInfo> kv =
iterator.next();
if (kv == null) {
continue;
}
OmMultipartPartKey key = kv.getKey();
if (!uploadId.equals(key.getUploadId())) {
break;
}
if (key.hasPartNumber()) {
parts.put(key.getPartNumber(), kv.getValue());
}
}
}

Iterator<Map.Entry<CacheKey<OmMultipartPartKey>,
CacheValue<OmMultipartPartInfo>>> cacheIterator =
omMetadataManager.getMultipartPartsTable().cacheIterator();
while (cacheIterator.hasNext()) {
Map.Entry<CacheKey<OmMultipartPartKey>, CacheValue<OmMultipartPartInfo>>
cacheEntry = cacheIterator.next();
OmMultipartPartKey key = cacheEntry.getKey().getCacheKey();
if (!uploadId.equals(key.getUploadId()) || !key.hasPartNumber()) {
continue;
}
OmMultipartPartInfo value = cacheEntry.getValue().getCacheValue();
if (value == null) {
parts.remove(key.getPartNumber());
} else {
parts.put(key.getPartNumber(), value);
}
}

return parts;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, in OmMetadataReader listKeys and getMultipartUploadKeys, we will iterate the cache first before the DB. Any reason here is reversed? Not saying it's wrong, just want to standardize.

Comment on lines +268 to +270
if (multipartKeyInfo.getSchemaVersion()
== OmMultipartKeyInfo.LEGACY_SCHEMA_VERSION
&& null != oldPartKeyInfo) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: A lot of unnecessary line wraps here which AI agent usually does. We can try to fix unnecessary wraps introduced in this patch as much as possible (within reason).

Comment on lines +423 to +426
private OmMultipartPartKey getMultipartPartKey(String uploadId,
int partNumber) {
return OmMultipartPartKey.of(uploadId, partNumber);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Let's make this inline instead of introducing another method.

Comment on lines +428 to +440
private void validateSplitPartInfo(OmKeyInfo omKeyInfo, int partNumber)
throws OMException {
if (StringUtils.isBlank(omKeyInfo.getMetadata().get(OzoneConsts.ETAG))) {
throw new OMException("Missing ETag for multipart upload part "
+ partNumber, OMException.ResultCodes.INVALID_REQUEST);
}
if (omKeyInfo.getKeyLocationVersions() == null
|| omKeyInfo.getKeyLocationVersions().isEmpty()
|| omKeyInfo.getLatestVersionLocations().getLocationList().isEmpty()) {
throw new OMException("Missing block locations for multipart upload part "
+ partNumber, OMException.ResultCodes.INVALID_REQUEST);
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have this logic in the schema version 0? OM might allow uploading an empty part (although S3G might fail it early).

Comment on lines +666 to +674
if (eTagBasedValidationAvailable) {
requestPartId = part.getETag();
omPartId = multipartPartInfo.getETag();
} else {
requestPartId = part.getPartName();
omPartId = multipartPartInfo.getPartName();
}
requestPart = new MultipartCommitRequestPart(
requestPartId, omPartId, StringUtils.equals(requestPartId, omPartId));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to call the ETag validation here?

Comment on lines +94 to +105
for (OmKeyInfo currentKeyPartInfo :
abortInfo.getPartsKeyInfoToDelete()) {
addPartToDeletedTable(omMetadataManager, batchOperation,
omBucketInfo, abortInfo, currentKeyPartInfo,
omMultipartKeyInfo.getUpdateID());
}

// multi-part key format is volumeName/bucketName/keyName/uploadId
String deleteKey = omMetadataManager.getOzoneDeletePathKey(
currentKeyPartInfo.getObjectID(), abortInfo.getMultipartKey());

omMetadataManager.getDeletedTable().putWithBatch(batchOperation,
deleteKey, repeatedOmKeyInfo);
for (OmMultipartPartKey partKey :
abortInfo.getPartsTableKeysToDelete()) {
omMetadataManager.getMultipartPartsTable().deleteWithBatch(
batchOperation, partKey);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought, but if it's possible we can use deleteRange in the future to reduce the tombstones. However, deleteRange need to be done carefully since it can invalidate valid DB entries.

Comment on lines +83 to +86
this.multipartPartKey = multipartPartKey;
this.openKey = openKey;
this.omMultipartKeyInfo = omMultipartKeyInfo;
this.omMultipartPartInfo = omMultipartPartInfo;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to add a precondition here that if mutlipartPartKey and omMultipartPartInfo need to either be both null or both not null.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

s3 S3 Gateway

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants