When including a ; in a query string, we unexpectedly receive a 403 Forbidden error. See: https://cn.dataone.org/cn/v2/object?formatId=science-on-schema.org/Dataset;ld%2Bjson. If that semicolon is escaped, we get the expected result: https://cn.dataone.org/cn/v2/object?formatId=science-on-schema.org/Dataset%3Bld%2Bjson.
The documentation gives these set of characters for path segments and query segments:
[pchar] - [‘+’] as the set of unescaped characters for identifiers in path segments, and [pchar] - [‘+’, ‘&’, ‘=’] + [‘/’, ‘?’] for identifiers in query segments
However, the code indicates that ; is in the list of unescaped characters for query segments. Line 141 from that same file shows awareness of the Apache 403 issue - so it's unclear why the ; got added back to the unescaped list.
I originally found this issue in metadig, with this stack trace:
at org.dataone.service.util.ExceptionHandler.deserializeHtmlAndThrowException(ExceptionHandler.java:423)
at org.dataone.service.util.ExceptionHandler.deserializeAndThrowException(ExceptionHandler.java:372)
at org.dataone.service.util.ExceptionHandler.deserializeAndThrowException(ExceptionHandler.java:313)
at org.dataone.service.util.ExceptionHandler.filterErrors(ExceptionHandler.java:107)
at org.dataone.client.rest.HttpMultipartRestClient.doGetRequest(HttpMultipartRestClient.java:343)
at org.dataone.client.rest.HttpMultipartRestClient.doGetRequest(HttpMultipartRestClient.java:328)
at org.dataone.client.v2.impl.MultipartCNode.listObjects(MultipartCNode.java:340)
at edu.ucsb.nceas.mdqengine.scheduler.RequestReportJob.getPidsToProcess(RequestReportJob.java:545)
From the logs, this was the actual request:
2026-06-24T17:53:33.966+0000|GET|403|9|128.111.85.19|/cn/v2/object|"?fromDate=2025-09-01T00:00:00.001%2B00:00&toDate=2026-06-24T17:52:13.996%2B00:00&formatId=science-on-schema.org/Dataset;ld%2Bjson&nodeId=urn:node:DRYAD&start=0&count=1000"|-|"Apache-HttpClient/4.5.14 (Java/17.0.4.1)"
When including a
;in a query string, we unexpectedly receive a 403 Forbidden error. See: https://cn.dataone.org/cn/v2/object?formatId=science-on-schema.org/Dataset;ld%2Bjson. If that semicolon is escaped, we get the expected result: https://cn.dataone.org/cn/v2/object?formatId=science-on-schema.org/Dataset%3Bld%2Bjson.The documentation gives these set of characters for path segments and query segments:
However, the code indicates that
;is in the list of unescaped characters for query segments. Line 141 from that same file shows awareness of the Apache 403 issue - so it's unclear why the;got added back to the unescaped list.I originally found this issue in metadig, with this stack trace:
From the logs, this was the actual request: