Add unified parse_json runtime type checker (#3285) [draft / proof-of-direction]#4063
Add unified parse_json runtime type checker (#3285) [draft / proof-of-direction]#4063vup903 wants to merge 8 commits into
Conversation
Introduce zarr.core.json_parse.parse_json, a single type-annotation-driven validator that consolidates the scattered per-field parse_* helpers. Handles primitives, Literal, unions/Optional, fixed and variadic tuples, Sequence/list (coerced to tuple), Mapping/dict, and TypedDict, with a bool-vs-int safe primitive check. Adds tests/test_json_parse.py (94 tests) and a changelog fragment. No existing call sites migrated yet; this is the proof-of-direction module. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r-developers#3285) Pilot migration delegating three representative helpers to the unified parse_json validator. Public signatures and return types are unchanged. parse_zarr_format re-wraps parse_json's ValueError/TypeError as MetadataValidationError with the original message to preserve observable behavior. parse_json is imported function-locally to avoid circular imports. Focused suites green: test_common/test_config/test_metadata/test_json_parse = 430 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ns (zarr-developers#3285) Apply ruff check/format to satisfy the Lint CI hook. Keep typing.Union/Optional spellings in tests (with noqa) to cover that origin path alongside X | Y. Rework _parse_typeddict to derive required/optional from get_type_hints(include_extras=True) + __total__ instead of __required_keys__, so class-syntax NotRequired is detected even when 'from __future__ import annotations' stringizes the hints (as in zarr's metadata modules). test_json_parse + test_common + test_metadata = 393 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lopers#3285) get_origin(hint) is Required/NotRequired tripped mypy's comparison-overlap and unreachable checks; typing origin as Any keeps the runtime check while satisfying mypy. Full 'uv run --frozen mypy' is clean (190 files); ruff check/format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Delegate the literal/type check of parse_indexing_order, parse_node_type, parse_node_type_array, parse_separator, two parse_zarr_format variants (group, v2), and parse_name to the unified parse_json. Public signatures and return types unchanged; each preserves its original exception type and message (wrapping parse_json's ValueError/TypeError into MetadataValidationError / NodeTypeValidationError / the original ValueError/TypeError where tests assert on them). parse_json imported function-locally to avoid circular imports. Focused suites + full mypy green (1196 passed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s#3285) Delegate the int/bool type check of parse_checksum, parse_clevel, parse_blocksize, parse_typesize, parse_gzip_level, and parse_zstd_level to parse_json, keeping each helper's range/bound check and exact error messages. Original exception types are preserved by wrapping parse_json's ValueError/TypeError. Note: parse_json rejects bool where the old isinstance(data, int) accepted it; verified no caller/test passes a bool to these, so this is a deliberate, more-correct strictening. parse_json imported function-locally. Codec suite + full mypy green (794 passed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4063 +/- ##
==========================================
- Coverage 93.50% 93.49% -0.01%
==========================================
Files 90 91 +1
Lines 11979 12132 +153
==========================================
+ Hits 11201 11343 +142
- Misses 778 789 +11
🚀 New features to boost your workflow:
|
|
Hi @d-v-b, I put up a draft so you can see where I'm heading before I take it further. So far it adds On the de-dup side, I've moved 16 of the scattered Before I migrate the rest, a few things I'd like your read on:
I'm holding off on the One heads up on CI: the failing Test jobs are a pre-existing collection error in Let me know what you think on the four questions and I'll keep going. |
|
@vup903, if we are willing to add |
|
nearly all of the types we need are now defined in zarr-metadata, so we should investigate if msgspec can handle all of those types correctly. zarr-metadata itself uses pydantic for its test suite, and I think i ended up choosing that after tryng msgspec and running into some issues, but I don't recall what they were. Maybe something to do with tuples? If its helpful, we could widen the JSON collection types in zarr-metadata to |
|
Thanks @chuckwondo, that's a good idea. I spent some time poking at The good news is that most of the categories work out of the box, including the one you were worried about, @d-v-b:
The two things that don't work are both load-bearing for the real
So my read is that msgspec handles the leaf/primitive/tuple cases really well, but it can't currently validate the two zarr-specific constructs (
Happy to prototype whichever direction you prefer. What's your instinct here? |
these are both pretty important features! it's too bad that msgspec doesn't support them. but i don't think we need to reach 100% with just 1 tool. I would be interested in seeing how far we could get with msgspec, even if it doesn't get us all the way. The metric here should be increasing code simplicity (removing code) + catching bugs. |
Summary
This is a draft / proof-of-direction PR for #3285, opened to show the intended approach and get early feedback before migrating all call sites — per @d-v-b's request in the issue thread.
It introduces a single unified runtime type checker,
parse_json(value, type_annotation), that consolidates the scattered per-fieldparse_*helpers (there are currently ~42 of them acrosssrc/zarr). The function validates a JSON-decoded value against a type annotation and returns data assignable to that annotation (coercing where sensible, e.g.parse_json([1, 2, 3], tuple[int, int, int]) -> (1, 2, 3)), or raises a clear error.What's in this draft
src/zarr/core/json_parse.py—parse_jsonplus sub-parsers, scoped to the JSON-relevant categories named in the issue:None,str,int,float,bool(with a bool-vs-int-safe check:Trueis not accepted forint, and1is not accepted forbool)Literal[...]OptionaltupleSequence[T]/list[T]→ returned as a tupleMapping[str, T]/dict[str, T]TypedDicttests/test_json_parse.py— 94 tests covering every category, incl. the bool/int edge cases, coercion, unions, nested TypedDict.parse_json, with public signatures and behavior unchanged:core/common.py::parse_order→parse_json(data, Literal["C", "F"])core/common.py::parse_bool→parse_json(data, bool)core/metadata/v3.py::parse_zarr_format→parse_json(data, Literal[3]), re-wrapping the error asMetadataValidationErrorto preserve the original exception type and message.Focused suites pass locally (Python 3.12,
test.py3.12-optionalenv):test_common,test_config,test_metadata, andtest_json_parse→ 430 passed.Open questions for maintainers (direction feedback)
src/zarr/core/json_parse.pyis a placeholder; happy to move it (e.g. into a typing/metadata utility module).parse_jsonraises plainValueError/TypeErrorsince it's field-agnostic; field-specific callers (likeparse_zarr_format) re-wrap into their domain error. Does that split match what you'd want?NotRequired— the current_parse_typeddictreads__required_keys__/__optional_keys__directly. Underfrom __future__ import annotationswith class-syntax TypedDicts,typing_extensionscan't see theNotRequiredwrapper at class-creation time, so an optional key can be mis-classified as required. The robust fix is to resolve hints viatyping_extensions.get_type_hints(..., include_extras=True)and inspect forRequired/NotRequired. I've left this for the follow-up once the overall direction is confirmed, since none of the three pilot helpers are TypedDicts.If the direction looks right, I'll migrate the remaining
parse_*call sites incrementally in follow-up commits.Closes #3285 once fully migrated (draft for now).