gh-151722: Fix data race on frozendict reads during construction in free-threading build#151724
Open
tonghuaroot wants to merge 3 commits into
Open
gh-151722: Fix data race on frozendict reads during construction in free-threading build#151724tonghuaroot wants to merge 3 commits into
tonghuaroot wants to merge 3 commits into
Conversation
…n in free-threading build len(), repr() and hash() on a frozendict read internal state without synchronization. That is safe for a finished frozendict but not while one is being constructed from a user mapping, where the half-built object is already observable from other threads. Read ma_used through GET_USED() and run the repr/hash entry-table walk inside a critical section, matching dict_repr / set_repr and set_len; the cached hash short-circuit stays lock-free like frozenset_hash.
Member
|
Let me take a look at better solution if possible, cc @vstinner |
Member
|
I prefer #151740 rather than this PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In the free-threading build,
len(fd),repr(fd)andhash(fd)on afrozendict(PEP 814) read the object's internal state withoutsynchronization. That is safe for a finished
frozendict, but not while one isbeing constructed from a user mapping: the half-built object is already
GC-tracked and reachable from other threads (the generic
frozendict_new→dict_mergepath runs userkeys()/__getitem__, which canleak it via
gc.get_objects()), so a reader races the construction-time insertsand resizes. ThreadSanitizer reports a data race, and a reader can observe the
frozendict's length and contents change, which contradicts its immutability.What
frozendict_length, thefrozendict_repr/copy_lock_heldempty-guards, andthe
frozendict_hashactive-entry factor now readma_usedthroughGET_USED()(FT_ATOMIC_LOAD_SSIZE_RELAXED), matchingdict_lengthandset_len.frozendict_reprnow runs the_PyDict_Nextentry-table walk(
anydict_repr_impl) insidePy_BEGIN_CRITICAL_SECTION, exactly likedict_reprand likeset_repr(thetp_reprofset/frozenset).frozendict_hashis split into afrozendict_hash_lock_heldhelper (theentry-table walk) wrapped in
Py_BEGIN_CRITICAL_SECTION, with the cachedma_hashshort-circuit left lock-free (atomic load), mirroringfrozenset_hash/frozenset_hash_impl.Why this matches the existing conventions
These are not new design choices; they mirror what
dictobject.candsetobject.calready do:dict_reprwrapsanydict_repr_implinPy_BEGIN_CRITICAL_SECTION(self).set_reprwrapsset_repr_lock_heldthe same way, and it is thetp_reprfor both
PySet_TypeandPyFrozenSet_Type— so a frozen type already uses acritical-sectioned repr.
holds
Py_BEGIN_CRITICAL_SECTION(a)on the target in the genericdict_mergebranch, and inserts run under it (
setitem_lock_held). A reader that takesthe object's critical section therefore serializes against construction.
frozendict_hashwas copied fromfrozenset_hash; keeping the cached-hashshort-circuit lock-free with an atomic load keeps that parity.
ma_usedreads going throughGET_USED()matchesdict_lengthandset_len.The construction (write) side already uses
STORE_USED(
FT_ATOMIC_STORE_SSIZE_RELAXED) and a critical section, and is unchanged.Tests / verification
On a free-threading + ThreadSanitizer build
(
./configure --disable-gil --with-thread-sanitizer), HEAD9688d252d330b0b586760a121ee8c8f7215176e8:len/repr/hashon afrozendictthat is stillbeing constructed (leaked via
gc.get_objects()from__getitem__) reportszero ThreadSanitizer warnings with this change, where unpatched
mainreports data races on all three.
__getitem__,get,in, iteration,__eq__) were already free-threading-safe and remain clean.FrozenDictTests/FrozenDictMappingTests(21 tests),test_setandtest_free_threadingall pass on the same build;test_dictpasses.Note on
frozendictcopycopy_lock_heldwas deliberately made lock-free forfrozendictsources in#145920. It reads source state the same way and so has the same latent
construction-time race, but re-adding a critical section there would revert that
performance decision, so this PR does not touch the copy table walk (only the
scalar
ma_usedguard, which is a pure atomicity fix). The construction-windowquestion for copy is discussed in the linked issue.