Skip to content

IGNITE-28806 Fix dynamic cache group affinity init during first local join#13263

Open
oleg-vlsk wants to merge 1 commit into
apache:masterfrom
oleg-vlsk:IGNITE-28806
Open

IGNITE-28806 Fix dynamic cache group affinity init during first local join#13263
oleg-vlsk wants to merge 1 commit into
apache:masterfrom
oleg-vlsk:IGNITE-28806

Conversation

@oleg-vlsk

Copy link
Copy Markdown
Contributor

… join

Thank you for submitting the pull request to the Apache Ignite.

In order to streamline the review of the contribution
we ask you to ensure the following steps have been taken:

The Contribution Checklist

  • There is a single JIRA ticket related to the pull request.
  • The web-link to the pull request is attached to the JIRA ticket.
  • The JIRA ticket has the Patch Available state.
  • The pull request body describes changes that have been made.
    The description explains WHAT and WHY was made instead of HOW.
  • The pull request title is treated as the final commit message.
    The following pattern must be used: IGNITE-XXXX Change summary where XXXX - number of JIRA issue.
  • A reviewer has been mentioned through the JIRA comments
    (see the Maintainers list)
  • The pull request has been checked by the Teamcity Bot and
    the green visa attached to the JIRA ticket (see tab PR Check at TC.Bot - Instance 1 or TC.Bot - Instance 2)

Notes

If you need any help, please email dev@ignite.apache.org or ask anу advice on http://asf.slack.com #ignite channel.

Comment on lines +2094 to +2115
private boolean skipNotStartedDynamicGroupOnFirstLocalJoin(
GridDhtPartitionsExchangeFuture fut,
CacheGroupDescriptor desc,
@Nullable CacheGroupContext grp,
boolean newAff
) {
if (grp != null)
return false;

if (newAff)
return false;

if (!firstLocalJoinExchange(fut))
return false;

AffinityTopologyVersion grpStartTopVer = desc.startTopologyVersion();

if (grpStartTopVer == null)
return false;

return grpStartTopVer.after(fut.initialVersion());
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make smth like

/** */
private boolean skipNotStartedDynamicGroup(
    GridDhtPartitionsExchangeFuture fut,
    CacheGroupDescriptor desc,
    @Nullable CacheGroupContext grp,
    boolean newAff
) {
    if (grp != null || newAff)
        return false;

    AffinityTopologyVersion grpStartTopVer = desc.startTopologyVersion();

    return grpStartTopVer != null && grpStartTopVer.after(fut.initialVersion());
}

To make this check based on the cache group start version instead of the first join?

Comment on lines +46 to +85
private static final String ERR_MSG = "Invalid exchange futures state";

/** */
private static final int SRV_NODES = 3;

/** */
private static final int CLIENT_THREADS = 32;

/** */
private static final int RESTART_CNT = 10;

/** */
private static final int FIRST_CLIENT_PORT = 10800;

/** */
private static final int CACHE_CFG_CNT = 30;

/** */
private final AtomicBoolean stopClients = new AtomicBoolean();

/** */
private final AtomicInteger clientCreateSuccesses = new AtomicInteger();

/** */
private final AtomicReference<Throwable> caughtErr = new AtomicReference<>();

/** */
private final CountDownLatch caughtLatch = new CountDownLatch(1);

/** */
private final Map<Integer, ClientCacheConfiguration> cacheConfs = new HashMap<>();

/** */
private Thread.UncaughtExceptionHandler oldUncaughtHnd;

/** */
private IgniteInternalFuture<?> startFut;

/** */
private IgniteInternalFuture<?> clientFut;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test is too heavy for this fix. It looks more like a stress/probabilistic test than a deterministic regression test. Even if the bug appears again, this test can still pass if the race does not happen during the run

I suggest reverting this test in a separate commit, in case another reviewer asks to restore it later. Instead, we can add a smaller happy-path test to check that we do not skip extra cache groups by mistake

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants