fix: detect ScyllaDB via SUPPORTED protocol extensions, not shard count#910
fix: detect ScyllaDB via SUPPORTED protocol extensions, not shard count#910mykaul wants to merge 1 commit into
Conversation
|
Important Review skippedNo new commits to review since the last review. ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthrough
Sequence Diagram(s)sequenceDiagram
participant ProtocolFeatures
participant ControlConnection
participant SchemaParserV3
ProtocolFeatures->>ProtocolFeatures: parse_from_supported(supported)
ProtocolFeatures->>ProtocolFeatures: detect_scylla(supported, sharding_info)
ProtocolFeatures-->>ControlConnection: is_scylla
ControlConnection->>ControlConnection: set _uses_peers_v2 and metadata timeout
ControlConnection->>SchemaParserV3: _is_not_scylla(features)
SchemaParserV3-->>ControlConnection: not is_scylla
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The driver previously identified ScyllaDB by the presence of shard-related fields (SCYLLA_NR_SHARDS, SCYLLA_SHARD, etc.) in the SUPPORTED response. When shard-awareness is disabled on the server side (allow_shard_aware_drivers: false) those fields are absent, causing the driver to misidentify a ScyllaDB cluster as Cassandra. The immediate consequence is that the driver enables peers_v2 (which ScyllaDB does not support) and fails to connect -- the same regression described in scylladb/gocql#902. Fix (mirrors scylladb/gocql#903): * Add ProtocolFeatures.is_scylla: set to True whenever ANY known Scylla-specific extension key (SCYLLA_LWT_ADD_METADATA_MARK, SCYLLA_RATE_LIMIT_ERROR, TABLETS_ROUTING_V1) is present in the SUPPORTED response, or whenever sharding_info is populated. This flag stays True even when sharding is disabled. * ProtocolFeatures.sharding_info remains None when sharding is disabled, so shard-aware connection pooling (pool.py) is correctly left inactive. * cluster.py: use is_scylla (not sharding_info is not None) to gate the peers_v2 disable and the USING TIMEOUT metadata request timeout. * metadata.py _is_not_scylla(): use is_scylla instead of the previous check (shard_id is None), which was always False after the OPTIONS exchange and therefore silently broke Cassandra trigger-metadata queries. * protocol_features.py parse_sharding_info(): fix a latent crash -- if SCYLLA_PARTITIONER or SCYLLA_SHARDING_ALGORITHM were present without SCYLLA_SHARD, int(None) would raise TypeError. Default shard_id to 0. Fixes: scylladb/python-driver#<TBD> See also: scylladb/gocql#902, scylladb/gocql#903 Fixes: scylladb#909
a64d05e to
bea45ac
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes ScyllaDB detection by using Scylla-specific SUPPORTED protocol extension keys (and sharding info when present) instead of relying on shard-related fields alone, preventing misclassification as Cassandra when shard-awareness is disabled and avoiding peers_v2-related connection failures.
Changes:
- Add
ProtocolFeatures.is_scyllaand implement Scylla detection via known Scylla-specificSUPPORTEDextension keys. - Harden sharding parsing/initialization against missing shard fields to prevent
int(None)crashes. - Gate peers_v2 usage,
USING TIMEOUT, and trigger-metadata behavior onis_scyllainstead of shard-awareness heuristics; add unit tests for detection/regressions.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
cassandra/protocol_features.py |
Introduces is_scylla detection and adjusts sharding parsing to avoid crashes. |
cassandra/shard_info.py |
Guards integer conversion for missing shard-related fields. |
cassandra/c_shard_info.pyx |
Mirrors shard-info conversion guards in the Cython implementation. |
cassandra/cluster.py |
Switches peers_v2 disabling and metadata timeout gating to is_scylla. |
cassandra/metadata.py |
Updates Scylla-vs-not logic to use features.is_scylla. |
tests/unit/test_protocol_features.py |
Adds unit tests for Scylla detection and crash regressions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| resolved_shard_id = int(shard_id) if shard_id is not None else 0 | ||
| return resolved_shard_id, _ShardingInfo(shard_id, shards_count, partitioner, sharding_algorithm, sharding_ignore_msb, | ||
| shard_aware_port, shard_aware_port_ssl) |
| # SCYLLA_PARTITIONER passes the sharding guard, so sharding_info is populated | ||
| # with zero defaults rather than crashing. | ||
| assert pf.sharding_info is not None | ||
| assert pf.sharding_info.shards_count == 0 | ||
| assert pf.sharding_info.sharding_ignore_msb == 0 |
| assert pf.is_scylla is True | ||
| assert pf.sharding_info is not None | ||
| assert pf.sharding_info.shards_count == 0 |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/unit/test_protocol_features.py (1)
79-108: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAssert the normalized
pf.shard_idin these regressions.These cases exercise the missing-
SCYLLA_SHARDpath, but they never check the new top-level normalization to0. Addingassert pf.shard_id == 0would pin the actual fix and catch a regression back toNone.Suggested assertions
def test_scylla_without_sharding_no_crash(self): ... assert pf.is_scylla is True + assert pf.shard_id == 0 assert pf.sharding_info is not None assert pf.sharding_info.shards_count == 0 assert pf.sharding_info.sharding_ignore_msb == 0 def test_scylla_sharding_algorithm_only_no_crash(self): ... assert pf.is_scylla is True + assert pf.shard_id == 0 assert pf.sharding_info is not None assert pf.sharding_info.shards_count == 0🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/test_protocol_features.py` around lines 79 - 108, The regression tests in test_scylla_without_sharding_no_crash and test_scylla_sharding_algorithm_only_no_crash only verify sharding_info, but they do not assert the new normalized top-level shard_id value. Update both ProtocolFeatures.parse_from_supported regression cases to also check that pf.shard_id is 0, so the tests pin the missing-SCYLLA_SHARD normalization behavior and catch any return to None.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/unit/test_protocol_features.py`:
- Around line 79-108: The regression tests in
test_scylla_without_sharding_no_crash and
test_scylla_sharding_algorithm_only_no_crash only verify sharding_info, but they
do not assert the new normalized top-level shard_id value. Update both
ProtocolFeatures.parse_from_supported regression cases to also check that
pf.shard_id is 0, so the tests pin the missing-SCYLLA_SHARD normalization
behavior and catch any return to None.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 6e195da2-f03f-4a94-a124-f7657f65a989
📒 Files selected for processing (6)
cassandra/c_shard_info.pyxcassandra/cluster.pycassandra/metadata.pycassandra/protocol_features.pycassandra/shard_info.pytests/unit/test_protocol_features.py
Summary
The driver previously identified ScyllaDB solely by the presence of shard-related fields (
SCYLLA_NR_SHARDS,SCYLLA_SHARD, etc.) in theSUPPORTEDresponse.When shard-awareness is disabled on the server side (
allow_shard_aware_drivers: false), those fields are absent, causing the driver to misidentify a ScyllaDB cluster as Cassandra.The immediate consequence is that the driver leaves
peers_v2enabled (which ScyllaDB does not support) and fails to connect — the same regression described in scylladb/gocql#902.This fix mirrors scylladb/gocql#903.
Changes
cassandra/protocol_features.pyProtocolFeatures.is_scyllabool (defaultFalse), set by the newdetect_scylla()static method.detect_scylla()returnsTruewhen any known Scylla-specific extension key (SCYLLA_LWT_ADD_METADATA_MARK,SCYLLA_RATE_LIMIT_ERROR,TABLETS_ROUTING_V1) is present in theSUPPORTEDresponse, or whensharding_infois populated.These keys are advertised by ScyllaDB regardless of whether shard-awareness is enabled, so
is_scyllastaysTrueeven with sharding disabled.parse_sharding_info:int(shard_id)raisedTypeErrorwhenSCYLLA_PARTITIONER/SCYLLA_SHARDING_ALGORITHMwas present withoutSCYLLA_SHARD. Default to0.cassandra/shard_info.py/cassandra/c_shard_info.pyx_ShardingInfo.__init__/ShardingInfo.__init__: guardint(shards_count)andint(sharding_ignore_msb)againstNone(default to0), matching the existing pattern used forshard_aware_port. Fixes the remaining crash path where a shard-defining field was present withoutSCYLLA_NR_SHARDS.cassandra/cluster.py_uses_peers_v2disable and_metadata_request_timeoutnow gate onis_scyllainstead ofsharding_info is not None, so both work correctly when sharding is disabled.cassandra/metadata.py_is_not_scylla()now usesnot is_scyllainstead ofshard_id is None.The old check was always
Falseafter the OPTIONS exchange (Cassandra getsshard_id=0, and0 is NoneisFalse), meaning trigger-metadata queries were silently skipped even for Cassandra/DSE connections.tests/unit/test_protocol_features.pyint(None)paths.Shard-aware pooling is unaffected
pool.pycontinues to gate per-shard connection opening onsharding_info is not None.A ScyllaDB node with sharding disabled correctly gets
is_scylla=True(so peers_v2 is disabled and USING TIMEOUT works) butsharding_info=None(so no per-shard connections are opened). This matches the gocql#903 fix'sisScyllaConn() && nrShards != 0guard.Note on Cassandra behavioral change
_is_not_scylla()was previously always returningFalsepost-OPTIONS for Cassandra (shard_id=0,0 is None = False), sosystem_schema.triggersqueries were silently skipped even for Cassandra/DSE. With this fix, Cassandra/DSE connectionswill correctly issue the trigger-metadata query and populate
TableMetadata.triggers.See also