Skip to content

Zero-downtime migration tooling; widen File.file_size to bigint (expand stage)#5986

Open
rtibbles wants to merge 3 commits into
learningequality:hotfixesfrom
rtibbles:widen_file_size_bigint
Open

Zero-downtime migration tooling; widen File.file_size to bigint (expand stage)#5986
rtibbles wants to merge 3 commits into
learningequality:hotfixesfrom
rtibbles:widen_file_size_bigint

Conversation

@rtibbles

@rtibbles rtibbles commented Jun 24, 2026

Copy link
Copy Markdown
Member

Summary

Add tooling and playbook for zero downtime migrations for changing field types, e.g. int to bigint, or char to uuid - implement start of migration for file size on File object
Create reusable tooling for migration of file_size and other fields (like our CharField UUIDs to proper UUIDFields) - allow upload of files bigger than 2.1GB > int max value
Implemented with clear tooling to help ensure we implement with guardrails and clear repeatable steps.

References

Fixes #5973
Fixes #5974

Reviewer guidance

  • DB engine swap (settings.py, production_settings.py): DB backend now routes through django-pg-zero-downtime-migrations.
  • Migration 0167: nullable shadow column + concurrent index + mirror trigger — all safe DDL (the backend forces atomic=False, so the in-migration CREATE INDEX CONCURRENTLY is fine).
  • deploy-migrate runs the backfill this release. New writes dual-write via the trigger.
  • New CI step lints PR migrations against the merge-base.

AI usage

Used Claude Code to implement the tooling (safe DB backend wiring, the mirror_field dual-write trigger, the backfill_column command) and draft the runbook. I reviewed the output across several local review rounds for correctness, tightened it to Studio's conventions, and ran the test suite.

@rtibbles rtibbles marked this pull request as ready for review June 24, 2026 02:43
@bjester bjester self-assigned this Jun 26, 2026

@bjester bjester left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the code looks good and makes sense. I have lots of comments and a few questions for deliberation. After that, I'll look at testing it

Comment thread contentcuration/contentcuration/management/commands/backfill_column.py Outdated
Comment thread contentcuration/contentcuration/management/commands/backfill_column.py Outdated
Comment thread contentcuration/contentcuration/management/commands/backfill_column.py Outdated
Comment thread contentcuration/contentcuration/models.py
Comment thread docs/zero_downtime_migrations.md Outdated
Comment thread docs/zero_downtime_migrations.md Outdated
Comment thread docs/zero_downtime_migrations.md Outdated
Comment thread docs/zero_downtime_migrations.md Outdated
Comment thread contentcuration/contentcuration/management/commands/backfill_column.py Outdated
Comment thread contentcuration/contentcuration/models.py Outdated
rtibbles and others added 3 commits June 30, 2026 14:35
- Safe-DDL backend: lock timeout, fail-fast, runtime unsafe-op guard
- CI linting of new migrations on pull requests
- Declarative dual-write trigger decorator (mirror_field)
- Reusable batched-backfill command (idempotent, resumable, throttled)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LfZvkigk8hdsKdEif3hzBi
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LfZvkigk8hdsKdEif3hzBi
Expand stage of the zero-downtime int->bigint widening:
- Add nullable file_size_bigint shadow column and its index (built CONCURRENTLY).
- Mirror file_size into it via the change-guarded @mirror_field trigger.
- Wire the online backfill as a commented deploy-migrate step.
- Stage the cutover and contract steps as comments on the File model.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LfZvkigk8hdsKdEif3hzBi
@rtibbles rtibbles force-pushed the widen_file_size_bigint branch from 4f79103 to 84f402d Compare June 30, 2026 21:37
@bjester

bjester commented Jul 2, 2026

Copy link
Copy Markdown
Member

So I have a slightly older SQL dump with a bunch of data that I tried to use to test this out. While this won't occur on rollout because these migrations have already run, it does appear that the zero-downtime package aggressively blocks migrations that it deems are not zero-downtime. This likely explains why AI decided to put the clear documented note about not doing column renames:

  Applying contentcuration.0159_update_community_library_submission_date_updated...Traceback (most recent call last):
  File "/home/bjester/Projects/learningequality/studio/widen_file_size_bigint/contentcuration/manage.py", line 11, in <module>
    execute_from_command_line(sys.argv)
  File "/home/bjester/Projects/learningequality/studio/widen_file_size_bigint/.venv/lib/python3.10/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
    utility.execute()
  File "/home/bjester/Projects/learningequality/studio/widen_file_size_bigint/.venv/lib/python3.10/site-packages/django/core/management/__init__.py", line 413, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/bjester/Projects/learningequality/studio/widen_file_size_bigint/.venv/lib/python3.10/site-packages/django/core/management/base.py", line 354, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/bjester/Projects/learningequality/studio/widen_file_size_bigint/.venv/lib/python3.10/site-packages/django/core/management/base.py", line 398, in execute
    output = self.handle(*args, **options)
  File "/home/bjester/Projects/learningequality/studio/widen_file_size_bigint/.venv/lib/python3.10/site-packages/django/core/management/base.py", line 89, in wrapped
    res = handle_func(*args, **kwargs)
  File "/home/bjester/Projects/learningequality/studio/widen_file_size_bigint/.venv/lib/python3.10/site-packages/django/core/management/commands/migrate.py", line 244, in handle
    post_migrate_state = executor.migrate(
  File "/home/bjester/Projects/learningequality/studio/widen_file_size_bigint/.venv/lib/python3.10/site-packages/django/db/migrations/executor.py", line 117, in migrate
    state = self._migrate_all_forwards(state, plan, full_plan, fake=fake, fake_initial=fake_initial)
  File "/home/bjester/Projects/learningequality/studio/widen_file_size_bigint/.venv/lib/python3.10/site-packages/django/db/migrations/executor.py", line 147, in _migrate_all_forwards
    state = self.apply_migration(state, migration, fake=fake, fake_initial=fake_initial)
  File "/home/bjester/Projects/learningequality/studio/widen_file_size_bigint/.venv/lib/python3.10/site-packages/django/db/migrations/executor.py", line 227, in apply_migration
    state = migration.apply(state, schema_editor)
  File "/home/bjester/Projects/learningequality/studio/widen_file_size_bigint/.venv/lib/python3.10/site-packages/django/db/migrations/migration.py", line 126, in apply
    operation.database_forwards(self.app_label, schema_editor, old_state, project_state)
  File "/home/bjester/Projects/learningequality/studio/widen_file_size_bigint/.venv/lib/python3.10/site-packages/django/db/migrations/operations/fields.py", line 350, in database_forwards
    schema_editor.alter_field(
  File "/home/bjester/Projects/learningequality/studio/widen_file_size_bigint/.venv/lib/python3.10/site-packages/django_zero_downtime_migrations/backends/postgres/schema.py", line 374, in alter_field
    super().alter_field(model, old_field, new_field, strict)
  File "/home/bjester/Projects/learningequality/studio/widen_file_size_bigint/.venv/lib/python3.10/site-packages/django/db/backends/base/schema.py", line 608, in alter_field
    self._alter_field(model, old_field, new_field, old_type, new_type,
  File "/home/bjester/Projects/learningequality/studio/widen_file_size_bigint/.venv/lib/python3.10/site-packages/pgtrigger/migrations.py", line 336, in _alter_field
    return super()._alter_field(
  File "/home/bjester/Projects/learningequality/studio/widen_file_size_bigint/.venv/lib/python3.10/site-packages/django/db/backends/postgresql/schema.py", line 196, in _alter_field
    super()._alter_field(
  File "/home/bjester/Projects/learningequality/studio/widen_file_size_bigint/.venv/lib/python3.10/site-packages/django/db/backends/base/schema.py", line 705, in _alter_field
    self.execute(self._rename_field_sql(model._meta.db_table, old_field, new_field, new_type))
  File "/home/bjester/Projects/learningequality/studio/widen_file_size_bigint/.venv/lib/python3.10/site-packages/django_zero_downtime_migrations/backends/postgres/schema.py", line 401, in _rename_field_sql
    raise UnsafeOperationException(Unsafe.ALTER_TABLE_RENAME_COLUMN)
django_zero_downtime_migrations.backends.postgres.schema.UnsafeOperationException: ALTER TABLE RENAME COLUMN is unsafe operation

@bjester

bjester commented Jul 2, 2026

Copy link
Copy Markdown
Member

So besides the aforementioned potential issue, testing this looks good. Migration ran 10k chunks in about 1 second each. I killed it half way through, and re-ran with --start-id. The trigger also works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants