From 4c58ac66fadaf158450eece033a2aafdc798524b Mon Sep 17 00:00:00 2001 From: Maciej Olko Date: Sat, 20 Jun 2026 00:47:46 +0200 Subject: [PATCH 1/2] Update language tag guidelines in PEP 545 Cover usage of script subtags in IETF language tags. --- peps/pep-0545.rst | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/peps/pep-0545.rst b/peps/pep-0545.rst index 633d497b44a..3cb5fb825f8 100644 --- a/peps/pep-0545.rst +++ b/peps/pep-0545.rst @@ -187,15 +187,25 @@ Language Tag '''''''''''' A common notation for language tags is the :rfc:`IETF Language Tag <5646>` -[4]_ based on ISO 639, although gettext uses ISO 639 tags with -underscores (ex: ``pt_BR``) instead of dashes to join tags [5]_ -(ex: ``pt-BR``). Examples of IETF Language Tags: ``fr`` (French), -``ja`` (Japanese), ``pt-BR`` (Orthographic formulation of 1943 - -Official in Brazil). +[4]_ (BCP 47, RFC 5646), which is based on ISO 639 for language codes, +ISO 15924 for script codes, and ISO 3166 for region codes. Gettext uses +ISO 639 tags with underscores (e.g. ``pt_BR``), but IETF tags use hyphens +as separators instead of dashes to join tags [5]_ (e.g. ``pt-BR``). -It is more common to see dashes instead of underscores in URLs [6]_, -so we should use IETF language tags, even if sphinx uses gettext -internally: URLs are not meant to leak the underlying implementation. +Examples of IETF Language Tags: + +* ``fr`` (French), +* ``ja`` (Japanese), +* ``pt-br`` (Portugese as spoken in Brazil), +* ``pa-guru`` (Punjabi written in Gurmukhi script) + +The `script` subtag is used when a language can be written in multiple +writing systems. For example, Punjabi can be written in Gurmukhi (``pa-guru``) +or Shahmukhi (``pa-arab``). + +It is more common to see hyphens instead of underscores in URLs [6]_, +so we should use IETF language tags in URL paths, even if Sphinx or Gettext use +different internal conventions. URLs should not leak implementation details. It's uncommon to see capitalized letters in URLs, and docs.python.org doesn't use any, so it may hurt readability by attracting the eye on it, @@ -206,10 +216,10 @@ states that tags are not case sensitive. As the RFC allows lower case, and it enhances readability, we should use lowercased tags like ``pt-br``. -We may drop the region subtag when it does not add distinguishing +We may drop the subtag when it does not add distinguishing information, for example: "de-DE" or "fr-FR". (Although it might make sense, respectively meaning "German as spoken in Germany" -and "French as spoken in France"). But when the region subtag +and "French as spoken in France"). But when the subtag actually adds information, for example "pt-BR" or "Portuguese as spoken in Brazil", it should be kept. From 49141e07ed9b8c11faee8439cb63b2e0369c33b1 Mon Sep 17 00:00:00 2001 From: Maciej Olko Date: Sat, 20 Jun 2026 01:08:16 +0200 Subject: [PATCH 2/2] Fix backticks in PEP 545 language subtag description --- peps/pep-0545.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/peps/pep-0545.rst b/peps/pep-0545.rst index 3cb5fb825f8..066bd5d9714 100644 --- a/peps/pep-0545.rst +++ b/peps/pep-0545.rst @@ -199,7 +199,7 @@ Examples of IETF Language Tags: * ``pt-br`` (Portugese as spoken in Brazil), * ``pa-guru`` (Punjabi written in Gurmukhi script) -The `script` subtag is used when a language can be written in multiple +The ``script`` subtag is used when a language can be written in multiple writing systems. For example, Punjabi can be written in Gurmukhi (``pa-guru``) or Shahmukhi (``pa-arab``).