Unicode representation of the units and prefixes #91
fmeynadier
started this conversation in
General
Replies: 1 comment 1 reply
-
|
I would like to propose considering a case for the symbol of degree in the table:
Note that the Unicode point U+00BA is not that common and is sometimes underscored (e.g. File formats like PDF that use visual glyphs instead of codepoints can produce either of those two upon copy-pasting. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This is a draft proposal to guide the choice of Unicode points to be used as symbols for units and prefixes defined in the SI Brochure.
It follows a discussion on TheBIPM/SI_Digital_Framework#15 and various exchanges since.
Symbols in the Brochure were introduced when it was primarily distributed as a printed booklet, so it was enough to simply print a symbol. For example, the 12th resolution of the 11th CGPM in 1960 (https://www.bipm.org/en/-/resolution-cgpm-11-12) introduced the symbol to be used for the prefix "micro" by simply printing a table where the symbol µ appeared. It was not specified that it was "the greek letter micro in lower case", nor was it specified that the other symbols where the latin letters, as it was implicit from the initials of the corresponding units and prefixes.
When it became necessary to encode characters in computers, early standardized sets (like ASCII) where limited to latin alphabet, digits, and a limited number of additional symbols. Various extensions to this character set have been developped, to allow the transcription of languages using different alphabets or systems.
The Unicode standard, first published in 1991, was a proposal to unify all the various character sets that had been developped around the world. It is now the de facto standard for character encoding.
By aggregating all previous systems it also has some duplicate symbols, kept for compatibility with previous choices.
For example, µ (U+00B5 MICRO SIGN) is an heritage from the ISO 8859 (latin 1), and its representation is identical to μ (U+03BC GREEK SMALL LETTER MU). Similarly, Ω (U+2126 OHM SIGN) is considered (by Unicode) as backward compatibility for Ω (U+03A9 GREEK CAPITAL LETTER OMEGA).
Moreover, combined characters like "℃ (U+2103 DEGREE CELSIUS)" exist but are now discouraged in favor of the two characters "°C U+00B0 DEGREE SIGN + U+0043 LATIN CAPITAL LETTER C". The fact that "K (U+212A K KELVIN SIGN)" exists can be a consequence to the (now abolished) use of the "degree" symbol in front of K, as it it otherwise undistinguishable from "K (U+004B LATIN CAPITAL LETTER K".
The Unicode consortium has already identified some cases and recommends to use the regular alphabet counterpart when available ( https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-22/#G20445 )
So the multiplicity of codepoints for certain symbols does not appear to be the result of a voluntary distinction, but rather the result of historical accidents during the development of the character encoding systems. For units and prefixes that could be represented in the latin alphabet, i.e. the initial ASCII character set, no specific code point have been created (with the notable exception of the kelvin, symbol K, as mentionned above).
It seems reasonable to assume that the symbols that are designated in the SI Brochure are indeed the corresponding letters of the latin and, sometimes, greek alphabet, even if it is implicit.
Other symbols used like °, ′ , ″ for degree, minute and seconds do not belong to alphabets but are punctuation marks that also exist in different forms. For minute and second, the symbol in the brochure is clearly slanted so it seems unambiguous that the PRIME and DOUBLE PRIME unicode points are a better match than the ' and " symbols that are commonly used for programming, and are therefore usually easier to type.
It could therefore be useful to clarify, e.g. as follows:
Beta Was this translation helpful? Give feedback.
All reactions