Skip to content

Commit 52ebbf5

Browse files
committed
strings: add French "café" to the first comparison
The first cell compared "hello" (pure ASCII, 1 byte/char) with "สวัสดี" (Thai, 3 bytes/char) but skipped the middle ground. Adding "café" between them gives a three-row reading: English hello 5 code points 5 bytes (ASCII) French café 4 code points 5 bytes (é = 2 UTF-8 bytes) Thai สวัสดี 6 code points 18 bytes (each char = 3 bytes) The example already used "café" in the third cell to demonstrate strip/upper/encode; lifting it into the opening comparison links the three cells around one progression of byte costs and removes the jump from "ASCII" to "completely non-Latin" without a stepping stone. Both the :::program block (runnable code) and the matching :::cell (walkthrough) updated together; the cell's expected output now shows the three-row table; example_loader verifies the cell output matches what the program prints.
1 parent 5958402 commit 52ebbf5

2 files changed

Lines changed: 7 additions & 4 deletions

File tree

src/asset_manifest.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
# Generated by scripts/fingerprint_assets.py. Do not edit by hand.
22
ASSET_PATHS = {'SITE_CSS': '/site.1452cc5609f2.css', 'SYNTAX_JS': '/syntax-highlight.3b6c7f730d46.js', 'EDITOR_JS': '/editor.dd81f5171b14.js'}
3-
HTML_CACHE_VERSION = '54735cba7a40'
3+
HTML_CACHE_VERSION = '0405776651fd'

src/example_sources/strings.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,10 @@ Use `str` when you mean text, and encode to `bytes` only at boundaries such as f
1515
:::program
1616
```python
1717
english = "hello"
18+
french = "café"
1819
thai = "สวัสดี"
1920

20-
for label, word in [("English", english), ("Thai", thai)]:
21+
for label, word in [("English", english), ("French", french), ("Thai", thai)]:
2122
print(label, word, len(word), len(word.encode("utf-8")))
2223

2324
print(thai[0])
@@ -32,18 +33,20 @@ print(clean.encode("utf-8"))
3233
:::
3334

3435
:::cell
35-
Compare an English greeting with a Thai greeting. Both are Python `str` values, but UTF-8 uses one byte for each ASCII code point and multiple bytes for many non-ASCII code points.
36+
Compare three words by code-point count and UTF-8 byte count. ASCII characters take one byte each (`hello` → 5 bytes); the `é` in `café` is one code point but two UTF-8 bytes; each Thai character takes three. The `str` type abstracts over all three.
3637

3738
```python
3839
english = "hello"
40+
french = "café"
3941
thai = "สวัสดี"
4042

41-
for label, word in [("English", english), ("Thai", thai)]:
43+
for label, word in [("English", english), ("French", french), ("Thai", thai)]:
4244
print(label, word, len(word), len(word.encode("utf-8")))
4345
```
4446

4547
```output
4648
English hello 5 5
49+
French café 4 5
4750
Thai สวัสดี 6 18
4851
```
4952
:::

0 commit comments

Comments
 (0)