Skip to content

lang attribute report#26

Merged
cpeel merged 1 commit intoDistributedProofreaders:masterfrom
tangledhelix:lang-report
Apr 5, 2026
Merged

lang attribute report#26
cpeel merged 1 commit intoDistributedProofreaders:masterfrom
tangledhelix:lang-report

Conversation

@tangledhelix
Copy link
Copy Markdown
Member

Reports on 'lang' attributes in two modes.

In regular mode, counts all lang attributes by language, producing a report of only how many tags have a lang attribute for each language.

In verbose mode, print a report three ways:

  1. sort by tag, then language
  2. sort by tag content
  3. sort by lanuage, then content

This ports a feature from the legacy 'pptools' software.

Testing notes

A file with no lang= attributes should produce:

[info] lang attribute report
       0 unique elements with 'lang' attribute

Make a file with some number of lang= attributes (with or without differing languages). Run with verbose mode OFF and you should get something like:

[info] lang attribute report
       232 unique elements with 'lang' attribute
         lang=de seen 26 times
         lang=fr seen 102 times
         lang=la seen 104 times

Lastly, try with verbose mode. This will produce a report, three ways, as noted above. My sample output is a bit long so I've made it easy to show or hide below.

Click to expand.
[info] lang attribute report
       59 unique elements with 'lang' attribute

       sorted by tag, then language:
         i  es  amigo
         i  es  amigo?
         i  es  azotea
         i  es  bolero
         i  es  buenos noches
         i  es  calle
         i  es  calles
         i  es  Carramba!
         i  es  centavos
         i  es  contrabandista
         i  es  Cospita!
         i  es  Cospita, hombre
         i  es  cuchillo
         i  es  dulce
         i  es  gitano
         i  es  hija de Puerto Rico
         i  es  hola
         i  es  Hola!
         i  es  Madre de Dios
         i  es  Madre de Dios!
         i  es  manana
         i  es  muchacho
         i  es  peseta
         i  es  Por Dios!
         i  es  quien vive
         i  es  siesta
         i  es  toreador
         i  es  toreadors
         i  es  volante
         i  fr  bete noire
         i  fr  bizarre
         i  fr  cafés
         i  fr  chateaux d'Espagne
         i  fr  compagnon de voyage
         i  fr  contretemps
         i  fr  coup d'etat
         i  fr  debonair
         i  fr  debris
         i  fr  dernier ressort
         i  fr  diablerie
         i  fr  en route
         i  fr  entree
         i  fr  hors de combat
         i  fr  preux chevalier
         i  fr  qui vive
         i  fr  rendezvous
         i  fr  repertoire
         i  fr  ruse de guerre
         i  fr  tapis
         i  fr  tete a tete
         i  fr  vis-à-vis
         i  fr  à la
         i  it  fata morgana
         i  la  dies irae
         i  la  et al.
         i  la  modus operandi
         i  la  peccavi
         i  la  sanctissima!
         i  la  sine qua non

       sorted by content:
         i  es  amigo
         i  es  amigo?
         i  es  azotea
         i  fr  bete noire
         i  fr  bizarre
         i  es  bolero
         i  es  buenos noches
         i  fr  cafés
         i  es  calle
         i  es  calles
         i  es  Carramba!
         i  es  centavos
         i  fr  chateaux d'Espagne
         i  fr  compagnon de voyage
         i  es  contrabandista
         i  fr  contretemps
         i  es  Cospita!
         i  es  Cospita, hombre
         i  fr  coup d'etat
         i  es  cuchillo
         i  fr  debonair
         i  fr  debris
         i  fr  dernier ressort
         i  fr  diablerie
         i  la  dies irae
         i  es  dulce
         i  fr  en route
         i  fr  entree
         i  la  et al.
         i  it  fata morgana
         i  es  gitano
         i  es  hija de Puerto Rico
         i  es  hola
         i  es  Hola!
         i  fr  hors de combat
         i  es  Madre de Dios
         i  es  Madre de Dios!
         i  es  manana
         i  la  modus operandi
         i  es  muchacho
         i  la  peccavi
         i  es  peseta
         i  es  Por Dios!
         i  fr  preux chevalier
         i  fr  qui vive
         i  es  quien vive
         i  fr  rendezvous
         i  fr  repertoire
         i  fr  ruse de guerre
         i  la  sanctissima!
         i  es  siesta
         i  la  sine qua non
         i  fr  tapis
         i  fr  tete a tete
         i  es  toreador
         i  es  toreadors
         i  fr  vis-à-vis
         i  es  volante
         i  fr  à la

       sorted by language, then content:
         i  es  amigo
         i  es  amigo?
         i  es  azotea
         i  es  bolero
         i  es  buenos noches
         i  es  calle
         i  es  calles
         i  es  Carramba!
         i  es  centavos
         i  es  contrabandista
         i  es  Cospita!
         i  es  Cospita, hombre
         i  es  cuchillo
         i  es  dulce
         i  es  gitano
         i  es  hija de Puerto Rico
         i  es  hola
         i  es  Hola!
         i  es  Madre de Dios
         i  es  Madre de Dios!
         i  es  manana
         i  es  muchacho
         i  es  peseta
         i  es  Por Dios!
         i  es  quien vive
         i  es  siesta
         i  es  toreador
         i  es  toreadors
         i  es  volante
         i  fr  bete noire
         i  fr  bizarre
         i  fr  cafés
         i  fr  chateaux d'Espagne
         i  fr  compagnon de voyage
         i  fr  contretemps
         i  fr  coup d'etat
         i  fr  debonair
         i  fr  debris
         i  fr  dernier ressort
         i  fr  diablerie
         i  fr  en route
         i  fr  entree
         i  fr  hors de combat
         i  fr  preux chevalier
         i  fr  qui vive
         i  fr  rendezvous
         i  fr  repertoire
         i  fr  ruse de guerre
         i  fr  tapis
         i  fr  tete a tete
         i  fr  vis-à-vis
         i  fr  à la
         i  it  fata morgana
         i  la  dies irae
         i  la  et al.
         i  la  modus operandi
         i  la  peccavi
         i  la  sanctissima!
         i  la  sine qua non

Reports on 'lang' attributes in two modes.

In regular mode, counts all lang attributes by language, producing a report of only how many tags have a lang attribute for each language.

In verbose mode, print a report three ways:

1. sort by tag, then language
2. sort by tag content
3. sort by lanuage, then content

This ports a feature from the legacy 'pptools' software.
Copy link
Copy Markdown

@windymilla windymilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks OK to me.
If you end up having trouble with the regexes not working for someone's weird HTML layout, I can thoroughly recommend HTMLParser from html.parser for things like finding all occurrences of a particular tag (used several times in GG2's PPhtml)

@cpeel cpeel merged commit e1f0326 into DistributedProofreaders:master Apr 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants