Skip to content

Commit c21bdd8

Browse files
committed
Add test suite, CI workflow, and technical documentation
This commit introduces a comprehensive testing infrastructure, including: - Unit tests for parsers and integration tests for CLI extraction. - A GitHub Actions workflow for automated CI on Linux. - ARCHITECTURE.md and AGENTS.md for system design and agent guidance. - Build fixes for Linux (link flags and libgomp paths).
1 parent c37ba48 commit c21bdd8

14 files changed

Lines changed: 380 additions & 2 deletions

File tree

.github/workflows/test.yml

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
name: Tests
2+
3+
on:
4+
push:
5+
branches: [ main, test-suite ]
6+
pull_request:
7+
branches: [ main ]
8+
9+
jobs:
10+
test:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- name: Checkout code
14+
uses: actions/checkout@v4
15+
16+
- name: Set up OCaml
17+
uses: ocaml/setup-ocaml@v3
18+
with:
19+
ocaml-compiler: 5.1.1
20+
dune-cache: true
21+
opam-repositories: |
22+
default: https://github.com/ocaml/opam-repository.git
23+
24+
- name: Install dependencies
25+
run: |
26+
sudo apt-get update
27+
sudo apt-get install -y npm xz-utils libomp-dev llvm-dev
28+
opam install . --deps-only
29+
npm install --no-save typescript browserify pug-lexer pug-parser pug-walk
30+
31+
- name: Install QuickJS
32+
run: |
33+
curl https://bellard.org/quickjs/quickjs-2021-03-27.tar.xz > quickjs.tar.xz
34+
tar xvf quickjs.tar.xz && rm quickjs.tar.xz
35+
mv quickjs-2021-03-27 quickjs
36+
cd quickjs && make
37+
38+
- name: Install Flow
39+
run: |
40+
git clone --branch v0.183.1 --depth 1 https://github.com/facebook/flow.git flow
41+
ln -s "$(pwd)/flow/src/parser" src/flow_parser
42+
ln -s "$(pwd)/flow/src/third-party/sedlex" src/sedlex
43+
ln -s "$(pwd)/flow/src/hack_forked/utils/collections" src/collections
44+
45+
- name: Run tests
46+
run: |
47+
mkdir -p strings
48+
opam exec -- dune runtest tests/

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,5 @@ bad/
2323
src/flow_parser
2424
src/sedlex
2525
src/collections
26+
27+
tests/integration_test_run/

AGENTS.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Agent Information - String Extractor
2+
3+
This repository contains an OCaml-based internationalization (i18n) string extraction tool. It parses source files (JS, TS, Vue, Pug, HTML) and extracts strings for translation management.
4+
5+
## Documentation
6+
7+
- **[ARCHITECTURE.md](ARCHITECTURE.md)**: Contains a deep-dive into the codebase layout, directory structure, and a comprehensive API reference. **Read this file first** when:
8+
- Starting a new task to understand which files are relevant.
9+
- Investigating the impact of changes across the system.
10+
- Looking for specific functionality or function definitions before searching.
11+
12+
- **[DEVELOPMENT.md](DEVELOPMENT.md)**: Contains instructions for environment setup, build processes for various platforms, and release workflows. **Read this file first** when:
13+
- Setting up the development environment or installing dependencies (OCaml, JS, QuickJS).
14+
- Building the project for development or release.
15+
- Executing the tool for manual verification or testing.
16+
- Managing version numbers or release artifacts.
17+
18+
## Project Overview
19+
20+
- **Language**: OCaml (5.1.1) with some C++ (QuickJS bridge) and JavaScript (parsers via Browserify).
21+
- **Architecture**:
22+
- `src/cli/`: Main entry point, command-line interface, and output generation logic.
23+
- `src/parsing/`: OCaml parsers using `Angstrom` for custom formats and `Flow_parser` for JS.
24+
- `src/quickjs/`: Bridge to QuickJS to run JavaScript-based parsers (TypeScript/Pug) from OCaml.
25+
- `src/utils/`: Common utilities for collection, timing, and I/O.
26+
- **Key Libraries**: `Core`, `Eio` (concurrency), `Angstrom` (parsing), `Yojson`, `Ppx_jane`.
27+
28+
## Essential Commands
29+
30+
### Build
31+
- **Development build**: `dune build src/cli/strings.exe`
32+
- **Watch mode**: `dune build src/cli/strings.exe -w`
33+
- **Release build (MacOS)**: `DUNE_PROFILE=release dune build src/cli/strings.exe`
34+
- **Full release cycle**: See `DEVELOPMENT.md` for `cp`, `strip`, and Docker commands.
35+
36+
### Run
37+
- After building: `./_build/default/src/cli/strings.exe [directory-to-extract-from]`
38+
- The CLI expects to be run from the root of a project containing a `strings/` directory (or it will create one if a `.git` folder is present).
39+
40+
### Installation (Dev Setup)
41+
Refer to `DEVELOPMENT.md` for specific `opam` and `npm` setup steps, as the project has several external dependencies (Flow, QuickJS, pug-lexer, etc.).
42+
43+
## Code Conventions & Patterns
44+
45+
### Parsing Strategy
46+
1. **Direct Parsers**: Simple formats like `.strings`, `HTML`, and basic `Vue` tags are parsed using `Angstrom` in `src/parsing/`.
47+
2. **JS/TS Parsing**:
48+
- Javascript uses `Flow_parser` and a custom AST walker in `src/parsing/js_ast.ml`.
49+
- TypeScript uses the official TS parser running inside QuickJS (`src/quickjs/`).
50+
3. **Pug Parsing**: Has a "fast" OCaml implementation (`src/parsing/pug.ml`) and a "slow" official Pug implementation via QuickJS (`src/quickjs/`).
51+
52+
### Extraction Pattern
53+
- Content is extracted into a `Utils.Collector.t`.
54+
- The collector tracks found strings, potential scripts (to be further parsed), and file errors.
55+
- **Convention**: Strings found inside `L("...")` calls are treated as translations in JS/TS.
56+
57+
### Concurrency
58+
- Uses OCaml 5's `Eio` for multicore processing.
59+
- Parallel traversal of directories is handled in `src/cli/strings.ml` via `Fiber.List.iter` and `Eio.Executor_pool`.
60+
- JS workers (QuickJS) are managed via a pool in `src/quickjs/quickjs.ml`.
61+
62+
## Important Gotchas
63+
64+
- **QuickJS Dependency**: Requires a compiled `quickjs` directory at the project root for building. `dune` rules in `src/quickjs/dune` copy headers and libraries from there.
65+
- **Generated Headers**: `src/quickjs/runtime.h` is generated from `src/quickjs/parsers.js` using `browserify` and `qjsc`.
66+
- **Linking**: MacOS builds use specific link flags (e.g., `ld64.lld`) defined in `src/cli/link_flags.*`.
67+
- **OCamlFormat**: `.ocamlformat` is present; ensure you format OCaml code before submitting.
68+
- **Memory Safety**: Be cautious with C++ FFI code in `src/quickjs/quickjs.cpp`, particularly regarding OCaml's GC interaction (`CAMLparam`, `CAMLreturn`, `caml_release_runtime_system`).
69+
70+
## Testing Approach
71+
72+
- **Inline Tests**: The project uses `ppx_inline_test`. Parsers in `src/parsing/` can be tested directly within the OCaml files or in the `tests/` directory.
73+
- **Test Suite**: A standard test suite is located in `tests/test_runner.ml`. It covers JS, HTML, Pug, and `.strings` file parsing.
74+
- **Integration Tests**: Verification can be performed by running the built binary against fixtures in `tests/fixtures/` and checking the generated output in the `strings/` directory.
75+
- **Debug Flags**: Use `--show-debugging` or `--debug-pug` / `--debug-html` flags in the CLI to inspect internal parsing results.
76+
77+
## Troubleshooting
78+
79+
### "File modified since last read"
80+
If you receive an error stating that a file has been **"modified since it was last read"**, it usually indicates a discrepancy between the file's filesystem timestamp and the internal state tracking.
81+
82+
**Example Error:**
83+
> `Edit failed: The file '/path/to/file' was modified since it was last read. Please read the file again before trying to edit it.`
84+
85+
**Recommended Fix:**
86+
1. Execute `touch filename` to reset the file's modification time to the current system time.
87+
2. Re-read the file using the `view` tool.
88+
3. Attempt the edit again.
89+

ARCHITECTURE.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Architecture Documentation - String Extractor
2+
3+
This document provides a high-level overview of the String Extractor's architecture, directory structure, and internal APIs.
4+
5+
## Project Entry Point
6+
7+
The main entry point of the application is **`src/cli/strings.ml`**. It handles command-line argument parsing using `Core.Command`, sets up the `Eio` runtime, and initiates the file traversal process.
8+
9+
## Directory Structure
10+
11+
```text
12+
/
13+
├── src/
14+
│ ├── cli/ # Main CLI application logic
15+
│ │ ├── strings.ml # CLI entry point, traversal coordination
16+
│ │ ├── vue.ml # Vue-specific parsing and extraction logic
17+
│ │ └── generate.ml # Localization file generation (.strings, .json)
18+
│ ├── parsing/ # Core parsers using Angstrom and Flow
19+
│ │ ├── basic.ml # Common parsing utilities and combinators
20+
│ │ ├── js_ast.ml # Flow AST walker for string extraction
21+
│ │ ├── js.ml # JavaScript string extraction entry point
22+
│ │ ├── pug.ml # Native Pug template parsing
23+
│ │ ├── html.ml # HTML template parsing
24+
│ │ ├── strings.ml # .strings file parsing logic
25+
│ │ └── ... # Other specialized parsers (vue blocks, styles)
26+
│ ├── quickjs/ # Interface to QuickJS for JS/TS/Pug parsing
27+
│ │ ├── quickjs.ml # OCaml FFI to QuickJS
28+
│ │ ├── quickjs.cpp # C++ implementation of the bridge
29+
│ │ └── parsers.js # JS-based parsers running in QuickJS
30+
│ └── utils/ # Shared utility modules
31+
│ ├── collector.ml # State container for collected strings/errors
32+
│ ├── io.ml # I/O helpers
33+
│ ├── timing.ml # Performance measurement
34+
│ └── exception.ml # Exception handling
35+
├── strings/ # Directory where .strings files are managed
36+
├── dune-project # Dune build system configuration
37+
└── README.md # Project overview and usage instructions
38+
```
39+
40+
## Core API Reference
41+
42+
### `src/cli/`
43+
- **`Strings.main`**: Coordinates the entire run, including directory traversal and result generation.
44+
- **`Vue.parse`**: Splits a `.vue` file into its constituent parts (template, script, style).
45+
- **`Generate.write_english`**: Creates `english.strings` and `english.json` from the collected strings.
46+
- **`Generate.write_other`**: Updates existing translations for other languages.
47+
48+
### `src/parsing/`
49+
- **`Parsing.Basic`**: Provides foundational Angstrom parsers for whitespace, strings, and standard error handling.
50+
- **`Parsing.Js.extract_to_collector`**: Entry point for scanning JavaScript source code.
51+
- **`Parsing.Js_ast.extract`**: A comprehensive walker for the Flow AST that identifies and extracts strings from `L("...")` calls.
52+
- **`Parsing.Pug.collect`**: Traverses the native Pug AST to extract strings.
53+
- **`Parsing.Strings.parse`**: Parses existing `.strings` files into a lookup table.
54+
55+
### `src/quickjs/`
56+
- **`Quickjs.extract_to_collector`**: Offloads extraction to QuickJS for TypeScript and advanced Pug templates.
57+
58+
### `src/utils/`
59+
- **`Utils.Collector.create`**: Initializes a new string collection state for a specific file. (type `t = { path: string; strings: string Queue.t; ... }`)
60+
- **`Utils.Collector.blit_transfer`**: Merges results from one collector into another.
61+
62+
## Control Flow
63+
1. **Initiation**: `strings.exe` starts, parses CLI flags, and identifies the target directory.
64+
2. **Traversal**: Uses `Eio` to recursively walk the directory tree.
65+
3. **Dispatch**: For each supported file extension, the corresponding parser in `src/parsing` is invoked.
66+
4. **Collection**: Parsers find strings (usually inside `L()`) and add them to a `Collector.t`.
67+
5. **Generation**: `Generate.ml` aggregates strings from all collectors and updates the `strings/` directory.
68+
69+
## Testing Setup
70+
71+
The project implements a multi-layered testing strategy:
72+
73+
1. **Inline Tests**: Using `ppx_inline_test`, logic can be tested directly within the source files. This is primarily used for parser validation in `src/parsing/`.
74+
2. **Standard Test Suite**: Located in `tests/test_runner.ml`, this suite uses `ppx_expect` and `ppx_assert` to verify:
75+
- JavaScript string extraction via `Flow_parser`.
76+
- HTML and Pug extraction via `SZXX` and `Angstrom`.
77+
- Apple-style `.strings` file parsing.
78+
3. **Integration Testing**: The `tests/fixtures/` directory contains sample files of all supported types. The CLI can be run against these fixtures to verify end-to-end extraction and output generation (`.strings` and `.json` files).
79+
80+
The `tests/dune` file configures the test library and enables inline tests for the module.

DEVELOPMENT.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,14 @@
11
## Local development
22

33
### Setup
4+
45
From the root of the repo:
6+
57
```sh
68
brew install opam libomp llvm
79

810
opam switch create . ocaml-variants.5.1.1+options --no-install
11+
eval $(opam env)
912
opam install . --deps-only -t
1013

1114
# Remove old Flow version
@@ -28,6 +31,7 @@ cd quickjs && make && cd -
2831
```
2932

3033
### MacOS - Development
34+
3135
```sh
3236
# Build
3337
dune build src/cli/strings.exe -w
@@ -37,6 +41,7 @@ cp _build/default/src/cli/strings.exe strings.mac && ./strings.mac
3741
```
3842

3943
### MacOS - Build & Run
44+
4045
```sh
4146
# Don't forget to update the version number in [strings.ml]
4247

@@ -46,6 +51,7 @@ rm -f strings.mac && dune clean && DUNE_PROFILE=release dune build src/cli/strin
4651
```
4752

4853
### Docker (Linux) - Build & Run
54+
4955
```sh
5056
# Don't forget to update the version number in [strings.ml]
5157

@@ -68,3 +74,14 @@ docker run -it --rm \
6874
## apt-get update && apt-get install musl
6975
## /app/strings.linux frontend
7076
```
77+
78+
### Testing
79+
80+
To run the automated tests and generate the translation files, first create the `strings/` directory at the project root, then run the tests. Ensure your opam environment is initialized:
81+
82+
```sh
83+
eval $(opam env)
84+
mkdir -p strings && opam exec -- dune runtest tests/
85+
```
86+
87+
This command builds the project, executes the test suite, and populates the `strings/` directory with `english.strings` (extracted from fixtures) and merged `french.strings`.

dune

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
(data_only_dirs node_modules quickjs)
1+
(data_only_dirs node_modules quickjs flow)
22
(env
33
(dev
44
(flags (:standard -warn-error -A))

src/cli/link_flags.linux.dev.dune

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
()

src/quickjs/dune

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,8 @@
5454
(targets libomp.a)
5555
(action (bash "
5656
cp /usr/local/Cellar/libomp/17.0.6/lib/libomp.a . &> /dev/null \
57-
|| cp /usr/lib/libgomp.a libomp.a
57+
|| cp /usr/lib/libgomp.a libomp.a &> /dev/null \
58+
|| cp /usr/lib/gcc/aarch64-redhat-linux/15/libgomp.a libomp.a
5859
"))
5960
(mode standard)
6061
)

tests/dune

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
(library
2+
(name parsing_tests)
3+
(inline_tests)
4+
(libraries parsing utils core eio_main)
5+
(preprocess (pps ppx_jane ppx_inline_test))
6+
)
7+
8+
(rule
9+
(alias runtest)
10+
(deps
11+
../src/cli/strings.exe
12+
(source_tree fixtures))
13+
(action
14+
(bash "
15+
TMP_DIR=\"integration_test_run\"
16+
rm -rf $TMP_DIR
17+
mkdir -p $TMP_DIR/strings
18+
mkdir -p $TMP_DIR/.git
19+
printf '\"Hello from HTML\" = \"Bonjour de HTML\";\n' > $TMP_DIR/strings/french.strings
20+
cp -r fixtures $TMP_DIR/
21+
cd $TMP_DIR
22+
../../src/cli/strings.exe fixtures --output strings &> /dev/null
23+
cd ..
24+
25+
if ! grep -q \"Bonjour de HTML\" $TMP_DIR/strings/french.strings; then
26+
echo \"Error: French translation lost in .strings\"
27+
exit 1
28+
fi
29+
if ! grep -q \"Bonjour de HTML\" $TMP_DIR/strings/french.json; then
30+
echo \"Error: French translation lost in .json\"
31+
exit 1
32+
fi
33+
if ! grep -q \"MISSING TRANSLATION - demo.pug\" $TMP_DIR/strings/french.strings; then
34+
echo \"Error: Missing translation marker not found in .strings\"
35+
exit 1
36+
fi
37+
38+
echo \"✅ French integration test passed\"
39+
rm -rf $TMP_DIR
40+
41+
# Help user populate root strings/ if it exists
42+
# We use absolute paths to ensure we hit the real source directory
43+
# even if sandboxed in _build/default/tests
44+
# The dune-project is used as a landmark for the root
45+
# Traverse up to find the root of the source tree
46+
# In dune, we are at _build/default/tests
47+
ROOT_SRC=\"$(cd ../../.. && pwd)\"
48+
if [ -d \"$ROOT_SRC/strings\" ]; then
49+
# Extraction generates 5 strings.
50+
# We pre-populate 3 translations.
51+
printf '\"Hello from HTML\" = \"Bonjour de HTML\";\n\"Hello from JS\" = \"Bonjour de JS\";\n\"Hello from Pug\" = \"Bonjour de Pug\";\n' > \"$ROOT_SRC/strings/french.strings\"
52+
./../src/cli/strings.exe \"$ROOT_SRC/tests/fixtures\" --output \"$ROOT_SRC/strings\"
53+
fi
54+
")))

tests/fixtures/demo.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
<i18n>Hello from HTML</i18n>

0 commit comments

Comments
 (0)