Skip to content

Commit 2373fa3

Browse files
taoeffectclaude
andauthored
Add test suite, CI workflow, and technical documentation (#7)
* Add test suite, CI workflow, and technical documentation This commit introduces a comprehensive testing infrastructure for the Lwt branch, including: - Unit tests for parsers (JS, HTML, Pug, .strings) using ppx_inline_test - Integration tests for CLI extraction against fixture files - A GitHub Actions workflow for automated CI on Linux - AGENTS.md and ARCHITECTURE.md for system design and agent guidance - Build fixes for Linux (link flags and libgomp/libomp path search) - Testing section added to DEVELOPMENT.md Co-authored-by: Claude Opus 4.6 (via Crush) <noreply@anthropic.com> * Address Copilot review comments on PR #7 - Rename PAWTHS to PATHS in src/quickjs/dune for readability - Use curl -fsSL in CI workflow for fail-fast downloads - Remove side-effecting block from tests/dune that clobbered strings/french.strings in the repo root on every test run - Fix DEVELOPMENT.md to accurately describe hermetic test behavior - Fix ARCHITECTURE.md test tooling description (ppx_inline_test, not ppx_expect; separate SZXX/Angstrom for HTML vs Pug) Co-authored-by: Claude Opus 4.6 (via Crush) <noreply@anthropic.com> * Remove .agents from tracking and add .agents/, ignored/ to .gitignore Co-authored-by: Claude Opus 4.6 (via Crush) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (via Crush) <noreply@anthropic.com>
1 parent 82a1663 commit 2373fa3

14 files changed

Lines changed: 380 additions & 3 deletions

File tree

.github/workflows/test.yml

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
name: Tests
2+
3+
on:
4+
push:
5+
branches: [ lwt, test-suite ]
6+
pull_request:
7+
branches: [ lwt ]
8+
9+
jobs:
10+
test:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- name: Checkout code
14+
uses: actions/checkout@v4
15+
16+
- name: Set up OCaml
17+
uses: ocaml/setup-ocaml@v3
18+
with:
19+
ocaml-compiler: 5.1.1
20+
dune-cache: true
21+
opam-repositories: |
22+
default: https://github.com/ocaml/opam-repository.git
23+
24+
- name: Install dependencies
25+
run: |
26+
sudo apt-get update
27+
sudo apt-get install -y npm xz-utils libomp-dev llvm-dev
28+
opam install . --deps-only --update-invariant
29+
npm install --no-save typescript browserify pug-lexer pug-parser pug-walk
30+
31+
- name: Install QuickJS
32+
run: |
33+
curl -fsSL https://bellard.org/quickjs/quickjs-2021-03-27.tar.xz -o quickjs.tar.xz
34+
tar xvf quickjs.tar.xz && rm quickjs.tar.xz
35+
mv quickjs-2021-03-27 quickjs
36+
cd quickjs && make
37+
38+
- name: Install Flow
39+
run: |
40+
git clone --branch v0.183.1 --depth 1 https://github.com/facebook/flow.git flow
41+
ln -s "$(pwd)/flow/src/parser" src/flow_parser
42+
ln -s "$(pwd)/flow/src/third-party/sedlex" src/sedlex
43+
ln -s "$(pwd)/flow/src/hack_forked/utils/collections" src/collections
44+
45+
- name: Run tests
46+
run: |
47+
mkdir -p strings
48+
opam exec -- dune runtest tests/

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,7 @@ bad/
2222
src/flow_parser
2323
src/sedlex
2424
src/collections
25+
26+
tests/integration_test_run/
27+
.agents/
28+
ignored/

AGENTS.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Agent Information - String Extractor
2+
3+
This repository contains an OCaml-based internationalization (i18n) string extraction tool. It parses source files (JS, TS, Vue, Pug, HTML) and extracts strings for translation management.
4+
5+
## Documentation
6+
7+
- **[ARCHITECTURE.md](ARCHITECTURE.md)**: Contains a deep-dive into the codebase layout, directory structure, and a comprehensive API reference. **Read this file first** when:
8+
- Starting a new task to understand which files are relevant.
9+
- Investigating the impact of changes across the system.
10+
- Looking for specific functionality or function definitions before searching.
11+
12+
- **[DEVELOPMENT.md](DEVELOPMENT.md)**: Contains instructions for environment setup, build processes for various platforms, and release workflows. **Read this file first** when:
13+
- Setting up the development environment or installing dependencies (OCaml, JS, QuickJS).
14+
- Building the project for development or release.
15+
- Executing the tool for manual verification or testing.
16+
- Managing version numbers or release artifacts.
17+
18+
## Project Overview
19+
20+
- **Language**: OCaml (5.1.1) with some C++ (QuickJS bridge) and JavaScript (parsers via Browserify).
21+
- **Architecture**:
22+
- `src/cli/`: Main entry point, command-line interface, and output generation logic.
23+
- `src/parsing/`: OCaml parsers using `Angstrom` for custom formats and `Flow_parser` for JS.
24+
- `src/quickjs/`: Bridge to QuickJS to run JavaScript-based parsers (TypeScript/Pug) from OCaml.
25+
- `src/utils/`: Common utilities for collection, timing, and I/O.
26+
- **Key Libraries**: `Core`, `Lwt` (concurrency), `Angstrom` (parsing), `Yojson`, `Ppx_jane`.
27+
28+
## Essential Commands
29+
30+
### Build
31+
- **Development build**: `dune build src/cli/strings.exe`
32+
- **Watch mode**: `dune build src/cli/strings.exe -w`
33+
- **Release build (MacOS)**: `DUNE_PROFILE=release dune build src/cli/strings.exe`
34+
- **Full release cycle**: See `DEVELOPMENT.md` for `cp`, `strip`, and Docker commands.
35+
36+
### Run
37+
- After building: `./_build/default/src/cli/strings.exe [directory-to-extract-from]`
38+
- The CLI expects to be run from the root of a project containing a `strings/` directory (or it will create one if a `.git` folder is present).
39+
40+
### Installation (Dev Setup)
41+
Refer to `DEVELOPMENT.md` for specific `opam` and `npm` setup steps, as the project has several external dependencies (Flow, QuickJS, pug-lexer, etc.).
42+
43+
## Code Conventions & Patterns
44+
45+
### Parsing Strategy
46+
1. **Direct Parsers**: Simple formats like `.strings`, `HTML`, and basic `Vue` tags are parsed using `Angstrom` in `src/parsing/`.
47+
2. **JS/TS Parsing**:
48+
- Javascript uses `Flow_parser` and a custom AST walker in `src/parsing/js_ast.ml`.
49+
- TypeScript uses the official TS parser running inside QuickJS (`src/quickjs/`).
50+
3. **Pug Parsing**: Has a "fast" OCaml implementation (`src/parsing/pug.ml`) and a "slow" official Pug implementation via QuickJS (`src/quickjs/`).
51+
52+
### Extraction Pattern
53+
- Content is extracted into a `Utils.Collector.t`.
54+
- The collector tracks found strings, potential scripts (to be further parsed), and file errors.
55+
- **Convention**: Strings found inside `L("...")` calls are treated as translations in JS/TS.
56+
57+
### Concurrency
58+
- Uses `Lwt` for cooperative concurrency.
59+
- Parallel traversal of directories is handled in `src/cli/strings.ml` via `Lwt_list` and `Lwt_pool`.
60+
- JS workers (QuickJS) are managed via `Lwt_pool` and `Lwt_preemptive` in `src/quickjs/quickjs.ml`.
61+
62+
## Important Gotchas
63+
64+
- **QuickJS Dependency**: Requires a compiled `quickjs` directory at the project root for building. `dune` rules in `src/quickjs/dune` copy headers and libraries from there.
65+
- **Generated Headers**: `src/quickjs/runtime.h` is generated from `src/quickjs/parsers.js` using `browserify` and `qjsc`.
66+
- **Linking**: MacOS builds use specific link flags (e.g., `ld64.lld`) defined in `src/cli/link_flags.*`.
67+
- **OCamlFormat**: `.ocamlformat` is present; ensure you format OCaml code before submitting.
68+
- **Memory Safety**: Be cautious with C++ FFI code in `src/quickjs/quickjs.cpp`, particularly regarding OCaml's GC interaction (`CAMLparam`, `CAMLreturn`, `caml_release_runtime_system`).
69+
70+
## Testing Approach
71+
72+
- **Inline Tests**: The project uses `ppx_inline_test`. Parsers in `src/parsing/` can be tested directly within the OCaml files or in the `tests/` directory.
73+
- **Test Suite**: A standard test suite is located in `tests/test_runner.ml`. It covers JS, HTML, Pug, and `.strings` file parsing.
74+
- **Integration Tests**: Verification can be performed by running the built binary against fixtures in `tests/fixtures/` and checking the generated output in the `strings/` directory.
75+
- **Debug Flags**: Use `--show-debugging` or `--debug-pug` / `--debug-html` flags in the CLI to inspect internal parsing results.
76+
77+
## Troubleshooting
78+
79+
### "File modified since last read"
80+
If you receive an error stating that a file has been **"modified since it was last read"**, it usually indicates a discrepancy between the file's filesystem timestamp and the internal state tracking.
81+
82+
**Example Error:**
83+
> `Edit failed: The file '/path/to/file' was modified since it was last read. Please read the file again before trying to edit it.`
84+
85+
**Recommended Fix:**
86+
1. Execute `touch filename` to reset the file's modification time to the current system time.
87+
2. Re-read the file using the `view` tool.
88+
3. Attempt the edit again.

ARCHITECTURE.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Architecture Documentation - String Extractor
2+
3+
This document provides a high-level overview of the String Extractor's architecture, directory structure, and internal APIs.
4+
5+
## Project Entry Point
6+
7+
The main entry point of the application is **`src/cli/strings.ml`**. It handles command-line argument parsing using `Core.Command`, sets up the `Lwt` runtime, and initiates the file traversal process.
8+
9+
## Directory Structure
10+
11+
```text
12+
/
13+
├── src/
14+
│ ├── cli/ # Main CLI application logic
15+
│ │ ├── strings.ml # CLI entry point, traversal coordination
16+
│ │ ├── vue.ml # Vue-specific parsing and extraction logic
17+
│ │ └── generate.ml # Localization file generation (.strings, .json)
18+
│ ├── parsing/ # Core parsers using Angstrom and Flow
19+
│ │ ├── basic.ml # Common parsing utilities and combinators
20+
│ │ ├── js_ast.ml # Flow AST walker for string extraction
21+
│ │ ├── js.ml # JavaScript string extraction entry point
22+
│ │ ├── pug.ml # Native Pug template parsing
23+
│ │ ├── html.ml # HTML template parsing
24+
│ │ ├── strings.ml # .strings file parsing logic
25+
│ │ └── ... # Other specialized parsers (vue blocks, styles)
26+
│ ├── quickjs/ # Interface to QuickJS for JS/TS/Pug parsing
27+
│ │ ├── quickjs.ml # OCaml FFI to QuickJS
28+
│ │ ├── quickjs.cpp # C++ implementation of the bridge
29+
│ │ └── parsers.js # JS-based parsers running in QuickJS
30+
│ └── utils/ # Shared utility modules
31+
│ ├── collector.ml # State container for collected strings/errors
32+
│ ├── io.ml # I/O helpers
33+
│ ├── timing.ml # Performance measurement
34+
│ └── exception.ml # Exception handling
35+
├── strings/ # Directory where .strings files are managed
36+
├── dune-project # Dune build system configuration
37+
└── README.md # Project overview and usage instructions
38+
```
39+
40+
## Core API Reference
41+
42+
### `src/cli/`
43+
- **`Strings.main`**: Coordinates the entire run, including directory traversal and result generation.
44+
- **`Vue.parse`**: Splits a `.vue` file into its constituent parts (template, script, style).
45+
- **`Generate.write_english`**: Creates `english.strings` and `english.json` from the collected strings.
46+
- **`Generate.write_other`**: Updates existing translations for other languages.
47+
48+
### `src/parsing/`
49+
- **`Parsing.Basic`**: Provides foundational Angstrom parsers for whitespace, strings, and standard error handling.
50+
- **`Parsing.Js.extract_to_collector`**: Entry point for scanning JavaScript source code.
51+
- **`Parsing.Js_ast.extract`**: A comprehensive walker for the Flow AST that identifies and extracts strings from `L("...")` calls.
52+
- **`Parsing.Pug.collect`**: Traverses the native Pug AST to extract strings.
53+
- **`Parsing.Strings.parse`**: Parses existing `.strings` files into a lookup table. Takes a `Lwt_io.input_channel` and returns a `string Core.String.Table.t Lwt.t`.
54+
55+
### `src/quickjs/`
56+
- **`Quickjs.extract_to_collector`**: Offloads extraction to QuickJS for TypeScript and advanced Pug templates.
57+
58+
### `src/utils/`
59+
- **`Utils.Collector.create`**: Initializes a new string collection state for a specific file. (type `t = { path: string; strings: string Queue.t; ... }`)
60+
- **`Utils.Collector.blit_transfer`**: Merges results from one collector into another.
61+
62+
## Control Flow
63+
1. **Initiation**: `strings.exe` starts, parses CLI flags, and identifies the target directory.
64+
2. **Traversal**: Uses `Lwt` to cooperatively walk the directory tree via `Lwt_list` and `Lwt_pool`.
65+
3. **Dispatch**: For each supported file extension, the corresponding parser in `src/parsing` is invoked.
66+
4. **Collection**: Parsers find strings (usually inside `L()`) and add them to a `Collector.t`.
67+
5. **Generation**: `Generate.ml` aggregates strings from all collectors and updates the `strings/` directory.
68+
69+
## Testing Setup
70+
71+
The project implements a multi-layered testing strategy:
72+
73+
1. **Inline Tests**: Using `ppx_inline_test` (e.g. `let%test_unit`) together with `ppx_assert` (e.g. `[%test_eq]`), logic can be tested directly within the source files. This is primarily used for parser validation in `src/parsing/`.
74+
2. **Standard Test Suite**: Located in `tests/test_runner.ml`, this suite runs the inline tests via `ppx_inline_test` and uses `ppx_assert` to verify:
75+
- JavaScript string extraction via `Flow_parser`.
76+
- HTML extraction via `SZXX` and Pug extraction via `Angstrom`.
77+
- Apple-style `.strings` file parsing (via `Lwt_main.run` and `Lwt_io`).
78+
3. **Integration Testing**: The `tests/fixtures/` directory contains sample files of all supported types. The CLI can be run against these fixtures to verify end-to-end extraction and output generation (`.strings` and `.json` files).
79+
80+
The `tests/dune` file configures the test library and enables inline tests for the module.

DEVELOPMENT.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,3 +68,14 @@ docker run -it --rm \
6868
## apt-get update && apt-get install musl
6969
## /app/strings.linux frontend
7070
```
71+
72+
### Testing
73+
74+
To run the automated tests, ensure your opam environment is initialized:
75+
76+
```sh
77+
eval $(opam env)
78+
dune runtest tests/
79+
```
80+
81+
This builds the project, runs the inline unit tests, and executes the integration test (which verifies extraction and translation preservation in an isolated temporary directory).

dune

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
(data_only_dirs node_modules quickjs)
1+
(data_only_dirs node_modules quickjs flow)
22
(env
33
(dev
44
(flags (:standard -warn-error -A))

src/cli/link_flags.linux.dev.dune

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
()

src/quickjs/dune

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,8 +55,26 @@
5555
(rule
5656
(targets libomp.a)
5757
(action (bash "
58-
cp /opt/homebrew/Cellar/libomp/20.1.6/lib/libomp.a . &> /dev/null \
59-
|| cp /usr/lib/libgomp.a libomp.a
58+
OUT=\"libomp.a\"
59+
PATHS=\"
60+
/usr/local/Cellar/libomp/17.0.6/lib/libomp.a
61+
/opt/homebrew/Cellar/libomp/20.1.6/lib/libomp.a
62+
/usr/lib/x86_64-linux-gnu/libomp.a
63+
/usr/lib/x86_64-linux-gnu/libgomp.a
64+
/usr/lib/libgomp.a
65+
/usr/lib/gcc/x86_64-linux-gnu/*/libgomp.a
66+
/usr/lib/gcc/aarch64-redhat-linux/*/libgomp.a
67+
\"
68+
for p in $PATHS; do
69+
for matched_path in $p; do
70+
if [ -f \"$matched_path\" ]; then
71+
cp \"$matched_path\" \"$OUT\"
72+
exit 0
73+
fi
74+
done
75+
done
76+
echo \"Error: Could not find libomp.a or libgomp.a\" >&2
77+
exit 1
6078
"))
6179
(mode standard)
6280
)

tests/dune

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
(library
2+
(name parsing_tests)
3+
(inline_tests)
4+
(libraries parsing utils core lwt lwt.unix angstrom-lwt-unix)
5+
(preprocess (pps ppx_jane ppx_inline_test))
6+
)
7+
8+
(rule
9+
(alias runtest)
10+
(deps
11+
../src/cli/strings.exe
12+
(source_tree fixtures))
13+
(action
14+
(bash "
15+
TMP_DIR=\"integration_test_run\"
16+
rm -rf $TMP_DIR
17+
mkdir -p $TMP_DIR/strings
18+
mkdir -p $TMP_DIR/.git
19+
printf '\"Hello from HTML\" = \"Bonjour de HTML\";\n' > $TMP_DIR/strings/french.strings
20+
cp -r fixtures $TMP_DIR/
21+
cd $TMP_DIR
22+
../../src/cli/strings.exe fixtures --output strings &> /dev/null
23+
cd ..
24+
25+
if ! grep -q \"Bonjour de HTML\" $TMP_DIR/strings/french.strings; then
26+
echo \"Error: French translation lost in .strings\"
27+
exit 1
28+
fi
29+
if ! grep -q \"Bonjour de HTML\" $TMP_DIR/strings/french.json; then
30+
echo \"Error: French translation lost in .json\"
31+
exit 1
32+
fi
33+
if ! grep -q \"MISSING TRANSLATION - demo.pug\" $TMP_DIR/strings/french.strings; then
34+
echo \"Error: Missing translation marker not found in .strings\"
35+
exit 1
36+
fi
37+
38+
echo \"✅ French integration test passed\"
39+
rm -rf $TMP_DIR
40+
")))

tests/fixtures/demo.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
<i18n>Hello from HTML</i18n>

0 commit comments

Comments
 (0)