|
| 1 | +# Agent Information - String Extractor |
| 2 | + |
| 3 | +This repository contains an OCaml-based internationalization (i18n) string extraction tool. It parses source files (JS, TS, Vue, Pug, HTML) and extracts strings for translation management. |
| 4 | + |
| 5 | +## Documentation |
| 6 | + |
| 7 | +- **[ARCHITECTURE.md](ARCHITECTURE.md)**: Contains a deep-dive into the codebase layout, directory structure, and a comprehensive API reference. **Read this file first** when: |
| 8 | + - Starting a new task to understand which files are relevant. |
| 9 | + - Investigating the impact of changes across the system. |
| 10 | + - Looking for specific functionality or function definitions before searching. |
| 11 | + |
| 12 | +- **[DEVELOPMENT.md](DEVELOPMENT.md)**: Contains instructions for environment setup, build processes for various platforms, and release workflows. **Read this file first** when: |
| 13 | + - Setting up the development environment or installing dependencies (OCaml, JS, QuickJS). |
| 14 | + - Building the project for development or release. |
| 15 | + - Executing the tool for manual verification or testing. |
| 16 | + - Managing version numbers or release artifacts. |
| 17 | + |
| 18 | +## Project Overview |
| 19 | + |
| 20 | +- **Language**: OCaml (5.1.1) with some C++ (QuickJS bridge) and JavaScript (parsers via Browserify). |
| 21 | +- **Architecture**: |
| 22 | + - `src/cli/`: Main entry point, command-line interface, and output generation logic. |
| 23 | + - `src/parsing/`: OCaml parsers using `Angstrom` for custom formats and `Flow_parser` for JS. |
| 24 | + - `src/quickjs/`: Bridge to QuickJS to run JavaScript-based parsers (TypeScript/Pug) from OCaml. |
| 25 | + - `src/utils/`: Common utilities for collection, timing, and I/O. |
| 26 | +- **Key Libraries**: `Core`, `Lwt` (concurrency), `Angstrom` (parsing), `Yojson`, `Ppx_jane`. |
| 27 | + |
| 28 | +## Essential Commands |
| 29 | + |
| 30 | +### Build |
| 31 | +- **Development build**: `dune build src/cli/strings.exe` |
| 32 | +- **Watch mode**: `dune build src/cli/strings.exe -w` |
| 33 | +- **Release build (MacOS)**: `DUNE_PROFILE=release dune build src/cli/strings.exe` |
| 34 | +- **Full release cycle**: See `DEVELOPMENT.md` for `cp`, `strip`, and Docker commands. |
| 35 | + |
| 36 | +### Run |
| 37 | +- After building: `./_build/default/src/cli/strings.exe [directory-to-extract-from]` |
| 38 | +- The CLI expects to be run from the root of a project containing a `strings/` directory (or it will create one if a `.git` folder is present). |
| 39 | + |
| 40 | +### Installation (Dev Setup) |
| 41 | +Refer to `DEVELOPMENT.md` for specific `opam` and `npm` setup steps, as the project has several external dependencies (Flow, QuickJS, pug-lexer, etc.). |
| 42 | + |
| 43 | +## Code Conventions & Patterns |
| 44 | + |
| 45 | +### Parsing Strategy |
| 46 | +1. **Direct Parsers**: Simple formats like `.strings`, `HTML`, and basic `Vue` tags are parsed using `Angstrom` in `src/parsing/`. |
| 47 | +2. **JS/TS Parsing**: |
| 48 | + - Javascript uses `Flow_parser` and a custom AST walker in `src/parsing/js_ast.ml`. |
| 49 | + - TypeScript uses the official TS parser running inside QuickJS (`src/quickjs/`). |
| 50 | +3. **Pug Parsing**: Has a "fast" OCaml implementation (`src/parsing/pug.ml`) and a "slow" official Pug implementation via QuickJS (`src/quickjs/`). |
| 51 | + |
| 52 | +### Extraction Pattern |
| 53 | +- Content is extracted into a `Utils.Collector.t`. |
| 54 | +- The collector tracks found strings, potential scripts (to be further parsed), and file errors. |
| 55 | +- **Convention**: Strings found inside `L("...")` calls are treated as translations in JS/TS. |
| 56 | + |
| 57 | +### Concurrency |
| 58 | +- Uses `Lwt` for cooperative concurrency. |
| 59 | +- Parallel traversal of directories is handled in `src/cli/strings.ml` via `Lwt_list` and `Lwt_pool`. |
| 60 | +- JS workers (QuickJS) are managed via `Lwt_pool` and `Lwt_preemptive` in `src/quickjs/quickjs.ml`. |
| 61 | + |
| 62 | +## Important Gotchas |
| 63 | + |
| 64 | +- **QuickJS Dependency**: Requires a compiled `quickjs` directory at the project root for building. `dune` rules in `src/quickjs/dune` copy headers and libraries from there. |
| 65 | +- **Generated Headers**: `src/quickjs/runtime.h` is generated from `src/quickjs/parsers.js` using `browserify` and `qjsc`. |
| 66 | +- **Linking**: MacOS builds use specific link flags (e.g., `ld64.lld`) defined in `src/cli/link_flags.*`. |
| 67 | +- **OCamlFormat**: `.ocamlformat` is present; ensure you format OCaml code before submitting. |
| 68 | +- **Memory Safety**: Be cautious with C++ FFI code in `src/quickjs/quickjs.cpp`, particularly regarding OCaml's GC interaction (`CAMLparam`, `CAMLreturn`, `caml_release_runtime_system`). |
| 69 | + |
| 70 | +## Testing Approach |
| 71 | + |
| 72 | +- **Inline Tests**: The project uses `ppx_inline_test`. Parsers in `src/parsing/` can be tested directly within the OCaml files or in the `tests/` directory. |
| 73 | +- **Test Suite**: A standard test suite is located in `tests/test_runner.ml`. It covers JS, HTML, Pug, and `.strings` file parsing. |
| 74 | +- **Integration Tests**: Verification can be performed by running the built binary against fixtures in `tests/fixtures/` and checking the generated output in the `strings/` directory. |
| 75 | +- **Debug Flags**: Use `--show-debugging` or `--debug-pug` / `--debug-html` flags in the CLI to inspect internal parsing results. |
| 76 | + |
| 77 | +## Troubleshooting |
| 78 | + |
| 79 | +### "File modified since last read" |
| 80 | +If you receive an error stating that a file has been **"modified since it was last read"**, it usually indicates a discrepancy between the file's filesystem timestamp and the internal state tracking. |
| 81 | + |
| 82 | +**Example Error:** |
| 83 | +> `Edit failed: The file '/path/to/file' was modified since it was last read. Please read the file again before trying to edit it.` |
| 84 | +
|
| 85 | +**Recommended Fix:** |
| 86 | +1. Execute `touch filename` to reset the file's modification time to the current system time. |
| 87 | +2. Re-read the file using the `view` tool. |
| 88 | +3. Attempt the edit again. |
0 commit comments