Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
198 changes: 197 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "codem8"
version = "0.1.0"
version = "0.2.0"
edition = "2021"
license = "MIT"
description = "A deterministic source code analysis CLI for duplicate code reports."
Expand All @@ -9,5 +9,7 @@ keywords = ["cli", "duplicate-detection", "source-code", "analysis"]
categories = ["command-line-utilities", "development-tools"]

[dependencies]
ignore = "0.4"
rayon = "1"
regex = "1"
xxhash-rust = { version = "0.8", features = ["xxh3"] }
56 changes: 12 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,13 @@ cargo run -- --report-duplicate

## Usage

Analyze TypeScript files from the current directory:
Analyze supported source files from the current directory:

```bash
codem8 --report-duplicate
```

Analyze multiple extensions:
Restrict analysis to specific extensions:

```bash
codem8 --report-duplicate -file-extension=ts,tsx,js,jsx
Expand All @@ -65,16 +65,17 @@ base branch:
codem8 --report-duplicate -git-branch
```

Include duplicate block metrics:
Include duplicate block metrics and timing information:

```bash
codem8 --report-duplicate -verbose
```

## Duplicate Report

By default, CodeM8 analyzes `.ts` files. Recursive discovery skips common
irrelevant directories such as `.git`, `node_modules`, `target`, `dist`,
By default, CodeM8 analyzes all registered source file extensions. Recursive
discovery respects Git ignore rules, works outside Git repositories, and skips
common irrelevant directories such as `.git`, `node_modules`, `target`, `dist`,
`build`, `coverage`, `.next`, `.nuxt`, `.svelte-kit`, `.idea`, and `.vscode`.
Symbolic links are not followed.

Expand All @@ -99,55 +100,22 @@ Reports are sorted deterministically by descending weight, then by line count,
character count, first location, and normalized block text.

By default, each duplicate block prints the duplicated code before its
locations. Use `-verbose` to also show weight, line count, and occurrence
count. Character counts are used internally for scoring and sorting, but are
not printed.

## Language Heuristics

CodeM8 includes a hard-coded registry of block-only line patterns for common
languages and markup formats:

- TypeScript / JavaScript
- Rust
- C / C++ / Objective-C
- C#
- Java / Kotlin / Scala
- Go
- Python
- Ruby
- PHP
- Swift
- Shell
- PowerShell
- HTML / XML
- CSS / SCSS / Sass / Less
- SQL
- YAML / JSON / TOML

Block-only lines, such as braces or closing tags, cannot start a duplicate by
themselves. They can still be included inside a larger duplicated block when
surrounding comparison lines match.
locations. Use `-verbose` to also show weight, line count, occurrence count, and
timings for discovery, file processing, and duplicate detection. Character
counts are used internally for scoring and sorting, but are not printed.

## Development

Run the full local verification set:

```bash
cargo test
cargo fmt --all -- --check
cargo clippy --workspace --all-targets --all-features -- -D warnings -W clippy::too_many_lines -W clippy::too_many_arguments -W clippy::type_complexity -W clippy::excessive_nesting -W clippy::cognitive_complexity
rtk cargo build --locked --all-targets
cargo test --all-targets
cargo clippy --workspace --all-targets --all-features -- -D warnings -W clippy::too_many_lines -W clippy::too_many_arguments -W clippy::type_complexity -W clippy::excessive_nesting -W clippy::cognitive_complexity -W clippy::pedantic -W clippy::nursery -W clippy::cargo
cargo build --locked --all-targets
```

The repository includes GitHub Actions workflows for Rust CI and a CodeRabbit
review gate. CI verifies formatting, build success, and tests on pushes and pull
requests. The CodeRabbit gate runs when CodeRabbit submits or edits a pull
request review and fails if CodeRabbit requests changes on the current PR head.

## Dependency Policy

CodeM8 avoids external packages for functionality that is simple to implement
and maintain directly. The first implementation uses one runtime dependency,
`xxhash-rust`, for the required XXH3 128-bit hash implementation. The crate is
widely used and permissively licensed under MIT or Apache-2.0.
Loading
Loading