Skip to content
This repository was archived by the owner on Mar 27, 2026. It is now read-only.

Commit 8fe6b43

Browse files
committed
use single-pass approach
1 parent f5c6971 commit 8fe6b43

9 files changed

Lines changed: 578 additions & 111 deletions

File tree

.cursor/rules/general.mdc

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: false
5+
---
6+
# RepoDiff Development Guidelines
7+
8+
You are an expert in **Python** focused on building high-performance, maintainable, and scalable command-line applications.
9+
10+
## Project Structure
11+
```
12+
RepoDiff/
13+
├── repodiff/ # Main package
14+
│ ├── __init__.py # Package initialization
15+
│ ├── __main__.py # Entry point for python -m repodiff
16+
│ ├── main.py # Main application logic
17+
│ ├── utils.py # Utility functions
18+
│ ├── diff/ # Diff parsing and processing
19+
│ │ ├── __init__.py
20+
│ │ ├── parser.py # Diff parsing
21+
│ │ └── processor.py # Diff processing
22+
│ ├── filters/ # Filter implementations
23+
│ │ ├── __init__.py
24+
│ │ ├── base.py # Base filter class and registry
25+
│ │ ├── context_filter.py # Context filter implementation
26+
│ │ └── signature_filter.py # Signature filter implementation
27+
│ └── git/ # Git operations
28+
│ ├── __init__.py
29+
│ └── operations.py # Git command wrappers
30+
├── tests/ # Unit tests
31+
│ ├── __init__.py
32+
│ ├── test_utils.py
33+
│ ├── test_git_operations.py
34+
│ ├── test_diff_parser.py
35+
│ ├── test_processor.py
36+
│ └── test_filters.py
37+
├── setup.py # Package setup script
38+
├── config.json # Configuration file
39+
└── README.md # User documentation
40+
```
41+
42+
## Code Structure and Best Practices
43+
- Use **object-oriented programming (OOP)** principles to structure the application effectively.
44+
- Follow **PEP 8** guidelines for code readability.
45+
- Use **descriptive function and method names** that reflect their behavior.
46+
- Implement **logging** using Python's `logging` module instead of `print` statements.
47+
- Modularize code into **separate files** based on functionality.
48+
49+
## Extending with Custom Filters
50+
You can create custom filters by extending the `DiffFilter` base class and registering them with the `FilterRegistry`:
51+
52+
```python
53+
from repodiff.filters.base import DiffFilter, FilterRegistry
54+
55+
@FilterRegistry.register("my_custom_filter")
56+
class MyCustomFilter(DiffFilter):
57+
def apply(self, hunks, rule):
58+
# Custom filter implementation
59+
return processed_hunks
60+
```
61+
62+
## Testing Guidelines
63+
- Write **unit tests** with `pytest` for all new functionality.
64+
- Use **mock objects** to isolate tests from external dependencies.
65+
- Aim for high test coverage, especially for core functionality.
66+
- Run tests with coverage to identify untested code:
67+
```bash
68+
pytest --cov=repodiff tests/
69+
```
70+
71+
## Performance Considerations
72+
- Optimize for large diffs by processing files incrementally.
73+
- Consider memory usage when handling large repositories.
74+
- Use efficient data structures for storing and processing diffs.
75+
76+
## Documentation
77+
- Document all public functions, classes, and methods.
78+
- Keep the README.md focused on end-user documentation.
79+
- Use this file (.cursorrules) for developer-specific documentation.
80+
- Update documentation when making significant changes.
81+
82+
## Deployment and Packaging
83+
- Use **setuptools** for package management.
84+
- Ensure all dependencies are properly specified in setup.py.
85+
- Test the package installation in a clean environment before release.

.cursorrules

Lines changed: 0 additions & 37 deletions
This file was deleted.

README.md

Lines changed: 109 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -1,105 +1,158 @@
1-
21
# RepoDiff
32

4-
**RepoDiff** is a tool designed to simplify code reviews by generating dynamic git diffs between two commits or branches. It allows you to configure diff options based on file paths, optimizing the output for consumption by large language models (LLMs).
3+
**RepoDiff** is a tool designed to simplify code reviews by generating dynamic git diffs between two commits or branches. It optimizes diffs for analysis by large language models (LLMs) with features like context line adjustment and method body removal.
54

65
## Features
76

8-
- Generate diffs between two commits or branches with customizable options.
9-
- Supports customized diff options depending on file type.
10-
- Combines diffs into a single file.
11-
- Calculates token counts for estimating the query cost for LLMs.
7+
- Generate diffs between two commits or branches with a single pass
8+
- Configurable file pattern matching for different file types
9+
- Smart method body removal for C# files to improve readability
10+
- Adjustable context lines per file pattern
11+
- Token counting for estimating LLM query costs
12+
- Combines all changes into a single, well-formatted output
1213

13-
## Usage
14+
## Installation
1415

15-
You can either provide commit hashes to compare directly, or use the -b option to compare the latest commit in the current branch with the latest common commit in another branch (e.g., `master`). You can also specify an output file, or let the script default to the system's temporary directory if no output file is provided.
16+
### Option 1: Download the Executable
1617

17-
### Compare Latest Commit with Another Branch
18+
1. Go to the [Releases](https://github.com/EntityProcess/RepoDiff/releases) page.
19+
2. Download the latest version of the `repodiff.exe` executable.
20+
3. Move the `repodiff.exe` file to a directory included in your system's `PATH`.
21+
22+
### Option 2: Install from Source
1823

19-
To compare the latest commit in the current branch with the latest common commit in another branch (e.g., `master`), use the `-b` option:
2024
```bash
21-
repodiff -b <branch> [-o /path/to/output_file.txt]
25+
git clone https://github.com/EntityProcess/RepoDiff.git
26+
cd RepoDiff
27+
pip install -e .
2228
```
2329

24-
**Example:**
25-
Compare the latest commit in the current branch with the latest common commit in master, and write the result to a default file in the system's temporary directory
30+
### Option 3: Build the Executable Yourself
31+
32+
Clone the repository and navigate to the directory:
2633

2734
```bash
28-
repodiff -b master
35+
git clone https://github.com/EntityProcess/RepoDiff.git
36+
cd RepoDiff
2937
```
3038

31-
### Compare Two Commits
39+
Install PyInstaller and build the executable:
3240

3341
```bash
34-
repodiff -c1 <commit1> -c2 <commit2> [-o /path/to/output_file.txt]
42+
pip install pyinstaller
43+
# On Windows, run:
44+
build.bat
3545
```
3646

37-
* `-c1`, `--commit1`: First commit hash.
38-
* `-c2`, `--commit2`: Second commit hash.
39-
* `-o`, `--output_file`: (Optional) Path to the output file. If not provided, the diff will be written to a default file in the system's temporary directory.
47+
Add `./RepoDiff/dist` to your `PATH` environmental variable.
4048

41-
### Configuring Diff Options
49+
## Usage
4250

43-
You can customize the diff options using a `config.json` file. This allows you to apply different diff strategies depending on the file path.
51+
### Compare Latest Commit with Another Branch
4452

45-
For example:
53+
To compare the latest commit in the current branch with the latest common commit in another branch:
4654

4755
```bash
56+
repodiff -b main -o output.txt
57+
```
58+
59+
### Compare Two Specific Commits
60+
61+
```bash
62+
repodiff -c1 abcdef1234567890 -c2 0987654321fedcba -o output.txt
63+
```
64+
65+
Parameters:
66+
* `-b`, `--branch`: Branch to compare with (e.g., `main` or `master`)
67+
* `-c1`, `--commit1`: First commit hash
68+
* `-c2`, `--commit2`: Second commit hash
69+
* `-o`, `--output_file`: (Optional) Path to the output file. If not provided, the diff will be written to a default file in the system's temporary directory.
70+
* `--version`: Display the current version of RepoDiff
71+
72+
## Configuration
73+
74+
RepoDiff uses a `config.json` file in the project root directory. Example configuration:
75+
76+
```json
4877
{
4978
"tiktoken_model": "gpt-4o",
50-
"diffs": [
51-
["-U50", "--ignore-all-space", "--", ":!*Test*"],
52-
["-U20", "--ignore-all-space", "--", "*Test*"]
79+
"filters": [
80+
{
81+
"file_pattern": "*.cs",
82+
"include_entire_file_with_signatures": true,
83+
"method_body_threshold": 10
84+
},
85+
{
86+
"file_pattern": "*Test*.cs",
87+
"context_lines": 20
88+
},
89+
{
90+
"file_pattern": "*.xml",
91+
"context_lines": 5
92+
},
93+
{
94+
"file_pattern": "*",
95+
"context_lines": 3
96+
}
5397
]
5498
}
5599
```
56100

57-
Explanation of the options:
101+
Configuration options:
58102

59-
* `tiktoken_model`: This specifies the language model you're using (for example, gpt-4o), which helps estimate how many tokens the output will contain.
60-
* `diffs`: This is a list of different comparison rules. Each rule has settings that control how Git compares the files:
61-
* `-U50`: Show 50 lines of context around changes (default is 3 lines).
62-
* `--ignore-all-space`: Ignore spaces when comparing files (useful when whitespace changes don't matter).
63-
* `--`: Signals the end of options and the start of file patterns.
64-
* `:!*Test*`: Exclude files with Test in their path.
65-
* `*Test*`: Include only files with Test in their path.
103+
* `tiktoken_model`: Specifies the language model for token counting (e.g., "gpt-4o").
104+
* `filters`: An array of filter rules that determine how different files are processed.
105+
* `file_pattern`: Glob pattern to match files (e.g., "*.cs", "*Test*.cs").
106+
* `include_entire_file_with_signatures`: (Optional) When true, keeps method signatures but replaces large method bodies with `{ ... }`.
107+
* `method_body_threshold`: (Optional) Maximum number of lines in a method before its body is replaced with `{ ... }`.
108+
* `context_lines`: (Optional) Number of context lines to show around changes (default: 3).
66109

67-
This setup means:
68-
* For most files, it shows a larger context (50 lines around each change) and ignores spaces.
69-
* For test files (*Test*), it shows fewer lines of context (20 lines) and also ignores spaces.
110+
Filter rules are applied in order, with the first matching pattern being used.
70111

71-
## Prerequisites
112+
## Output Format
72113

73-
- **PowerShell (Windows Only)**: If you're using Windows, you need to run the script in PowerShell. The pattern matching functionality in the script will not work properly in Command Prompt (`cmd`).
74-
- **Python 3.x**: Ensure Python is installed on your system.
114+
The tool generates a unified diff format with some enhancements:
75115

76-
## Installation
116+
1. A header explaining any placeholders used (e.g., `{ ... }` for removed method bodies).
117+
2. Standard git diff headers for each file.
118+
3. Modified hunks based on the applied filters:
119+
- Adjusted context lines
120+
- Method bodies replaced with `{ ... }` where applicable
121+
- Original line numbers preserved
77122

78-
### Option 1: Download the Executable
123+
Example output:
79124

80-
1. Go to the [Releases](https://github.com/EntityProcess/RepoDiff/releases) page.
81-
2. Download the latest version of the `repodiff.exe` executable.
82-
3. Move the `repodiff.exe` file to a directory included in your system's `PATH`.
125+
```diff
126+
NOTE: Some method bodies have been replaced with "{ ... }" to improve clarity for code reviews and LLM analysis.
83127

84-
### Option 2: Build the Executable Yourself
128+
diff --git a/src/MyClass.cs b/src/MyClass.cs
129+
--- a/src/MyClass.cs
130+
+++ b/src/MyClass.cs
131+
@@ -10,7 +10,7 @@ public class MyClass
132+
public void ProcessData(int value)
133+
{
134+
{ ... }
135+
}
136+
```
85137

86-
Clone the repository and navigate to the directory:
138+
## Prerequisites
87139

88-
```bash
89-
git clone https://github.com/EntityProcess/RepoDiff.git
90-
cd RepoDiff
91-
```
140+
- **PowerShell (Windows Only)**: If you're using Windows, you need to run the script in PowerShell. The pattern matching functionality in the script will not work properly in Command Prompt (`cmd`).
141+
- **Python 3.x**: Ensure Python is installed on your system.
92142

93-
Install PyInstaller by running the following command:
143+
## Running Tests
94144

95145
```bash
96-
pip install pyinstaller
97-
```
146+
# Run all tests
147+
pytest
98148

99-
Generate the executable by running `build.bat`.
149+
# Run specific test file
150+
pytest tests/test_filters.py
100151

101-
Add `./RepoDiff/dist` to your `PATH` environmental variable.
152+
# Run with coverage
153+
pytest --cov=repodiff
154+
```
102155

103156
## Contributing
104157

105-
Contributions are welcome! Please feel free to submit a pull request or open an issue for any bugs or feature requests.
158+
Contributions are welcome! Please feel free to submit a pull request or open an issue for any bugs or feature requests.

ai-workspace/work_in_progress.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# RepoDiff Work in Progress
2+
3+
## Current Task: Single-Pass Diff Implementation
4+
Converting RepoDiff to use a single git diff command with post-processing.
5+
6+
### Changes Required:
7+
1. [x] Update config.json format to match new structure from PRD
8+
- [x] Replace "diffs" array with "filters" array
9+
- [x] Add support for file patterns and context lines
10+
- [x] Add support for method body thresholds
11+
12+
2. [x] Modify repodiff.py to implement single-pass approach:
13+
- [x] Update run_git_diff to use --unified=999999
14+
- [x] Create parse_unified_diff function to parse git diff output
15+
- [x] Implement post_process_files function for applying filters
16+
- [x] Add find_matching_rule function to match files with filters
17+
- [x] Create apply_signature_removal for C# method body handling
18+
- [x] Create apply_context_filter for adjusting context lines
19+
- [x] Add explanatory header to output
20+
- [x] Fix C# file line ordering in diff output
21+
22+
3. [ ] Testing:
23+
- [ ] Test with C# files for method body removal
24+
- [ ] Test with different context line settings
25+
- [ ] Test with multiple file patterns
26+
- [ ] Verify token counting still works
27+
- [ ] Test C# file line ordering with various file structures
28+
29+
4. [x] Documentation:
30+
- [x] Update README.md with new config format
31+
- [x] Document new features and behavior
32+
- [x] Add examples of new config options
33+
34+
### Future Tasks:
35+
1. [ ] UI Implementation (PyQt)
36+
- Dark mode support
37+
- Commit selection
38+
- Filter configuration
39+
- Export options
40+
41+
2. [ ] Performance Optimization
42+
- Profile git diff performance
43+
- Optimize post-processing
44+
- Memory usage analysis
45+
46+
3. [ ] Testing Framework
47+
- Unit tests
48+
- Integration tests
49+
- PyQt UI tests

0 commit comments

Comments
 (0)