Thank you for your interest in contributing! This document provides guidelines and instructions for contributing to the project.
- Code of Conduct
- Getting Started
- Development Workflow
- Adding a New Normalizer
- Code Style
- Testing
- Commit Messages
- Pull Request Process
This project follows a simple code of conduct:
- Be respectful and inclusive
- Focus on constructive feedback
- Help others learn and grow
- Prioritize data quality and accuracy
- Node.js 22.x or higher
- pnpm 9.x or higher
- Git
- A GitHub account
- Basic understanding of TypeScript and React
-
Fork the repository on GitHub
-
Clone your fork:
git clone https://github.com/YOUR_USERNAME/data-normalization-platform.git cd data-normalization-platform -
Add upstream remote:
git remote add upstream https://github.com/ORIGINAL_OWNER/data-normalization-platform.git
-
Install dependencies:
pnpm install
-
Set up environment:
cp .env.example .env # Edit .env with your local configuration -
Start development server:
pnpm dev
Always create a new branch for your work:
git checkout -b feature/your-feature-nameBranch naming conventions:
feature/- New features (e.g.,feature/email-normalizer)fix/- Bug fixes (e.g.,fix/regex-escaping)docs/- Documentation updates (e.g.,docs/api-examples)refactor/- Code refactoring (e.g.,refactor/csv-parser)test/- Adding tests (e.g.,test/phone-validation)
Regularly sync with the upstream repository:
git fetch upstream
git checkout master
git merge upstream/master
git push origin masterFollow these steps to add a new data normalizer (e.g., Email, Company, Address):
Create client/src/lib/[type]Config.ts:
export const emailConfig = {
DISPOSABLE_DOMAINS: [
"tempmail.com",
"guerrillamail.com",
// ... more domains
],
COMMON_TYPOS: {
"gmial.com": "gmail.com",
"yahooo.com": "yahoo.com",
// ... more typos
},
// Add other configuration data
};Create client/src/lib/[Type]Normalizer.ts:
export interface [Type]Options {
// Define options
}
export class [Type]Normalizer {
raw[Type]: string;
// ... properties
constructor(raw[Type]: string, options: [Type]Options = {}) {
this.raw[Type] = raw[Type];
this.parse();
}
private parse() {
// Implement parsing logic
}
// Add formatting methods
toJSON(): string {
// Return JSON representation
}
toCSVRow(): string {
// Return CSV row
}
}
// Export batch processing function
export function parseBatch(
items: string[],
options: [Type]Options = {}
): ParseResult[] {
// Implement batch processing
}Create client/src/pages/[Type]Demo.tsx:
export default function [Type]Demo() {
// Implement interactive demo UI
// Include:
// - Single item processing
// - Batch processing
// - CSV upload
// - Results display
// - Export options
}Update client/src/App.tsx:
import [Type]Demo from "./pages/[Type]Demo";
// Add route
<Route path={"/[type]"} component={[Type]Demo} />Update server/jobProcessor.ts to handle the new type:
case 'email':
// Add processing logic
break;Create client/src/lib/__tests__/[Type]Normalizer.test.ts:
import { [Type]Normalizer } from '../[Type]Normalizer';
describe('[Type]Normalizer', () => {
it('should parse valid [type]', () => {
// Add test cases
});
it('should handle edge cases', () => {
// Add edge case tests
});
});- Add section to README.md
- Create
docs/[type]-normalizer.mdwith detailed API documentation - Add examples to the demo page
- Use TypeScript for all new code
- Define interfaces for all data structures
- Avoid
anytype - use proper typing - Export types that other modules might need
- Classes: PascalCase (
NameNormalizer,PhoneNormalizer) - Functions: camelCase (
parseBatch,escapeRegex) - Constants: UPPER_SNAKE_CASE (
CREDENTIALS,JOB_WORDS) - Interfaces: PascalCase with descriptive names (
ParseOptions,RepairLog)
- One class per file for normalizers
- Group related functions in utility files
- Keep configuration separate from logic
- Limit file length to ~500 lines (split if longer)
/**
* Parse and normalize a phone number
*
* @param rawPhone - The raw phone number string
* @param options - Parsing options
* @returns Normalized phone number object
*
* @example
* ```typescript
* const phone = new PhoneNormalizer("+1 (415) 555-2671");
* console.log(phone.e164); // "+14155552671"
* ```
*/- Use functional components with hooks
- Extract reusable logic into custom hooks
- Keep components focused - single responsibility
- Use TypeScript for prop types
interface ComponentProps {
data: string[];
onProcess: (results: Result[]) => void;
}
export function Component({ data, onProcess }: ComponentProps) {
// Implementation
}# Run all tests
pnpm test
# Run tests in watch mode
pnpm test:watch
# Run tests with coverage
pnpm test:coverage- Test edge cases: Empty strings, null, undefined, special characters
- Test error handling: Invalid input, malformed data
- Test performance: Large datasets, batch processing
- Use descriptive test names: "should handle phone numbers with extensions"
describe('PhoneNormalizer', () => {
describe('constructor', () => {
it('should parse valid US phone number', () => {
const phone = new PhoneNormalizer('+14155552671');
expect(phone.isValid).toBe(true);
expect(phone.countryCode).toBe('1');
});
it('should handle invalid phone numbers', () => {
const phone = new PhoneNormalizer('invalid');
expect(phone.isValid).toBe(false);
});
});
describe('format', () => {
it('should format as E.164', () => {
const phone = new PhoneNormalizer('(415) 555-2671', { defaultCountry: 'US' });
expect(phone.e164).toBe('+14155552671');
});
});
});Follow the Conventional Commits specification:
<type>(<scope>): <subject>
<body>
<footer>
feat: New featurefix: Bug fixdocs: Documentation changesstyle: Code style changes (formatting, etc.)refactor: Code refactoringtest: Adding or updating testschore: Maintenance tasks
feat(email): add email normalizer with disposable detection
- Implement EmailNormalizer class
- Add disposable domain detection
- Create interactive demo page
- Add batch processing support
Closes #123
fix(name): escape special regex characters in config
The question mark in MISENCODED_MAP was causing regex errors.
Added escapeRegex utility to properly escape special characters.
Fixes #456
-
Update your branch with latest master:
git fetch upstream git rebase upstream/master
-
Run tests and ensure they pass:
pnpm test pnpm type-check pnpm lint -
Update documentation if needed
-
Add tests for new functionality
-
Update CHANGELOG.md with your changes
-
Push your branch to your fork:
git push origin feature/your-feature-name
-
Create Pull Request on GitHub
-
Fill out the PR template completely:
- Description of changes
- Related issues
- Testing performed
- Screenshots (if UI changes)
-
Request review from maintainers
Use the same format as commit messages:
feat(email): add email normalizer
fix(csv): handle quoted fields correctly
docs(readme): update installation instructions
- Maintainers will review your PR within 2-3 business days
- Address any requested changes
- Once approved, a maintainer will merge your PR
-
Delete your feature branch:
git branch -d feature/your-feature-name git push origin --delete feature/your-feature-name
-
Update your master branch:
git checkout master git pull upstream master
- Open an Issue
- Join our discussions
- Email the maintainers
Thank you for contributing! 🎉