Skip to content

Add Machine Learning Engineer archetype#39

Open
IKetutWidiyane wants to merge 5 commits into
TechImmigrants:mainfrom
IKetutWidiyane:feature/ml-engineer-archetype
Open

Add Machine Learning Engineer archetype#39
IKetutWidiyane wants to merge 5 commits into
TechImmigrants:mainfrom
IKetutWidiyane:feature/ml-engineer-archetype

Conversation

@IKetutWidiyane

@IKetutWidiyane IKetutWidiyane commented Jun 1, 2026

Copy link
Copy Markdown

fixes #3

Changes

  • Added machine-learning-engineer archetype
  • Added evaluator tests for ML Engineer auto-detection
  • Added research references
  • Updated README supported archetypes list

Validation

  • pnpm test
  • pnpm build

Notes

Auto-detection now recognizes ML Engineer signals such as:

  • scikit-learn
  • XGBoost
  • feature engineering
  • cross-validation

References

Summary by CodeRabbit

  • New Features

    • Added Machine Learning Engineer as a supported role archetype.
  • Documentation

    • Updated the supported role archetypes list.
    • Updated v0.1 roadmap to reflect 8 available archetypes.
    • Added research sources for archetype definitions.

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a machine-learning-engineer archetype to the ARCHETYPES registry with keywords, evaluation weights, action verbs, and anti-patterns. Two new Vitest cases validate archetype metadata and detection. The README archetype list and roadmap counter are updated, and research/sources.md gains an ML Engineer sources table.

Changes

Machine Learning Engineer Archetype

Layer / File(s) Summary
Archetype registry entry
packages/core/src/archetypes/index.ts
Registers machine-learning-engineer via ARCHETYPES.set(...) with all required metadata fields, making it resolvable by getArchetype and selectable by detectArchetype.
Tests for lookup and detection
packages/core/src/__tests__/evaluator.test.ts
Imports detectArchetype, adds a metadata shape test for machine-learning-engineer, and adds a classification test asserting detectArchetype returns the correct id for ML-focused text.
README and research docs
README.md, research/sources.md
Updates the supported archetypes list and v0.1 roadmap counter from 6 to 8; adds an ML Engineer sources table with references and accessed dates.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

  • #15 (Write unit tests for detectArchetype()): The new test cases in evaluator.test.ts directly add detectArchetype coverage for the machine-learning-engineer archetype, which is exactly what this issue requests.

Suggested reviewers

  • alexNJF
  • rfatideh

🐇 A new archetype hops into the fold,
ML Engineer, shiny and bold!
With keywords and verbs all set in a map,
detectArchetype won't need a nap.
Eight archetypes now, the roadmap rings true —
scikit and XGBoost, hooray to you! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely captures the main change—adding the Machine Learning Engineer archetype.
Linked Issues check ✅ Passed The PR implements all required components from Issue #3: 15-30 keywords, evaluation weights, 10+ action verbs, 5+ anti-patterns, and source references for the Machine Learning Engineer archetype.
Out of Scope Changes check ✅ Passed All changes are directly related to Issue #3 requirements. README and test updates appropriately document and validate the new archetype; no unrelated modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
packages/core/src/archetypes/index.ts (1)

157-165: ⚡ Quick win

Tighten generic ML keywords to avoid cross-archetype misclassification.

Given detectArchetype uses substring frequency (includes) plus normalized count, broad terms here (python, sql, classification, regression) can dilute precision and cause false positives against adjacent archetypes. Prefer more role-specific signals in this list.

Suggested keyword refinement
   keywords: [
-    "python", "scikit-learn", "xgboost", "lightgbm", "catboost",
-    "pandas", "numpy", "scipy", "sql", "jupyter", "feature engineering",
+    "scikit-learn", "xgboost", "lightgbm", "catboost",
+    "pandas", "numpy", "scipy", "jupyter", "feature engineering",
     "feature selection", "data preprocessing", "model validation",
     "cross-validation", "hyperparameter tuning", "grid search",
     "random forest", "gradient boosting", "logistic regression",
-    "classification", "regression", "time series", "model evaluation",
+    "time series forecasting", "model evaluation",
     "precision", "recall", "f1 score", "roc auc", "mlflow", "model monitoring",
   ],
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/archetypes/index.ts` around lines 157 - 165, The keywords
array in this archetype definition contains overly generic terms like "python",
"sql", "classification", and "regression" that are too broad and can match
multiple archetypes when detectArchetype uses substring frequency matching.
Remove or replace these generic keywords with more role-specific and distinctive
terms that uniquely identify this particular archetype and reduce false
positives against adjacent archetypes. Keep only keywords that provide strong,
specific signals for accurate archetype detection.
packages/core/src/__tests__/evaluator.test.ts (1)

98-105: ⚡ Quick win

Add a disambiguation test to protect against false-positive ML detection.

This validates the positive path only. Please add one counterexample (e.g., backend/data-engineering-heavy text with minimal ML-specific terms) to lock in classifier precision as archetypes evolve.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/__tests__/evaluator.test.ts` around lines 98 - 105, The
test for detectArchetype only validates the positive case where ML-specific text
is correctly identified as a machine-learning-engineer archetype. Add a new test
case using the same test pattern (it block calling detectArchetype) that passes
backend or data-engineering-heavy text with minimal ML-specific terms to ensure
the function does not incorrectly classify non-ML content as
machine-learning-engineer, thereby protecting against false-positive detections
as the classifier evolves.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@research/sources.md`:
- Around line 27-29: The source citations for the Machine Learning Engineer role
(Google Cloud Professional Machine Learning Engineer Exam Guide, scikit-learn
User Guide, and XGBoost Python API Reference) currently reference only top-level
documentation URLs without specific section names or anchors. Add section-level
citations by including the specific documentation sections, page anchors, or
subsection names (such as "model selection" for scikit-learn or "early stopping"
for XGBoost) directly in the source reference column so that each claim can be
traced to and verified in the actual documentation.

---

Nitpick comments:
In `@packages/core/src/__tests__/evaluator.test.ts`:
- Around line 98-105: The test for detectArchetype only validates the positive
case where ML-specific text is correctly identified as a
machine-learning-engineer archetype. Add a new test case using the same test
pattern (it block calling detectArchetype) that passes backend or
data-engineering-heavy text with minimal ML-specific terms to ensure the
function does not incorrectly classify non-ML content as
machine-learning-engineer, thereby protecting against false-positive detections
as the classifier evolves.

In `@packages/core/src/archetypes/index.ts`:
- Around line 157-165: The keywords array in this archetype definition contains
overly generic terms like "python", "sql", "classification", and "regression"
that are too broad and can match multiple archetypes when detectArchetype uses
substring frequency matching. Remove or replace these generic keywords with more
role-specific and distinctive terms that uniquely identify this particular
archetype and reduce false positives against adjacent archetypes. Keep only
keywords that provide strong, specific signals for accurate archetype detection.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 795a9282-c658-4128-b7fe-fcf1c5f835a3

📥 Commits

Reviewing files that changed from the base of the PR and between 6398541 and 881dd1b.

📒 Files selected for processing (4)
  • README.md
  • packages/core/src/__tests__/evaluator.test.ts
  • packages/core/src/archetypes/index.ts
  • research/sources.md

Comment thread research/sources.md
@IKetutWidiyane

Copy link
Copy Markdown
Author

Hi maintainers @SaharPak , just a friendly follow-up. The PR is ready for review and all CodeRabbit feedback has been addressed. Thanks for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Archetype] Machine Learning Engineer (Classical ML)

1 participant