Add Machine Learning Engineer archetype#39
Conversation
📝 WalkthroughWalkthroughAdds a ChangesMachine Learning Engineer Archetype
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
packages/core/src/archetypes/index.ts (1)
157-165: ⚡ Quick winTighten generic ML keywords to avoid cross-archetype misclassification.
Given
detectArchetypeuses substring frequency (includes) plus normalized count, broad terms here (python,sql,classification,regression) can dilute precision and cause false positives against adjacent archetypes. Prefer more role-specific signals in this list.Suggested keyword refinement
keywords: [ - "python", "scikit-learn", "xgboost", "lightgbm", "catboost", - "pandas", "numpy", "scipy", "sql", "jupyter", "feature engineering", + "scikit-learn", "xgboost", "lightgbm", "catboost", + "pandas", "numpy", "scipy", "jupyter", "feature engineering", "feature selection", "data preprocessing", "model validation", "cross-validation", "hyperparameter tuning", "grid search", "random forest", "gradient boosting", "logistic regression", - "classification", "regression", "time series", "model evaluation", + "time series forecasting", "model evaluation", "precision", "recall", "f1 score", "roc auc", "mlflow", "model monitoring", ],🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/core/src/archetypes/index.ts` around lines 157 - 165, The keywords array in this archetype definition contains overly generic terms like "python", "sql", "classification", and "regression" that are too broad and can match multiple archetypes when detectArchetype uses substring frequency matching. Remove or replace these generic keywords with more role-specific and distinctive terms that uniquely identify this particular archetype and reduce false positives against adjacent archetypes. Keep only keywords that provide strong, specific signals for accurate archetype detection.packages/core/src/__tests__/evaluator.test.ts (1)
98-105: ⚡ Quick winAdd a disambiguation test to protect against false-positive ML detection.
This validates the positive path only. Please add one counterexample (e.g., backend/data-engineering-heavy text with minimal ML-specific terms) to lock in classifier precision as archetypes evolve.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/core/src/__tests__/evaluator.test.ts` around lines 98 - 105, The test for detectArchetype only validates the positive case where ML-specific text is correctly identified as a machine-learning-engineer archetype. Add a new test case using the same test pattern (it block calling detectArchetype) that passes backend or data-engineering-heavy text with minimal ML-specific terms to ensure the function does not incorrectly classify non-ML content as machine-learning-engineer, thereby protecting against false-positive detections as the classifier evolves.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@research/sources.md`:
- Around line 27-29: The source citations for the Machine Learning Engineer role
(Google Cloud Professional Machine Learning Engineer Exam Guide, scikit-learn
User Guide, and XGBoost Python API Reference) currently reference only top-level
documentation URLs without specific section names or anchors. Add section-level
citations by including the specific documentation sections, page anchors, or
subsection names (such as "model selection" for scikit-learn or "early stopping"
for XGBoost) directly in the source reference column so that each claim can be
traced to and verified in the actual documentation.
---
Nitpick comments:
In `@packages/core/src/__tests__/evaluator.test.ts`:
- Around line 98-105: The test for detectArchetype only validates the positive
case where ML-specific text is correctly identified as a
machine-learning-engineer archetype. Add a new test case using the same test
pattern (it block calling detectArchetype) that passes backend or
data-engineering-heavy text with minimal ML-specific terms to ensure the
function does not incorrectly classify non-ML content as
machine-learning-engineer, thereby protecting against false-positive detections
as the classifier evolves.
In `@packages/core/src/archetypes/index.ts`:
- Around line 157-165: The keywords array in this archetype definition contains
overly generic terms like "python", "sql", "classification", and "regression"
that are too broad and can match multiple archetypes when detectArchetype uses
substring frequency matching. Remove or replace these generic keywords with more
role-specific and distinctive terms that uniquely identify this particular
archetype and reduce false positives against adjacent archetypes. Keep only
keywords that provide strong, specific signals for accurate archetype detection.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 795a9282-c658-4128-b7fe-fcf1c5f835a3
📒 Files selected for processing (4)
README.mdpackages/core/src/__tests__/evaluator.test.tspackages/core/src/archetypes/index.tsresearch/sources.md
|
Hi maintainers @SaharPak , just a friendly follow-up. The PR is ready for review and all CodeRabbit feedback has been addressed. Thanks for your time! |
fixes #3
Changes
Validation
Notes
Auto-detection now recognizes ML Engineer signals such as:
References
Summary by CodeRabbit
New Features
Documentation