Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions docs/sagemaker_core/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,46 @@ Key Core Features
* **Monitoring Integration** - Built-in support for CloudWatch metrics, logging, and resource status tracking
* **Error Handling** - Comprehensive error handling with detailed feedback for troubleshooting and debugging

Feature Store
~~~~~~~~~~~~~

SageMaker Core provides the foundational ``FeatureGroup`` resource class used by the Feature Store module.
For full Feature Store functionality in V3 — including DataFrame ingestion, Athena queries, dataset building,
and feature definitions — use the ``sagemaker.mlops.feature_store`` package:

.. code-block:: python

from sagemaker.mlops.feature_store import (
FeatureGroup,
OnlineStoreConfig,
OfflineStoreConfig,
S3StorageConfig,
load_feature_definitions_from_dataframe,
ingest_dataframe,
create_athena_query,
DatasetBuilder,
)

# Create a feature group
feature_defs = load_feature_definitions_from_dataframe(df)
FeatureGroup.create(
feature_group_name="my-feature-group",
feature_definitions=feature_defs,
record_identifier_feature_name="id",
event_time_feature_name="timestamp",
role_arn=role,
online_store_config=OnlineStoreConfig(enable_online_store=True),
)

# Ingest data from a DataFrame
ingest_dataframe(feature_group_name="my-feature-group", data_frame=df, max_workers=4)

.. note::

If you are migrating from V2 (``sagemaker.feature_store``), see the
`Feature Store Migration Guide <https://github.com/aws/sagemaker-python-sdk/blob/main/sagemaker-mlops/src/sagemaker/mlops/feature_store/MIGRATION_GUIDE.md>`_
for detailed V2-to-V3 migration instructions.

Supported Core Scenarios
------------------------

Expand Down
160 changes: 160 additions & 0 deletions migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -418,6 +418,139 @@ pipeline = Pipeline(
)
```

### 5. Feature Store

Feature Store is fully supported in V3 under the `sagemaker.mlops.feature_store` namespace. The V2 `sagemaker.feature_store` module has been reorganized — FeatureGroup and shapes come from `sagemaker-core`, while utility functions, ingestion, Athena queries, and dataset building are provided by the `sagemaker-mlops` package.

> **Detailed Migration Guide:** For comprehensive V2-to-V3 Feature Store migration instructions, see [`sagemaker-mlops/src/sagemaker/mlops/feature_store/MIGRATION_GUIDE.md`](sagemaker-mlops/src/sagemaker/mlops/feature_store/MIGRATION_GUIDE.md).

**Import Changes:**

```python
# V2 imports
from sagemaker.feature_store.feature_group import FeatureGroup # ❌
from sagemaker.feature_store.inputs import FeatureValue # ❌
from sagemaker.session import Session # ❌

# V3 imports — everything from one place
from sagemaker.mlops.feature_store import FeatureGroup # ✅
from sagemaker.mlops.feature_store import FeatureValue # ✅
from sagemaker.mlops.feature_store import ingest_dataframe # ✅
```

**Create a FeatureGroup:**

**V2:**

```python
from sagemaker.feature_store.feature_group import FeatureGroup
from sagemaker.session import Session

session = Session()
fg = FeatureGroup(name="my-fg", sagemaker_session=session)
fg.load_feature_definitions(data_frame=df)
fg.create(
s3_uri="s3://bucket/prefix",
record_identifier_name="id",
event_time_feature_name="ts",
role_arn=role,
enable_online_store=True,
)
```

**V3:**

```python
from sagemaker.mlops.feature_store import (
FeatureGroup,
OnlineStoreConfig,
OfflineStoreConfig,
S3StorageConfig,
load_feature_definitions_from_dataframe,
)

feature_defs = load_feature_definitions_from_dataframe(df)

FeatureGroup.create(
feature_group_name="my-fg",
feature_definitions=feature_defs,
record_identifier_feature_name="id",
event_time_feature_name="ts",
role_arn=role,
online_store_config=OnlineStoreConfig(enable_online_store=True),
offline_store_config=OfflineStoreConfig(
s3_storage_config=S3StorageConfig(s3_uri="s3://bucket/prefix")
),
)
```

**Record Operations (Put/Get/Delete):**

**V2:**

```python
from sagemaker.feature_store.inputs import FeatureValue

fg.put_record(record=[FeatureValue(feature_name="id", value_as_string="123")])
response = fg.get_record(record_identifier_value_as_string="123")
fg.delete_record(record_identifier_value_as_string="123", event_time="2024-01-15T00:00:00Z")
```

**V3:**

```python
from sagemaker.mlops.feature_store import FeatureGroup, FeatureValue

fg = FeatureGroup(feature_group_name="my-fg")
fg.put_record(record=[FeatureValue(feature_name="id", value_as_string="123")])
response = fg.get_record(record_identifier_value_as_string="123")
fg.delete_record(record_identifier_value_as_string="123", event_time="2024-01-15T00:00:00Z")
```

**DataFrame Ingestion:**

**V2:**

```python
fg.ingest(data_frame=df, max_workers=4, max_processes=2, wait=True)
```

**V3:**

```python
from sagemaker.mlops.feature_store import ingest_dataframe

manager = ingest_dataframe(
feature_group_name="my-fg",
data_frame=df,
max_workers=4,
max_processes=2,
wait=True,
)
```

**Athena Queries:**

**V2:**

```python
query = fg.athena_query()
query.run(query_string="SELECT * FROM ...", output_location="s3://...")
query.wait()
df = query.as_dataframe()
```

**V3:**

```python
from sagemaker.mlops.feature_store import create_athena_query

query = create_athena_query("my-fg", session)
query.run(query_string="SELECT * FROM ...", output_location="s3://...")
query.wait()
df = query.as_dataframe()
```

## Feature Mapping

### Training Features
Expand Down Expand Up @@ -450,6 +583,27 @@ pipeline = Pipeline(
| ScriptProcessor | ProcessingJob | Script-based processing |
| FrameworkProcessor | ProcessingJob | Framework-specific processing |

### Feature Store Features

| V2 Feature | V3 Equivalent | Notes |
|------------|---------------|-------|
| `sagemaker.feature_store.feature_group.FeatureGroup` | `sagemaker.mlops.feature_store.FeatureGroup` | Re-exported from sagemaker-core |
| `FeatureGroup(name=..., sagemaker_session=...)` | `FeatureGroup(feature_group_name=...)` | Session managed internally by core |
| `fg.create(s3_uri=..., enable_online_store=...)` | `FeatureGroup.create(online_store_config=..., offline_store_config=...)` | Structured config objects |
| `fg.describe()` | `FeatureGroup.get(feature_group_name=...)` | Returns typed object |
| `fg.delete()` | `FeatureGroup(feature_group_name=...).delete()` | Same pattern |
| `fg.put_record(record=...)` | `FeatureGroup(feature_group_name=...).put_record(record=...)` | FeatureValue from core |
| `fg.get_record(...)` | `FeatureGroup(feature_group_name=...).get_record(...)` | Same interface |
| `fg.delete_record(...)` | `FeatureGroup(feature_group_name=...).delete_record(...)` | Use strings not enums |
| `fg.ingest(data_frame=df)` | `ingest_dataframe(feature_group_name=..., data_frame=df)` | Standalone function |
| `fg.athena_query()` | `create_athena_query(feature_group_name, session)` | Standalone function |
| `fg.as_hive_ddl()` | `as_hive_ddl(feature_group_name)` | Standalone function |
| `fg.load_feature_definitions(df)` | `load_feature_definitions_from_dataframe(df)` | Returns list, no mutation |
| `FeatureStore(session).search(...)` | `FeatureStore.search(resource=..., search_expression=...)` | Core resource class |
| `FeatureStore.create_dataset(...)` | `DatasetBuilder.create(...)` | Dataclass-based builder |
| Config shapes with `to_dict()` | Pydantic shapes (auto-serialization) | No manual serialization needed |
| `TargetStoreEnum.ONLINE_STORE.value` | `"OnlineStore"` (plain strings) | Enums available but strings preferred |

## Functionality Level Mapping

### Training
Expand Down Expand Up @@ -766,6 +920,12 @@ from sagemaker.train import ModelTrainer # ✅

from sagemaker.model import Model # ❌
from sagemaker.serve import ModelBuilder # ✅

from sagemaker.feature_store.feature_group import FeatureGroup # ❌
from sagemaker.mlops.feature_store import FeatureGroup # ✅

from sagemaker.feature_store.inputs import FeatureValue # ❌
from sagemaker.mlops.feature_store import FeatureValue # ✅
```

### 2. Parameter Mapping
Expand Down
45 changes: 45 additions & 0 deletions sagemaker-mlops/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,51 @@ The following files were moved from `sagemaker-core/src/sagemaker/core/workflow/
- `retry.py` - Retry policies
- `selective_execution_config.py` - Selective execution settings

### Feature Store

The Feature Store module (`sagemaker.mlops.feature_store`) provides comprehensive support for Amazon SageMaker Feature Store operations. This is the V3 equivalent of the V2 `sagemaker.feature_store` module.

**Key Modules:**

- `__init__.py` - Re-exports all Feature Store components from a single entry point
- `feature_definition.py` - Feature definition helpers (FractionalFeatureDefinition, IntegralFeatureDefinition, etc.)
- `feature_utils.py` - Utility functions (ingest_dataframe, create_athena_query, as_hive_ddl, etc.)
- `ingestion_manager_pandas.py` - Multi-threaded DataFrame ingestion manager
- `athena_query.py` - Athena query execution and result retrieval
- `dataset_builder.py` - Dataset building with point-in-time joins across feature groups
- `inputs.py` - Enums for Feature Store operations (TargetStoreEnum, DeletionModeEnum, etc.)
- `feature_processor/` - Feature processor for PySpark-based transformations

**Quick Start:**

```python
from sagemaker.mlops.feature_store import (
FeatureGroup,
OnlineStoreConfig,
OfflineStoreConfig,
S3StorageConfig,
load_feature_definitions_from_dataframe,
ingest_dataframe,
create_athena_query,
)

# Create a feature group
feature_defs = load_feature_definitions_from_dataframe(df)
FeatureGroup.create(
feature_group_name="my-feature-group",
feature_definitions=feature_defs,
record_identifier_feature_name="id",
event_time_feature_name="timestamp",
role_arn=role,
online_store_config=OnlineStoreConfig(enable_online_store=True),
)

# Ingest data
ingest_dataframe(feature_group_name="my-feature-group", data_frame=df, max_workers=4)
```

> **Migrating from V2?** See the detailed [Feature Store Migration Guide](src/sagemaker/mlops/feature_store/MIGRATION_GUIDE.md) for V2-to-V3 migration instructions.

### Model Building

ModelBuilder is now located in the `sagemaker-serve` package but is re-exported from MLOps for convenience.
Expand Down
10 changes: 9 additions & 1 deletion sagemaker-mlops/src/sagemaker/mlops/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
"""SageMaker MLOps package for workflow orchestration and model building.

This package provides high-level orchestration capabilities for SageMaker workflows,
including pipeline definitions, step implementations, and model building utilities.
including pipeline definitions, step implementations, model building utilities,
and Feature Store operations.

The MLOps package sits at the top of the dependency hierarchy and can import from:
- sagemaker.core (foundation primitives)
Expand All @@ -11,10 +12,12 @@
Key components:
- workflow: Pipeline and step orchestration
- model_builder: Model building and orchestration
- feature_store: Feature Store operations (FeatureGroup, ingestion, Athena queries)

Example usage:
from sagemaker.mlops import ModelBuilder
from sagemaker.mlops.workflow import Pipeline, TrainingStep
from sagemaker.mlops.feature_store import FeatureGroup, ingest_dataframe
"""
from __future__ import absolute_import

Expand All @@ -27,7 +30,12 @@
# from sagemaker.mlops import workflow
# from sagemaker.mlops.workflow import Pipeline, TrainingStep, etc.

# Feature Store submodule is available via:
# from sagemaker.mlops import feature_store
# from sagemaker.mlops.feature_store import FeatureGroup, ingest_dataframe, etc.

__all__ = [
"ModelBuilder",
"workflow", # Submodule
"feature_store", # Submodule - Feature Store operations
]
Loading
Loading