Type Preservation in Redis Data Structures

This document explains how Python types are preserved when storing data in Redis using our data structures.

Overview

Redis natively supports only a limited set of data types, and JSON serialization typically loses information about Python's rich type system. Our implementation preserves Python types through:

A type registry system for managing custom types
Built-in type handlers for common Python types
Automatic serialization/deserialization with type information
Pydantic model support for schema validation and complex types
Automatic compression of large data using zlib (configurable via REDIS_DS_COMPRESSION_THRESHOLD environment variable or compression_threshold in the Config class)

Built-in Type Support

The following Python types are automatically preserved through built-in type handlers:

Primitive Types
- int, float, str, bool, NoneType
- Automatically preserved without special handling
```
hash_map.set("key", 42)  # int
hash_map.set("key2", True)  # bool
```

Collections

list, dict, set, tuple
Nested structures maintain their type information

data = {
    "tuple": (1, 2, 3),
    "set": {4, 5, 6},
    "list": [7, 8, (9, 10)]
}
hash_map.set("nested", data)

Date and Time

datetime (preserved with timezone in ISO format)
timedelta (stored as total seconds)

from datetime import datetime, timezone, timedelta
data = datetime.now(timezone.utc)
hash_map.set("date", data)
hash_map.set("duration", timedelta(hours=1))

Binary Data
- bytes (preserved using hexadecimal encoding)
```
data = b"binary data"
hash_map.set("bytes", data)
```

Unique Identifiers

uuid.UUID objects

import uuid
data = uuid.uuid4()
hash_map.set("id", data)

Custom Type Support

Standard Class Approach

For custom types, inherit from SerializableType and implement the required methods:

from redis_data_structures.base import SerializableType

class User(SerializableType):
    def __init__(self, name: str, age: int):
        self.name = name
        self.age = age

    def to_dict(self) -> dict:
        return {
            "name": self.name,
            "age": self.age
        }

    @classmethod
    def from_dict(cls, data: dict) -> "User":
        return cls(data["name"], data["age"])

    def __eq__(self, other):
        # Override __eq__ for proper equality comparison
        return isinstance(other, User) and self.name == other.name and self.age == other.age

Pydantic Integration

For complex types with validation requirements, use Pydantic models directly:

from pydantic import BaseModel, Field
from datetime import datetime
from typing import Optional, Set

class Address(BaseModel):
    street: str
    city: str
    country: str
    postal_code: Optional[str] = None

class UserModel(BaseModel):
    name: str
    email: str
    age: int = Field(gt=0, lt=150)
    joined: datetime
    address: Optional[Address] = None
    tags: Set[str] = set()

See more examples here: type_preservation_example.py, serialization_example.py

Type Registration

Automatic Registration

Types are automatically registered when you first store them in a Redis data structure:

user = User("John", 30)
hash_map.set("user", user)  # User type is automatically registered

model = UserModel(name="Jane", email="jane@example.com", age=25)
hash_map.set("model", model)  # UserModel is automatically registered

Manual Registration for Consumers

In distributed systems where some processes only consume data (without storing any), you need to manually register types before deserializing since the type registering is only automatically done when storing data:

# In consumer processes, register types before reading data
hash_map = HashMap("my_key")

# Register custom types
hash_map.register_types(User)
hahs_map.register_types(UserModel)

# Or you can also register multiple types at once
hash_map.register_types([User, UserModel])

# Now you can safely deserialize data
user = hash_map.get("user")  # Will correctly deserialize as User instance
model = hash_map.get("model")  # Will correctly deserialize as UserModel instance

This is particularly important in scenarios like:

Worker processes that only read from queues
Read-only replicas or analytics services
Monitoring or logging systems
Any process that doesn't write data but needs to read it

Complex Type Hints and Keys

Complex Types as Keys

Our Redis data structures support using complex types (custom classes or Pydantic models) as keys with automatic serialization and type preservation. Unlike standard Python dictionaries, you don't need to implement __hash__ or make your types immutable:

class TestModel(BaseModel):
    id: int
    name: str

# Complex types as keys with automatic serialization
cache = LRUCache[TestModel, str]("test_cache", max_size=10)
cache.put(TestModel(id=1, name="one"), "one")
cache.put(TestModel(id=2, name="two"), "two")

# Keys maintain their types when retrieved
assert cache.get(TestModel(id=1, name="one")) == "one"
assert cache.get(TestModel(id=2, name="two")) == "two"

# Works with custom SerializableType classes too
class User(SerializableType):
    def __init__(self, id: int, name: str):
        self.id = id
        self.name = name

    def to_dict(self) -> dict:
        return {"id": self.id, "name": self.name}

    @classmethod
    def from_dict(cls, data: dict) -> "User":
        return cls(data["id"], data["name"])

hash_map = HashMap[User, dict]("user_data")
hash_map[User(id=1, name="Alice")] = {"role": "admin"}
hash_map[User(id=2, name="Bob")] = {"role": "user"}

# Keys are automatically serialized and deserialized
stored_data = hash_map[User(id=1, name="Alice")]
assert stored_data["role"] == "admin"

# You can get all keys with their original types
keys = hash_map.keys()  # Returns list of User objects

Type Hints Support

The library supports Python's type hints system for better IDE support and runtime type checking:

from typing import Optional, List, Dict, Set
from datetime import datetime

class UserProfile(BaseModel):
    name: str
    age: int
    joined: datetime
    tags: Set[str] = set()

# Type hints provide IDE support and runtime checking
cache: LRUCache[str, UserProfile] = LRUCache("user_profiles", max_size=1000)
hash_map: HashMap[int, UserProfile] = HashMap("user_data")
dict_map: Dict[str, List[UserProfile]] = Dict("user_groups")

# Complex nested types are supported
cache: LRUCache[UserProfile, Dict[str, List[UserProfile]]] = LRUCache("nested_data")
hash_map: HashMap[UserProfile, List[Dict[str, Any]]] = HashMap("complex_data")

When using type hints:

Both key and value types are automatically serialized and deserialized
Type information is preserved across Redis operations
IDE autocompletion and type checking work as expected
No need to implement special methods like __hash__ or __eq__

Best Practices for Complex Keys

Keep Keys Simple and Meaningful

# Good: Key contains only necessary identifying information
class UserKey(SerializableType):
    def __init__(self, id: int, department: str):
        self.id = id
        self.department = department

# Bad: Key contains unnecessary data
class UserKeyBad(SerializableType):
    def __init__(self, id: int, department: str, full_profile: dict):
        self.id = id
        self.department = department
        self.full_profile = full_profile  # Unnecessary in key

Use Appropriate Types

# Good: Using proper types for key components
class SessionKey(BaseModel):
    user_id: int
    session_id: str
    created_at: datetime

# Bad: Using strings for everything
class SessionKeyBad(BaseModel):
    user_id: str  # Should be int
    session_id: str
    created_at: str  # Should be datetime

Consider Key Size

Keep keys concise to minimize storage and lookup overhead
Include only fields necessary for uniquely identifying the data

# Good: Minimal key with essential fields
class DocumentKey(BaseModel):
    doc_id: int
    version: int

# Bad: Bloated key
class DocumentKeyBad(BaseModel):
    doc_id: int
    version: int
    title: str  # Unnecessary
    author: str  # Unnecessary
    created_at: datetime  # Unnecessary
    tags: List[str]  # Unnecessary

Implementation Details

Type Registry System

The serialization system uses two registries managed by the TypeRegistry class:

Custom Type Registry: For classes inheriting from SerializableType
Pydantic Type Registry: For Pydantic models

# Types are automatically registered during serialization
user = User("John", 30)
hash_map.set("user", user)  # Registers User class in custom type registry

model = UserModel(name="Jane", email="jane@example.com", age=25)
hash_map.set("model", model)  # Registers UserModel in pydantic registry

Serialization Process

graph TD
    A[Start Serialization] --> B{Is data a Pydantic model?}
    B -- Yes --> C[Prepare raw_str_data for Pydantic]
    C --> D[Register Pydantic type]
    B -- No --> E{Is data a SerializableType?}
    E -- Yes --> F[Prepare raw_str_data for Custom Type]
    F --> G[Register Custom type]
    E -- No --> H[Call _serialize_recursive]
    H --> I[Get serialized data]
    I --> J[Convert to JSON string]
    J --> K{Is data large enough for compression?}
    K -- Yes --> L[Compress data]
    K -- No --> M[Return raw_str_data]

    M --> N[End Serialization]

The serialization process follows these steps:

Type Detection
- Checks if the value is a Pydantic model
- Checks if the value is a SerializableType
- Falls back to built-in type handlers
Data Transformation
- Converts objects to a dictionary format with type information
- Handles nested structures recursively
- Preserves type information in the _type field
Compression
- Large serialized data is automatically compressed using zlib
- Compression threshold is configurable
- Compressed data is marked with a special prefix

Deserialization Process

graph TD
    N[Start Deserialization] --> O{Is data empty?}
    O -- Yes --> Q[Return None]
    O -- No --> R[Decode data]
    R --> S{Is data compressed?}
    S -- Yes --> T[Decompress data]
    S -- No --> U[Load JSON data]
    U --> V{Is _registry present?}
    V -- Yes --> W{Is it Pydantic?}
    W -- Yes --> X[Validate using Pydantic model]
    W -- No --> Y[Use Custom type to create instance]
    V -- No --> Z[Call _deserialize_recursive]
    Z --> AA[Return deserialized data]

    AA --> AB[End Deserialization]

The deserialization process:

Compression Check
- Detects if data is compressed (checks for compression marker)
- Decompresses if necessary
Type Resolution
- Checks registry type (_registry field)
- Uses appropriate registry to reconstruct objects
- Falls back to built-in type handlers
Object Recreation
- Reconstructs objects using registered type information
- Handles nested structures recursively

Best Practices

Type Registration
- Register all types explicitly to avoid missing types when deserializing (important for consumer processes)
- Keep type names unique across your application
Performance Considerations
- Configure compression threshold based on your data size, default is 1024 bytes
- Use appropriate serialization methods for your data types
- Consider the overhead of complex nested structures

Custom Type Implementation

Always override __eq__ in SerializableType subclasses
The default __eq__ implementation compares to_dict() output, which may not be what you want
Implement proper equality comparison based on your type's semantics

def __eq__(self, other):
    return (
        isinstance(other, self.__class__) and  # Check type
        self.field1 == other.field1 and        # Compare relevant fields
        self.field2 == other.field2
    )

Error Handling

try:
    result = hash_map.get("key")
except ValueError as e:
    logger.error(f"Unsupported type or serialization error: {e}")

Limitations

Circular References
- Not supported due to JSON serialization
- Will raise RecursionError
Dynamic Types
- Lambda functions and dynamic code cannot be serialized
- File handles and sockets are not supported
Type Consistency
- Type names must be consistent across your application
- Changing class definitions may break deserialization

Future Enhancements

Additional Type Support
- Support for more built-in Python types
- Custom type handler registration API
Performance Optimizations
- Alternative compression algorithms
- Lazy deserialization options
- Caching improvements
Developer Experience
- Enhanced error messages
- Debug logging options
- Type hint improvements

Contributing

To add support for new built-in types:

Add a type handler to the Serializer class:

self.type_handlers["your_type"] = {
    "serialize": lambda x: {"_type": "your_type", "value": ...},
    "deserialize": lambda x: YourType(x["value"])
}

Add tests for the new type handler
Update this documentation
Submit a pull request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Type Preservation in Redis Data Structures

Overview

Built-in Type Support

Custom Type Support

Standard Class Approach

Pydantic Integration

Type Registration

Automatic Registration

Manual Registration for Consumers

Complex Type Hints and Keys

Complex Types as Keys

Type Hints Support

Best Practices for Complex Keys

Implementation Details

Type Registry System

Serialization Process

Deserialization Process

Best Practices

Limitations

Future Enhancements

Contributing

Uh oh!

FilesExpand file tree

type_preservation.md

Latest commit

History

type_preservation.md

File metadata and controls

Type Preservation in Redis Data Structures

Overview

Built-in Type Support

Custom Type Support

Standard Class Approach

Pydantic Integration

Type Registration

Automatic Registration

Manual Registration for Consumers

Complex Type Hints and Keys

Complex Types as Keys

Type Hints Support

Best Practices for Complex Keys

Implementation Details

Type Registry System

Serialization Process

Deserialization Process

Best Practices

Limitations

Future Enhancements

Contributing