-
Notifications
You must be signed in to change notification settings - Fork 93
DataJoint 2.0 #1311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
dimitri-yatsenko
wants to merge
249
commits into
master
Choose a base branch
from
pre/v2.0
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
DataJoint 2.0 #1311
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
use src layout
update test workflow to use src layout
use pytest to manage docker container startup for tests
Chore/dev env fixes
Add CSS media query for prefers-color-scheme: dark to automatically adapt table preview styling to dark mode environments. Dark mode colors: - Table header: #4a4a4a background - Odd rows: #2d2d2d background, #e0e0e0 text - Even rows: #3d3d3d background, #e0e0e0 text - Primary key: #bd93f9 (purple accent) - Borders: #555555 Uses browser-native dark mode detection - no JavaScript or config needed. Light mode styling remains unchanged for backward compatibility. Fixes #1167 Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
When a user answers "no" to "Commit deletes?", the transaction is rolled back but delete() still returned the count of rows that would have been deleted. This was unintuitive - if nothing was deleted, the return value should be 0. Now delete() returns 0 when: - User cancels at the prompt - Nothing to delete (already worked correctly) Fixes #1155 Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
When inspect.getmembers() encounters modules with objects that have non-standard __bases__ attributes (like _ClassNamespace from typing internals), it raises TypeError. This caused dj.Diagram(schema) to fail intermittently depending on what modules were imported. Now catches TypeError in addition to ImportError, allowing the search to continue by skipping problematic modules. Fixes #1072 Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
) `text` is no longer a core DataJoint type. It remains available as a native SQL passthrough type (with portability warning). Rationale: - Core types should encourage structured, bounded data - varchar(n) covers most legitimate text needs with explicit bounds - json handles structured text better - <object> is better for large/unbounded text (files, sequences, docs) - text behavior varies across databases, hurting portability Changes: - Remove `text` from CORE_TYPES in declare.py - Update NATIVE_TEXT pattern to match plain `text` (in addition to tinytext, mediumtext, longtext) - Update archive docs to note text is native-only Users who need unlimited text can: - Use varchar(n) with generous limit - Use json for structured content - Use <object> for large text files - Use native text types with portability warning Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
When dropping a table/schema referenced by foreign key constraints, MySQL returns error 3730. This was passing through as a raw pymysql OperationalError, making it difficult for users to catch and handle. Now translates to datajoint.errors.IntegrityError, consistent with other foreign key constraint errors (1217, 1451, 1452). Before: pymysql.err.OperationalError: (3730, "Cannot drop table...") After: datajoint.errors.IntegrityError: Cannot drop table '#table' referenced by a foreign key constraint... Fixes #1032 Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Add __enter__ and __exit__ methods to Connection for use with Python's
`with` statement. This enables automatic connection cleanup, particularly
useful for serverless environments (AWS Lambda, Cloud Functions).
Usage:
with dj.Connection(host, user, password) as conn:
schema = dj.schema('my_schema', connection=conn)
# perform operations
# connection automatically closed
Closes #1081
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* perf: implement lazy imports for heavy dependencies Defer loading of heavy dependencies (networkx, matplotlib, click, pymysql) until their associated features are accessed: - dj.Diagram, dj.Di, dj.ERD -> loads diagram.py (networkx, matplotlib) - dj.kill -> loads admin.py (pymysql via connection) - dj.cli -> loads cli.py (click) This reduces `import datajoint` time significantly, especially on macOS where import overhead is higher. Core functionality (Schema, Table, Connection, etc.) remains immediately available. Closes #1220 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: cache lazy imports correctly and expose diagram module - Cache lazy imports in globals() to override the submodule that importlib automatically sets on the parent module - Add dj.diagram to lazy modules (returns module for diagram_active access) - Add tests for cli callable and diagram module access Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* fix: raise error when table declaration fails due to permissions Previously, AccessError during table declaration was silently swallowed, causing tables with cross-schema foreign keys to fail without any feedback when the user lacked REFERENCES privilege. Now: - If table already exists: suppress error (idempotent declaration) - If table doesn't exist: raise AccessError with helpful message about CREATE and REFERENCES privileges Closes #1161 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test: update test to expect AccessError at declaration time The test previously expected silent failure at declaration followed by error at insert time. Now we fail fast at declaration time (better UX). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Fix read_cell_array to handle edge cases from MATLAB:
- Empty cell arrays ({})
- Cell arrays with empty elements ({[], [], []})
- Nested/ragged arrays ({[1,2], [3,4,5]})
- Cell matrices with mixed content
The fix uses dtype='object' to avoid NumPy's array homogeneity
requirements that caused reshape failures with ragged arrays.
Closes #1056
Closes #1098
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* fix: provide helpful error when table heading is not configured
When using tables from non-activated schemas, operations that access
the heading now raise a clear DataJointError instead of confusing
"NoneType has no attribute" errors.
Example:
schema = dj.Schema() # Not activated
@Schema
class MyTable(dj.Manual): ...
MyTable().heading # Now raises: "Table `MyTable` is not properly
# configured. Ensure the schema is activated..."
Closes #1039
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: Allow heading introspection on base tier classes
The heading property now returns None for base tier classes (Lookup,
Manual, Imported, Computed, Part) instead of raising an error. This
allows Python's help() and inspect modules to work correctly.
User-defined table classes still get the helpful error message when
trying to access heading on a non-activated schema.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
…cs (#1328) * feat: Add consistent URL representation for all storage paths (#1326) Implements unified URL handling for all storage backends including local files: - Add URL_PROTOCOLS tuple including file:// - Add is_url() to check if path is a URL - Add normalize_to_url() to convert local paths to file:// URLs - Add parse_url() to parse any URL into protocol and path - Add StorageBackend.get_url() to return full URLs for any backend - Add comprehensive unit tests for URL functions This enables consistent internal representation across all storage types, aligning with fsspec's unified approach to filesystems. Closes #1326 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test: Remove redundant URL tests from test_object.py The TestRemoteURLSupport class tested is_remote_url and parse_remote_url which were renamed to is_url and parse_url. These tests are now redundant as comprehensive coverage exists in tests/unit/test_storage_urls.py. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: Remove trailing whitespace from blank line in json.ipynb Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * style: Apply ruff formatting Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore: Remove accidentally committed local config files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * style: Apply ruff formatting to test files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: Remove orphaned archive documentation Content has been migrated to datajoint-docs repository. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Ensures PR #1311 automatically receives breaking and bug labels. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Makes tables more compact in notebook displays. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Documentation is now consolidated in datajoint-docs repository. Changes: - Delete docs/ folder (legacy MkDocs infrastructure) - Create ARCHITECTURE.md with transpiler design docs - Update README.md links to point to docs.datajoint.com The Developer Guide remains in README.md. Internal architecture documentation for contributors is now in ARCHITECTURE.md. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Transpilation documentation moved to datajoint-docs query-algebra spec. Developer docs now consolidated in README.md. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Rename CHANGELOG.md to CHANGELOG-archive.md with redirect to GitHub Releases - Add "Writing Release Notes" section to RELEASE_MEMO.md: - Categories (BREAKING, Added, Changed, Deprecated, Fixed, Security) - Format template with examples - Guidelines for good release notes - PR label mapping for release drafter Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Slim README.md to essentials (intro, badges, install, links) - Create CONTRIBUTING.md with: - Development setup (pixi and pip) - Test running instructions - Pre-commit hooks - Environment variables - Condensed docstring style guide - Delete DOCSTRING_STYLE.md (merged into CONTRIBUTING.md) README: 218 → 82 lines All detailed docs now at docs.datajoint.com Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add .github/DISCUSSION_TEMPLATE/rfc.yml for enhancement proposals - Fix table header alignment (center instead of right) - Fix excessive padding in table headers by removing p tag margins Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
DataJoint 2.0 is a major release that modernizes the entire codebase while maintaining backward compatibility for core functionality. This release focuses on extensibility, type safety, and developer experience.
Planning: DataJoint 2.0 Plan | Milestone 2.0
Major Features
Codec System (Extensible Types)
Replaces the adapter system with a modern, composable codec architecture:
<blob>,<json>,<attach>,<filepath>,<object>,<hash><blob>wraps<json>for external storage)__init_subclass__validate()method for type checking before insertSemantic Matching
Attribute lineage tracking ensures joins only match semantically compatible attributes:
idornamesemantic_check=Falsefor legacy permissive behaviorPrimary Key Rules
Rigorous primary key propagation through all operators:
dj.U('attr')creates ad-hoc grouping entitiesAutoPopulate 2.0 (Jobs System)
Per-table job management with enhanced tracking:
~~_job_timestampand~~_job_durationcolumns~~table_namejob tabletable.progress()returns (remaining, total)Modern Fetch & Insert API
New fetch methods:
to_dicts()- List of dictionariesto_pandas()- DataFrame with PK as indexto_arrays(*attrs)- NumPy arrays (structured or individual)keys()- Primary keys onlyfetch1()- Single rowInsert improvements:
validate()- Check rows before insertingchunk_size- Batch large insertsinsert_dataframe()- DataFrame with index handlingType Aliases
Core DataJoint types for portability:
int8,int16,int32,int64uint8,uint16,uint32,uint64float32,float64booluuidObject Storage
Content-addressed and object storage types:
<hash>- Content-addressed storage with deduplication<object>- Named object storage (Zarr, folders)<filepath>- Reference to managed files<attach>- File attachments (uploaded on insert)Virtual Schema Infrastructure (#1307)
New schema introspection API for exploring existing databases:
Schema.get_table(name)- Direct table access with auto tier prefix detectionSchema['TableName']- Bracket notation accessfor table in schema- Iterate tables in dependency order'TableName' in schema- Check table existencedj.virtual_schema()- Clean entry point for accessing schemasdj.VirtualModule()- Virtual modules with custom namesCLI Improvements
The
djcommand-line interface for interactive exploration:dj -s schema:alias- Load schemas as virtual modules--host,--user,--password- Connection options-hconflict with--helpSettings Modernization
Pydantic-based configuration with validation:
dj.config.override()context manager.secrets/)DJ_HOST, etc.)License Change
Changed from LGPL to Apache 2.0 license (#1235 (discussion)):
Breaking Changes
Removed Support
fetch()with format parametersafemodeparameter (useprompt)create_virtual_module(usedj.virtual_schema()ordj.VirtualModule())~logtable (IMPR: Deprecate and Remove the~logTable. #1298)API Changes
fetch()→to_dicts(),to_pandas(),to_arrays()fetch(format='frame')→to_pandas()fetch(as_dict=True)→to_dicts()safemode=False→prompt=FalseSemantic Changes
Documentation
Developer Documentation (this repo)
Comprehensive updates in
docs/:User Documentation (datajoint-docs)
Full documentation site following the Diátaxis framework:
Tutorials (learning-oriented, Jupyter notebooks):
How-To Guides (task-oriented):
Reference (specifications):
Project Structure
src/layout for proper packaging (IMPR:srclayout #1267)Test Plan
Closes
Milestone 2.0 Issues
~logTable. #1298 - Deprecate and remove~logtablesuper.deletekwargs toPart.delete#1276 - Part.delete kwargs pass-throughsrclayout #1267 -srclayoutdj.Toporders the preview withorder_by#1242 -dj.Toporders the preview withorder_byBug Fixes
pyarrow(apandasdependency) #1202 - DataJoint import error with missing pyarrowValueErrorin DataJoint-Python 0.14.3 when using numpy 2.2.* #1201 - ValueError with numpy 2.2dj.Diagram()and new release ofpydot==3.0.*#1169 - Error with dj.Diagram() and pydot 3.0Improvements
Related PRs
Migration Guide
See How to Migrate from 1.x for detailed migration instructions.
🤖 Generated with Claude Code