0.7 alpha 2#258
Open
Chenglong-MS wants to merge 393 commits intomainfrom
Open
Conversation
…vider model, transitioning from a multi-provider chain to a single provider with anonymous fallback. Updated sections on protocol support and plugin design patterns for improved clarity and structure.
- 移除1-sso-plugin-architecture.md中重复的插件接口定义,改为引用1-data-source-plugin-architecture.md - 新增2-external-dataloader-enhancements.md详细说明外部数据加载器的三项改进方案: 1. 数据库元数据拉取(P0) 2. SSO Token透传(P1) 3. 凭证持久化(P2) - 明确各数据库实现细节和优先级规划
添加详细的开发路线图文档,包含SSO认证、数据源插件框架、凭证保险箱等功能的实施计划与测试策略
补充 Superset 集成代码迁移说明,调整步骤编号,并添加文档交付要求
补充数据溯源描述的设计决策和实现方式,使用模板拼接而非AI生成来保证准确性和可刷新性。描述内容包括来源、筛选条件、时间范围等,并自动存储到loader_metadata中供前端和AI使用。
更新设计文档,将所有数据源插件的环境变量前缀从裸前缀(如SUPERSET_)统一修改为PLG_前缀(如PLG_SUPERSET_),以避免与LLM模型配置的环境变量冲突
更新数据源插件架构文档,简化外部元数据设计方案为不透明 blob 格式 添加新的设计文档分析多语言提示词注入问题及解决方案
完善语言注入架构说明,明确需要修复的工作区命名问题 移除不相关的解决方案,聚焦现有架构的合理使用
…rning functionality Add sanitize_error_message feature in error handling module to securely sanitize all error details returned to clients Add streaming warning handling mechanism, including stream_warning_event and collect_stream_warning/flush_stream_warnings utilities Update development documentation, supplement streaming warning specifications and usage instructions
fix(oidc): use issuer returned by discovery endpoint to update configuration docs: update OIDC security guide, add GitHub private email and state validation instructions feat(github): add state parameter validation and private email retrieval functionality test: add contract tests for mainstream SSO providers
refactor(测试): 优化BigQuery模拟器探针逻辑避免阻塞 docs(测试): 更新README文档说明新的测试服务管理方式 fix(测试): 移除MySQL和PostgreSQL测试中不必要的row_count检查 style(测试): 格式化Superset测试代码并更新API端点
Feature/plugin architecture
…ensitive query parameters refactor(frontend): extract rule pre-content processing logic into independent module docs: update documentation for error handling and log desensitization test: add unit tests for log desensitization and rule pre-content
fix(security): enhance log desensitization functionality to support s…
Add test cases to verify automatic log rotation by date functionality Refactor ReasoningLogger class, extract log file path related properties as instance variables Add _ensure_fd_for_today method to handle log file switching when date changes
…ing document Add document 15.3 with detailed planning for Agent knowledge injection matrix and search tool implementation Update document 15 table format and add link to document 15.3
1. 重构元数据同步流程,支持从缓存读取富元数据 2. 在数据摘要中自动注入目录元数据(表/列描述、标签等) 3. 前端展示合并后的元数据状态和用户标注提示 4. 新增测试验证元数据同步和展示逻辑 - 目录缓存支持两种写入模式(replace/seed_if_missing) - 搜索工具返回source_id/table_key用于元数据读取 - 优化数据加载时的行数截断提示 - 统一元数据状态枚举和前端多语言支持
…on field support Add support for verbose_name and expression fields in Agent context, while preserving dual-source descriptions from source_description and user_description
Feature/plugin architecture
Add _build_sort_clause static method for building ORDER BY clauses Modify fetch_data_as_arrow method to support sorting options Add related unit tests to verify sorting functionality
… security design document Add ISSUE-005 design document, detailing the current status of sorting capabilities and column name concatenation security issues in data loaders, root cause analysis, fix plan, and testing strategy
Feature/plugin architecture
… library Merge skills and experiences directories into a unified library directory, keeping rules independent. Main changes include: - Modify constant definitions and initialization logic in knowledge/store.py - Update category references in API routes and Agent tools - Adjust frontend type definitions and state management - Optimize ExperienceDistillAgent prompts to distill general methodologies - Update related test cases
…periences - Merge original three directories rules/skills/experiences into rules/experiences - KnowledgeStore.search() automatically skips rules with alwaysApply=true - Update ExperienceDistillAgent to distill general methodologies rather than specific cases - Frontend simplified from three tabs to Rules/Experiences two sections - Injection text uses semantic tags [knowledge]/[rule] instead of directory names
…ience distillation Add timeout parameter configuration in API, backend routes, and agent, frontend sets request timeout based on configuration
…gorithm Refactor knowledge base rule injection logic, unify duplicate code into KnowledgeStore.format_rules_block() method, support preloading data to avoid secondary disk reads. Improve search algorithm to tokenization + multi-field weighted matching, support Chinese-English mixed query splitting and table name tag bonus. Update related documentation and test cases. - Add format_rules_block() and load_always_apply_rules() methods - Implement _tokenize_query() supporting Chinese-English mixed tokenization - Improve _match_score() weighted algorithm and table name tag bonus - Update DataLoadingAgent to use last user message as search query - Refactor rule injection code in 6 Agents - Improve test coverage and development documentation
- Update Superset SSO configuration example, add DF Token Exchange endpoint - Supplement SSO token exchange mode documentation, including flow, deployment steps, and security notes - Mark user metadata feature as completed, abandon imported table editing approach - Streamline knowledge system documentation, archive completed content - Update user isolation design document, record implemented portions - Adjust knowledge injection planning document, highlight core conclusions and implemented portions
Add Chinese and English translations for Agent logs, including status messages and expand/collapse functionality Implement long message folding and expanding functionality, optimize message display experience Enhance Agent step display, support icon differentiation for error, warning, and info states Fix JSON parsing issues when weak models call tools, add validation logic
Feature/plugin architecture
…le loading plan support Implement AI functionality for data loading assistant, including: 1. Add LoadPlan type to support multi-table loading plans 2. Add tools for searching data candidates, reading metadata, and proposing loading plans 3. Implement loading plan confirmation card UI component 4. Update i18n translations to support new features 5. Add backend test cases to verify data discovery tools 6. Rename "Data Extraction" to "Data Assistant" to better reflect functionality
feat(data loading): add AI data assistant functionality and multi-tab…
- Add pagination and sorting functionality to table component - Support backend pagination queries and local sorting - Add loading state display and pagination information - Update internationalization text to support pagination display
Set unified maximum row limit for data loading to 2 million rows and remove frontend row selector and related code. Modify all data loaders to use MAX_IMPORT_ROWS constant and update related tests. - Add MAX_IMPORT_ROWS constant (2 million rows) and apply to all data loaders - Remove frontend row selector component and related UI code - Update test cases to verify row limit logic - Adjust data loading chat agent to use system-configured row limit
Update related design documents and development guidelines to reflect unified limit policy
Feature/plugin architecture
Global row limit is now handled automatically by the system, so remove all rowLimit related logic from frontend, backend, and test code
refactor(data loading): remove rowLimit related code
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Summary
Agents & AI Pipeline
agent_py_data_rec,agent_sql_data_rec,agent_py_data_transform,agent_sql_data_transform,agent_concept_derive,agent_py_concept_derive,agent_data_clean, andagent_explorationinto three unified agents:data_agent.py,agent_data_rec.py, andagent_data_transform.pysemantic_types.pybackend module and full frontend type registry (src/lib/agents-chart/core/type-registry.ts,field-semantics.ts,semantic-types.ts) with domain shape inference, tick constraints, zero-baseline classification, and snap-to-bound heuristicsagent_chart_insight.pyfor AI-generated chart takeawaysagent_language.pyfor i18n-aware promptsagent_diagnostics.pywith unified diagnostic information builder for better error reportingVisualization
src/lib/agents-chart/, 120 files, ~44K lines) with multi-backend support for Vega-Lite, ECharts, Chart.js, and GoFish — includes template system, semantic-aware axis/domain/tick handling, color decisions, layout computation, faceting, and overflow filteringChartGallery.tsxwith expanded chart type support including pie, US map, world map, bump, candlestick, density, lollipop, pyramid, radar, rose, streamgraph, strip plot, waterfall, and moreChartRenderService.tsxreplacing static SVG rendering withvega-embedfor interactive chartsSimpleChartRecBox.tsxandchartRecommendation.tsfor improved chart suggestion workflowScoretype with small domain spans (e.g., [0,1]) no longer forces integer-only ticks, preserving intermediate decimal ticksData Thread & Workflow
DataThread.tsxrewrite, newDataThreadCards.tsx,InteractionEntryCard.tsx)useFormulateData.tsconsolidating data derivation logicTiptapReportEditor.tsx) with richer editing supportData Loading & Management
UnifiedDataUploadDialog.tsxreplacing the old table selection view — supports file upload, URL, paste, database, and sample datasets in a single dialog with loading state indicatorsMultiTablePreview.tsxfor previewing multiple tables before loadingtableThunks.tshandling all data source types with server-side workspace storageuseDataRefresh.tsxwith auto-refresh, stream data sources, andRefreshDataDialog.tsx#rowId) viaROW_NUMBER()in DuckDB and pandas paths, preserving original row positions after sortData Loaders (Database Plugins)
Datalake / Workspace Backend
datalake/package withworkspace.py,azure_blob_workspace.py,cached_azure_blob_workspace.py,file_manager.py,metadata.py,cache_manager.py,parquet_utils.py, andtable_names.pyworkspace_factory.pyfor configuration-driven workspace initializationsession_routes.pyfor session-level API endpointsSecurity
code_signing.pyfor generated code integrity verificationauth.pyfor authentication handlingurl_allowlist.pyfor URL validationsanitize.pyto prevent leaking sensitive info in error messagessandbox/package withlocal_sandbox.py,docker_sandbox.py,not_a_sandbox.py, andDockerfile.sandboxreplacing the oldpy_sandbox.pyidentity.tswith browser-based identity for multi-user supportInternationalization (i18n)
react-i18nextwith English and Chinese locale files across 7 namespaces (common, chart, encoding, messages, model, navigation, upload)TRANSLATION_GUIDE.mdfor contributorsUI & Design System
tokens.tswith centralized color, spacing, shadow, transition, and radius tokensDataFormulator.tsxandApp.tsxwithTopNavButton,AppShellnavigation, and model management UIEncodingShelfCard.tsxandEncodingShelfThread.tsxConceptCard.tsx,ConceptShelf.tsx,DerivedDataDialog.tsxModel Management
model_registry.pyfor managing model configurations server-sideModelSelectionDialog.tsxwith multi-model supportInfrastructure & DevOps
Dockerfile,docker-compose.yml,docker-compose.test.ymlwith volume permissions and sandbox user handling.devcontainer/devcontainer.jsonuv.lock, updatedpyproject.tomlandrequirements.txtTesting
vitest.config.ts,pytest.ini,conftest.py, frontend setup, andtest_plan.md