Skip to content

semantic search#329

Merged
Johnson-zs merged 4 commits into
linuxdeepin:semantic-searchfrom
Johnson-zs:semantic-search
Jun 26, 2026
Merged

semantic search#329
Johnson-zs merged 4 commits into
linuxdeepin:semantic-searchfrom
Johnson-zs:semantic-search

Conversation

@Johnson-zs

Copy link
Copy Markdown
Contributor
  • fix(dfm-io): drain GLib deferred callbacks after g_file_enumerate_children
  • fix(dfm-io): close FTS tree in destructor to prevent memory leak
  • perf(dfm-io): cache statx results in DFileInfoPrivate::attributesBySelf
  • fix: prevent BOM character loss in path concatenation

liujianqiang-niu and others added 4 commits June 26, 2026 08:46
…ldren

g_file_enumerate_children() internally creates and destroys temporary
GDBusProxy instances (mount tracker proxy, daemon proxy) whose
finalization defers the weak_ref_free callback to the thread-default
GMainContext via call_destroy_notify() as a g_idle_source.  In a QThread
without a GLib main loop, these idle sources are never dispatched,
causing the GWeakRef allocations (8 bytes each from gdbusproxy.c:112)
to leak.

Iterate the thread-default GMainContext immediately after the GVFS
enumeration call to process the deferred callbacks while still on the
same thread.

Log: Fixed 8-byte GWeakRef memory leak triggered by GVFS trash enumeration.

Assisted-by: deepseek-v4-pro

Influence:
1. Enumerate trash:/// under Valgrind — verify zero definitely-lost bytes at weak_ref_new
2. Test trash directory traversal with daemon running and file manager not in background
3. Verify normal file enumeration (local paths) is unaffected
4. Verify enumerate with timeout (QtConcurrent path) is unaffected
5. Verify rapid open/close of trash window does not accumulate leaks

fix(dfm-io): 在 g_file_enumerate_children 后处理 GLib 延迟回调,修复 GWeakRef 泄露

g_file_enumerate_children() 内部会创建并销毁临时的 GDBusProxy 实例
(mount tracker proxy 和 daemon proxy),其析构时通过
call_destroy_notify() 将 weak_ref_free 回调以 g_idle_source 的形式
挂载到线程默认的 GMainContext。在没有 GLib 主循环的 QThread 中,
这些 idle source 永远不会被调度,导致 GWeakRef 分配(每次 8 字节,
来自 gdbusproxy.c:112)泄露。

在 GVFS 枚举调用返回后立即迭代线程默认的 GMainContext,
在同一个线程上处理延迟回调,避免泄露。

Log: 修复 GVFS 回收站枚举时触发的 8 字节 GWeakRef 内存泄露。

Influence:
1. Valgrind 下枚举 trash:/// — 验证 weak_ref_new 处零 definitely lost
2. 测试 daemon 运行且文件管理器未驻留时回收站目录遍历
3. 验证普通本地文件枚举不受影响
4. 验证带超时的枚举(QtConcurrent 路径)不受影响
5. 验证快速打开/关闭回收站窗口不累积泄露
When initEnumerator(false) is called, openDirByfts() allocates an FTS
tree via fts_open().  However, LocalDirIterator::sortFileInfoList()
uses its own opendir()/readdir()/closedir() implementation and never
calls fts_close(), leaving the FTS tree (5,279 bytes including indirect
allocations) leaked.

Close the FTS tree in the destructor as a safety net so it is always
freed regardless of which sortFileInfoList() code path is taken.

Log: Fixed FTS tree memory leak when LocalDirIterator bypasses fts_close().

Assisted-by: deepseek-v4-pro

Influence:
1. Open a local directory with dde-file-manager - verify no fts_open leak under Valgrind
2. Test batch directory traversal with sort role set (sortFileInfoList path)
3. Verify one-by-one enumeration path is unaffected
4. Test vault and SMB directory traversal paths
5. Verify rapid open/close of file manager window does not accumulate fts_open leaks

fix(dfm-io): 在析构函数中关闭 FTS 树以防止内存泄露

initEnumerator(false) 调用 openDirByfts() 通过 fts_open() 分配
FTS 树,但 LocalDirIterator::sortFileInfoList() 使用自己的
opendir()/readdir()/closedir() 实现,从未调用 fts_close(),
导致 FTS 树(含间接分配共 5,279 字节)泄露。

在析构函数中添加 fts_close() 作为安全兜底,确保无论走哪个
sortFileInfoList() 代码路径,FTS 树都能被释放。

Log: 修复 LocalDirIterator 绕过 fts_close() 导致的 FTS 树内存泄露。

Influence:
1. 使用 dde-file-manager 打开本地目录 — Valgrind 下验证 fts_open 无泄露
2. 测试设置排序角色后的批量目录遍历(sortFileInfoList 路径)
3. 验证逐个遍历路径不受影响
4. 测试保险箱和 SMB 目录遍历路径
5. 验证快速打开/关闭文件管理器窗口不累积 fts_open 泄露
Introduce ensureStatxCached() to consolidate six independent statx
syscalls into one. Cache the result in mutable member fields so that
subsequent time-attribute queries reuse the same buffer.

在 attributesBySelf 中引入 ensureStatxCached() 方法,将 6 个时间
属性的重复 statx 系统调用合并为一次,结果缓存后各 case 直接读取。

Log: 优化 attributesBySelf 中 statx 调用,消除重复系统调用
Task: https://pms.uniontech.com/task-view-391003.html
Influence: 查询 kTimeCreated/Modified/Access 等属性时,statx 从最多
6 次降为 1 次,减少不必要的系统调用开销。
Use std::string for directory path concatenation to avoid QString's
normalization of UTF-8 BOM (U+FEFF / zero-width no-break space).

使用 std::string 进行路径拼接,避免 QString 对 UTF-8 BOM
(零宽不换行空格) 的规范化导致字节丢失。

Log: 修复路径拼接时 BOM 字符丢失的问题
Bug: https://pms.uniontech.com//bug-view-367075.html
Influence: 修复后包含 BOM/零宽不换行空格的路径能正确拼接,避免文件操作失败。
@deepin-ci-robot

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Johnson-zs

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @Johnson-zs, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@deepin-ci-robot

Copy link
Copy Markdown

deepin pr auto review

★ 总体评分:100分

■ 【总体评价】

代码精准修复了内存泄漏、空指针崩溃、路径遍历越权及重复系统调用四大缺陷,整体质量极高
逻辑严密、注释详实且性能优化效果显著,无任何扣分项

■ 【详细分析】

  • 1.语法逻辑(完全正确)✓

DEnumeratorPrivate::~DEnumeratorPrivate() 中正确释放了 fts 句柄并置空指针,防止悬空指针;createEnumerator() 中通过 g_main_context_get_thread_default() 获取上下文并在非空时调用 g_main_context_iteration,逻辑严谨;buildUrl() 增加了空指针拦截,ensureStatxCached() 中使用 statxCached 布尔值作为守卫条件,避免了重复进入系统调用,头文件 <sys/stat.h> 的移动保证了编译依赖的正确性

  • 2.代码质量(优秀)✓

消除了 attributesBySelf() 中 6 处完全重复的 statx 调用代码块,将其优雅地抽取为 ensureStatxCached() 方法,符合 DRY 原则;新增的 [[nodiscard]] 属性强制调用方处理返回值,规范了接口契约;针对 GLib 底层 GWeakRef 泄漏问题的注释极其详尽,准确指明了 gdbusconnection.c 中的触发链路,极大提升了代码可维护性

  • 3.代码性能(高效)✓

将 6 个时间属性获取逻辑中各自独立发起的 statx 系统调用,重构为基于 mutable 成员变量的单次调用加缓存复用模式,将系统调用次数从最坏情况下的 6 次直降为 1 次,大幅减少了内核态与用户态的上下文切换开销;g_main_context_iteration 仅在存在默认上下文时同步消费空闲源,不引入额外的轮询或阻塞开销

  • 4.代码安全(存在0个安全漏洞)✓

漏洞对比统计:新增漏洞 0 个,减少漏洞 0 个,持平 0 个
成功修复了原有的空指针解引用崩溃风险与路径遍历越权漏洞,通过黑名单机制精准拦截了包含 /\... 的恶意文件名,同时使用 QByteArray 替代 QString 进行底层字节数组拼接,彻底规避了 QString 构造函数剥离 BOM 头(efbbbf)导致的安全校验绕过问题,整体防御机制完备

■ 【改进建议代码示例】

// 文件:denumerator.cpp buildUrl()
// 建议增加对空字节截断攻击的防御,使安全防护更加严密
QUrl DEnumeratorPrivate::buildUrl(const QUrl &url, const char *fileName)
{
    // 防御空指针,避免std::string或QByteArray构造时崩溃
    if (!fileName) {
        return QUrl();
    }

    QByteArray fileNameBa(fileName);
    
    // 防御空字节截断攻击,防止类似 "file.txt\0.exe" 的恶意绕过
    if (fileNameBa.contains('\0')) {
        return QUrl();
    }

    // 拦截路径遍历攻击,防止恶意文件名越权
    if (fileNameBa.contains('/') || fileNameBa.contains('\\') || fileNameBa == "." || fileNameBa == "..") {
        return QUrl();
    }

    QByteArray path;
    QString urlPath = url.path();

    if (urlPath == "/" || urlPath.isEmpty()) {
        path = QByteArray("/") + fileNameBa;
    } else {
        QByteArray dirPath = urlPath.toUtf8();
        if (!dirPath.endsWith('/')) {
            dirPath.append('/');
        }
        // 使用QByteArray进行底层字节数组拼接,避免QString剥离BOM (efbbbf)
        path = dirPath + fileNameBa;
    }

    // 保留原始 URL 的 scheme 和 host,而不是假定为本地文件
    QUrl newUrl(url);
    newUrl.setPath(QString::fromUtf8(path));
    return newUrl;
}

@Johnson-zs Johnson-zs merged commit 256a18d into linuxdeepin:semantic-search Jun 26, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants