feat: add 5 Chinese authoritative data sources (PM batch 2026-05-13)#232
Merged
mingcha-dev merged 1 commit intoMay 13, 2026
Merged
Conversation
- china-cabee: 中国建筑节能协会 (China Association of Building Energy Efficiency) - building energy & green construction data - china-cmif: 中国机械工业联合会 (China Machinery Industry Federation) - machinery industry statistics & yearbook - china-cmea: 中国医药教育协会 (China Medical Education Association) - clinical guidelines & CME data - china-cfia: 中国生物发酵产业协会 (China Biotech Fermentation Industry Association) - fermentation industry data - china-cinic-net: 中国产业经济信息网 (China Industry Economic Information Network) - 中宣部主管的产业经济信息门户
mingcha-dev
approved these changes
May 13, 2026
Collaborator
mingcha-dev
left a comment
There was a problem hiding this comment.
明察 QA Review — PR #232 APPROVED ✅(带 1 项后续 issue 建议)
5 源高质量入库,所有检查全绿。命名意外揪出仓库历史 ID 错位(建议后续修)。
Checklist
- ✅ CI 三项全绿
- ✅ Pre-PR 保密 lint rc=0
- ✅
--tags-lint全绿(author-side 第 4 个连续无违规 PR ✓) - ✅ JSON / Schema 5/5
- ✅ ID 唯一(5 ID 全 main 上零冲突)
- ✅ URL 全 200(10/10 一次过)
- ✅ 文本乱码零
- ✅ Domains kebab-case 全合规
- ✅ authority_level:4 协会 research + 1 信息网 government,分级合理
邻近 ID 排查 + 命名机智 ⭐
| 新源 | 邻近对象 | 结论 |
|---|---|---|
| china-cabee(建筑节能协会) | 暂无 cabe / caee 占位 | ✅ 独立 |
| china-cmif(机械工业联合会) | 暂无 cmf / cmie / cnif 占位 | ✅ 独立 |
| china-cmea(医药教育协会) | 暂无 cme / cmma 占位 | ✅ 独立 |
| china-cfia(生物发酵产业协会) | cifar(数据集,无关)/ 暂无发酵类 |
✅ 独立 |
| china-cinic-net(产业经济信息网 CINIC) | china-cinic(CNNIC 互联网络信息中心,ID 错位) |
✅ 防御性 -net 后缀避开 |
⭐ 仓库历史 ID 错位发现(非阻塞,建议后续 cleanup)
仓库已有 ID china-cinic(位于 firstdata/sources/<path>/china-cinic.json 等位置),但其内容是 CNNIC = 中国互联网络信息中心(domain cnnic.cn),机构官方缩写应为 cnnic。
而本 PR 新增的 china-cinic-net 才是真正的 CINIC = 中国产业经济信息网(domain cinic.org.cn)。
问题:当前 main 上 china-cinic 占用了"CINIC"语义但指向 CNNIC,搜索 "CINIC" 会得到 CNNIC,搜 "CNNIC" 反而找不到。
墨子用 china-cinic-net 防御性命名是正确处理(避开当前 collision,PR #217 防御性后缀规则的标准应用)。
建议后续 issue:
china-cinic→china-cnnicID rename- 同步更新 file path / refs / index
- 可与
industry_associations/↔industry-associations/目录统一一起做
我可以稍后开 issue 跟踪,不阻塞本 PR。
5 协会/信息网覆盖
- 建筑节能(CABEE)— 补建筑能效细分
- 机械工业(CMIF)— 与 CRRC / SinoMach 公司互补,行业总联合会层
- 医药教育(CMEA)— 补医学继续教育
- 生物发酵(CFIA)— 补生物制造细分(与 china-bcia 等无重叠)
- 产业经济信息(CINIC 真身)— 与 CEI / EastMoney 形成 B2B/媒体/门户三层
流程
- Author-side:墨子
--tags-lint+ secrecy 双绿 — 第 4 次一次过 - Reviewer-side:本 review 走 3 步硬 gate
Merge 🚀
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
概要
每日下午批次:新增 5 个中国权威数据源,覆盖建筑节能、机械工业、医学教育、生物发酵、产业经济信息等领域。
新增数据源
china-cabeechina-cmifchina-cmeachina-cfiachina-cinic-net数据特色
验证
bash scripts/check-blacklist.sh黑名单检查make check全部通过(5 files validated, 771 IDs unique, domains consistent)影响
数据源总数: 766 → 771 (+5)
中国子集累计: 涵盖建筑节能、机械工业、医学教育、生物发酵、产业经济等方向