Skip to content

Commit 8ee3378

Browse files
authored
Update to ChEBI 2.0 & SDF data (#147)
* remove outdated JCI files * get molecule data from SDF file * add new tokens * add chembl dependency * update tests for SDF files * fix 3-STAR preprocessing * Revert "fix 3-STAR preprocessing" This reverts commit 9166d9e. * add new tokens from SDF * fix 3-star processing * add sanitize function
1 parent 5243e02 commit 8ee3378

9 files changed

Lines changed: 390 additions & 1469 deletions

File tree

chebai/preprocessing/bin/smiles_token/tokens.txt

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4375,3 +4375,148 @@ b
43754375
[OH2]
43764376
[TlH2+]
43774377
[SbH6+3]
4378+
[1*]
4379+
[2*]
4380+
[3*]
4381+
[4*]
4382+
[5*]
4383+
[6*]
4384+
[7*]
4385+
[8*]
4386+
[9*]
4387+
[3He+]
4388+
[12C+4]
4389+
[16O+6]
4390+
[11B-3]
4391+
[11B+3]
4392+
[31P+3]
4393+
[31P+5]
4394+
[34S+2]
4395+
[34S+4]
4396+
[34S+6]
4397+
[55Mn+2]
4398+
[55Mn+4]
4399+
[55Mn+7]
4400+
[57Fe+3]
4401+
[59Co+2]
4402+
[75As-3]
4403+
[98Mo+3]
4404+
[98Mo+6]
4405+
[Cl:1]
4406+
[c:2]
4407+
[n:3]
4408+
[c:4]
4409+
[c:5]
4410+
[H:24]
4411+
[c:6]
4412+
[H:25]
4413+
[c:7]
4414+
[H:26]
4415+
[c:8]
4416+
[H:27]
4417+
[c:9]
4418+
[c:10]
4419+
[H:28]
4420+
[c:11]
4421+
[C:12]
4422+
[O:13]
4423+
[c:14]
4424+
[c:15]
4425+
[H:31]
4426+
[c:16]
4427+
[H:32]
4428+
[c:17]
4429+
[H:33]
4430+
[c:18]
4431+
[c:19]
4432+
[H:34]
4433+
[c:20]
4434+
[H:35]
4435+
[c:21]
4436+
[H:36]
4437+
[n:22]
4438+
[c:23]
4439+
[H:29]
4440+
[H:30]
4441+
[C:1]
4442+
[C:2]
4443+
[O:3]
4444+
[O:4]
4445+
[H:41]
4446+
[H:42]
4447+
[H:43]
4448+
[H:44]
4449+
[C:11]
4450+
[O:12]
4451+
[C:14]
4452+
[C:15]
4453+
[O:16]
4454+
[N:17]
4455+
[C:18]
4456+
[C:19]
4457+
[H:50]
4458+
[C:20]
4459+
[H:51]
4460+
[H:52]
4461+
[N:21]
4462+
[c:25]
4463+
[c:26]
4464+
[H:53]
4465+
[c:27]
4466+
[F:37]
4467+
[c:28]
4468+
[N:31]
4469+
[C:32]
4470+
[H:56]
4471+
[H:57]
4472+
[C:33]
4473+
[H:58]
4474+
[H:59]
4475+
[O:34]
4476+
[C:35]
4477+
[H:60]
4478+
[H:61]
4479+
[C:36]
4480+
[H:62]
4481+
[H:63]
4482+
[c:29]
4483+
[H:54]
4484+
[c:30]
4485+
[H:55]
4486+
[C:22]
4487+
[O:23]
4488+
[O:24]
4489+
[H:48]
4490+
[H:49]
4491+
[H:47]
4492+
[H:45]
4493+
[H:46]
4494+
[H:38]
4495+
[H:39]
4496+
[H:40]
4497+
[NaH2-]
4498+
[KH2-]
4499+
[C-2]
4500+
[As+2]
4501+
[P+2]
4502+
[LiH2-]
4503+
[BH2-3]
4504+
[O+2]
4505+
[BeH2-]
4506+
[W@]
4507+
[W@@]
4508+
[RbH2-]
4509+
[FrH2-]
4510+
[AlH-2]
4511+
[CsH2-]
4512+
[B-2]
4513+
[V@]
4514+
[V@@]
4515+
[V@OH]
4516+
[*:0]
4517+
[1*:0]
4518+
[2*:0]
4519+
[3*:0]
4520+
[224RaH2]
4521+
[226RaH2]
4522+
[228RaH2]

chebai/preprocessing/collect_all.py

Lines changed: 0 additions & 226 deletions
This file was deleted.

0 commit comments

Comments
 (0)