Skip to content

Commit bec4078

Browse files
authored
Merge pull request #72 from fujiki-1emon/dev/japanese
dev japanese
2 parents 4642a80 + b6dc764 commit bec4078

2 files changed

Lines changed: 23 additions & 1 deletion

File tree

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
base_dir: 'PROJECT_DIR/datasets/'
2+
targets:
3+
- "oscar2323_0-100"
4+
output_dir: 'PROJECT_DIR/datasets/oscar2323_0-100_QFv2'
5+
6+
n_dist: 128
7+
n_output: 1
8+
is_cluster: True
9+
is_local: False
10+
11+
use_column: "content"
12+
min_doc_len: 50
13+
max_doc_len: 100000
14+
min_mean_word_len: 1
15+
max_mean_word_len: 10
16+
symbol_to_word_ratio: 0.1
17+
bullet_point_ratio: 0.9
18+
ellipsis_ratio: 0.3
19+
japanese_word_ratio: 0.8
20+
freq_char_cnt: 1
21+
separator_ratio: 0.1

requirements-ja.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,5 @@ bs4
55
html2text
66
python-stdnum
77
numpy
8-
SudachiPy==0.5.4
8+
SudachiPy==0.5.4
9+
SudachiDict-core

0 commit comments

Comments
 (0)