fix: Perform the last action with the column name modified.#9
fix: Perform the last action with the column name modified.#9tashiro-akira wants to merge 22 commits intosapientml:mainfrom
Conversation
…ng types Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>
…ng types Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>
…ng types Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>
Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>
Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>
Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>
Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>
Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>
Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>
AkiraUra
left a comment
There was a problem hiding this comment.
I think the following process is easy to implement. What do you think?
rename_symbol_cols = {col: inhibited_symbol_pattern.sub("", col) in cols_has_symbols }
train_dataset = train_dataset.rename(columns=rename_symbol_cols)
...
rename_symbol_cols = {v: k for k, v in rename_symbol_cols.items()}
... # restore column names by ".rename(columns=rename_symbol_cols)"An issue would be that renamed names are the same as the original ones or renamed names of two columns are the same.
Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>
|
An error occurred during the process of outputting a csv when the review results were applied. |
AkiraUra
left a comment
There was a problem hiding this comment.
Could you consider my comment?
| {% if training %} | ||
| rename_symbol_cols = {col: inhibited_symbol_pattern.sub("", col) if col in cols_has_symbols else col in cols_has_symbols for col in cols_has_symbols } | ||
| rename_symbol_cols = {v: k for k, v in rename_symbol_cols.items()} | ||
| train_dataset = train_dataset.rename(columns=lambda col: inhibited_symbol_pattern.sub("", col) if col in cols_has_symbols else col) | ||
| {% endif %} | ||
| {% if test %} |
There was a problem hiding this comment.
How about the following code?
rename_symbol_cols = {col: inhibited_symbol_pattern.sub("", col) if col in cols_has_symbols else col in cols_has_symbols for col in cols_has_symbols }
{% if training %}
train_dataset = train_dataset.rename(columns=rename_symbol_cols)
{% endif %}
{% if test %}
test_dataset = test_dataset.rename(columns=rename_symbol_cols)
{% endif %}
rename_symbol_cols = {v: k for k, v in rename_symbol_cols.items()}
Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>
|
Reflected the content of the review. |
Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>
sapientml_preprocess/generator.py
Outdated
| logger = setup_logger() | ||
|
|
||
| INHIBITED_SYMBOL_PATTERN = re.compile(r"[\{\}\[\]\",:<'\\]+") | ||
| INHIBITED_SYMBOL_PATTERN = re.compile(r"[\{\}\[\]\",:<'\\\+]+") |
There was a problem hiding this comment.
I was checking if "+" is treated as a symbol in the test string.
I put it back.
| org_df_column = df.columns.values | ||
| org_target_column = task.target_columns | ||
| df = df.rename(columns=lambda col: remove_symbols(col) if col in cols_has_symbols else col) | ||
| task.target_columns = [ | ||
| remove_symbols(col) if col in cols_has_symbols else col for col in task.target_columns | ||
| ] | ||
| same_column = {k: v for k, v in collections.Counter(list(df.columns.values)).items() if v > 1} | ||
| rename_dict = {} | ||
| if len(same_column) != 0: | ||
| for target in same_column.keys(): | ||
| rename_dict = {} | ||
| rename_target_col = [] | ||
| df_cols = list(df.columns.values) | ||
| i = 1 | ||
| for col in df_cols: | ||
| if target in col: | ||
| rename_dict[org_df_column[len(rename_dict)]] = str(col + str(i)) | ||
| i = i + 1 | ||
| else: | ||
| rename_dict[org_df_column[len(rename_dict)]] = col | ||
| df = df.set_axis(list(rename_dict.values()), axis=1) | ||
| i = 1 | ||
| for col in org_target_column: | ||
| rename_target_col.append(rename_dict[col]) | ||
|
|
||
| task.target_columns = rename_target_col | ||
|
|
There was a problem hiding this comment.
- I think the code may rename column names which originally don't contain symbols, right? If so, could you keep the names when the column names contain symbols.
- This code is hard to read and can have unexpected issues. Could you rewrite it?
There was a problem hiding this comment.
The review has been applied.
| if len(rename_dict) == 0 : | ||
| rename_symbol_cols = {col: inhibited_symbol_pattern.sub("", col) if col in cols_has_symbols else col in cols_has_symbols for col in cols_has_symbols } | ||
| else: | ||
| rename_symbol_cols = rename_dict |
There was a problem hiding this comment.
We don't need show this conditional branch to users. Could you move it to "template's if statement"?
There was a problem hiding this comment.
Fixed conditional branching in template if statement.
Signed-off-by: tashiro-akira <fj0822cr@fujitsu.com>
Signed-off-by: tashiro-akira <fj0822cr@fujitsu.com>
Signed-off-by: tashiro-akira <fj0822cr@fujitsu.com>
Signed-off-by: tashiro-akira <fj0822cr@fujitsu.com>
AkiraUra
left a comment
There was a problem hiding this comment.
Could you consider my comment?
| same_column[target] = same_column[target] - 1 | ||
| else: | ||
| rename_dict[org_column] = target | ||
|
|
There was a problem hiding this comment.
The current method fails when the renamed names are the same as original names.
For example, there are original columns Age , Age{} and Age1.
In the case, Age -> Age1, Age{} -> Age0, so there are two Age1 columns.
Could you consider the case?
@AkiraUra @kimusaku
Fix target column names to revert after removing special characters.
Please check.