fix: Perform the last action with the column name modified. by tashiro-akira · Pull Request #9 · sapientml/preprocess

tashiro-akira · 2023-12-12T06:42:14Z

@AkiraUra @kimusaku
Fix target column names to revert after removing special characters.
Please check.

…ng types Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>

Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>

AkiraUra

I think the following process is easy to implement. What do you think?

rename_symbol_cols = {col: inhibited_symbol_pattern.sub("", col) in cols_has_symbols }
train_dataset = train_dataset.rename(columns=rename_symbol_cols)
...
rename_symbol_cols = {v: k for k, v in rename_symbol_cols.items()}
... # restore column names by ".rename(columns=rename_symbol_cols)"

An issue would be that renamed names are the same as the original ones or renamed names of two columns are the same.

Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>

tashiro-akira · 2023-12-22T01:18:53Z

An error occurred during the process of outputting a csv when the review results were applied.
csv file to return the name of the column in the data frame from which it was output.

AkiraUra

Could you consider my comment?

AkiraUra · 2024-01-05T07:39:57Z

sapientml_preprocess/templates/rename_columns.py.jinja

 {% if training %}
+rename_symbol_cols = {col: inhibited_symbol_pattern.sub("", col) if col in cols_has_symbols else col in cols_has_symbols for col in cols_has_symbols }
+rename_symbol_cols = {v: k for k, v in rename_symbol_cols.items()}
 train_dataset = train_dataset.rename(columns=lambda col: inhibited_symbol_pattern.sub("", col) if col in cols_has_symbols else col)
 {% endif %}
 {% if test %}


How about the following code?

rename_symbol_cols = {col: inhibited_symbol_pattern.sub("", col) if col in cols_has_symbols else col in cols_has_symbols for col in cols_has_symbols } {% if training %} train_dataset = train_dataset.rename(columns=rename_symbol_cols) {% endif %} {% if test %} test_dataset = test_dataset.rename(columns=rename_symbol_cols) {% endif %} rename_symbol_cols = {v: k for k, v in rename_symbol_cols.items()}

Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>

tashiro-akira · 2024-01-11T07:24:30Z

Reflected the content of the review.
Undid unnecessary modifications.

Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>

AkiraUra · 2024-04-08T09:08:38Z

sapientml_preprocess/generator.py

 logger = setup_logger()

-INHIBITED_SYMBOL_PATTERN = re.compile(r"[\{\}\[\]\",:<'\\]+")
+INHIBITED_SYMBOL_PATTERN = re.compile(r"[\{\}\[\]\",:<'\\\+]+")


Why did you add \+?

I was checking if "+" is treated as a symbol in the test string.
I put it back.

AkiraUra · 2024-04-08T09:18:40Z

sapientml_preprocess/generator.py

+            org_df_column = df.columns.values
+            org_target_column = task.target_columns
            df = df.rename(columns=lambda col: remove_symbols(col) if col in cols_has_symbols else col)
            task.target_columns = [
                remove_symbols(col) if col in cols_has_symbols else col for col in task.target_columns
            ]
+            same_column = {k: v for k, v in collections.Counter(list(df.columns.values)).items() if v > 1}
+            rename_dict = {}
+            if len(same_column) != 0:
+                for target in same_column.keys():
+                    rename_dict = {}
+                    rename_target_col = []
+                    df_cols = list(df.columns.values)
+                    i = 1
+                    for col in df_cols:
+                        if target in col:
+                            rename_dict[org_df_column[len(rename_dict)]] = str(col + str(i))
+                            i = i + 1
+                        else:
+                            rename_dict[org_df_column[len(rename_dict)]] = col
+                    df = df.set_axis(list(rename_dict.values()), axis=1)
+                    i = 1
+                    for col in org_target_column:
+                        rename_target_col.append(rename_dict[col])
+
+                    task.target_columns = rename_target_col
+


I think the code may rename column names which originally don't contain symbols, right? If so, could you keep the names when the column names contain symbols.

This code is hard to read and can have unexpected issues. Could you rewrite it?

The review has been applied.

AkiraUra · 2024-04-08T09:33:49Z

sapientml_preprocess/templates/rename_columns.py.jinja

+if len(rename_dict) == 0 :
+    rename_symbol_cols = {col: inhibited_symbol_pattern.sub("", col) if col in cols_has_symbols else col in cols_has_symbols for col in cols_has_symbols }
+else:
+    rename_symbol_cols = rename_dict


We don't need show this conditional branch to users. Could you move it to "template's if statement"?

Fixed conditional branching in template if statement.

Signed-off-by: tashiro-akira <fj0822cr@fujitsu.com>

…shiro

Signed-off-by: tashiro-akira <fj0822cr@fujitsu.com>

AkiraUra

Could you consider my comment?

AkiraUra · 2024-05-30T10:33:24Z

sapientml_preprocess/generator.py

+                        same_column[target] = same_column[target] - 1
+                    else:
+                        rename_dict[org_column] = target
+


The current method fails when the renamed names are the same as original names.
For example, there are original columns Age , Age{} and Age1.
In the case, Age -> Age1, Age{} -> Age0, so there are two Age1 columns.
Could you consider the case?

tashiro akira and others added 11 commits October 25, 2023 14:48

fix: Fix error caused when input data is mixed with datetime and stri…

55786bc

…ng types Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>

fix: Fix error caused when input data is mixed with datetime and stri…

5466bed

…ng types Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>

fix: Fix error caused when input data is mixed with datetime and stri…

f4807c2

…ng types Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>

fix:Modifying Source Code Formatting

99a9d7e

Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>

Merge branch 'sapientml:main' into main

e951d23

fix:Reflect Review

812da98

Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>

fix: Remove Unnecessary Imports

937880c

Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>

fix: Reflect the point

79e39e3

Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>

Merge branch 'sapientml:main' into main

22e0080

fix:Reflected review results

edd4d49

Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>

style:Modified to fit the format

e063795

Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>

tashiro-akira requested a review from a team as a code owner December 12, 2023 06:42

tashiro-akira requested review from AkiraUra and kimusaku and removed request for a team December 12, 2023 06:42

AkiraUra suggested changes Dec 14, 2023

View reviewed changes

fix:Fixed to return column names in csv file

22be366

Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>

AkiraUra suggested changes Jan 5, 2024

View reviewed changes

fix:Reflected the content of the review

3b62abb

Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>

tashiro-akira and others added 3 commits February 13, 2024 11:05

Merge branch 'sapientml:main' into main

ce996da

Merge branch 'sapientml:main' into main

9ea7ad3

fix:Reflected review results

6433d99

Signed-off-by: tashiro akira <fj1755jk@fujitsu.com>

tashiro-akira requested a review from AkiraUra April 4, 2024 02:59

AkiraUra suggested changes Apr 8, 2024

View reviewed changes

tashiro-akira and others added 4 commits April 22, 2024 16:49

Merge branch 'sapientml:main' into main

bed2b88

fix:Reflect Review Results

e1da54c

Signed-off-by: tashiro-akira <fj0822cr@fujitsu.com>

Merge branch 'main' of https://github.com/tashiro-akira/preprocess_ta…

895d5c4

…shiro

fix:Fixed error in running lint

bde5ad9

Signed-off-by: tashiro-akira <fj0822cr@fujitsu.com>

tashiro-akira added 2 commits April 25, 2024 11:06

fix:Reflect Review Results

8c15e21

Signed-off-by: tashiro-akira <fj0822cr@fujitsu.com>

fix:Reflect Review Results

1f62efd

Signed-off-by: tashiro-akira <fj0822cr@fujitsu.com>

tashiro-akira requested a review from AkiraUra April 26, 2024 07:26

AkiraUra suggested changes May 30, 2024

View reviewed changes

Conversation

tashiro-akira commented Dec 12, 2023

Uh oh!

AkiraUra left a comment

Choose a reason for hiding this comment

Uh oh!

tashiro-akira commented Dec 22, 2023

Uh oh!

AkiraUra left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tashiro-akira commented Jan 11, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AkiraUra left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants