Conversation
0321bca to
ff67e06
Compare
ff67e06 to
07a74c0
Compare
07a74c0 to
8ab5335
Compare
8ab5335 to
61f172f
Compare
EstebanMontandon
left a comment
There was a problem hiding this comment.
Nice work, this is finally taking shape :') congrats
| self.prefix_candidate_data: str = "candidate_" | ||
| self.prefix_reference_data: str = "reference_" | ||
|
|
||
| def _init_logger(self, logger: logging.Logger | None) -> None: |
There was a problem hiding this comment.
I'm starting to doubt a bit about the logging. The pyramid matcher returns a exhaustive result summary, probably we don't need to additionally provide a logging. We can always decide how to handle any situation based on the outputs. We could remove this, what do you think?
There was a problem hiding this comment.
Mmmmmmmmmm
I am not sure! I have two worries:
- (sometimes) the levels and (always) the attributes are detected by the algorithm. I think its nice if they get printed
- The function can take a long time (specially if we are matching a lot of level_5's) -- I think its nice to have an indication of what is happening.
| - `repeated_level_*` columns indicate whether the match for that level is repeated (i.e., if the same reference level is matched to multiple candidate levels). | ||
|
|
||
|
|
||
| The output `matched_data_simplified` will be: |
There was a problem hiding this comment.
This is very good, as a User I want to get quick simple mapping table based on matching columns that will allow me to perform my task.
Just to clarify my thoughts about the algorithm: as we use parameters names reference and candidate pyramids, how is the output table structured relative to the reference input?
As a user, I would expect:
- The output is essentially a copy of the reference pyramid
- Each (unique) organization unit from reference is paired with a candidate at the matching level
- Is expected that some rows would have None values in the "candidate_" prefixed columns (for unpaired units)
Is this correct?
There was a problem hiding this comment.
mmmmmmmmmm no
- The output is essentially a copy of the reference pyramid --> yes
- Each (unique) organization unit from reference is paired with a candidate at the matching level --> yes
- Is expected that some rows would have None values in the "candidate" prefixed columns (for unpaired units)_ --> no
The output does look like what you said -- a copy of the candidate pyramid with the relevant reference columns attached to it.
Nevertheless, there are no None's on the candidate columns... If a row is not matched, it is not in the output. (I think this makes sense -- if I have not matched a row, I do not want it...)
(You can see the not-matched things on the not_matched outputs)
What do you think?
lgarridobsq
left a comment
There was a problem hiding this comment.
Hola!
I changed a couple of things and answered your comments...
if you want, approve and we merge ⭐
| self.prefix_candidate_data: str = "candidate_" | ||
| self.prefix_reference_data: str = "reference_" | ||
|
|
||
| def _init_logger(self, logger: logging.Logger | None) -> None: |
There was a problem hiding this comment.
Mmmmmmmmmm
I am not sure! I have two worries:
- (sometimes) the levels and (always) the attributes are detected by the algorithm. I think its nice if they get printed
- The function can take a long time (specially if we are matching a lot of level_5's) -- I think its nice to have an indication of what is happening.
| - `repeated_level_*` columns indicate whether the match for that level is repeated (i.e., if the same reference level is matched to multiple candidate levels). | ||
|
|
||
|
|
||
| The output `matched_data_simplified` will be: |
There was a problem hiding this comment.
mmmmmmmmmm no
- The output is essentially a copy of the reference pyramid --> yes
- Each (unique) organization unit from reference is paired with a candidate at the matching level --> yes
- Is expected that some rows would have None values in the "candidate" prefixed columns (for unpaired units)_ --> no
The output does look like what you said -- a copy of the candidate pyramid with the relevant reference columns attached to it.
Nevertheless, there are no None's on the candidate columns... If a row is not matched, it is not in the output. (I think this makes sense -- if I have not matched a row, I do not want it...)
(You can see the not-matched things on the not_matched outputs)
What do you think?
ca1fc3b to
96ba4f7
Compare
@EstebanMontandon ,
Tell me what you htink!