Skip to content

Improve the code and add the tests#8

Open
lgarridobsq wants to merge 2 commits intomainfrom
DSDE-218_finalize_pyramid_matching
Open

Improve the code and add the tests#8
lgarridobsq wants to merge 2 commits intomainfrom
DSDE-218_finalize_pyramid_matching

Conversation

@lgarridobsq
Copy link
Copy Markdown
Collaborator

@lgarridobsq lgarridobsq commented Mar 26, 2026

@EstebanMontandon ,

  • I improved the code a bit
  • I added the tests
  • I tested in my computer and i could use the funcitons once I downloaded the package

Tell me what you htink!

This comment was marked as resolved.

@lgarridobsq lgarridobsq force-pushed the DSDE-218_finalize_pyramid_matching branch 2 times, most recently from 0321bca to ff67e06 Compare March 26, 2026 10:56
@lgarridobsq lgarridobsq requested a review from Copilot March 26, 2026 10:56

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as outdated.

@lgarridobsq lgarridobsq force-pushed the DSDE-218_finalize_pyramid_matching branch from 8ab5335 to 61f172f Compare March 26, 2026 13:06
Copy link
Copy Markdown
Collaborator

@EstebanMontandon EstebanMontandon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, this is finally taking shape :') congrats

self.prefix_candidate_data: str = "candidate_"
self.prefix_reference_data: str = "reference_"

def _init_logger(self, logger: logging.Logger | None) -> None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to doubt a bit about the logging. The pyramid matcher returns a exhaustive result summary, probably we don't need to additionally provide a logging. We can always decide how to handle any situation based on the outputs. We could remove this, what do you think?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmmmmmmmm
I am not sure! I have two worries:

  • (sometimes) the levels and (always) the attributes are detected by the algorithm. I think its nice if they get printed
  • The function can take a long time (specially if we are matching a lot of level_5's) -- I think its nice to have an indication of what is happening.

- `repeated_level_*` columns indicate whether the match for that level is repeated (i.e., if the same reference level is matched to multiple candidate levels).


The output `matched_data_simplified` will be:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very good, as a User I want to get quick simple mapping table based on matching columns that will allow me to perform my task.

Just to clarify my thoughts about the algorithm: as we use parameters names reference and candidate pyramids, how is the output table structured relative to the reference input?

As a user, I would expect:

  • The output is essentially a copy of the reference pyramid
  • Each (unique) organization unit from reference is paired with a candidate at the matching level
  • Is expected that some rows would have None values in the "candidate_" prefixed columns (for unpaired units)

Is this correct?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmmmmmmmmm no

  • The output is essentially a copy of the reference pyramid --> yes
  • Each (unique) organization unit from reference is paired with a candidate at the matching level --> yes
  • Is expected that some rows would have None values in the "candidate" prefixed columns (for unpaired units)_ --> no
Image

The output does look like what you said -- a copy of the candidate pyramid with the relevant reference columns attached to it.

Nevertheless, there are no None's on the candidate columns... If a row is not matched, it is not in the output. (I think this makes sense -- if I have not matched a row, I do not want it...)

(You can see the not-matched things on the not_matched outputs)

What do you think?

Copy link
Copy Markdown
Collaborator Author

@lgarridobsq lgarridobsq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hola!

I changed a couple of things and answered your comments...

if you want, approve and we merge ⭐

self.prefix_candidate_data: str = "candidate_"
self.prefix_reference_data: str = "reference_"

def _init_logger(self, logger: logging.Logger | None) -> None:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmmmmmmmm
I am not sure! I have two worries:

  • (sometimes) the levels and (always) the attributes are detected by the algorithm. I think its nice if they get printed
  • The function can take a long time (specially if we are matching a lot of level_5's) -- I think its nice to have an indication of what is happening.

- `repeated_level_*` columns indicate whether the match for that level is repeated (i.e., if the same reference level is matched to multiple candidate levels).


The output `matched_data_simplified` will be:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmmmmmmmmm no

  • The output is essentially a copy of the reference pyramid --> yes
  • Each (unique) organization unit from reference is paired with a candidate at the matching level --> yes
  • Is expected that some rows would have None values in the "candidate" prefixed columns (for unpaired units)_ --> no
Image

The output does look like what you said -- a copy of the candidate pyramid with the relevant reference columns attached to it.

Nevertheless, there are no None's on the candidate columns... If a row is not matched, it is not in the output. (I think this makes sense -- if I have not matched a row, I do not want it...)

(You can see the not-matched things on the not_matched outputs)

What do you think?

@lgarridobsq lgarridobsq force-pushed the DSDE-218_finalize_pyramid_matching branch from ca1fc3b to 96ba4f7 Compare March 31, 2026 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants