precommit hooks#408
Conversation
Jday7879
left a comment
There was a problem hiding this comment.
Code changes look good, you have two unit tests that are failing for me, can you double check these?
tests/integration_tests/test_integration_cluster_summaries.py
tests/post_processing/test_cluster_summaries.py
Jday7879
left a comment
There was a problem hiding this comment.
Updated install modules and fixed issue, ready to merge
| # subgroup have an errors when creating the clustergram so skip this subcluster instead. | ||
| logger.info(f"Skipping cluster {subcluster} due to insufficient data points ({len(subcluster_df)}).") | ||
| # For example when running on number_of_times_k_means_initialised = 1000 the | ||
| # Oxford and Cambridge subgroup have an errors when creating the clustergram |
There was a problem hiding this comment.
"Have an errors" - one error or multiple?
| Create and save a clustergram for evaluating k-means clustering solutions. | ||
|
|
||
| The clustergram visualizes clustering stability and helps identify the optimal | ||
| The clustergram visualizes clustering stability and helps identify the optimal |
| numbers. | ||
| Since k-means is sensitive to initialization, `n_init` determines the number of | ||
| times the algorithm runs with different centroid seeds. The final result is the | ||
| Since k-means is sensitive to initialization, `n_init` determines the number of |
| # variance_value = variance_df.loc[cluster_number_int, v_code] # if running through main hash this! | ||
| variance_value = variance_df.loc[ | ||
| cluster_number_str, v_code | ||
| ] # if running through main un hash / if running through main hash this! |
There was a problem hiding this comment.
A bit confusing? I think having these statements completely separate is easier to follow
| axes[i].set_title(domain) | ||
| axes[i].axvline(0, color='grey', linewidth=2, linestyle='--', zorder=2) | ||
|
|
||
| # Split y-axis labels at 28 characters |
| As mentioned in the main README for this repo, disability data for Northern Ireland needs to | ||
| be downloaded manually from the Northern Ireland Statistics and Research Agency (NISRA) | ||
| website as it is not available in the bulk download. It should have been manually saved | ||
| into the 'data/inputs/ni_downloads folder. The file should be named 'census-2021-ms-d02.xlsx'. |
There was a problem hiding this comment.
'data/inputs/ni_downloads needs closing string quote mark (')?
| #### 4.2.3 Two groups of radial plots: | ||
| One is based on the comparison of a cluster to the UK mean. The other group compare variables within a cluster to the mean of parent cluster. For example, a radial plot for cluster 5a (group level) represents a comparison to the mean of all clusters that make up supergroup 5. | ||
|
|
||
| #### 4.2.5 Bar charts |
There was a problem hiding this comment.
Double check this is meant to be 4.2.5 as 4.2.4 is missing
Summary
Add your summary here - keep it brief, to the point, and in plain English. For further
information about pull requests, check out the GDS
Way.
Checklists
This pull/merge request meets the following requirements:
docs/analytical_quality_assurance/analytical_quality_assurance_plan.md)docs/analytical_quality_assurance/data_log.md), if necessarydocs/analytical_quality_assurance/assumptions_caveats.md), ifnecessary
docsfolderComments have been added below around the incomplete checks.