Skip to content

precommit hooks#408

Open
ella-goodman wants to merge 7 commits into
mainfrom
pre-commit-hooks
Open

precommit hooks#408
ella-goodman wants to merge 7 commits into
mainfrom
pre-commit-hooks

Conversation

@ella-goodman
Copy link
Copy Markdown
Contributor

Summary

Add your summary here - keep it brief, to the point, and in plain English. For further
information about pull requests, check out the GDS
Way
.

Checklists

This pull/merge request meets the following requirements:

  • code runs
  • developments are ethical and secure
  • you have made proportionate checks that the code works correctly
  • test suite passes
  • developments adhere to analytical quality assurance plan (see docs/analytical_quality_assurance/analytical_quality_assurance_plan.md)
  • data log updated (see docs/analytical_quality_assurance/data_log.md), if necessary
  • assumptions, and caveats log updated (see docs/analytical_quality_assurance/assumptions_caveats.md), if
    necessary
  • minimum usable documentation written in the docs folder

Comments have been added below around the incomplete checks.

@ella-goodman ella-goodman requested a review from Jday7879 March 5, 2026 10:16
Copy link
Copy Markdown
Contributor

@Jday7879 Jday7879 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look good, you have two unit tests that are failing for me, can you double check these?
tests/integration_tests/test_integration_cluster_summaries.py
tests/post_processing/test_cluster_summaries.py

Copy link
Copy Markdown
Contributor

@Jday7879 Jday7879 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated install modules and fixed issue, ready to merge

# subgroup have an errors when creating the clustergram so skip this subcluster instead.
logger.info(f"Skipping cluster {subcluster} due to insufficient data points ({len(subcluster_df)}).")
# For example when running on number_of_times_k_means_initialised = 1000 the
# Oxford and Cambridge subgroup have an errors when creating the clustergram
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Have an errors" - one error or multiple?

Create and save a clustergram for evaluating k-means clustering solutions.

The clustergram visualizes clustering stability and helps identify the optimal
The clustergram visualizes clustering stability and helps identify the optimal
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Visualises

numbers.
Since k-means is sensitive to initialization, `n_init` determines the number of
times the algorithm runs with different centroid seeds. The final result is the
Since k-means is sensitive to initialization, `n_init` determines the number of
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialisation

# variance_value = variance_df.loc[cluster_number_int, v_code] # if running through main hash this!
variance_value = variance_df.loc[
cluster_number_str, v_code
] # if running through main un hash / if running through main hash this!
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confusing? I think having these statements completely separate is easier to follow

axes[i].set_title(domain)
axes[i].axvline(0, color='grey', linewidth=2, linestyle='--', zorder=2)

# Split y-axis labels at 28 characters
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this 28 or 31 characters?

As mentioned in the main README for this repo, disability data for Northern Ireland needs to
be downloaded manually from the Northern Ireland Statistics and Research Agency (NISRA)
website as it is not available in the bulk download. It should have been manually saved
into the 'data/inputs/ni_downloads folder. The file should be named 'census-2021-ms-d02.xlsx'.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'data/inputs/ni_downloads needs closing string quote mark (')?

#### 4.2.3 Two groups of radial plots:
One is based on the comparison of a cluster to the UK mean. The other group compare variables within a cluster to the mean of parent cluster. For example, a radial plot for cluster 5a (group level) represents a comparison to the mean of all clusters that make up supergroup 5.

#### 4.2.5 Bar charts
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double check this is meant to be 4.2.5 as 4.2.4 is missing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants