feat: add GO enrichment analysis page for ProteomicsLFQ results by hjn0415a · Pull Request #8 · OpenMS/quantms-web

hjn0415a · 2026-02-04T05:50:12Z

This PR adds a new GO Enrichment Analysis page for ProteomicsLFQ results.
The page allows users to perform GO term enrichment (BP, CC, MF) based on protein-level differential abundance results.

Added a new Streamlit results page: results_proteomicslfq.py
Integrated GO enrichment analysis using MyGene.info for GO annotation
Foreground proteins are selected based on configurable p-value and |log2FC| thresholds
Enrichment is computed using Fisher’s exact test
Results are visualized as bar plots and tables, separated by GO category (BP / CC / MF)
Added mygene as a new dependency

Summary by CodeRabbit

Chores
- Updated file formatting to ensure proper line endings (no user-facing impact).

coderabbitai · 2026-02-04T05:55:23Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fa7af7c4-c7ca-48e9-bcb1-d9071d893697

📥 Commits

Reviewing files that changed from the base of the PR and between 827367e and 87f95fb.

📒 Files selected for processing (2)

content/results_proteomicslfq.py
requirements.txt

✅ Files skipped from review due to trivial changes (2)

requirements.txt
content/results_proteomicslfq.py

📝 Walkthrough

Walkthrough

This pull request adds trailing newlines to two files: content/results_proteomicslfq.py and requirements.txt. Both files previously lacked newline characters at their end, and this change ensures they terminate with proper newlines as per standard code formatting conventions.

Changes

Cohort / File(s)	Summary
File Formatting `content/results_proteomicslfq.py`	Added trailing newline at end of file to comply with standard formatting requirements.
Dependency Configuration `requirements.txt`	Added trailing newline after `statsmodels` entry to ensure file terminates with proper newline character.

Possibly related PRs

refactor: decouple GO enrichment logic and improve data flow safety #10: Introduces the content/results_proteomicslfq.py module that this PR modifies with formatting fixes.

Poem

🐰 A newline here, a newline there,
Tidy files floating through the air,
No trailing chaos left behind,
Just properly formatted code, so refined! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title claims to 'add GO enrichment analysis page' but the actual changes only fix newline formatting in two files without implementing any GO analysis functionality.	Align the title with the actual changes, such as 'fix: add trailing newlines to results_proteomicslfq.py and requirements.txt' or create a separate PR for the actual GO enrichment feature.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@content/results_proteomicslfq.py`:
- Around line 68-117: The foreground/background counts are using all proteins
(bg_ids/fg_ids) even if MyGene returned no annotation, so update
run_go_enrichment to first compute annotated_ids = set(res["query"].astype(str))
(or otherwise derive the set of IDs present in the filtered res) and then
replace bg_set and fg_set with their intersections with annotated_ids before
computing N_bg/N_fg and running the Fisher tests; keep building go2bg/go2fg from
res rows as-is so counts and p-values reflect only annotated proteins.

🧹 Nitpick comments (4)

requirements.txt (1)

152-152: Consider pinning mygene for deterministic builds.

requirements.txt is generated by pip-compile, but mygene is unpinned. Align it with the rest of the lockfile by re-running pip-compile or pinning a version to avoid non-reproducible installs.
content/results_proteomicslfq.py (3)
45-50: Wrap the GO enrichment UI in @st.fragment to avoid full reruns.

This keeps slider/button interactions from re-running the entire page. As per coding guidelines, **/*.py: Use @st.fragment decorator for interactive UI updates without full page reloads.
Suggested refactor (skeleton)
+@st.fragment
+def go_enrichment_panel(pivot_df):
     st.subheader("🧬 GO Enrichment Analysis")
     p_cutoff = st.slider(...)
     fc_cutoff = st.slider(...)
     if st.button("Run GO Enrichment"):
         ...
+
+go_enrichment_panel(pivot_df)
55-65: Avoid blind Exception catches to improve debuggability.

Catching broad exceptions hides unexpected failures. Consider narrowing to the likely exceptions (e.g., AttributeError/IndexError in parsing and request-related exceptions around the API call) or re-raise after logging.

Also applies to: 140-141

134-137: Use streamlit_plotly_events for interactive Plotly charts.

Right now the chart is displayed but you aren’t capturing interactions. Consider using plotly_events to support click/selection actions. As per coding guidelines, **/*.py: Use Plotly and streamlit_plotly_events for interactive visualizations.
Example integration
+from streamlit_plotly_events import plotly_events
 ...
-                                st.plotly_chart(fig, use_container_width=True)
+                                selected = plotly_events(fig, click_event=True, select_event=True)
+                                st.plotly_chart(fig, use_container_width=True)

coderabbitai · 2026-02-04T05:55:26Z

+                    bg_ids = analysis_df["UniProt"].dropna().unique().tolist()
+                    fg_ids = analysis_df[
+                        (analysis_df["p-value"] < p_cutoff) &
+                        (analysis_df["log2FC"].abs() >= fc_cutoff)
+                    ]["UniProt"].dropna().unique().tolist()
+
+                    if len(fg_ids) < 3:
+                        st.warning(f"Not enough significant proteins (p < {p_cutoff}, |log2FC| ≥ {fc_cutoff}). Found: {len(fg_ids)}")
+                    else:
+                        res_list = mg.querymany(bg_ids, scopes="uniprot", fields="go", as_dataframe=False)
+                        res = pd.DataFrame(res_list)
+                        if "notfound" in res.columns:
+                            res = res[res["notfound"] != True]
+
+                        def extract_go_terms(go_data, go_type):
+                            if not isinstance(go_data, dict) or go_type not in go_data:
+                                return []
+                            terms = go_data[go_type]
+                            if isinstance(terms, dict):
+                                terms = [terms]
+                            return list({t.get("term") for t in terms if "term" in t})
+
+                        for go_type in ["BP", "CC", "MF"]:
+                            res[f"{go_type}_terms"] = res["go"].apply(lambda x: extract_go_terms(x, go_type))
+
+                        fg_set = set(fg_ids)
+                        bg_set = set(bg_ids)
+
+                        def run_go_enrichment(go_type):
+                            go2fg = defaultdict(set)
+                            go2bg = defaultdict(set)
+                            for _, row in res.iterrows():
+                                uid = str(row["query"])
+                                for term in row[f"{go_type}_terms"]:
+                                    go2bg[term].add(uid)
+                                    if uid in fg_set:
+                                        go2fg[term].add(uid)
+
+                            records = []
+                            N_fg = len(fg_set)
+                            N_bg = len(bg_set)
+                            for term, fg_genes in go2fg.items():
+                                a = len(fg_genes)
+                                if a == 0:
+                                    continue
+                                b = N_fg - a
+                                c = len(go2bg[term]) - a
+                                d = N_bg - (a + b + c)
+                                _, p = fisher_exact([[a, b], [c, d]], alternative="greater")
+                                records.append({"GO_Term": term, "Count": a, "GeneRatio": f"{a}/{N_fg}", "p_value": p})


⚠️ Potential issue | 🟠 Major

Foreground/background counts include unannotated proteins, biasing Fisher p-values.

N_bg/N_fg are computed from all proteins, even those without GO annotations. This inflates the background and can understate enrichment. Restrict both sets to annotated proteins returned by MyGene before computing Fisher’s exact test.

Proposed fix

- bg_ids = analysis_df["UniProt"].dropna().unique().tolist() + bg_ids = analysis_df["UniProt"].dropna().unique().tolist() fg_ids = analysis_df[ (analysis_df["p-value"] < p_cutoff) & (analysis_df["log2FC"].abs() >= fc_cutoff) ]["UniProt"].dropna().unique().tolist() ... - fg_set = set(fg_ids) - bg_set = set(bg_ids) + annotated_ids = set(res["query"].astype(str)) + bg_set = annotated_ids + fg_set = annotated_ids.intersection(map(str, fg_ids))

🧰 Tools

🪛 Ruff (0.14.14)

[error] 80-80: Avoid inequality comparisons to True; use not res["notfound"]: for false checks

Replace with not res["notfound"]

(E712)

[warning] 91-91: Function definition does not bind loop variable go_type

(B023)

🤖 Prompt for AI Agents

In `@content/results_proteomicslfq.py` around lines 68 - 117, The foreground/background counts are using all proteins (bg_ids/fg_ids) even if MyGene returned no annotation, so update run_go_enrichment to first compute annotated_ids = set(res["query"].astype(str)) (or otherwise derive the set of IDs present in the filtered res) and then replace bg_set and fg_set with their intersections with annotated_ids before computing N_bg/N_fg and running the Fisher tests; keep building go2bg/go2fg from res rows as-is so counts and p-values reflect only annotated proteins.

@hjn0415a Could you check this?

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

t0mdavid-m · 2026-02-05T07:45:31Z

I just noticed that the abundance data is not actually calculated within the workflow. This could lead to issues in displaying abundance data if the user changes the labels after running the workflow. Could you please integrate it together with the GO-Term Annotation in the execution section of the workflow.

The results pages should then only display preprocessed results. The results should only be influenced by the configuration if the user reruns the workflow.

t0mdavid-m · 2026-02-05T07:46:24Z

Make sure you use the file manager to write the output to file.

feat: add GO enrichment analysis page for ProteomicsLFQ results

827367e

coderabbitai Bot reviewed Feb 4, 2026

View reviewed changes

Merge branch 'main' into feature/go-terms

87f95fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add GO enrichment analysis page for ProteomicsLFQ results#8

feat: add GO enrichment analysis page for ProteomicsLFQ results#8
hjn0415a wants to merge 2 commits into
OpenMS:mainfrom
hjn0415a:feature/go-terms

hjn0415a commented Feb 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Feb 4, 2026 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Feb 4, 2026 •

edited

Loading

Uh oh!

t0mdavid-m Feb 5, 2026

Uh oh!

coderabbitai Bot Feb 5, 2026

Uh oh!

t0mdavid-m commented Feb 5, 2026

Uh oh!

t0mdavid-m commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hjn0415a commented Feb 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

t0mdavid-m Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

t0mdavid-m commented Feb 5, 2026

Uh oh!

t0mdavid-m commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hjn0415a commented Feb 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Feb 4, 2026 •

edited

Loading

coderabbitai Bot Feb 4, 2026 •

edited

Loading