Skip to content

Add runHMMER function for HMMER analysis#18

Open
AbhirupaGhosh wants to merge 4 commits intomainfrom
add-hmmer
Open

Add runHMMER function for HMMER analysis#18
AbhirupaGhosh wants to merge 4 commits intomainfrom
add-hmmer

Conversation

@AbhirupaGhosh
Copy link
Copy Markdown
Contributor

These functions will be added to data_processing.R once approved.

The scripts are modifications of @epbrenner 's hmmering and rhmmer.

Description

What kind of change(s) are included?

  • Feature (adds or updates new capabilities)
  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

  • I have read and followed the CONTRIBUTING.md guidelines.
  • I have searched for existing content to ensure this is not a duplicate.
  • I have performed a self-review of these additions (including spelling, grammar, and related).
  • I have added comments to my code to help provide understanding.
  • I have added a test which covers the code changes found within this PR.
  • I have deleted all non-relevant text in this pull request template.
  • Reviewer assignment: Tag a relevant team member to review and approve the changes.

These functions will be added to data_processing.R once approved. 

The scripts are modifications of @epbrenner 's hmmering and rhmmer.
AbhirupaGhosh and others added 3 commits March 30, 2026 22:16
Removed redundant line reading from file as it was already handled earlier in the code.
Copy link
Copy Markdown
Member

@jananiravi jananiravi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you moved a line -- unless I'm missing something, it's good to merge!

Copy link
Copy Markdown
Member

@jananiravi jananiravi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that I commented on one commit earlier -- sorry about that!

In principle, it looks good. I would like to request @eboyer221 or @epbrenner to run this locally and suggest non-alpine placeholders to ensure this works for all!

combined_drug_data <- unlist(batch_drug_data, use.names = FALSE)
if (length(combined_drug_data) == 0) { message("No drug data returned."); return(NULL) }
if (length(combined_drug_data) == 0) {
message("No drug data returned.")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found or returned?

combined_genome_data <- unlist(batch_genome_data, use.names = FALSE)
if (length(combined_genome_data) == 0) { message("No genome data returned."); return(NULL) }
if (length(combined_genome_data) == 0) {
message("No genome data returned.")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found/returned/retrieved? same Q as before.

chunk_size <- ceiling(length(records) / chunk_count)
chunks <- split(records, ceiling(seq_along(records) / chunk_size))

purrr::walk2(chunks, seq_along(chunks), function(chunk, i) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"exec",
"-B", paste0(mount_host, ":", mount_cont),
"-B", paste0(db_host_dir, ":", db_cont_dir),
"/scratch/alpine/aghosh5@xsede.org/software/hmmer_latest.sif",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ hardcoded path alert


message("Combined parquet written")

# arrow::read_parquet("/scratch/alpine/aghosh5@xsede.org/AMR/data/Campylobacter_jejuni/protein_COG_count.parquet") |> DBI::dbWriteTable(conn=con, name="protein_COG_count")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcoded path alert. cannot be part of the public amRdata repo.

cdhit_extra_args = c("-g", "1"),
cdhit_output_prefix = "cdhit_out",
# InterPro
ipr_appl = c("Pfam"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user can switch: Pfam vs. something else? @AbhirupaGhosh @epbrenner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants