Skip to content

Conversation

@alex-sandercock
Copy link
Collaborator

Improving the Amatrix and Hmatrix support for genomic prediction

@alex-sandercock alex-sandercock added the work in progress More commits are coming label Oct 2, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the BIGapp package to version 1.6.0 with improvements to VCF validation, matrix alignment, error handling, and cross-validation logic.

  • Conditionally skips VCF sanity checks when using A-matrix (pedigree-based relationship matrix)
  • Ensures proper sample alignment between G-matrix and Pedigree-matrix in H-matrix computation
  • Fixes hardcoded fold value (5) to use dynamic folds parameter throughout cross-validation

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
R/mod_GSAcc.R Adds conditional logic to skip VCF validation when A-matrix is selected instead of genotype-based matrices
R/GS_functions.R Ensures sample order alignment in H-matrix computation, adds error handling for kin.blup, fixes hardcoded fold references, and improves progress bar updates
DESCRIPTION Updates package version from 1.5.1 to 1.6.0

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +244 to +247
# Ensure G.mat and Ped.mat have the same sample order
common_samples <- intersect(rownames(G.mat), rownames(Ped.mat))
G.mat <- G.mat[common_samples, common_samples]
Ped.mat <- Ped.mat[common_samples, common_samples]
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sample alignment logic is redundant. Both G.mat (line 237) and Ped.mat (line 233) are already filtered to the same valid_ids from the phenotype file just before this code. The intersect here will always return the same samples unless there's a bug in the earlier filtering. Consider adding validation that the samples already match instead of re-filtering, or document why this additional alignment is necessary.

Suggested change
# Ensure G.mat and Ped.mat have the same sample order
common_samples <- intersect(rownames(G.mat), rownames(Ped.mat))
G.mat <- G.mat[common_samples, common_samples]
Ped.mat <- Ped.mat[common_samples, common_samples]
# Validate that G.mat and Ped.mat have the same sample order
if (!identical(rownames(G.mat), rownames(Ped.mat))) {
stop("Sample alignment error: G.mat and Ped.mat rownames do not match after filtering to valid_ids.")
}

Copilot uses AI. Check for mistakes.
Comment on lines +338 to +340
if (is.null(traitpred)) {
cat("kin.blup failed for trait", traits[trait_idx], "- skipping this trait\n")
next
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using next to skip a failed trait will leave NA values in the results matrix for that trait in this fold, but the code continues to populate subsequent folds. This creates incomplete data structures where some rows have results and others don't, potentially causing issues in downstream analysis. Consider either: (1) stopping the entire analysis with an error, (2) filling with explicit NA or sentinel values, or (3) documenting this behavior clearly.

Copilot uses AI. Check for mistakes.
# Add error handling for kin.blup
traitpred <- tryCatch({
kin.blup(data = Pheno_test,
geno = colnames(pheno_dat)[1], # This assumes first column is sample ID
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states 'This assumes first column is sample ID' but this assumption is not validated. If the assumption is violated, the error handling will catch it but the error message won't indicate the root cause. Consider validating this assumption earlier in the function or making it an explicit parameter.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

work in progress More commits are coming

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants