-
Notifications
You must be signed in to change notification settings - Fork 6
improved Amatrix/Hmatrix support #148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the BIGapp package to version 1.6.0 with improvements to VCF validation, matrix alignment, error handling, and cross-validation logic.
- Conditionally skips VCF sanity checks when using A-matrix (pedigree-based relationship matrix)
- Ensures proper sample alignment between G-matrix and Pedigree-matrix in H-matrix computation
- Fixes hardcoded fold value (5) to use dynamic
foldsparameter throughout cross-validation
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| R/mod_GSAcc.R | Adds conditional logic to skip VCF validation when A-matrix is selected instead of genotype-based matrices |
| R/GS_functions.R | Ensures sample order alignment in H-matrix computation, adds error handling for kin.blup, fixes hardcoded fold references, and improves progress bar updates |
| DESCRIPTION | Updates package version from 1.5.1 to 1.6.0 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Ensure G.mat and Ped.mat have the same sample order | ||
| common_samples <- intersect(rownames(G.mat), rownames(Ped.mat)) | ||
| G.mat <- G.mat[common_samples, common_samples] | ||
| Ped.mat <- Ped.mat[common_samples, common_samples] |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sample alignment logic is redundant. Both G.mat (line 237) and Ped.mat (line 233) are already filtered to the same valid_ids from the phenotype file just before this code. The intersect here will always return the same samples unless there's a bug in the earlier filtering. Consider adding validation that the samples already match instead of re-filtering, or document why this additional alignment is necessary.
| # Ensure G.mat and Ped.mat have the same sample order | |
| common_samples <- intersect(rownames(G.mat), rownames(Ped.mat)) | |
| G.mat <- G.mat[common_samples, common_samples] | |
| Ped.mat <- Ped.mat[common_samples, common_samples] | |
| # Validate that G.mat and Ped.mat have the same sample order | |
| if (!identical(rownames(G.mat), rownames(Ped.mat))) { | |
| stop("Sample alignment error: G.mat and Ped.mat rownames do not match after filtering to valid_ids.") | |
| } |
| if (is.null(traitpred)) { | ||
| cat("kin.blup failed for trait", traits[trait_idx], "- skipping this trait\n") | ||
| next |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using next to skip a failed trait will leave NA values in the results matrix for that trait in this fold, but the code continues to populate subsequent folds. This creates incomplete data structures where some rows have results and others don't, potentially causing issues in downstream analysis. Consider either: (1) stopping the entire analysis with an error, (2) filling with explicit NA or sentinel values, or (3) documenting this behavior clearly.
| # Add error handling for kin.blup | ||
| traitpred <- tryCatch({ | ||
| kin.blup(data = Pheno_test, | ||
| geno = colnames(pheno_dat)[1], # This assumes first column is sample ID |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states 'This assumes first column is sample ID' but this assumption is not validated. If the assumption is violated, the error handling will catch it but the error message won't indicate the root cause. Consider validating this assumption earlier in the function or making it an explicit parameter.
Improving the Amatrix and Hmatrix support for genomic prediction