تعداد نشریات | 295 |

تعداد شمارهها | 3,657 |

تعداد مقالات | 28,361 |

تعداد مشاهده مقاله | 20,704,783 |

تعداد دریافت فایل اصل مقاله | 12,583,344 |

## Accuracy of Genomic Prediction under Different Genetic Architectures and Estimation Methods | ||

Iranian Journal of Applied Animal Science | ||

مقاله 6، دوره 8، شماره 1، خرداد 2018، صفحه 43-52
اصل مقاله (664 K)
| ||

نوع مقاله: Research Articles | ||

نویسندگان | ||

A. Atefi؛ A.A. Shadparvar ^{} ؛ N. Ghavi Hossein-Zadeh
| ||

^{}Department of Animal Science, Faculty of Agricultural Science, University of Guilan, Rasht, Iran | ||

چکیده | ||

The accuracy of genomic breeding value prediction was investigated in various levels of reference population size, trait heritability and the number of quantitative trait locus (QTL). Five Bayesian methods, including Bayesian Ridge regression, BayesA, BayesB, BayesC and Bayesian LASSO, were used to estimate the marker effects for each of 27 scenarios resulted from combining three levels for heritability (0.1, 0.3 and 0.5), training population size (600, 1000 and 1600) and QTL numbers (50, 100 and 150). A finite locus model was used to simulate stochastically a historical population consisting 100 animals at first 100 generations. Through next 100 generations, the population size gradually increased to 1000 individuals. Then the animals in generations 201 and 202 having both known genotypic and phenotypic records were assigned as reference population, and individuals at generations 203 and 204 were considered as validation population. The genome comprised five chromosomes of 100 cM length and 500 single nucleotide polymorphism markers for each chromosome that distributed through the genome randomly. The QTLs and markers were bi-allelic. In this study, the heritability had great significant positive effect on the accuracy (P<0.001). By increasing the size of the reference population, the average genomic accuracy increased from 0.64±0.03 to 0.70 ± 0.04 (P<0.001). The accuracy responded to increasing number of QTLs non-linearly. The highest and lowest accuracies of Bayesian methods were 0.40 ± 0.04 and 0.84 ± 0.05, respectively. The results showed having the greatest amount of information (i.e. highest heritability, highest contribution of gene action in phenotypic variation and large reference population size), the highest accuracy (0.84) was obtained, with all investigated methods of estimation. | ||

کلیدواژهها | ||

accuracy؛ Bayesian؛ genetic architecture؛ genomic؛ heritability؛ QTL | ||

اصل مقاله | ||

The estimation of breeding values in order to select the best animals as parents of the next generation is the main goal of animal breeding programs. Traditional methods of genetic evaluation were performed using a combination of phenotypic and pedigree information to produce estimated breeding values (EBV) (Dekkers, 2012). The rapid progress and reducing costs of genotyping of whole genome have led to a great interest in using molecular markers information to identify individuals of high genetic merit (Daetwyler
Various scenarios were defined according to all combinations of three different levels of heritability, training population size and QTL numbers. For each scenario five Bayesian methods of estimation were compared in terms of prediction accuracy, the correlation between the predicted genomic breeding values and the true values. Parameter estimate was performed via Gibbs Sampler algorithm implemented in the BGLR package of R software (Perez and De los Campos, 2014). A historical population of 100 effective numbers with equal sex ratio was simulated using QMSim software, assuming the heritability values of 0.1, 0.3 or 0.5. During the first 100 historical generations, mating was performed by drawing the parents of an animal randomly from the animals of the previous generation. Then, in order to arrive at a mutation-drift balance, 100 more generations were simulated while increasing the population size to 1000 individuals gradually. After the last historical generation, the recent population was constructed by random selection of 300, 500 or 800 individuals and four successive generations were generated by random mating. The animals in generations 201 and 202 with known genotypes and records for the trait constructed the training population. The animals of generations 203 and 204 formed the validation population, which assumed having no phenotypic records. The genome is comprised of five chromosomes of 100 cM, on which 500 marker loci and QTL loci were randomly distributed. All marker and QTL loci were bi-allelic. The number of segregating QTL affecting the trait was set at 50, 100 or 150. The Marker and QTL allele frequencies were assumed to be equal in the 200
To achieve accurate genomic prediction, sufficient level of linkage disequilibrium (LD) is imperative. The extent of LD in the training populations was measured by r r Where: freq (A1): frequency of A1 allele, and likewise for the other alleles in the population. D: another statistic of linkage disequilibrium that was calculated as: D= freq(A1-B1) PLINK software and Synbreed and GGPLOT2 packages were used to calculate and display the LD properties.
Following linear model was used to estimate the marker effects: Y= µ + Xβ + ε [2] Where: Y: phenotypic value. μ: population mean. X: marker design matrix. β: vector of marker effects ε: error term that is assumed to be normally distributed with mean and variance equal to 0 and σ
The estimator of βis: ( Where: λ: regularization parameter.
The elements of the X for each individual depended on the number of alleles present in its genotype. For example, per i
Ridge regression best linear unbiased predictor (RR-BLUP) assumes all markers have a common variance (Meuwissen Where:
θ,_{β}σ): prior density of the j^{2}^{th} marker effect.
ω): prior density assigned to θ._{βj}
Meuwissen ν; S^{2}) with degrees of freedom ν and scale parameter S^{2} as the prior distribution. BayesB assumes a normal prior distribution on the markers effects with zero mean and variance σ_{j}^{2}. Then, a mixture of distributions is assumed on this variance being equal to zero with probability π and distributed as in BayesA with probability 1 - π. BayesC was proposed to compensate some of the deficiency of BayesB, as the estimation of the probability π or the distribution of mixtures, which in BayesC is applied on the SNPs effects instead of the variances. In a comparison using simulated data, Bayes BLUP, BayesA, BayesB and BayesC had the same predictive ability with correlation over 0.85 (Verbyla et al. 2010). Park and Casella (2008) introduced the Bayesian LASSO method for estimating the regression coefficients. De los Campos et al. (2009) used the Bayesian LASSO in GS. The LASSO estimates can be viewed as the posterior mode in a Bayesian model considering a double-exponential prior for the regression coefficient estimates.The summary of investigated scenarios (Each scenario was repeated for 10 times) and statistical methods is presented in Table 1.
The correlation coefficient between the true breeding values (BV) and the genomic predicted BV (r
N +_{QTL} N_{IND} + interaction effects + ε [3]Where:
μ: overall mean,
ε: random error.
The statistical analyze of all main and interaction effects were conducted using the GLM procedure of SAS software (SAS, 2003). The expected accuracy of genome-wide selection has been anticipated as a function of the training population size ( r M)]_{e}^{1/2} [2]Where:
et al. 2008). The Mis a function of the breeding history of the population and of the length of the genome. The objective of this research was to investigate the accuracy of GEBV under various underlying genetic architecture using some different Bayesian methods._{e}
The mean values of ^{2}. The largest gap between SNPs (12.18 cM) was located on chromosome 4. The highest and lowest number of SNPs and therefore the highest and lowest mean of rwere located on chromosome 1 and 4 respectively (Figures 1a and 1b). The sufficient average LD over the entire genome is necessary for accurate estimations in genomic selection and whole-genome association studies. Calus ^{2 }et al. (2007) demonstrated that if the mean r^{2} between adjacent SNPs was > 0.2, accurate genomic breeding values could be obtained. In Holstein-Friesian cattle, r^{2} of 0.2 occurs at approximately 100 kb, suggesting that 30000 markers should be sufficient to apply genomic selection. The extent of genome-wide LD considerably depends on the past effective population size. In a simulation study, Meuwissen et al. (2001) demonstrated that, to get very accurate genomic estimated breeding values, 10NeL markers are required, where L is the length of the genome in Morgan and Ne is the effective population size. In Holstein-Friesian cattle, Ne is approximately 100, and the length of the genome is 30 Morgans, again suggesting that 30000 markers are required. In species with large effective population sizes, dense marker panels will be required. Provided the number of markers are enough (i.e. LD=0.2 that was obtained in the current study), the accuracy of GEBV will depend on the number of individuals genotyped and phenotyped in the reference population, the heritability of the trait, and the number of loci affecting the trait (Daetwyler et al. 2008; Goddard, 2009).
Table 3 shows the result of analysis variance for accuracy and implies that the effect of all main factors, including method, heritability, number of QTL, number of individuals in each generation of training population and all interaction effects, except Method × N,were significant (P<0.05)._{QTL} × h^{2}
b) visualization of pairwise LD estimates versus marker distance
According to the F values in Table 3, the descending order of the main factors in terms of importance was heritability, reference population size, number of QTL and the estimating method. Among the interaction effects, the effects containing the reference population size had higher importance.
Figure 2 presents the plots of correlations (R) between true breeding value and GEBV obtained for the validation population, for the different heritability (plot N the accuracy from 150 QTL was the lowest and from 50 QTL was the highest one. Efficiency of increasing the number of animals was higher with 50 QTL than with 150 QTL. By 800 N_{IND}_{IND} a different situation was observed and the accuracy from 100 QTL was the lowest one.
Investigation of the accuracy of genomic prediction of the standard marker effects using method BayesB showed that in the case of N
a) different number of individuals per generation and heritabilities b) different number of individuals per generationand number of QTL c) different number of individuals per generationand Bayesian methods d) Bayesian methods and number of QTL
However, the trend was a little different among methods and this declared the existence of interaction between training population size and estimation method. In the study by Clark
Although having sufficient LD is essential for high accuracies but in the next step, other factors relating to population structure and genetic architecture of trait are important. The results of this study declared that among well-known Bayesian methods for genomic prediction, in most scenarios, well known methods introduced by Meuwissen (BayesB and BayesA) had the highest accuracies. Therefore among the Bayesian methods, we can propose these methods specially BayesB for marker effects estimation because of it’s more realistic prior density assigned to marker effects. The economically important traits that involved in the breeding programs, vary in their heritability and number of QTLs. In traditional and genomic methods, the accuracy of traits with high heritability is higher than traits with low heritability due to low contribution of genes effects in phenotypic variation. Increasing the number of response variable (training population size) led to high accuracies because with more records, estimated marker effects were more accurate and using these effects in testing population give the accurate GEBVs. | ||

مراجع | ||

Bastiaansen J.W., Bink M.C., Coster A., Maliepaard C. and Calus M.P. (2010). Comparison of analyses of the QTLMAS XIII common dataset. I: genomic selection. Calus M. and Veerkamp R. (2007). Accuracy of breeding values when using and ignoring the polygenic effect in genomic breeding value estimation with a marker density of one SNP per cM. Clark S.A., Hickey J.M. and Van der Werf J.H. (2011). Different models of genetic variation and their effect on genomic evaluation. Coster A., Bastiaansen J.W., Calus M.P., van Arendonk J.A. and Bovenhuis H. (2010). Sensitivity of methods for estimating breeding values using genetic markers to the number of QTL and distribution of QTL variance. Daetwyler H.D., Pong-Wong R., Villanueva B. and Woolliams J.A. (2010). The impact of genetic architecture on genome-wide evaluation methods. Daetwyler H.D., Villanueva B., Bijma P. and Woolliams J.A. (2007). Inbreeding in genome wide selection. Daetwyler H.D., Villanueva B. and Woolliams J.A. (2008). Accuracy of predicting the genetic risk of disease using a genome-wide approach. De Los Campos G., Hickey J.M., Pong-Wong R., Daetwyler H.D. and Calus M.P. (2013). Whole-genome regression and prediction methods applied to plant and animal breeding. De Los Campos G., Naya H., Gianola D., Crossa J., Legarra A., Manfredi E., Weigel K. and Cotes J.M. (2009). Predicting quantitative traits with regression models for dense molecular markers and pedigree. Dekkers J. (2007). Prediction of response to marker assisted and genomic selection using selection index theory. Dekkers J. (2012). Application of genomics tools to animal breeding. Gianola D., Fernando R.L. and Stella A. (2006). Genomic-assisted prediction of genetic value with semiparametric procedures. Goddard M. (2009). Genomic selection: Prediction of accuracy and maximisation of long term response. Habier D., Fernando R. and Dekkers J. (2007). The impact of genetic relationship information on genome-assisted breeding values. Hayes B., Bowman P., Chamberlain A. and Goddard M. (2009). Invited review: Genomic selection in dairy cattle: Progress and challenges. Heslot N., Yang H.P., Sorrells M.E. and Jannink J.L. (2012). Genomic selection in plant breeding: a comparison of models. Lorenzana R.E. and Bernardo R. (2009). Accuracy of genotypic value predictions for marker-based selection in biparental plant populations. Luan T., Woolliams J.A., Lien S., Kent M., Svendsen M. and Meuwissen T.H. (2009). The accuracy of genomic selection in Norwegian red cattle assessed by cross-validation.
Moser G., Tier B., Crump R.E., Khatkar M.S. and Raadsma H.W. (2009). A comparison of five methods to predict genomic breeding values of dairy bulls from genome-wide SNP markers. Genet. Sel. Evol. 41, 56.
Muir W. (2007). Comparison of genomic and traditional BLUP estimated breeding value accuracy and selection response under alternative trait and genomic parameters. Park T. and Casella G. (2008). The bayesian lasso. Pérez P. and De Los Campos G. (2014). Genome-wide regression and prediction with the BGLR statistical package. Piyasatian N. and Dekkers J. (2013). Accuracy of genomic prediction when accounting for population structure and polygenic effects. Tibshirani R. (1996). Regression shrinkage and selection via the lasso. VanRaden P.M. and Sullivan P.G. (2010). International genomic evaluation methods for dairy cattle. Verbyla K.L., Bowman P.J., Hayes B.J. and Goddard M.E. (2010). Sensitivity of genomic selection to using different prior distributions. Wientjes Y.C., Veerkamp R.F., Bijma P., Bovenhuis H., Schrooten C. and Calus M.P. (2015). Empirical and deterministic accuracies of across population genomic prediction. Genet. Sel. Evol. 47, 5. | ||

آمار تعداد مشاهده مقاله: 53 تعداد دریافت فایل اصل مقاله: 31 |
||