MULTIPLE IMPUTATION MIGAMMI ALGORITHM
Main Article Content
Abstract
Missing data are common in multi-environmental experiments however sophisticated they are. Thus, it is essential to use appropriate methods of analysis to reduce the impact generated by the loss of information. Data imputation consists in one of the most common techniques used to overcome the problem of missing values, it estimates missing data by plausible values; subsequently, the analyses are carried out on the complete data. This work aims to propose a new multiple imputation method for data from multi-environment trials, resulting from the proposal based on the simple residuals of a linear regression model. Alterations were made in the simple imputation algorithm EM-AMMI to accommodate the additive main effect and generalized multiplicative interaction GAMMI. The quality of the multiple imputations method was evaluated by using accurate general statistics distributions, which combines the variance among imputation and mean square deviation, and normalized root mean square error (NRMSE). For such, simulations of random values at levels of 10%, 20%, 30% and up to 40% were performed from two real data set and the obtained corresponding imputations. The overall mean accuracy and NRMSE results, given the low values obtained, considering the proposed method, demonstrate the high quality of the proposed multiple imputation algorithm MIGAMMI.
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
References
ACORSI, C. R. L; GUEDES, T. A; COAN, M.; PINTO, R. J. B; SCAPIM, C.A; PACHECO, C. A. P; GUIMAR ̃AES, P. D. O.; CASELA, C. R. Applying the generalized additive main effects and multiplicative interaction model to analysis of maize genotypes resistant to grey leaf spot. journal of Agricultural Science, Cambridge, v.155, p.939-953, 2017.
AMOˆEDO, P. M. Modelo de efeitos principais aditivos e interação multiplicativa generalizado (GAMMI) para imputações de dados em experimentos multiambientais, 2021. 45p. Thesis (Ph.D.) - L. S. E. Universidade de São Paulo, Piracicaba, 2021.
ARCINIEGAS-ALARCÓN, S.; DIAS, C. T. S. Data imputation in trials with genotype by environment interaction: an application on cotton data. Revista Brasileira de Biometria, São Paulo, v.27, p.125-138, 2009.
ARCINIEGAS-ALARCÓN, S.; DIAS, C. T. S.; GARCÍA-PEÑA, M. Imputação múltipla livre de distribuição em tabelas incompletas de dupla entrada. Pesquisa Agropecuária Brasileira, Brasília, v.49, p.683-691, 2014.
ARCINIEGAS-ALARCÓN, S; GARCÍA-PEÑA, M; RODRIGUES, P. C. New multiple imputation methods for genotype-by-environment data that combine singular value decomposition and Jackknife resampling or weighting schemes. Computers and Electronics in Agriculture, v.176, p.105617, 2020.
ARCINIEGAS-ALARCÓN, S.; GARCÍA-PEÑA, M.; KRZANOWSKI, W.; DIAS,C. T. S. Imputing missing values in multi-environment trials using the singular value decomposition: An empirical comparison. Communications in Biometry and Crop Science, v.9, p.54-70, 2014.
BERGAMO, G. C. Imputação múltipla livre de distribuição utilizando a decomposição por valor singular em matriz de interação, 2007. 89p. Thesis (Ph.D.)- L. S. E. Universidade de São Paulo, Piracicaba, 2007.
BERGAMO, G. C.; Dias, C. T. d. S.; KRZANOWSKI, W. J. Distribution-free multiple imputation in an interaction matrix through singular value decomposition. Scientia Agricola, v.65, p.422-427, 2008.
CARVALHO, J. R. P. DEet al. Modelo de Imputação Múltipla para Estimar Dados de Precipitação Diária e Preenchimento de Falhas. Revista Brasileira de Meteorologia, v.32, p.575-583, 2017
CHING, W.; LI, L.; TSING, N.; TAI, C.; NG, T.; WONG, A.; CHENG, K. A weighted local least squares imputation method for missing value estimationin microarray gene expression data. International journal of data mining and bioinformatics, v.4, p.331-347, 2010.
ENDERS, C. K. Applied missing data analysis. Guilford: Guilford press, 2010. 382p.
GAUCH, H.; ZOBEL, R. W. Imputing missing yield trial data. Theoretical and Applied Genetics, v.79, p.753-761, 1990
HADI, A. F.; MATTJIK, A.; SUMERTAJAYA, I. Generalized ammi models for assessing the endurance of soybean to leaf pest. Jurnal Ilmu Dasar, v.11, p.151-159,2010.
PADEREWSKI, J.; RODRIGUES, P. C. The usefulness of em-ammi to study the influence of missing data pattern and application to polish post-registration winter wheat data. Australian Journal of Crop Science, v.8, p.640-645, 2014.
PERRY, P. O. Cross-validation for unsupervised learning, 2009. 153p. Dissertation, Stanford University, 2009.
PEUGH, J. L.; ENDERS, C. K. Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of educational research,v.74, p.525-556, 2004.
PIEPHO, H. P. Methods for estimating missing genotype-location combinations in multilocationtrials-an empirical comparison. Informatik Biometrie und Epidemiologie in Medizin und Biologie, v.26, p.335-349, 1995.
R CORE TEAM.R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria., 2020.
RODRIGUES, P. C.; MONTEIRO, A.; LOURENC ̧ O, V. M. A robust additive main effects and multiplicative interaction model for the analysis of genotype-by-environment data. Bioinformatics, v.32, p.58-66, 2016.
RODRIGUES, P. C.; PEREIRA, D. G. S.; Mexia, J. T. A comparison between joint regression analysis and the additive main and multiplicative interaction model: the robustness with increasing amounts of missing data. Scientia Agricola, v.68,p.679-686, 2011.
ROUSSEAU, M.; SIMON, M.; BERTRAND, R.; HACHEY, K. Reporting missing data: a study of selected articles published from 2003-2007. Quality & Quantity,v.46, p.1393-1406, 2012.
RUBIN, D. B. Multiple imputations in sample surveys-a phenomenological bayesian approach to nonresponse. In Proceedings of the survey research methods section of the American Statistical Association, 1978, Alexandria. Proceedings. Alexandria: The American Statistical Association, p.20-34, 1978.
RUBIN, D. B. Multiple imputation for survey nonresponse. New York: John Wiley& Sons, 1987. 320p.
RUBIN, D. B. Multiple imputation after 18+ years. Journal of the American statistical Association, v.91, p.473-489, 1996.
SCHAFER, J. L. ; GRAHAM, J. W. Missing data: our view of the state of the art. Psychological methods, v.7, p.147-177, 2002.
SCHOMAKER, M; HEUMANN, C. Bootstrap inference when using multiple imputation. Statistics in medicine, v.37, n.14, p.2252-2266, 2018.
SPITTI, A. M. D. S.; CARBONELL, S. A. M. ; DIAS, C. T. d. S.; SABINO, L.G.; CARVALHO, C. R. L.; CHIORATO, A. F. Genótipos de feijoeiro carioca para tolerˆancia ao escurecimento de grão pelos métodos natural e acelerado. Ciˆencia e Agrotecnologia, v.43, 2019.
SRIVASTAVA, M. S.; DOLATABADI, M. Multiple imputation and other resampling schemes for imputing missing observations. Journal of Multivariate Analysis, v.100, p.1919-1937, 2009.
VAN BUUREN, S. Flexible imputation of missing data. 2.ed. Boca Raton: CRC Press, 2018. 416p.
VAN EEUWIJK, F. A. Multiplicative interaction in generalized linear models. Biometrics, v.51. p.1017-1032, 1995.
VAN GINKEL, J. R.; LINTING, M.; RIPPE R. C. A. ; VAN DER VOORT,A. Rebutting existing misconceptions about multiple imputation as a method for handling missing data. Journal of Personality Assessment, v.102, p.297-308, 2019.
YAN, W. Biplot analysis of incomplete two-way data. Crop Science, v.53, p.48-57,2013.
ZHANG, P. Multiple imputation: theory and method. International Statistical Review, v.71, p.581-592, 2003