ANALYSIS OF MULTINOMIAL DATA WITH OVERDISPERSION DIAGNOSTICS AND APPLICATION

Main Article Content

Maria Letícia SALVADOR
Eduardo Elías RIBEIRO JUNIOR
César Augusto TACONELI
Idemauro Antonio Rodrigues LARA

Abstract

In agronomic experiments, the presence of polytomous variables is common, and the generalized logit model can be used to analyze these data. One of the characteristics of the generalized logit model is the assumption that the variance is a known function of the mean, and the observed variance is expected to be close to that assumed by the model. However, it is not uncommon for extra-multinomial variation to occur, due to the systematic observation of data that are more heterogeneous than the variance specified by the model, a phenomenon known as overdispersion. In this context, the present work discusses a diagnostic of overdispersion in multinomial data, with the proposal of a descriptive measure for this problem, as well as presenting a methodological alternative through the Dirichlet- multinomial model. The descriptive
measure is evaluated through simulation, based on two particular scenarios. As a motivational study, we report an experiment applied to fruit growing, whose objective was to compare the flowering of adult plants of an orange tree, grafted on “Rangpur”
lime or “Swingle” citrumelo, with as response variable the classification of branches into three categories: lateral flower, no flower or aborted flower, terminal flower. Through the proposed descriptive measure, evidence of overdispersion was verified, indicating that the generalized logit model may not be the most appropriate. Thus, as a methodological alternative, the Dirichlet-multinomial model was used. Compared to the generalized logit model, the Dirichlet-multinomial proved to be more suitable to fit the data with overdispersion, by allowing the inclusion of an additional parameter to accommodate the excessive extra-multinomial dispersion.

Article Details

How to Cite
SALVADOR, M. L., RIBEIRO JUNIOR, E. E., TACONELI, C. A., & LARA, I. A. R. . (2022). ANALYSIS OF MULTINOMIAL DATA WITH OVERDISPERSION DIAGNOSTICS AND APPLICATION. Brazilian Journal of Biometrics, 40(3). https://doi.org/10.28951/bjb.v40i3.584
Section
Articles

References

AGRESTI, An introduction to categorical data analysis, John Wiley & Sons, 2019.

CHEN, J.; LI, H.Variable selection for sparse Dirichlet-multinomial regression with an application to microbiome data analysis. Annals of Applied Statistics, v.7, n.1, p.418–442, 2013. ISSN 19326157.

FREITAS, S. M. Modelos para proporções com superdispersão provenienetes de ensaios toxicológicos no tempo. 124p. Tese (Doutorado) — Universidade de São Paulo / Escola Superior Agricola "Luiz de Queiroz", 2001.

MORAL, R. A.; HINDE, J.; DEMÉTRIO, C. G. B. Half-Normal Plots and Overdispersed Models in R : The hnp Package. Journal of Statistical Software, v.8, n.10, 23p., 2017.

MORAL, R. d. A.; HINDE, J.; DEMÉTRIO, C. G. B. Half-Normal Plots with Simulation Envelopes. [S.l.]: 2018-05-21, 2018. 28p.

MOREL, J. G.; NAGARAJ, N. K. A Finite Mixture Distribution for Modelling Multinomial Extra Variation. Biometrika Trust, Oxford University Press, v.80, n.2, p.363–371, 1992.

MOSIMANN, J. E. On the Compound Multinomial Distribution , the Multivariate β- Distribution , and Correlations Among Proportions. Biometrika Trust, Oxford University Press, v.49, n. 1, p.65–82, 1962.

NELDER, J. A.; WEDDERBURN, R. W. M. Generalized Linear Models. v.135, n.3, p.370–384, 1972.

OLSSON, U. Generalized Linear Models An Applied Approach. [S.l.]: Lund: Studentlitteratur, 2002. 232p. ISBN 9789144031415.

PAUL, S. R.; LIANG, K. Y.; SELF, S. G. On Testing Departure from the Binomialand Multinomial Assumptions. v.45, n.1, p.231–236, 1989.

VENABLES, W. N.; RIPLEY, B. D. Modern Applied Statistics with S. Fourth. New York: Springer, 2002. ISBN 0-387-95457-0.

VOIGT, V. Caracterização fenotípica e avaliação da expressão de genes envolvidos na indução e no florescimento da laranjeirax11. Tese (Doutorado) — Universidadede São Paulo, 2013.

ZHANG, Y.; ZHOU, H.Mglm: Multivariate response generalized linear models. R package version 0.2.0, v.7, 2016.