Procrustes analysis, multivariate regression, variable selection and outlier detection in compositional data for social vulnerability
Main Article Content
Abstract
Vulnerability means delicate and weak in the behavior of people, objects, situations and ideas. People considered “socially vulnerable” are those who lose their representation in society and generally depend on help from third parties to ensure survival. The main characteristics that mark this vulnerability are precarious housing conditions, sanitation, non-existent means of subsistence and the absence of a family environment. Among the different types, they highlight youth in the area of health, marginalization, exclusion and territorial. Social Vulnerability Index (SVI) is composed of indicators of income and social impairment in dimensions such as identification, housing, education, income, poverty, family, work and other assets. Variable selection is finding a subset of variables that best explains a response vector, without losing relevant information. Procrustes Analysis is a method that aims to determine how much a subset of variables best represents the structure of the original data. Compositional data are quantitative descriptions of the parts of a whole, which convey information in a relative way. Principal components are linear combinations of all original variables, independent of each other and estimated with the purpose of retaining, in order of estimation, the maximum amount of information to explain the total variance. Univariate outliers are observations that differ greatly from the others. Multivariate outlier corresponds to cases involving two or more variables. In this work we use the Procrustes method and other regression methods to select variables formed from compositional data after detecting multivariate outliers using Mahalanobis Distance and comedian approach.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
References
Aitchison, J. The Statistical Analysis of Compositional Data. Chapman Hall, The Blackburn Press. 2011. https://doi.org/10.1111/j.2517-6161.1982.tb01195.x
Barbosa, J.J.; Pereira, T.M.; Oliveira, F..P. Uma proposta para identificação de outliers multivariados. Ciência e Natura vista em 17/04/2021 no link: https://www.repositorio.ufop.br/bitstream/123456789/11454/1/ARTIGO_PropostaIdentifica%c3%a7%c3%a3oOutliers.pdf. 2018.
BARNETT, V.; LEWIS, T. Outliers in statistical data. Wiley & Sons, New York. 1994.
Buccianti, A.; Mateu-Figueras, G.; Pawlowsky-Glahn, V. Compositional Data Analysis in the Geosciences from Theory to Practice. Geological Society Special 264. https://doi.org/10.1111/j.1467-985X.2007.00521_5.x
Bunch, J.R., Nielsen, C.P. AND SORENSEN, D.C. Rank one modification of the symmetric eigenproblem. Numerische Mathematik, 31, 31-48. 1978. https://doi.org/10.1007/BF01396012
Carmo, M. E.; Guizardi, F. L. O conceito de vulnerabilidade e seus sentidos para as políticas públicas de saúde e assistência social. Cadernos de Saúde Pública (3). ISSN 1678-4464. doi:10.1590/0102-311x00101417. Consultado em 27 de novembro de 2021
Costa, M. C. R. Qualidade de vida em adolescentes: Um estudo no terceiro ciclo do ensino básico. 2012. 377 f. Tese. Universidade de Salamanca, Salamanca, 2012. https://doi.org/10.1590/S0047-20852008000300009
Gower, C.J.; Dijksterhuis, G.B. Procrustes Problems. Oxford Statistical Series, 30. Oxford, England. 2004. http://dx.doi.org/10.1093/acprof:oso/9780198510581.001.0001
Ferreira, E.B. . Análise generalizada de procrustes via R: uma aplicação em laticínios. Dissertação de mestrado em Agronomia ULFA, Lavras-MG. 2004
Giroldo, F. R. S. Alguns métodos robustos para detectar outliers multivariados. Dissertação de Mestrado, IME-USP, São Paulo, São Paulo-SP. 2008.
Golub, G.H.; Reinsch, C. Singular value decomposition and least squares solutions. Numerische Mathematik, 14, 403-420. 1970. https://doi.org/10.1007/BF02163027
Jolliffe, I.J. Discarding variables in principal component analysis. I: artificial data. Applied Statistics, 21, 160-173. 1972. https://doi.org/10.2307/2346488
Jolliffe, I.J. Discarding variables in principal component analysis. II: real data. Applied Statistics, 22, 21-31. 1973. https://doi.org/10.2307/2346300
Kranowski, W.J. Selection of variables to preserve multivariate data structure, using principal components. Appl. Statist., 38:139—147, 1989. https://doi.org/10.2307/2347842
Krzanowski, W.J. A stopping rule for structure preserving variable selection. Statistics and Computing, 6, 51-56. 1996. https://doi.org/10.1007/BF00161573
Leite, C.C. Técnicas exploratórias na detecção de outliers em dados composicionais. Dissertação de Mestrado em Matemática e Aplicações. Universidade de Aveiro, Portugal. 2019.
MALTEZ, M.L.S. Novas abordagens na detecção de outliers em dados composicionais. Dissertação de Mestrado em Matemática e Aplicações. Universidade de Aveiro, Portugal. 2020.
Nunes, E. L. G.; Andrade, A. G. Adolescentes em situação de rua: prostituição, drogas e HIV/AIDS em Santo André, Brasil. Psicologia e Sociedade, Florianópolis, SC, v. 21, n. 1, p.45-54. jan./abr. 2009. https://doi.org/10.1590/S0102-71822009000100006
OLIVEIRA, P.T.M.S. Pessoas com deficiência: o que encontramos por trás da inclusão. In: XXI SINAPE, ABE, Natal-RN. 2014.
OLIVEIRA, P.T.M.S. Pessoas com deficiência: questão de risco sob aplicação de regressão logística politômica e sob visão epidemiológica. In: XV Escola de Modelos de Regressão, no período entre 2 a 5 de março de 2015. Centro de Convenções UNICAMP, Campinas-SP, Brasil, 2015.
Pessalacia, J. D. R.; Menezes, E. S.; Massuia, D. A vulnerabilidade do adolescente numa perspectiva das políticas de saúde pública. Revista Bioethikos, São Camilo, RJ. v. 4, n. 4, p. 423-430. out./dez. 2010.
PAWLOWSKY-GLAHN, V.; EGOZCUE, J. J.; TOLOSANA-DELGADO, R. Modeling and analysis of compositional data. John Wiley & Sons, USA. 2015. DOI:10.1002/9781119003144
Sibson, R. Studies in the robustness of multidimensional scaling. Journal of the Royal Statistical Society, B, 40, 234-238. 1978. https://doi.org/10.1111/j.2517-6161.1979.tb01076.x
Sajesh, T.A.; Srinivasan, M.R. An Overview of Multiple Outliers in Multidimensional Data. Sri Lankan Journal of Applied Statistics, 14:(2). 2013. http://dx.doi.org/10.4038/sljastats.v14i2.6214
Sousa, R.C.A. Análise estatística de dados composicionais. Dissertação de Mestrado em Matemática e Aplicações. Universidade de Aveiro, Portugal. 2016.
Van Den Boogaart, K.G.; Tolosana-Delgado, R. Analyzing Compositional data with R. Springer, Germany. 2013.