A PROPOSAL FOR THE ANALYSIS OF MAIN COMPONENTS IN THE PRESENCE OF NON-RANDOM VARIABLES
Main Article Content
Abstract
For exploratory analysis of the principal components (CPs), the assumption of multivariate normality of the variables is not required, nor necessarily that they are random. This means that variables that do not behave randomly can also be included in this analysis. Thus, in order to carry out the analysis of the PCs with random variables or not, a correction of the matrix based on the coefficients of variation was proposed (Campana et al., 2010) by applying the method of Lenth (1989), whose new array was named . To verify its feasibility, ten data sets of random variables Y1, Y2, Y3 and Y4 were simulated, with 10,000 values each and that followed multivariate normal distribution. After the simulation, 0%, 1%, 2%, 3% and 4% of the random values of Y4 were replaced by the same and respective percentages of outliers, in order to break its randomness. Subsequently, response surface analyzes were performed for eight different absolute mean percentage errors obtained in relation to eight parameters related to the performance of the CP analysis, as a function of the replacement percentages by Y4 outliers (0, 1, 2, 3 and 4 ) and the matrices used in the analysis of the PCs. According to the results, it was concluded that, in the presence of only normal random variables, it is the best matrix. On the other hand, when there are outliers, it is the most recommended.
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
References
CAMPANA, A. C. M.; RIBEIRO JÚNIOR, J. I.; NASCIMENTO, M. Uma proposta de transformação de dados para a análise de componentes principais. Revista Brasileira de Biometria, v.28, p.1-15, 2010.
FERREIRA, D. F. M Estatística multivariada. 2.ed. Lavras: Editora UFLA, 2009. 676p.
HOTELLING, H. Review of the triumph of mediocrity in business. Journal of the American Statistical Association. v. 28, p. 463-465, 1933.
JOHNSON, R. A; WICHERN, D. W. Applied multivariate statistical analysis. 5.ed. New Jersey: Prentice Hall, 2002.767p.
LAWSON, J. SAS macros for analysis of unreplicated 2kand 2k-pdesigns with a possible outlier. Journal of Statistical Software, v. 25, p. 1-17, 2008.
LENTH, R. V. Quick and easy analysis of unreplicated factorials. Technometrics, v.31, p. 469-473, 1989.
MINGOTI, S. A. Análise de dados através de métodos de estatística multivariada –uma abordagem aplicada. Belo Horizonte: Editora UFMG, 2007. 297p.
R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing, 2020. URL https://www.r-project.org