Principal Component Analysis and Exploratory Factor Analysis of the Mechanical Waves Conceptual Survey

Mechanical waves conceptual survey (MWCS) is a measurement tool established by the physics education research (PER) community to evaluate conceptual physics understanding of mechanical waves. A validation study is still needed to figure out the factor structure of MWCS using two data reduction techniques, namely exploratory factor analysis (EFA) and principal component analysis (PCA). The MWCS dataset in this paper was gathered from physics students ( n = 419) from nineteen Ugandan secondary schools. The findings of this research suggested the single factor of the MWCS construct that has emerged from the dataset explored in this study. Several issues involved in the calculation of inter-item correlation within the dataset are suspected as the leading cause of the missing component solution or stable loading in the data. Moreover, there might be other issues that leave open space for future exploration. The findings reported in this paper could be the subject of further discussion in evaluating the validity of the MWCS as a research-based assessment (RBA) to measure students' conceptual understanding of wave mechanics within PER studies.


Introduction
A mechanical wave is one of the physical topics discussed within the physics education curriculum from secondary school to college-level courses. A solid understanding of this topic is needed to study other physics concepts further. Therefore, the development of a conceptual inventory to measure students' understanding of this topic should be done either to measure the extent to which the wave concept can be grasped or to examine the effectiveness of the physics learning reforms.
Exploring the validity of a research-based assessment (RBA) should be administered routinely since the longer the PER community grows, the more studies should investigate the characteristics of their established RBA (Ding & Beichner, 2009). Formerly, the validity of the MWCS has been explored using classical item analysis, as reported by Tongchai et al. (2009). Further research with different analytical frameworks must have opportunities to conduct in evaluate the psychometric characteristics of MWCS. Evidence of validity should never cease to provide strong evidence that the MWCS is a valid instrument for a conceptual survey. It should be stated that MWCS can be employed as a sufficient RBA for the current development. However, a further study investigating the psychometric characteristics of MWCS should enable additional evidence of its current validity.
To enrich the value of MWCS, this paper is intended to explore the factor structure of MWCS through its pretest and post-test datasets using two data reduction techniques, namely principal component analysis (PCA) and exploratory factor analysis (EFA). Through PCA, we could learn how the structure of the analyzed dataset can form the factor structure that should belong to the MWCS measurement. Then, following the PCA and EFA, we will explain that the formed factors can classify the items in the set of factors. Nevertheless, in some literature, the definition and utilization of these techniques are often vague and interchangeable. There has been a long-lasting debate on the pros and cons between PCA and EFA, with some arguing in favor of the PCA technique (Steiger, 1979;Velicer & Jackson, 1990b, 1990a and others in favor of the EFA technique (Bentler & Kano, 1990;Gorsuch, 2010;Mulaik, 1990). Distinctions between the two are primarily theoretical. As for research practices, the selection does not seem to affect empirical findings or substantive conclusions (Arrindell & van der Ende, 1985;Mulaik, 1990;Velicer et al., 1982). This study used both methods and discovered that these techniques are complementary rather than interchangeable. Insights reported in this paper can contribute two folds. First, it will expand the current evidence of the MWCS validity as a measurement tool disseminated by the PER community. Second, we see that the discussion about the difference between PCA and EFA still needs to be announced. Therefore, it can open broader insights to understand better these analytical frameworks used to analyze data in the validation studies. The current version of the MWCS is designed based on the open-ended form developed by Wittmann (1998) through his dissertation project. From Wittmann (1998) point of view and other studies related to it, i.e., Wittmann et al. (1999), Wittmann (2002), and Wittmann et al. (2003), students' difficulties have been documented through several open-ended items. Nevertheless, multiple-choice items are inevitably the preferable format for administering the conceptual survey's large-scale setting. The MWCS is crafted as a multiple-choice test based on these previous investigations of students' difficulties with waves in four constructs. They are propagation, superposition, reflection, and standing waves. The original MWCS reported by Tongchai et al. (2009) discusses their test development phase and its standard psychometric analysis. MWCS was pilot tested on 632 Australian students (high school to second-year college students) and 270 Thai high school students. Table 1 describes the original construct of the MWCS, initially proposed by Tongchai et al. (2009). The MWCS comprises 22 items, of which 17 items accommodate the conventional format of a multiplechoice test with different numbers of responses (five items with five responses, one item with three responses, five items with four responses, four items with six responses, and two items with eight responses). In addition to the multiple-choice (MC) format, MWCS also uses five two-tiered formats (TT), which assign students to perform their scientific reasoning within two stages of answer.

19
TT 3 and 4 options 20 MC 6 options 21 TT 3 and 5 options 22 TT 3 and 5 options In this paper, we decided to involve the original version of MWCS reported by Tongchai et al. (2009). Even Barniol & Zavala (2016) attempted the modified version of MWCS in 2016. We see the potential room for further exploring the factor structure of the original MWCS of Tongchai et al. (2009) that seems unknown from the provided literature within the community. However, we have to admit that this version can be further studied using another dataset from the modified version of MWCS.

212-225
http://journal.uinjkt.ac.id/index.php/jp3i This is an open access article under CC-BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/) To the best of our knowledge, Tongchai et al. (2009), Barniol & Zavala (2016), and most recently Kanyesigye et al. (2022) have just employed the classical analysis to conduct the validation studies of MWCS, including item difficulty, discrimination index, and internal reliability estimation. Instead of the classical theory's drawbacks in providing an understanding of the measurement quality, it is inevitably easier to conduct and interpret. To date, there is no doubt that most educational practices still consider it. Tongchai et al. (2009) reported that the original version of MWCS possessed a sufficient measure of item difficulty, discrimination index, and reliability estimation even though there is a little shift in several study contexts. The difficulty indices ranged between 0.2-0.8, as recommended by the conventional rule. Likewise, the discrimination indices are also evident to be acceptable to distinguish the higher and the lower students within a class. These are supplemented by the sufficient alpha Cronbach's reliability value estimated from 902 students as the sample.
One can argue that providing valid evidence of a measurement tool can be carried out with the perspective through which it has been administered. Comparing the evidence using other analytical frameworks should always be needed. Factor analysis is one of the psychometric analyses to examine how the constructs in Table 1 have been represented by the MWCS dataset (students) (Ding & Beichner, 2009;Hair et al., 2018). Through factor analysis, the factor structure of several RBAs within PER community has been understood to describe their validity of the conceptual physics measurement, i.e., Force Concept Inventory (FCI) (Eaton & Willoughby, 2018;Semak et al., 2017), Force and Motion Conceptual Evaluation (FMCE) (Wells et al., 2020), and Colorado Learning Attitudes about Science Survey (CLASS) (Kontro & Buschhüter, 2020). However, to the best of our inquiry, no evidence has been provided to determine the MWCS's construct validity. Using two reduction techniques reported in this paper, we can evaluate PCA and EFA findings to explore the item's ability to examine the structure of students' conceptual knowledge about mechanical waves.

Methods
The dataset analyzed in this paper was downloaded from Mendeley® cloud-based repository hosted as open data resources. Thus, everyone can access the data legally within the CC BY 4.0 license. The MWCS dataset was recently reported by Kanyesigye et al. (2022), a study conducted in southwest Uganda to evaluate problem-based learning in physics education. Data were gathered from February to April 2021 at the nineteen Ugandan secondary schools. Mechanical Waves Conceptual Survey (MWCS) (Tongchai et al., 2009), Views about Science Survey (VASS) (Halloun & Hestenes, 2002), and Reformed Teaching Observation Protocol (RTOP) (Sawada et al., 2002) were reported as a measure of the three variables probed in their study. However, for this paper, we determined to use the MWCS dataset among the available datasets reported by Kanyesigye et al. (2022). Interested readers are encouraged to involve those other datasets for attempting further studies. The dataset explored in this paper included students' scores in the MWCS pretest ( = 239, ̅ pre = 6.87, = 2.38) and the MWCS posttest ( = 419, ̅ post = 10.42, = 3.00) separately. Table   2 demonstrates attributes, data types, and explanations of each attribute as a demographic aspect of the dataset that represents the context of the dataset. The different number of pretest and post-test data is driven by the different class assignments discussed in Kanyesigye et al. (2022).
The initial stage prior to the analytical process for researchers through secondary data sources is data preparation. The first thing to note was the case of missing data within Kanyesigye et al. (2022) dataset in several places. Fortunately, the missing value was less than the data size (< 5%). To address this potential issue, the imputation method with possible responses (Table 1) was chosen; hence the missing values were filled randomly (Kempf-Leonard, 2004). Subsequently, the student's score was coded based on the answer key provided in the dataset. Afterward, the dichotomous response was coded as one for correct and zero for the opposite. Then the results are filed on a separate sheet to simplify the working process. In addition, researchers also need to study the characteristics of the data to understand the context of the participants described in the dataset of Kanyesigye et al. (2022). The results are summarized in Table 3 about the demographic description of the MWCS pretest and post-test score regarding class, gender, age, major, school status, and school ownership. Exploring the factor structure of MWCS using principal component analysis (PCA) was carried out within R programming language employing several open-source packages such as "readxl" (Wickham et al., 2019), "psych" (Revelle et al., 2015), and "factoextra" (Kassambara et al., 2016). Naturally, these packages can be approached by every scholar in various fields, places, and any time. Packages "readxl" was used to import MWCS data filed as .xlsx format so that it can be translated within R data frame. Packages "psych" was a reliable library for psychometricians and can be easily identified by remembering the name "psych"ometry. These packages are the primary tool in PCA's work in this study. Then, "factoextra" was conducted in favor of the EFA technique supplemented with visualization plots so the reported results could be beautifully delivered. After the running of PCA, EFA was then performed to examine the theoretical construct of MWCS (Table 1) in the same file of R scripts but was equipped with other packages complementing the PCA technique, i.e., "corpcor" (Schaefer et al., 2012), "ggplot2" (Wickham, 2017), "MASS" (Ripley et al., 2019), "MVN" (Korkmaz et al., 2014), and "psy" (Falissard, 2012). Some of these packages were supplementary because these packages are merely to provide summarized information or additional visualizations for a better readership of this paper.

Students' Response Patterns within the MWCS Pretest and Post-Test Dataset
The mean score of the MWCS pretest is 6.87 with a standard deviation of 2.38, and the MWCS posttest was 10.42 with a standard deviation of 3.00 of the 22 items examined in the MWCS. Remember that there are five "two-tier" questions (see Table 1). Students will be given a score if they can correctly answer the items and reasons. The normality distribution of pretest data can be evaluated from the skewness, less than two, and kurtosis, less than 7 (Kline, 2011). Based on this rule, it is found that both the MWCS pretest and post-test datasets are Gaussian distributed. Table 4 summarizes the students' responses on the MWCS pretest and post-test sessions, respectively. In Table 4, we provide the proportion distribution of students' responses among the MWCS pretest and post-test datasets. Proportion is preferable since it would be more comparable to our discussion below. We treat both the pretest and post-test datasets as separate information to keep the nature of Conceptual changes can be discovered by the immediate shift of most selected responses between these two test sessions. For instance, most of the students (30%) answered "C" in the first MWCS pretest items. It is incorrect. Accordingly, most of them (53%) are successful in improving their conception of the correct answer ("B") in the post-test session. Nevertheless, eight MWCS items (14, 15, 16, 17, 18, 19, 20, and 22) were still discovered to be difficult to grasp by the students even though they attempted the post-test session. Before PCA analysis can be performed through the MWCS pretest and post-test datasets, a statistical assumption must be examined, such as Bartlett's sphericity test. The null hypothesis declares that it is impossible to reduce the dimensions of the MWCS dataset. Table 5 reports the results of Bartlett's sphericity test. We see that the p-value is less than 0.05; thus, our null hypothesis is failed to reject. This value can articulate that PCA analysis could reduce the dimension of the MWCS dataset. Furthermore, the following fundamental thing that affects the overall data reduction results using PCA is the inter-items (variables) correlation matrix. Highly correlated items or variables are more likely to form components. Figure 1 shows that the MWCS pretest data cannot be categorized as highly correlated. In this picture, heatmap visualization with the circle representation is presented. We prefer to present using the circle visualization due to the more accessible interpretation. The "perfect" correlation is reported by the main diagonal of Figure 1. The way to read the heatmap visualization in Figure 1 is supported by the continuous spectrum of reference color put on the right side of the image. The color is spread as the spectrum of red and blue. The more red or blue, the correlation will be higher. However, if it is getting faded or even the same as the color of the paper on which this article is printed, then there is no significant correlation between the items. Figure 1 indicates that the MWCS dataset reported by Kanyesigye, et al. (2022) lacks correlation among the items. This may interfere with the overall results of the PCA, and even EFA discussed below. The visualization shown in Figure 1 is also found in the MWCS post-test data but with no substantive difference. Therefore, both pretest and post-test data have a sufficient inter-item correlation to form six factors, as in Table 1. We excluded the pretest correlation matrix to keep the length of this article still easy to read and worthy of publication because the results do not appear to be significantly different.  PCA analysis then can be calculated in three ways, through the "prcomp" function, singular value decomposition or "svd", and "princcomp". The results through three different methods are reported in Table 6 and discover non-significant differences. In Table 6, five principal components (PCs) are only provided. We obtained 27 PCs within the MWCS dataset, which cannot be expected based on Table 1. It can be underpinned by either "prcomp", "svd", or "princomp" reporting a small cumulative proportion (Pr.K) (less than fifty percent as recommended by the conventional rule). Due to the low value of Pr.K results, we cannot represent the students' understanding of mechanical waves due to a lack of acceptable measures. Rather, Alavi et al. (2020) argue that PCA has more emphasis on data reduction than interpretation. Therefore, the interpretation of these PCs should be noticed carefully, and we argue that the MWCS dataset tends to form a single-factor structure (see Figure 2 dan 3 below). Unsurprisingly, this can indicate that the MWCS dataset is less representative of validating theoretical construct as described in Table 1.   Table 1. Instead, we can suspect this immediate single-factor pattern can be driven by the former lack of an inter-item correlation matrix (Figure 1). Furthermore, we can group by variables (items) in Figure 3. Naturally, this must be what EFA does in the next part of our analysis. Accordingly, the single dimension is also demonstrated within the aggregation of variables. This figure essentially makes clustering emerge in the structure of MWCS items. The variance explained by the emerged dimension is neither expected from the first nor second dimensions (less than fifty percent as recommended by the conventional rule). Hence, interpreting these dimensions to examine the theoretical construct of Tongchai et al. (2009) can be problematic to be informed for wider contexts.  Kanyesigye et al. (2022) can be problematic in providing empirical validity to the theoretical foundations (Table 1). Even though we can confirm that the percentage obtained in the y-axes is less than fifty percent as the conventional rule recommends. Therefore, our PCA result can not represent the MWCS dataset sufficiently.

Exploratory Factor Analysis (EFA) and MWCS Dataset
Factor analysis approaches data reduction from a fundamentally different perspective from the PCA. Factor analysis is a measurement model of latent variables. The latent construct is unable to be directly measured by a single variable, i.e., students' understanding of mechanical waves in our case. Latent variables are measured through causal relationships within the MWCS items. In this section, we will report the EFA results of the MWCS dataset using two extraction methods: principal component analysis (PCA) and principal axis factoring (PAF). The same "PCA" terminology can be misleading with the previous analysis as if they share standard features. They must be distinct theoretically. Two different rotation methods of EFA are also examined, both "varimax" from the orthogonal matrix and "oblimin" from the diagonal matrix.
Several steps must be examined prior to the factor analysis. Some have been reported previously at the PCA stage. Hence they will not be discussed here again value higher than 0.5, and the diagonal value of the anti-image correlation matrix is more significant than 0.5. Therefore, we can confidently conclude that the MWCS dataset can be reduced as its factor structure with EFA.
To extract the constructs explained by the MWCS dataset, EFA also uses PCA as its extraction method. However, this extraction is different from what has been executed formerly. PCA assumes no outliers in the data, while EFA assumes a multivariate normal distribution in the maximum likelihood extraction method. In contrast to PCA, EFA analysis decomposes the "adjusted" correlation matrix. The diagonals of the correlation matrix have been adjusted for unique factors. The amount of variance explained is the same as the number of trace matrices or commonalities. Factors represent the expected variance in the dataset. Squared multiple correlations (SMC) were defined to estimate commonality on the diagonal. The observed variable is a linear combination of unique and fundamental factors.
EFA is generally purposed to investigate the construct's empirical structure and estimate the instrument's reliability. EFA is recommended for researchers who do not have a hypothesis about the instrument's structure (Kline, 2011). Several extraction methods are used to determine the number of factors that will be formed. They are principal component analysis (PCA), principal axis factoring (PAF), maximum likelihood (canonical factoring), alpha factoring, and image factoring. In this paper, authors decide to perform two extraction methods most often chosen in several previous studies (Eaton & Willoughby, 2018;Scott et al., 2012;Semak et al., 2017). This method tries to determine the smallest number of factors that can represent the variability of the original variable, which is related to the factors (this contrasts with the PCA we did earlier, which looked for factors that reflect the variability of the variables). Both PCA and EFA should have relevant results if the variables are highly correlated and/or the number of original variables is very large. Four principal components (PCs) are reported in Table 7 as a selection of the number of factors extracted from the EFA technique. This number of PCs was determined based on the theoretical construct proposed by Tongchai, et al. (2009) in Table 1. In Table 6 above, five PCs are reported. Contrarily, they are theoretically distinct. In Table 7, the number of factors extracted by the EFA method should be determined prior to the calculation. Our determination was made based on the theoretical construct in Table 1 though we must admit that EFA underlies the assumption of exploratory than confirmatory. Then, we can evaluate the results by considering the Cum. V (an abbreviation of cumulative variance) is discussed in Table 6 above. We can infer from Table 7 that four components cannot explain the variance of the MWCS dataset (less than 50% as recommended by the literature). In summary, our findings suggest that the MWCS dataset possesses a single factor and fails to form four constructs as expected theoretically. Table 7 also reveals the results of two different extraction methods, PCA and PAF. We use the terminology of "PC" for PCA results and "PA" (an abbreviation of the principal axis) for PAF extraction method. Contrasting both discover common features in some places, and we can discover no significant difference in the mean item complexity. The "SS loadings" row is the sum of squared loadings. This is sometimes used to determine the value of a particular factor. We say a factor is worth keeping if the SS loading is greater than 1. In our example, PCA results possess values greater than 1 and just one "PA" which satisfies the cutoff value. The "Pr. Var" row reflects the proportion of variance explained by a particular factor. We can discover a small variance explained by our four PCs and PAs. In Table 8, the varimax and oblimin extraction method find the same mean item complexity though little difference occurs at the oblimin of the MWCS post-test. It seems that oblimin method obtains lower non-significant results than the varimax in all aspects (SS loading, Pr. Var, Cumulative Variance, Pr. Exp, and Cumulative Proportion). These findings align with the recommendation that both EFA and CFA frameworks could erroneously be used to analyze the same data and even yield similar results (Yanai & Ichikawa, 2006).

Internal Reliability of MWCS Dataset (Cronbach's )
This study calculated Cronbach's alpha measure to estimate the reliability of the MWCS instrument. Neither pretest nor post-test MWCS can be concluded as adequate reliability ( = 0.352, = 0.284). This value is unsatisfactory and can be a measure that the factor structure solution generated in this article is unclear to be generalized in the broader context. Internal consistency, such as Cronbach's alpha coefficient, highly depends on the total item correlation. The inter-item correlation also indirectly affects the total item correlation (Figure 2). It can be seen in Figure 2 that the inter-item correlation is problematic in the Kanyesigye et al. (2022). In addition, the opportunity for discrepancies or student misconceptions in the sample is entirely possible. Exploring MWCS Construct using PCA and EFA Dataset Reported by Kanyesigye et al. (2022) This study aims to extract the components or factors in the MWCS dataset to understand the MWCS construct proposed by Tongchai et al. (2009) using the dataset reported by Kanyesigye et al. (2022) empirically. Before we discuss it further, we can start this discussion from the descriptive distribution of the MWCS pretest and post-test responses in Table 4 above. The numbers in bold are the answer keys that most respondents should answer. There is a conceptual shift in the respondent group from the pretest to the post-test. This can be explained rationally because the context of the dataset is a problem-based learning experiment, but this effect is outside the purpose of this study. Starting with the first item, the respondents were evenly distributed throughout the answers (A-D) at the pretest and tended to choose option B at the post-test. This result implies that item 1 means working well. However, some items fail to be good items.
One can check the items where the participants are not dominated by the correct answers in the fifteenth item. At the post-test, students are congregated on option B (incorrect answer). Logical reasons can be conveyed within two folds of possibilities. First, this item failed to be understood by the students; hence those with high abilities even took the wrong answer. Failure to understand the items can be driven by grammatical errors or the confusing effect of distractors. In addition, this finding can be related to the students' misconceptions at the class level, and physics teachers must evaluate their instruction. Misconceptions usually occur in minority groups in the classroom. Misconceptions in large groups indicate incorrect content delivery or even item construction errors.
PCA results should be able to describe the number of components that can be reduced from the MWCS dataset. The results of the variance components that can be explained from the dataset have been summarized in Table 6 with three applied methods. Some literature suggests that the total variance explained by all components should be between 70-80 percent of the item variance. However, there is another argument that social studies can use the cutoff of 50-60 percent as acceptable results for inferring the final represented components. However, if we examine the results in column Pr. K (cumulative proportion) found that the result is inadequate. Extracted components based on the ideas proposed by Tongchai et al. (2009) are expected as four topics in the MWCS theoretically (propagation, superposition, reflection, and sound waves) ( Table 2). The MWCS dataset provided by Kanyesigye et al. (2022) already fulfilled several analytical assumptions, such as data normality, Barttlet sphericity test, and KMO sample adequacy test. However, as previously explained, our dataset is somewhat problematic in the basic aspect of factor analysis, namely the inter-item correlation matrix ( Figure 2). The correlation problem is also related to the reliability value not suggested by the statistical literature and requires an alpha value greater than 0.7. These issues allow other researchers to examine these data using more advanced analysis, such as item response theory (Hansen & Stewart, 2021;Smith et al., 2020;Stewart et al., 2018).
Several extractions and rotation methods of EFA have been approached in Table 8. Although showing somewhat different results from previous PCA (not extraction method within EFA), the PCA extraction method still shows a cumulative proportion that is less than the minimum suggested by the literature and is not expected from the findings of Tongchai et al. (2009). We can declare that PCA and EFA found similar results to conclude the factor structure of the MWCS dataset. Nevertheless, EFA cannot replace the merit of PCA as a data reduction technique. Both have different theoretical frameworks, and they are complementary.
We discover that the theoretical construct of the MWCS cannot be empirically satisfied based on Kanyesigye, et al. (2022) dataset. Several issues can drive this finding. One fundamental factor is the inadequate correlated items within the MWCS pretest and post-test dataset. Furthermore, one can argue that respondents need to be increased for cases with low correlation (Schreiber, 2021). This case seems to contradict the KMO sample adequacy test, which has claimed that the data used in this study is  (Table 5). However, the next question is whether larger data points with the same instrument guarantee MWCS reliability. This question can probably be answered intuitively. According to Schreiber (2021), before deciding to collect more data, we should return to the nature of the measurement itself. Valid data support valid measurements. Valid data will be gathered from the good items. The items mentioned above are still problematic in MWCS, which several aspects can influence. As explained earlier, some items do not work correctly. Further research opportunities should be planned by diagnosing them through more advanced analysis.

Conclusion
Investigating the psychometric characteristics of research-based assessment (RBA) is an ongoing process of validity studies for better measurement results. In this paper, MWCS, as one of the conceptual inventories within the PER community, has been examined for its construct validity using two frameworks, PCA and EFA. Nevertheless, we discover no substantial evidence to empirically conclude the established theoretical factor structure of MWCS via these frameworks. We suspect that the most fundamental factor directly affecting our PCA and EFA findings is the inadequate inter-item correlation matrix that emerged from the employed dataset. The low value can contribute to the unstable solution for components or factors extracted by the PCA and EFA.