Abstract
The Kaiser-Meyer-Olkin (KMO) index is a measure of sampling adequacy used by researchers to assess whether a data matrix is factorable prior to a factor analysis. Since its conception, the KMO index has remained a Frequentist statistic, leaving researchers unable to employ the advantages of Bayesian inference when assessing sampling adequacy. Building on the increasing relevance of the Bayesian statistical approach, as well as advancements in Markov-Chain Monte Carlo methods, the author proposes a re-conceptualization of the KMO index within the Bayesian framework that enables researchers to incorporate prior information and make coherent probabilistic statements about the sampling adequacy of a data matrix.©1
Keywords: Kaiser-Meyer-Olkin index, KMO, Bayesian Kaiser-Meyer-Olkin index, BKMO, Measure of Sampling Adequacy, MSA, Bayesian Measure of Sampling Adequacy, BMSA, Robust KMO index, Bootstrap KMO index, Bayesian inference, Likelihood-based inference, Frequentist inference
The Kaiser-Meyer-Olkin (KMO) index, formally referred to as a Measure of Sampling Adequacy (MSA, Kaiser 1970, 1981; Kaiser and Rice 1974), is a statistic useful for determining whether a data matrix can be considered ‘appropriate’ for factor analysis (Dziuban and Shirkey 1974; Field 2018: 798; Revelle 2025). Named in honor of Henry F. Kaiser4, Edward P. Meyer5, and Ingram Olkin6 (cf. Dziuban and Shirkey 1974; Kaiser 1970: 405; Kaiser 1981: 380; Kaiser and Rice 1974: 112), it is often used to assess whether a data matrix is factorable (cf. Revelle 2025; e.g., Field 2018: 797-799), which is a desired property to ascertain prior to an exploratory factor analysis (EFA, cf. Field 2018: 778-833). While originally introduced as ‘a second generation Little Jiffy’7 by Kaiser (1970), the currently most used iteration of the KMO index is the result of two subsequent revisions8, including a collaboration between Kaiser and John Rice9, titled ‘Little Jiffy, Mark IV’ (Kaiser and Rice 1974).10 Though it historically may have seen most use within psychology (e.g., Dziuban and Shirkey 1974; Ghadiri et al. 2015; Zabihi et al. 2014), often being used in conjunction with Maurice S. Bartlett’s (1950) test of sphericity (e.g., Dziuban and Shirkey 1974; Field 2018), to this day, it remains a standard measure in statistical software (e.g., IBM 2025; Revelle 2025) that helps researchers across disciplines gauge whether their data can be meaningfully explored using factor analysis (e.g., Ahmad et al. 2024; Almeida et al. 2022; Danaei et al. 2024; Sadowska et al. 2021; Sarfaraz et al. 2024).
Conceived within the classical statistical framework (for introductory texts, see, e.g., Field 2018; Stock and Watson 2019; Wooldridge 2019), the KMO index can be shown to be conceptually ‘Frequentist’ (cf. Kaiser 1970, 1981; Kaiser and Rice 1974), and a counterpart suitable for the Bayesian statistical framework (for introductory texts, see, e.g., Gelman et al. 2014, 2021; Kruschke 2014; Levy and Mislevy 2020; McElreath 2019) is currently missing. This is a relevant shortcoming, since recent advances in Bayesian inference (for a review, see, e.g., Gelman 2022) has made this framework more viable for applied research (e.g., Gelman et al. 1996, 2019; Hoffman and Gelman 2014; Vehtari et al. 2021), and considering the numerous criticisms of Frequentism (e.g., Clayton 2021; Cohen 1994; Gelman 2023; Gelman and Stern 2006; Gigerenzer 2004, 2018), transitioning to the Bayesian approach may be the best option (cf. Kruschke 2014; McElreath 2019; Wagenmakers et al. 2010). Accordingly, a multitude of Frequentist statistics have already been converted into Bayesian counterparts (e.g., Ben-Shachar et al. 2020; Gelman et al. 2019; Makowski et al. 2019a), and researchers acquainted with the KMO index may take issue with the lack of a Bayesian version that can properly exploit the numerous advantages of Bayesian inference, such as the ability to incorporate prior information into the analysis and make coherent probabilistic statements from a posterior distribution (cf. Clayton 2021; Gelman et al. 2014; Kruschke 2014; McElreath 2019).
This paper solves the issue of a missing Bayesian KMO index by re-conceptualizing it within the Bayesian framework and demonstrating its implementation using standard statistical software. To build towards a Bayesian concept, this is achieved by revisiting the current KMO index (Section 2, discussing it mathematical and conceptual properties within the Frequentist, Likelihood-based, and Bayesian frameworks, as well as re-examining its robustness and viability for inference. The resulting Bayesian Kaiser-Meyer-Olkin (BKMO) index is introduced (Section 3, which can be considered a Bayesian Measure of Sampling Adequacy (BMSA) that can incorporate prior information and express uncertainty about the sampling adequacy with posterior distributions. With weakly informative priors and small samples, the BKMO index is effectively estimated using regularization, and with ‘non-informative’ priors, the BKMO index approximates results derived from its Frequentist and Likelihood-based counterparts. The BKMO is discussed (Section 4), where, analogous to its classical counterpart, the BKMO can be made robust using rank-normalization, and similar to a bootstrapped version, the Bayesian version enables inferential statements to be made regarding the sampling adequacy of the data matrix.
To build towards a Bayesian conceptualization, it is necessary to first understand the classical and most widely used version of the KMO index, that is, ‘Little Jiffy, Mark IV’ (Kaiser and Rice 1974), including the mathematical formula involved in its calculation. Readers should note that this paper uses the terms ‘KMO index’ and ‘Measure of Sampling adequacy’ interchangeably, since the KMO index is a measure of the ‘sampling adequacy’ of a data matrix (Kaiser 1970, 1981; Kaiser and Rice 1974). This helps emphasize that the viability of the KMO index in applied research is highly dependent on the characteristics of this data matrix from which it is calculated. While explicitly specifying this data matrix is often neglected (e.g., Kaiser 1970, 1981; Kaiser and Rice 1974), this will be done here (Section 2.1) to reveal the assumptions of the KMO index and help make sense its formula (Section 2.2) and interpretation (Section 2.3). After demonstrating its application (Section 2.4), these assumptions are discussed in relation to its robustness (Section 2.5) and viability for inference (Section 2.6).
Specifying the data matrix can be done using set theory (Burgess 2022[1948]; Cantor 1874) and standard notation (e.g., McElreath 2019; Wooldridge 2019) that, following Kruschke (2014), is made more mnemonic for the purposes of this paper. Suppose the existence of an \(n \cdot k\) data matrix (i.e., \(\mathcal{D}\)). Let \(n\) be the positive and finite total number of independent unit observations (i.e., \(n \in \mathbb{N}\)), randomly sampled to an adequate extent (i.e., \(n \gg 1\)) from a possibly larger population (i.e., \(N\), with \(N \geq n\)), which (in principle) could be infinitely large (\(N \in \mathbb{C}\)), with \(u\) indexing an arbitrary individual \(u\)nit observation. Similarly, let \(k\) be the positive and finite total number of independently measured phenomena (i.e., \(k \in \mathbb{N}\)), theorized to be manifestations (i.e., \(\theta\)) of some latent phenomenon (i.e., \(\phi\)), with an arbitrary \(m\)anifestation indexed by \(m\), randomly sampled to an adequate extent (i.e., \(k \gg 1\)) from a possibly larger number (i.e., \(K\), with \(K \geq k\)) of manifestations (i.e., \(\Theta\), with \(\theta \subseteq \Theta\)), which (in principle) could be infinitely large (i.e., \(K \in \mathbb{C}\)). This data matrix (\(\mathcal{D}\)), with \(\theta_{1,1}, \dots, \theta_{n,k}\), is illustrated in equation 2.1.1.
Assume then that both the latent phenomenon (\(\phi\)) and all of its manifestations (\(\theta\)) can be meaningfully expressed as real-valued vectors (\(\{\phi, \theta\} \subset \mathbb{R}\)), and that the latent phenomenon is independent and identically distributed (iid, cf. McElreath 2019: 81; Stock and Watson 2019: 81-82) as a conditionally Normal distribution (\(\mathcal{N}\), i.e., Gaussian, Gauss 2012[1809]; cf. Wooldridge 2019: 704-705) so that: \(\phi_u | \nu_u, \sigma^2 \overset{iid}{\sim} \mathcal{N}(\nu_u, \sigma^2)\), for \(u\) in \(1\), \(\dots\), \(n\) (cf. Gelman et al. 2014; see also Levy and Mislevy 2020: 88; Kruschke 2014; McElreath 2019).
\[ \mathcal{D} = \begin{bmatrix} \theta_{1,1} & \cdots & \theta_{1,k} \\ \vdots & \ddots & \vdots \\ \theta_{n,1} & \cdots & \theta_{n,k} \end{bmatrix} \tag{2.1.1} \] Further, assume that the latent phenomenon is physically distinct from, and temporally precedes in origin, all of its manifestations (i.e., \(\forall \theta_m \, (\phi \neq \theta_m \land \phi \prec \theta_m)\)), with every manifestation also being physically distinct from, and temporally simultaneous in origin with, every other manifestation (i.e., \(\forall \theta_m (\theta_m \neq \theta_{m'} \land \theta_m \not \prec \theta_{m'} \land \theta_m \not \succ \theta_{m'}\)), with \(m'\) denoting an arbitrary manifestation that it not \(m\). Lastly, assume that for all other causes of the manifestations (i.e. \(\zeta\)) that the latent phenomenon is exogenous of all of these (i.e., \(\forall \zeta \, (\phi \perp \zeta)\)). In line with its standard conceptualization in factor analysis (Field 2018: 781-785; see also Levy and Mislevy 2020: 187-190), suppose also that the manifestations are endogenous and iid as a function of the latent phenomenon, with the following conditionally linear functional form: \(\theta_{u,m} | \tau_m, \lambda_m, \varsigma_m^2 \overset{iid}{\sim} \mathcal{N}(\tau_m + \lambda_m \cdot \phi_u, \varsigma_m^2)\), for \(u\) in \(1\), \(\dots\), \(n\), and \(m\) in \(1\), \(\dots\), \(k\), where \(\tau_m\) is the intercept, \(\lambda_m\) is the factor loading (i.e., coefficient), and \(\varsigma_m\) is the dispersion of manifestation \(m\). These assumptions are discussed in Section 2.6).
With this general data matrix (\(\mathcal{D}\)) just outlined in Section 2.1 in mind, the steps typically used to calculate the KMO index can be shown (cf. Kaiser 1981: 380; Kaiser and Rice 1974; consistent with Dziuban and Shirkey 1974: 359; IBM 2025; Revelle 2025). Conceptual considerations from the Frequentist, Likelihood-based, and Bayesian frameworks will be discussed throughout to build towards a Bayesian conceptualization.
First, let \(\text{Mean}(\theta_m)\) denote a piecewise defined function (cf. Stewart 2010: 16-17) for calculating the mean of manifestation \(m\) (cf. Field 2018: 26-27; Gelman et al. 2021: 41-42; Stock and Watson 2019: 82-83; Wooldridge 2019: 666-668):
\[ \text{Mean}(\theta_m) = \left\{ \begin{array}{lll} \hat{\mu}_m = \frac{1}{n} \sum_{u=1}^{n} \theta_{u,m} & \text{if Freq.} \land\ n < N & \text{(2.2.1.1)} \\ \mu_m = \frac{1}{n} \sum_{u=1}^{n} \theta_{u,m} & \text{if Freq.} \land\ n = N & \text{(2.2.1.2)} \\ \bar{\theta}_m = \frac{1}{n} \sum_{u=1}^{n} \theta_{u,m} & \text{if MLE-Bayes.} & \text{(2.2.1.3)} \end{array} \right. \tag{2.2.1} \]where \(\Sigma\) denotes the summation operation (Cajori 2007; Euler 1755), and the aforementioned \(u\) indexes the rows of \(\mathcal{D}\) (i.e., \(u\)nit observations), with \(n\) rows in total, while \(m\) indexes columns of \(\mathcal{D}\) (i.e, \(m\)anifestations). Here, the simplicity of function 2.2.1 helps express some of the conceptual differences between the Frequentist, Likelihood-based, and Bayesian frameworks, since the formula used in each are mathematically equivalent. While all frameworks employ formula that equivalently calculates the equally-weighted arithmetic mean of manifestation \(m\), the calculated mean can be conceptualized in at least three different ways. For example, a Frequentist researcher is conceptually interested in estimating the mean of the population (\(\mu_m\)). To achieve this, they conceptualize the formula used to arrive at \(\mu_m\) as an estimator (cf. Stock and Watson 2019: 105). The particular estimator of the population mean employed by formula 2.2.1.1 in function 2.2.1 can be denoted \(\hat{\mu}_{OLS}\) to reflect its association with the estimation method of ordinary least squares (OLS, Legendre 1805; Wooldridge 2019: 70-73). Accordingly, it is chosen not necessarily because it calculates the mean of the sample (i.e., \(\bar{\theta}_m\)), but because of its desirable properties across an (in principle) infinitely large number of samples, where it can be considered an unbiased estimator of the population mean when relying on OLS, because it is equivalent to the expected sampling mean (i.e., \(\mathbb{E}(\hat{\mu}_{OLS[m]}) = \mu_m\), Stock and Watson 2019: 106-108). Accordingly, formula 2.2.1.1 calculates the estimated mean of the population (\(\hat{\mu}_m\)). For frequentist researchers in possession of data on the entire population (i.e., \(n = N\)), the calculated mean is conceptually different from \(\hat{\mu}_m\), since the sample mean here is the population mean (i.e., \(\mu_m\)), enabling the use of formula 2.2.1.2. However, both of these conceptualizations are different from the Likelihood-based and Bayesian approaches.
By comparison, formula 2.2.1.3 in function 2.2.1 can be considered conceptually Bayesian, because it calculates the mean of the sample (i.e., \(\bar{\theta}_m\)). This is because in the event of no prior information, formula 2.2.1.3 provides the best possible estimate of the mean of manifestation \(m\) given the data. Here, the Bayesian framework is not inherently built around estimating population characteristics through some concept of drawing infinite samples from that population (cf. Clayton 2021; McElreath 2019), and accordingly, there is no need for varying formula to distinguish between data from a sample or population. While 2.2.1.3 is here referred to as ‘Bayesian’, readers should note that this formula is only a point estimator (cf. Casella and Berger 2002: 311) of the sample mean and fails to incorporate prior information. This leaves the calculated mean of formula 2.2.1.3 conceptually equivalent to the mode of the likelihood-distribution, which is conceptually of interest to the Likelihood-based approach (cf. King 1998). In this framework, the conceptual output of formula 2.2.1.3 is also preferred for function 2.2.1, since in a manner similar to the Frequentist approach, it can be considered a maximum-likelihood estimator (\(\hat{\mu}_{MLE}\)), which is unbiased for the likelihood-based estimation method. This means that the conceptual output of formula 2.2.1.3 could alternatively be denoted the maximum-likelihood estimate (Wooldridge 2019: 725-726). As such, formula 2.2.1.3 is only conceptually Bayesian in context of no prior information, and it is otherwise mostly of interest to researchers subscribing to the likelihood-based framework.
It should be noted that while the likelihood and Bayesian approaches are conceptually distinct, for simplicity, when in the context of point estimators and no prior information, this paper makes no distinction between them and jointly refers to them as ‘MLE-Bayesians’. For ‘proper’ Bayesian inference, which is concerned with the incorporation of prior information and the estimation of the entire posterior distribution, more complex calculations are required, which typically involve Markov-Chain Monte Carlo methods (MCMC, see Brooks et al. 2011). This will be expanded upon when re-conceptualizing the KMO index within the Bayesian framework (see Section 3.1). Given the Frequentist formula otherwise generally employed when calculating the KMO index (more on that in this section), the conceptual mean involved in the calculation of the classical KMO index (Kaiser and Rice 1974) is taken to be the estimated mean of the population (\(\hat{\mu}_m\)), with formula 2.2.1.1 thus used for function 2.2.1.
Second, let \(\text{Var}(\theta_m)\) denote the function for calculating the variance of manifestation \(m\) (cf. Field 2018: 28-31; Gelman et al. 2021: 41-42; Stock and Watson 2019: 83-84; Wooldridge 2019: 695-696). Here, not just the conceptual differences between the Frequentist and Bayesian approaches but also their mathematical differences become apparent, since the three different conceptualizations of variance involve at least two mathematically distinct formula, as shown in piecewise function 2.2.2. Here, the three formula are similar because they all involve the calculation of the sum of squares (SS, cf. Field 2018: 29, 56-57) of manifestation \(m\), which can be conceptualized irrespective of statistical framework as: \(\sum_{u=1}^{n} (\theta_{u,m} - \text{Mean}(\theta_m))^2\). What makes the function ‘Frequentist’, ‘Likelihoodist’, or ‘Bayesian’ is the conceptualization of the formula employed by \(\text{Mean}(\theta_m)\) and the choice of how to weigh the contributions of the observations. For example, formula 2.2.2.1 and 2.2.2.2 are Frequentist, while 2.2.2.3 can be considered MLE-Bayesian.
\[ \text{Var}(\theta_m) = \left\{ \begin{array}{lll} \hat{\sigma}^2_m = \frac{1}{n - 1} \sum_{u=1}^{n} (\theta_{u,m} - \hat{\mu}_m)^2 & \text{if Freq.} \land\ n < N & \text{(2.2.2.1)} \\ \sigma^2_m = \frac{1}{n} \sum_{u=1}^{n} (\theta_{u,m} - \mu_m)^2 & \text{if Freq.} \land\ n = N & \text{(2.2.2.2)} \\ s^2_m = \frac{1}{n} \sum_{u=1}^{n} (\theta_{u,m} - \bar{\theta}_m)^2 & \text{if MLE-Bayes.} & \text{(2.2.2.3)} \end{array} \right. \tag{2.2.2} \]Formula 2.2.2.1 is again a Frequentist estimator (denoted \(\hat{\sigma}^2_{OLS}\)) that calculates the variance by dividing the SS with \(n - 1\), also known as Bessel’s correction (cf. Field 2018: 29-30; Stock and Watson 2019: 112). This is a Frequentist adjustment based on the degrees of freedom (df) that makes the estimator unbiased for OLS (i.e., \(\mathbb{E}(\hat{\sigma}^2_{OLS[m]}) = \sigma^2_m\), cf. Field 2018: 58-59; Stock and Watson 2019: 112). This correction, however, is only applied when working with samples (i.e., \(n < N\)), and Frequentist researchers with data on the entire population (i.e., \(n = N\)) need not apply Bessel’s correction but can instead rely on formula 2.2.2.2, since the population mean (\(\mu_m\)) is known for such data (cf. Stock and Watson 2019: 112).
For conceptual clarity, it should be noted that frequentist researchers may sometimes refer to \(\hat{\sigma}^2_m\) as the ‘sample variance’ (denoting it \(s^2_m\), e.g., Field 2018: 30; Gelman et al. 2021: 168; Stock and Watson 2019: 112), but this is inappropriate (cf. McElreath 2019: 197), since \(\hat{\sigma}^2_m\) is not the sample variance (i.e., \(s^2_m\)) but instead an estimate of the population variance (i.e., \(\hat{\sigma}^2_m\)), and for any finite sample, \(s^2_m\) does not necessarily equal \(\hat{\sigma}^2_m\). Instead, the ‘sample variance’, that is, the variance in the sample (\(s^2_m\)), is calculated by formula 2.2.2.3, which despite being mathematically equivalent to formula 2.2.2.2 is conceptually different for the aforementioned reasons and thus preferred by Bayesian researchers (cf. McElreath 2019: 197).
Frequentists and likelihoodists denote the formula involved in both formula 2.2.2.2 and 2.2.2.3 as the maximum-likelihood estimator (\(\hat{\sigma}^2_{MLE}\), cf. Casella and Berger 2002: 313), in contrast to the unbiased estimator for OLS (\(\sigma^2_{OLS}\), cf. Field 2018: 29-30; Stock and Watson 2019: 112) used in formula 2.2.2.1. While formula 2.2.2.3 is generally preferred by MLE-Bayesians (cf. McElreath 2019: 197), a ‘proper’ Bayesian estimate of the variance will again typically involve MCMC methods to incorporate prior information and estimate the entire posterior distribution. Consistent with existing implementations (Dziuban and Shirkey 1974; IBM 2025; Revelle 2025), the formula generally used by the current KMO index (Kaiser and Rice 1974) for function 2.2.2 is based on formula 2.2.2.2, which outputs the estimated population variance (\(\hat{\sigma}^2_m\)). As such, the current KMO index (cf. Kaiser and Rice 1974) can be considered to be conceptually Frequentist, and the lack of a Bayesian or Likelihood-based conceptualization of the KMO index is a detriment to researchers working within this framework who seek to utilize the KMO index to assess ‘sampling adequacy’ prior to a factor analysis. The fact that the classical KMO index is conceptually frequentist is further corroborated in the following.
Let \(\text{SD}(\theta_m)\) then denote the piecewise function for calculating the standard deviation of manifestation \(m\) (cf. Field 2018: 28-31; Gelman et al. 2021: 41-42; Stock and Watson 2019: 84; Wooldridge 2019: 696). Illustrated in function 2.2.3, where the radical (i.e., \(\sqrt{}\)) denotes the square-root operation (Descartes 1637; Rudolff 1525). Similar to the variance, the Frequentist and Bayesian approach differ with respect to \(\text{SD}(\theta_m)\) as shown by conceptual disagreement about the output of function 2.2.3. Formula 2.2.3.1 and 2.2.3.2 are again Frequentist, with 2.2.3.1 being the result of relying on formula 2.2.2.1 for calculating the variance (function 2.2.2), while 2.2.3.2 relies on formula 2.2.2.2. Formula 2.2.3.1 is again an estimator (i.e., \(\hat{\sigma}_{OLS}\)) and thus calculates the estimated standard deviation in the population (i.e., \(\hat{\sigma}_m\)), while 2.2.3.2 calculates the population standard deviation (i.e., \(\sigma_m\)), with the former being used when in possession of the sample, while the latter is reserved for population data, where the population mean (\(\mu_m\)) is known.
\[ \text{SD}(\theta_m) = \left\{ \begin{array}{lll} \hat{\sigma}_m = \sqrt{\hat{\sigma}^2_m} & \text{if Freq.} \land n < N & \text{(2.2.3.1)} \\ \sigma_m = \sqrt{\sigma^2_m} & \text{if Freq.} \land n = N & \text{(2.2.3.2)} \\ s_m = \sqrt{s^2_m} & \text{if MLE-Bayes.} & \text{(2.2.3.3)} \end{array} \right. \tag{2.2.3} \]Similar to the conceptualization of variance, Frequentists often denote the conceptual output of formula 2.2.3.1 the ‘sample standard deviation’ (i.e., \(s_m\), Field 2018: 30), but this is again inappropriate (cf. McElreath 2019: 197), since it is an estimate of the population standard deviation (i.e., \(\hat{\sigma}_m\)). Frequentist researchers should also note that while formula 2.2.3.1 is a less biased estimator than 2.2.3.2 when estimating the population standard deviation with OLS (due to 2.2.2.1 applying Bessel’s correction), it is not unbiased due to Jensen’s (1906) inequality (i.e., \(\mathbb{E}(\hat{\sigma}_{OLS[m]}) \neq \sigma_m\), cf. Bolch 1968). Instead, to reduce the biasedness of this estimator, one will have to apply a further correction that depends on the assumed distribution and sample size (cf. Park et al. 2022; Park and Wang 2020, 2022). This, however, is typically not applied when calculating the Frequentist KMO index (cf. IBM 2025; Revelle 2025), meaning a biased estimator of standard deviation is used, and given that the present aim is the development of a Bayesian KMO index, this Frequentist correction will not be covered here.
The last formula, 2.2.3.3, can be considered MLE-Bayesian in the same manner as formula 2.2.2.3 in function 2.2.2, with equation 2.2.3.3 calculating the sample standard deviation (i.e., \(s_m\), alternatively denoted \(\hat{\sigma}_{MLE}\)). MCMC is again the preferred method to estimate the entire posterior distribution and incorporate prior information rather than merely relying the point estimator of formula 2.2.3.3. In line with the above choice of formula 2.2.2.1 for this section, and consistent with common implementations (IBM 2025; Revelle 2025), the classical KMO index is calculated using 2.2.3.1 in function 2.2.3, meaning it uses a biased estimate of the population standard deviation, further corroborating the claim that it is conceptually Frequentist and in need of a Bayesian re-conceptualization.
Third, let \(\text{Cov}(\theta_m, \theta_{m'})\) denote the piecewise function for calculating the covariance between manifestation \(m\) and \(m'\) (Field 2018: 336-338; Wooldridge 2019: 697-698):
\[ \text{Cov}(\theta_m, \theta_{m'}) = \left\{ \begin{array}{lll} \hat{\sigma}_{\theta_m, \theta_{m'}} = \frac{1}{n - 1} \sum_{u=1}^{n} (\theta_{u,m} - \hat{\mu}_m)(\theta_{u,m'} - \hat{\mu}_{m'}) & \text{if Freq.} \land n < N & \text{(2.2.4.1)} \\ \sigma_{\theta_m, \theta_{m'}} = \frac{1}{n} \sum_{u=1}^{n} (\theta_{u,m} - \mu_m)(\theta_{u,m'} - \mu_{m'}) & \text{if Freq.} \land n = N & \text{(2.2.4.2)} \\ s_{\theta_m, \theta_{m'}} = \frac{1}{n} \sum_{u=1}^{n} (\theta_{u,m} - \bar{\theta}_m)(\theta_{u,m'} - \bar{\theta}_{m'}) & \text{if MLE-Bayes.} & \text{(2.2.4.3)} \end{array} \right. \tag{2.2.4} \]By now, it should come as no surprise that the conceptual output of function 2.2.4 differs across approaches, where Frequentists rely on formula 2.2.4.1 and 2.2.4.2, while MLE-Bayesians would apply 2.2.4.3. Again, formula 2.2.4.1 applies Bessel’s correction to create an unbiased estimator of the covariance in the population (i.e., \(\hat{\sigma}_{OLS[\theta_m, \theta_{m'}]}\), with \(\mathbb{E}(\hat{\sigma}_{OLS[\theta_m, \theta_{m'}]}) = \sigma_{\theta_m, \theta_{m'}}\)). Conceptually speaking, Frequentists applying formula 2.2.4.1 are thus interested in an estimate of the population covariance (\(\hat{\sigma}_{\theta_m, \theta_{m'}}\)). Bessel’s correction is again only applied when in possession of a sample, and with population data, Frequentists apply formula 2.2.4.2 to calculate the population covariance (i.e., \(\sigma_{\theta_m, \theta_{m'}}\)), since the mean of each manifestation is known.
By contrast, MLE-Bayesians are interested in the sample covariance (i.e., \(s_{\theta_m, \theta_{m'}}\)) and so do not apply Bessel’s correction. Researchers applying either formula 2.2.4.2 or 2.2.4.3 can alternatively be said to rely on a maximum-likelihood estimator (i.e., \(\hat{\sigma}_{MLE[\theta_m, \theta_{m'}]}\)), though non-MLE Bayesians generally substitute formula 2.2.4.3 with MCMC methods for its aforementioned advantages. Consistent with existing implementations (IBM 2025; Revelle 2025), the Frequentist nature of the current KMO index is thus further corroborated due to calculating an estimate of the population variance (\(\hat{\sigma}_{\theta_m, \theta_{m'}}\)) by relying on formula 2.2.4.1 for function 2.2.4.
Let \(\text{Cor}(\theta_m, \theta_m')\) then denote the piecewise function for calculating the Pearson product-moment correlation coefficients (cf. Bravais 1844; Pearson 1895a; Stigler 1989) between manifestation \(m\) and \(m'\), which when applied between the \(k\) manifestations in \(\mathcal{D}\) can be used to construct a correlation matrix. This matrix, whose rows are indexed by \(i\) and columns by \(j\), is composed of \(k^2\) correlation coefficients. Given the obvious conceptual differences across and within the statistical approaches, the correlation matrix will be denoted \(\hat{P}\) if relying on Frequentist estimators for a sample, \(P\) if the data is the population, and \(R\) if relying on equations preferable for the MLE-Bayesian approach. With this denotation in mind, the correlation coefficients are calculated using the following formula (cf. Field 2018: 338-340; Stock and Watson 2019: 71; Wooldridge 2019: 698-699):
\[ \text{Cor}(\theta_m, \theta_m') = \left\{ \begin{array}{lll} \hat{\rho}_{mm'} = \frac{\hat{\sigma}_{\theta_m, \theta_{m'}}}{\hat{\sigma}_m \hat{\sigma}_{m'}} & \text{if Freq.} \land n < N & \text{(2.2.5.1)} \\ \rho_{mm'} = \frac{\sigma_{\theta_m, \theta_{m'}}}{\sigma_m \sigma_{m'}} & \text{if Freq.} \land n = N & \text{(2.2.5.2)} \\ r_{mm'} = \frac{s_{\theta_m, \theta_{m'}}}{s_m s_{m'}} & \text{if MLE-Bayes.} & \text{(2.2.5.3)} \end{array} \right. \tag{2.2.5} \]where the statistical framework and data used for \(\text{Cov}(\theta_m, \theta_{m'})\) and \(\text{SD}(\theta_m)\) determine whether function 2.2.5 results in an \(\hat{P}\), \(P\), or \(R\). Given the aforementioned conceptualizations, \(\hat{\rho}\) is the estimated correlation in the population, \(\rho\) is the correlation in the population, and \(r\) is the correlation in the sample.
Researchers should again note that while applying Bessel’s correction makes the estimator of the variance and covariance unbiased, conceptualizing 2.2.5.1 as an OLS-estimator of the population correlation (i.e., \(\hat{\rho}_{OLS[mm']}\)) does not yield an unbiased estimator of the population correlation (i.e., \(\mathbb{E}(\hat{\rho}_{OLS[mm']}) \neq \rho_{mm'}\)). While corrections to the correlation coefficient (or versions thereof) have been proposed, for example, Kelley’s (1935) \(\epsilon\), Hays’ (1963) \(\omega\), or regularized estimation (Schäfer and Strimmer 2005), the literature lacks consensus on whether any of these are properly unbiased (e.g., Albers and Lakens 2018; Carroll and Nordholm 1975; Mordkoff 2019), and since the typical calculation of the KMO index does not make any adjustments besides applying Bessel’s correction for the aforementioned functions, these will not be further covered here.
By comparison, this issue of ‘unbiasedness’ can be avoided with formula 2.2.5.2, which calculates the population correlation (i.e., \(\rho_{mm'}\)) when in possession of population data. A conceptual alternative is formula 2.2.5.3, preferable to MLE-Bayesians, which calculates the correlation in the sample (\(r_{mm'}\), alternatively conceptualized as \(\hat{\rho}_{MLE[mm']}\)). The latter can again be expanded to estimate the entire posterior distribution and include prior information with the use of MCMC methods. Consistent with the use of Frequentist estimators to calculate the covariance and standard deviation (Dziuban and Shirkey 1974; IBM 2025; Revelle 2025), the KMO index is conceptually Frequentist and relies on the calculation of the estimated correlation matrix in the population (\(\hat{P}\)), applying the biased formula 2.2.5.1 for function 2.2.5.
Finally, let \(\text{Cor}_q(\theta_m, \theta_{m'})\) denote the function for calculating the anti-image correlations between manifestation \(m\) and \(m'\), which essentially is the ‘unique’ part of a correlation that remains after having accounted for the ‘shared’ part between correlations. Again, when applied to every manifestation in \(\mathcal{D}\), the resulting \(k^2\) correlations can be used to construct an anti-image correlation matrix (consistent with Kaiser 1970: 405; Kaiser and Rice 1974: 113-114; Kaiser 1981: 379). These can be calculated directly from the rows and columns of the correlation matrix:
\[ \text{Cor}_q(\theta_m, \theta_m') = \left\{ \begin{array}{lll} \hat{\rho}_{q[i,j]} = -\frac{\hat{\rho}_{i,j}^{-1}}{\sqrt{\hat{\rho}_{ii}^{-1}\hat{\rho}_{jj}^{-1}}} & \text{if Freq.} \land n < N & \text{(2.2.6.1)} \\ \rho_{q[i,j]} = -\frac{\rho_{i,j}^{-1}}{\sqrt{\rho_{ii}^{-1}\rho_{jj}^{-1}}} & \text{if Freq.} \land n = N & \text{(2.2.6.2)} \\ \textit{r}_{q[i,j]} = -\frac{\textit{r}_{i,j}^{-1}}{\sqrt{\textit{r}_{ii}^{-1}\textit{r}_{jj}^{-1}}} & \text{if MLE-Bayes.} & \text{(2.2.6.3)} \end{array} \right. \tag{2.2.6} \]Again, the conceptual output of \(\text{Cor}_q(\theta_m, \theta_{m'})\) depends entirely on the used statistical framework and data, with \(\hat{P}_q\) being the estimated anti-image correlation matrix in the population, whose elements are given by formula 2.2.6.1, while \(P_q\) denotes the anti-image correlation matrix in the population, and formula 2.2.6.2 is thus used for population data. MLE-Bayesians can conceptualize the output of \(\text{Cor}_q(\theta_m, \theta_{m'})\) as \(\textit{R}_q\) to denote the anti-image correlation matrix in the sample, with elements given by formula 2.2.6.3. In line with the previous considerations, it is apparent that this Frequentist KMO index conceptually involves the estimated anti-image correlation in the population (\(\hat{P}_q\)) through its reliance on formula 2.2.6.1 for function 2.2.6 (cf. Dziuban and Shirkey 1974; IBM 2025; Revelle 2025).
For all these conceptualizations, it should be noted that the calculation of \(\text{Cor}_q(\theta_m, \theta_{m'})\) necessitates that the correlation matrix is positive definite because it relies on the calculation of the inverse correlation matrix (cf. Kaiser and Rice 1974: 113), whose components are denoted as \(\hat{\rho}^{-1}\), \(\rho^{-1}\), and \(r^{-1}\) in formula 2.2.6.1 - 2.2.6.3, respectively. This necessity is the reason why the aforementioned stipulations of the data matrix (\(\mathcal{D}\)) in Section 2.1 explicitly specified that the manifest phenomena were all physically distinct (i.e., \(\forall \theta_m (\theta_m \neq \theta_{m'})\)), because any instance of perfect multicollinearity in the correlation matrix makes it non-invertible. Another requirement is that piecewise functions 2.2.1 - 2.2.6 all require complete data, meaning that no data can be missing. While Kaiser and Rice (1974: 113) recommend using mean imputation to handle missing values, an arguably more appropriate procedure would involve multiple imputations by chained equations (MICE, see van Buuren and Groothuis-Oudshoorn 2011) to improve imputation accuracy and account for imperfect imputations.
\[ \text{KMO}(\mathcal{D}) = \left\{ \begin{array}{lll} \hat{\alpha} = \frac{\underset{i \neq j}{\sum \sum} \hat{\rho}^2_{i,j}}{\underset{i \neq j}{\sum \sum} \hat{\rho}^2_{i,j} + \underset{i \neq j}{\sum \sum} \hat{\rho}_{q[i,j]}^2} & \text{if Freq.} \land n < N & \text{(2.2.7.1)} \\ \alpha = \frac{\underset{i \neq j}{\sum \sum} \rho^2_{i,j}}{\underset{i \neq j}{\sum \sum} \rho^2_{i,j} + \underset{i \neq j}{\sum \sum} \rho_{q[i,j]}^2} & \text{if Freq.} \land n = N & \text{(2.2.7.2)} \\ a = \frac{\underset{i \neq j}{\sum \sum} r^2_{i,j}}{\underset{i \neq j}{\sum \sum} r^2_{i,j} + \underset{i \neq j}{\sum \sum} r_{q[i,j]}^2} & \text{if MLE-Bayes.} & \text{(2.2.7.3)} \end{array} \right. \tag{2.2.7} \]Now, using the values computed from these standard formula, let \(\text{KMO}(\mathcal{D})\) denote the piecewise function for calculating the overall ‘sampling adequacy’ of a data matrix (Kaiser and Rice 1974: 112-113; consistent with Dziuban and Shirkey 1974; IBM 2025; Revelle 2025). Since it has become apparent that the functions involved in the calculation of the KMO index differ in conceptualization and mathematical formula, \(\text{KMO}(\mathcal{D})\) can account for these variations and be expressed as function 2.2.7. Here, \(\sum \sum\) denotes the row-wise summation operation, \(\hat{\alpha}\) denotes the estimated sampling \(\alpha\)dequacy in the population (formula 2.2.7.1), \(\alpha\) denotes the sampling \(\alpha\)dequacy in the population (formula 2.2.7.2), and a denotes the sampling \(a\)dequacy in the sample (formula 2.2.7.3), which help distinguish the nuances of the otherwise same concept of a Measure of Sampling Adequacy (MSA, cf. Kaiser 1970, 1981; Kaiser and Rice 1974). Calculating the individual KMO indices, that is, the MSA of manifestation \(m\) in the data (\(\mathcal{D}_m\)) is given by the similarly conceptually-distinguishing piecewise function:
\[ \text{KMO}(\mathcal{D}_m) = \left\{ \begin{array}{lll} \hat{\alpha}_m = \frac{\sum_{j, j \neq i} \hat{\rho}^2_{i,j}}{\sum_{j, j \neq i} \hat{\rho}^2_{i,j} +\sum_{j, j \neq i} \hat{\rho}^2_{q[i,j]}} & \text{if Freq.} \land n < N & \text{(2.2.8.1)} \\ \alpha_m = \frac{\sum_{j, j \neq i} \rho_{i,j}^2}{\sum_{j, j \neq i} \rho_{i,j}^2 +\sum_{j, j \neq i} \rho^2_{q[i,j]}} & \text{if Freq.} \land n = N & \text{(2.2.8.2)} \\ a_m = \frac{\sum_{j, j \neq i} r_{i,j}^2}{\sum_{j, j \neq i} r_{i,j}^2 +\sum_{j, j \neq i} r^2_{q[i,j]}} & \text{if MLE-Bayes.} & \text{(2.2.8.3)} \end{array} \right. \tag{2.2.8} \]The reason for distinguishing between the the varying conceptual outputs produced by the formula in function 2.2.7 and 2.2.8 should have been made apparent based on the discussion of the different concepts and formula involved in their calculation. While it will here be left to the discretion of the researcher to determine whether they are interested in assessing the ‘sampling adequacy in the population’ (i.e., formula 2.2.7.1 - 2.2.7.2) or the ‘sampling adequacy in the sample’ (i.e., formula 2.2.7.3), the latter MLE-Bayesian conceptualization of a KMO index, despite applying different formula, does prima facie appear to be more consistent with the original conceptual intentions of the KMO index (see Kaiser 1970, 1981; Kaiser and Rice 1974). Accordingly, re-conceptualizing the KMO index with the Bayesian framework can be considered entirely appropriate.
Irrespective of conceptualization or formula, the ‘sampling adequacy’ measured by the KMO index can be meaningfully interpreted as the ratio of the sum of squared common (i.e., non-unique) inter-correlations relative to the sum of the squared common (non-unique) and unique (i.e., non-common) inter-correlations. This makes the KMO index continuous (\(\text{KMO} \in \mathbb{R}\)), where a value of .5 occurs when the sum of squared common inter-correlations are equal to the sum of the squared unique inter-correlations (cf. Kaiser and Rice 1974: 113).
Regarding the theoretical lower and upper bounds of the KMO index, there have been made numerous conflicting claims. Early researchers assert that its bounds are 0 and 1 (Dziuban and Shirkey 1974: 359; Kaiser and Rice 1974: 113), where a value of 0 would indicate that ‘the sum of [squared unique inter-, ed.] correlations is large relative to the sum of [squared inter-, ed.] correlations, indicating diffusion in the pattern of correlations’ (Field 2018: 798), while a value close to 1 would indicate that ‘patterns of correlations are relatively compact and so factor analysis should yield distinct and reliable factors.’ (Field 2018: 798). These bounds were challenged by Shirkey and Dzuban (1976), who by simulating independent variables mostly failed to produce KMO indices below .5 (as cited in Kaiser 1981: 380). By comparison, examinations of real-world data from Armstrong and Soelberg (1968) and Harman (1967), yielded a KMO index as low as .42 (Dziuban and Shirkey 1974: 359) and as high as .93 (Dziuban and Shirkey 1974: 360), respectively. In an attempt to fix what was perceived as non-intuitive bounds, Kaiser (1981) abandoned the ‘Mark IV’ revision (i.e., Kaiser and Rice 1974) in favor of the original ‘Mark II’ version (i.e., Kaiser 1970), whose lower bound instead was negative (Kaiser 1981: 381). To increase interpretation, however, Kaiser (1981) revised the KMO index as the square-root of the original, but this inadvertently meant that the lower bound became an imaginary number (Kaiser 1981: 381; cf. Nahin 1998), and perhaps, in part, for this reason, this ‘Mark V’ failed to gain traction in the literature.11
To solve this continuing discrepancy and obtain more clarity about the theoretical bounds of the ‘Mark IV’ KMO index (i.e., Kaiser and Rice 1974), which in turn will be informative about the bounds of the Bayesian KMO index, this paper makes some analytic observations and briefly investigates the sampling behavior of the KMO index using a series of Monte Carlo simulations (Robert and Casella 2013; Rubinstein and Kroese 2016). These were partly motivated by Kaiser’s (1970: 405) statements about ‘Mark II’ (i.e., Kaiser 1970) that, ceteris paribus, the value of the KMO index increases as a function of \(n\), \(k\), and \(\text{Cor}(\theta_m, \theta_{m'})\), while it decreases as a function of \(\text{Cor}_q(\theta_m, \theta_{m'})\). Following this, 480,060 simulations were conducted that directly manipulated the dependency structure of the correlation matrix, the magnitude and direction of the correlations, and the number of theorized manifestations (i.e., \(k\)), which enabled gauging the association between the common and unique correlations, and the values of the KMO index under codependent and independent (i.e., random) correlation structures, expanding upon the simulation study by Shirkey and Dzuban (1976).
Before consulting the simulated results, the analytic observations made were these: It can be stated that for any correlation matrix, which besides the diagonal, consists entirely of inter-correlations of zero, the KMO index is undefined for the real number system, because the formula in function 2.2.7 used in its calculation then involves a division by zero. Similarly, for any correlation matrix, which besides the diagonal, consists entirely of inter-correlations of one, the KMO index is undefined, because the correlation matrix is singular (as also observed by Kaiser and Rice 1974: 114). This means that perfectly independent and codependent correlation matrices that could intuitively constitute a lower and upper bound does not yield a KMO index of 0 and 1, respectively.
These insights are complimented by the simulations, which reveal that the association between the correlation matrix and anti-image correlation matrix is non-monotonous and complex, being independent, positive, and negative under different circumstances. This means that an increase in the correlations of the correlation matrix does not necessarily lead to an improved KMO index because the unique correlations in the anti-image correlation matrix may similarly increase. This complexity may be a cause of why the lower bound of data matrices with ‘bad’ sampling adequacy often yield a KMO index of .5. Regarding the theoretical bounds, the lowest calculated KMO index was \(2.54e^{-09}\), while the highest was \(9.98e^{-01}\) (for more information, see Section 7.5 of the Appendix).
These results are consistent with the earliest claims that the theoretical bounds of the KMO index are indeed 0 and 1 (i.e., \(\text{KMO} \in \mathbb{R}^{[0; 1]}\), though \(\text{KMO} \in \mathbb{R}^{(0; 1)}\) remains possible). However, they also corroborate the claim that values below .5 and values close to 1 (e.g., > .95) are highly unlikely (cf. Field 2018: 798), occurring across the simulations at a frequency of 6.02% and 3.03%, respectively. Another key insight is that a KMO index of exactly .5 is a highly possible outcome, occurring across the simulations at a frequency of 83.2%. Since this value occurred consistently and almost irrespective of how ‘good’ or ‘bad’ the inter-correlations were, a KMO index value of .5 can be taken to provide indecisive information in an assessment of the ‘sampling adequacy’ of the data matrix, which will be considered later when discussing inference in relation to the KMO index. Taken together, and with the mentioned caveats in mind, higher values of the KMO index thus generally indicate a greater percentage of shared information between the theorized manifestations, which is desirable for an EFA.
To help researchers make discrete decisions of ‘sampling adequacy’, guidelines by (Kaiser 1974), reproduced in table 2.1, are often referenced when interpreting the KMO index (e.g., Dziuban and Shirkey 1974: 359; Revelle 2025). While these guidelines may be appropriate for interpreting it, they were not originally formulated in relation to the KMO index. In the paper referenced as the source of these guidelines (i.e., Kaiser 1974; cf. Dziuban and Shirkey 1974: 359; Revelle 2025), Kaiser explicitly introduces them for a separate Index of Factor Simplicity (IFS, Kaiser 1974: 35; see also Kaiser and Rice 1974: 112), and the only relationship between these guidelines and ‘sampling adequacy’ is that the IFS is partly based on the KMO index (see Kaiser and Rice 1974). Instead, in the original version of the KMO index, Kaiser (1970: 405) states that ‘good factor-analytic data’ requires a KMO index of at least .8, while ‘really excellent data’ only occurs with a KMO index of at least .9. Whether these generalize to the ‘Mark IV’ of the KMO index is not mentioned in Kaiser and Rice (1974), and Kaiser (1981) fails to comment on Dziuban and Shirkey’s (1974: 359) use of the IFS-guidelines for interpreting the KMO index. However, given that Kaiser acknowledges the substantial differences between ‘Mark II’ and ‘Mark IV’ (Kaiser 1981: 380-381), it is unlikely that he would favor interpreting them using the same guidelines.
KMO | Interpretation |
---|---|
\(1 \geq IFS \geq .9\) | ‘Marvelous’ |
\(.9 \gt IFS \geq .8\) | ‘Mertitourious’ |
\(.8 \gt IFS \geq .7\) | ‘Middling’ |
\(.7 \gt IFS \geq .6\) | ‘Mediocre’ |
\(.6 \gt IFS \geq .5\) | ‘Miserable’ |
\(.5 \gt IFS \geq 0\) | ‘Unacceptable’ |
NOTE: Interpretive guidelines developed for the Index of Factor Simplicity (IFS, Kaiser 1974: 35; see also Kaiser and Rice 1974: 112), which are often incorrectly associated with the KMO index (e.g., Dziuban and Shirkey 1974: 359; Revelle 2025).
While it can be useful to develop interpretive heuristics, instead of developing them entirely on a theoretical basis and applying them across fields indiscriminately (e.g., Cohen 1988), one could argue that they should instead be empirically based and specified to suit the particular contexts of each field (cf. Funder and Ozer 2019; Gignac and Scodorai 2016; Lovakov and Agadullina 2021). The currently missing guidelines for interpreting the KMO index could thus be addressed through an empirical examination of KMO indices in the literature, with interpretive guidelines being specified for each particular field. At the same time, one could also argue that the discretization involved in producing such rules-of-thumb of a continuous measure likely oversimplifies interpretation (cf. Field 2018: 117), and researchers should thus ideally use guidelines with caution and report non-discretized KMO-indices. It is left to the reader’s discretion which of these perspectives to take, and this study makes no attempt to empirically derive guidelines for interpreting the KMO index. To more practically illustrate the utility of the KMO index, an example using simulated data is provided in the subsequent section.
In line with the aforementioned stipulations in Section 2.1, a 1,000 \(\cdot\) 10 data matrix is simulated using Monte Carlo methods (cf. Rubinstein and Kroese 2016), which, to abide by the open science standard (cf. Christensen et al. 2019), is done with the R programming language12 (R Core Team 2024). The latent phenomenon is generated with the parameter specifications: \(\forall \nu_u(\nu_u = 0), \sigma = 1\), while its manifestations are generated with the specifications: \(\forall \theta_m(\tau_m = 0, \lambda_m \overset{iid}{\sim} \mathcal{U}(.3, .7), \varsigma_m = 1)\), with \(\mathcal{U}\) being the uniform distribution (cf. McElreath 2019: 36). While the specification of 10 manifestations is rather arbitrary and merely serves to help demonstrate a computation of the KMO index, this number has previously been considered adequate for reliably measuring latent phenomena (e.g., Ashton and Lee 2009). The smallest effect size of interest (SESOI, see Lakens 2017) is specified to be .3, since, for simplicity, this constitutes a general threshold for the lowest factor loading of interest (cf. Field 2018: 806). Similarly, the specified sample size of 1,000 is also arbitrary, though in relation to EFA, it has been interpreted as ‘excellent’ (Comrey and Lee 1992; Field 2018: 797), and it provides a 99.9% statistical power (see Cohen 1988, 1991) to detect a factor loading as low as .3, which can be considered more than ‘adequate’ (cf. Cohen 1988, 1991). At the same time, with this sample size one can expectly (i.e., with 80% confidence) detect factor loadings as low as .088. The upper threshold of a factor loading is specified to be .7 to mitigate multicollinearity among the 10 manifestations (cf. Field 2018: 799; McElreath 2019), which could otherwise make the correlation matrix non-positive definite, thus making the KMO index undefined. Computing the variance inflation factor (VIF, James et al. 2017; Lüdecke et al. 2021; Snee 1982) from an OLS model predicting the first manifestation using all other manifestations reveal that multicollinearity in this simulated data is interpretable as ‘low’ (i.e., \(\forall \theta_m(VIF \leq 1.091)\), Ben-Shachar et al. 2020; Zuur et al. 2010).
To ease model specification (cf. McElreath 2019: 111), all measures are transformed using Fisher z-scoring (henceforth standardization, cf. Field 2018: 37; Stock and Watson 2019: 65; Wooldridge 2019: 696-697). In line with the ongoing conceptual distinction of formula between statistical frameworks, let \(\text{Z}(\theta_m)\) denote the piecewise function for standardizing manifestation \(m\):
\[ \text{Z}(\theta_m) = \left\{ \begin{array}{lll} \frac{\theta_{u,m} - \hat{\mu}_m}{\hat{\sigma}_m} & \text{if Freq.} \land n < N & \text{(2.4.1.1)} \\ \frac{\theta_{u,m} - \mu_m}{\sigma_m} & \text{if Freq.} \land n = N & \text{(2.4.1.2)} \\ \frac{\theta_{u,m} - \bar{\theta}_m}{s_m} & \text{if MLE-Bayes.} & \text{(2.4.1.3)} \end{array} \right. \tag{2.4.1} \]Function 2.4.1 makes it evident that the conceptualization of standardization varies across approaches, with formula 2.4.1.1 being a Frequentist standardization based on the estimated mean (i.e., \(\hat{\mu}_m\)) and (biased) standard deviation (i.e., \(\hat{\sigma}_m\)) of the population. Formula 2.4.1.2 is similarly Frequentist and is used with population data, with standardization being based on the mean (i.e., \(\mu_m\)) and standard deviation (i.e., \(\sigma_m\)) of the population. By contrast, formula 2.4.1.3 can be considered MLE-Bayesian, with the standardization being based on the mean (i.e., \(\bar{\theta}_m\)) and standard deviation (i.e., \(s_m\)) of the sample. This could again be made ‘properly’ Bayesian using MCMC methods, though that will be beyond the scope of this paper. For conceptual clarity, this demonstration of the Frequentist KMO index uses formula 2.4.1.1 for function 2.4.1, and preprocessing each manifestation using standardization simplifies the calculation and illustration of the correlation matrix because it becomes equivalent to the variance-covariance matrix (i.e., \(\text{Cov}(\text{Z}(\theta_m), \text{Z}(\theta_{m'})) = \text{Cor}(\text{Z}(\theta_m), \text{Z}(\theta_{m'}))\)).
To help illustrate the usefulness of standardization and express the statistical model that implicitly underlies the calculation of the KMO index, which will prove important when discussing robustness and inferential viability, in line with Kruschke (2014), numerous functions can be conceptualized as special cases of the Generalized Linear Model (GLM, see also Field 2018), and the correlations of a correlation matrix are no exception. For reasons that will become apparent when conceptualizing a Bayesian KMO index, a GLM-expression of the correlations of a correlation matrix, using notation based on Richard McElreath (2019: 441-442), is provided in model 2.4.2.
\[\begin{gather*} \tag{2.4.2} \theta_u | \mu_u, \Sigma \overset{iid}{\sim} \mathcal{MVN}(\mu_u, \Sigma) \; \text{for} \: u \: \text{in 1, } \dots \text{, } n \\ \Sigma = \text{DRD} \\ \mu_u = 0 \; \text{for all} \: u \\ \text{diag(D)} = 1 \end{gather*}\]In model 2.10 the rows (\(u\)) of the set of manifestations (\(\theta_m\)) are taken to be iid as a conditionally multivariate normal distribution (i.e., \(\mathcal{MVN}\), McElreath 2019: 441-442; Gelman et al. 2014: 582). Due to the standardization, the mean across the set of theorized manifestations is invariably zero, while the diagonal of the dispersion matrix, \(\text{diag(D)}\), is invariably 1, which due to becoming an identity matrix then simplifies the variance-covariance matrix (\(\Sigma\)) to the correlation matrix (i.e., \(\text{R}\)). As such, due to the standardization, the estimands of the model become solely the correlations. Once the correlations are used to construct the correlation matrix, the anti-image correlation matrix can be computed, and from these the overall and individual KMO indices can be derived. Using the conceptually Frequentist formula, a matrix of the correlations (i.e, \(\hat{P}\)), anti-image correlations (i.e, \(\hat{P}_q\)), and the KMO indices, calculated from the simulated data, is provided for ease of overview in table 7.1 in Section 7.6 of the Appendix.
The overall KMO index of this simulated data is .846, which is indicative of a relatively small diffusion in the correlations, consistent with either a relatively low noise or few latent factors. This means that by calculating the KMO index, the researcher obtains information that enables them to consider the ‘sampling adequacy’ of the data matrix to be fit for an EFA and to expect few latent factors. As such, the KMO index is demonstratively useful for obtaining relevant information about the factor structure prior to a factor analysis. By now, the reader should have been made familiar with the KMO index, its calculation, Frequentist conceptualization, interpretation and usefulness, which can be leveraged for a discussion of its robustness and use in relation to inference.
This section on robustness, and the subsequent section on inference, will clarify why the arguably strict stipulations were necessary when specifying the data matrix in Section 2.1. While those stipulations by definition hold for data simulated under such specifications, for any real-world data, they merely constitute assumptions that may be violated, and should this be the case, the conclusions regarding the ‘sampling adequacy’ of a data matrix provided by the KMO index may be incorrect.
A simple solution to make the KMO index more robust to violations of assumptions is by relaxing the assumption that the theorized manifestations are endogenous as a linear function of the latent phenomenon, which is relevant, since relations between many real-world phenomena are non-linear (e.g., Kruschke 2014: 424). This can be accomplished by transforming the measured manifestations into ranks (cf. Field 2018: 284), followed by applying the aforementioned standardization function 2.4.1 covered in Section 2.4. This joint rescaling-procedure, henceforth denoted rank-standardization, effectively transforms the Pearson’s product-moment correlation coefficients into Spearman’s (1904) rank correlation coefficients (cf. Field 2018: 351-352; Spearman 1910). Such a rescaling-procedure can be shown by letting \(Rank(\theta_m)\) denote the ranking function (cf. Field 2018: 288; R Core Team 2024):
\[ \text{Rank}(\theta_m) = \frac{1 + \#\{ l \mid x_l < \theta_{u,m} \} + \#\{ l \mid x_l \le \theta_{u,m} \}}{2} \tag{2.5.1} \]where \(\#\{ l \mid x_l < \theta_{u,m} \}\) expresses the counts less than the \(u\)th value of manifestation \(m\) and \(\#\{ l \mid x_l \le \theta_{u,m} \}\) expresses the counts less than or equal to the \(u\)th value of manifestation \(m\), which together with the division by 2 serves to account for ties through averaging. Then, let \(Z_{rank}(\theta_m)\) denote the rank-standardization function of manifestation \(m\), and while its conceptual output, similar to equation 2.4.1, varies across statistical approaches, it can be specified irrespective of framework as:
\[ \text{Z}_\text{rank}(\theta_m) = \frac{\text{Rank}(\theta_m)_u - \text{Mean}(\text{Rank}(\theta_m))}{\text{SD}(\text{Rank}(\theta_m))} \tag{2.5.2} \]Calculating the KMO index on data transformed in this manner enables it to capture ‘sampling adequacy’ in relation to ordinal relationships, which unlike Pearson’s product-moment correlation coefficient does not require linear relationships between the latent phenomenon and its manifestations. Instead, it merely requires the relationships between the theorized manifestations and the latent phenomenon to be monotonous (cf. Clapham and Nicholson 2014). Researchers may be unsure when to apply such a transformation, but in the event of linearity, Spearman’s rank correlation coefficient approximates Pearson’s product-moment correlation coefficient, so researchers could by default do rank-standardization and compare results with and without this rescaling-procedure.13 For the above example, rank-standardizing every manifestation in the data matrix and then calculating the KMO index yields an overall KMO of .840. The negligible difference between this result and the previous linear-dependent result (i.e., \(\Delta\).006) can be interpreted as being consistent with linearity of the functional form underlying the relationship between the latent phenomenon and the theorized manifestations. Besides assumptions of functional form, robustness can also be discussed in relation to inferential viability.
While inferential statements are seldom made in relation to the KMO index, these can be useful to express uncertainty about the estimated ‘sampling adequacy’ of a data matrix. The absence of such practices may be due to a lack of knowledge about the sampling distribution of the KMO index, which makes it difficult to make inferential statements about it. However, researchers with adequately large samples that are representative of the population of interest can easily overcome this limitation with the non-parametric bootstrap (Efron 1979, 2003; Efron and Tibshirani 1994). Here, the data matrix (\(\mathcal{D}\)) can be permuted an adequate number of times, that is, re-sampling the data with replacement, to produce a distribution of counterfactually sampled data matrices. From each data matrix, an overall and \(k\) individual KMO indices (one for each manifestation) can be calculated by applying function 2.2.7 and function 2.2.8 for each permutation, to produce approximations of the sampling distributions of the overall and individual KMO indices. Probability statements subject to the limitations of Frequentism (cf. Cohen 1994; Kruschke 2014; Wagenmakers et al. 2010) can then be made, for example, in the form of p-values (cf. Stock and Watson 2019: 110-117) and/or 95% confidence intervals (CI, Stock and Watson 2019: 117-118). While Frequentist inferential statements may not necessarily be appropriate within the context of a exploratory factor analysis, because they are confirmatory and assume a priori specified hypotheses, they could be considered useful in relation to a Confirmatory Factor Analysis (CFA, Brown 2015), where prespecified hypotheses of the ‘sampling adequacy’ of a data matrix can be tested prior to testing hypotheses about factor loadings and model structure. Since the sampling adequacy matters just as much for a CFA as for an EFA, such hypotheses are relevant for applied research, and the evidence provided by this procedure can corroborate the conclusion that the data matrix is factorable at the population level and appropriate for a factor analysis.
An example of applying the Frequentist Null Hypothesis Significance Testing (NHST, cf. Field 2018: 72-90; Kruschke 2014: 300-333) procedure to make inferential statements about the KMO index can be demonstrated for the previous example. Here, the null hypothesis (\(H_0\)) is stated to be \(\text{KMO}_{\text{overall}} = .5\), which reflects the aforementioned ‘indiscernible’ value of the KMO index (see Section 2.3), since researchers should preferably be able to eliminate the uncertainty expressed by this plausible value. As such, the alternative hypothesis (\(H_a\)) will state that \(\text{KMO}_{\text{overall}} \neq .5\), and the rejection level for \(H_0\) will be prespecified at the conventional 5% level. Here, the mean, bias-reduced (Park et al. 2022; Park and Wang 2020, 2022) standard error (SE, Stock and Watson 2019: 113-114), and 95% bias-corrected and accelerated interval (BCa, Efron 1987) are calculated from 40,005 permutations of the data. This number of permutations was chosen to help invoke the law of large numbers (cf. Angrist and Pischke 2015: 13-16; Sen and Singer 1993; Stock and Watson 2019: 85-86) and was chosen for comparability with a later demonstration of the Bayesian KMO index (see Section 3.2). This results in an overall KMO index of .831 (SE = .012; 95% BCa[.804; .851]), where the SE and 95% BCa, given its Frequentist conceptualization, helps express the (sampling) uncertainty about the previously conceptualized estimated sampling adequacy in the population (i.e., \(\hat{\alpha}_{OLS}\)). Given that the estimated sampling distribution of the KMO index is with 95% confidence higher than a value of .5, this is consistent with the sampling adequacy being discernible, and the relatively high values are also consistent with a low level of diffusion in the correlations at the population level, suggesting that the data matrix is factorable and appropriate for a subsequent factor analysis.
Having discussed robustness and demonstrated inference in relation to the KMO index, the viability of this approach can be discussed. The previously outlined stipulations regarding the data matrix (see Section 2.1) here prove detrimental. In relation to the model 2.4.2, which is often implicitly used when calculating the KMO index, its simple linear functional form assumes iid and fails to account for data characterized by a clustered structure (e.g., a repeated-measures design, Sullivan 2008), which could lead to biased results (cf. Stock and Watson 2019: 108) unless the model is appropriately modified. Another assumption reflected in the stipulation of the data matrix, is that the data must be a random sample (cf. Stock and Watson 2019: 108), which fits with the context of researchers often being limited to samples drawn from a population of interest. Violating this assumption can prevent inference of the KMO index to the intended population of interest and result in a Type III error (cf. Gigerenzer 2004: 599), though this issue is mitigated if the data is the population. Similarly, with an unrepresentative sample, the factor structure implied by the correlations used to calculate the KMO index may suffer from a wrongly estimated magnitude and direction (i.e., a Type M or Type S error, respectively, cf. Gelman et al. 2021: 59; Gelman and Carlin 2014). Furthermore, if the latent phenomenon violates the stipulation that the latent phenomenon is exogenous of confounders, its relationships with the theorized manifestations may be spurious (cf. Gumbel 1933) or masked/suppressed (McElreath 2019: 144-153; Lenz and Sahn 2021), leading to false positive or false negative conclusions regarding the factor structure (i.e., a Type I and Type II error, respectively, cf. Neyman and Pearson 1933). Accordingly, to ensure a valid and reliably KMO index, researchers should be wary of violating the stipulations made in Section 2.1. Having thoroughly revisited the Frequentist KMO index, a Bayesian re-conceptualization can now be made.
The first part of this paper (Section 2) revisited the classical KMO index (i.e., ‘Little Jiffy, Mark IV,’ Kaiser and Rice 1974), thoroughly reviewing its formula, robustness and viability for inference. By considering its calculation and conceptualization within different statistical frameworks, this KMO index was found to be Frequentist, leaving it incompatible with Bayesian inference. For the purposes of that review, the term ‘MLE-Bayesian’ was specifically introduced in Section 2.2 to jointly refer to a Likelihood-based conceptualization of the KMO index that could be considered Bayesian in the context of ‘non-informative’ priors and maximum-likelihood point estimators. While this served to build towards a possible Bayesian conceptualization, the formula and concepts specific to that terminology evidently failed to incorporate prior information and did not enable an estimation of the entire posterior distribution, making it somewhat ‘improper’ for Bayesian inference. In this section, the limitations of that MLE-Bayesian approach are solved by introducing an arguably more proper Bayesian Kaiser-Meyer-Olkin index (BKMO). This is accomplished by re-conceptualizing the KMO index through use of Bayes’ theorem and discussing the specification of priors and the calculation of a posterior distribution (Section 3.1), followed by an example demonstrating the computation of the BKMO (Section 3.2).
To re-conceptualize the KMO index in a ‘proper’ Bayesian manner, it is necessary to consider Bayes’ theorem (Bayes and Price 1763; Laplace 2009[1814]), which is the conceptual foundation of Bayesian inference (Gelman et al. 2014; Kruschke 2014; McElreath 2019). This theorem can be formulated as (cf. McElreath 2019: 49):
\[ P(a | b) = \frac{P(a)P(b | a)}{P(b)} \tag{3.1.1} \]where \(P(b | a)\) is the likelihood, \(P(b)\) is the marginal likelihood, \(P(a)\) is the prior, and \(P(a | b)\) is the posterior. Within Bayesian inference, \(a\) and \(b\) are often respectively conceptualized as the parameters and the data. This means that \(P(b)\) is the marginal likelihood of the data, \(P(a)\) is the probability of the parameter, and \(P(b | a)\) is the likelihood of the data given the parameter. While this often leads researchers to conceptualize \(P(a | b)\) as the ‘probability of the parameter given the data’ (e.g., Kruschke 2014), it is worth emphasizing that this posterior is entirely dependent on the prior, \(P(a)\), and likelihood, \(P(b | a)\), making conclusions sensitive to their specifications. Accordingly, the choices involved in specifying the prior and likelihood can be taken to reflect a modeling of the data that may impact the posterior, and since researchers could disagree about the choice of prior and likelihood, these choices may be a point of contention. To appropriately reflect this dependency, the posterior, \(P(a | b)\), will be referred to as the probability of the parameter given the (modeled) data.
Researchers working within the Frequentist and Likelihood-based frameworks should be familiar with the likelihood, \(P(b | a)\), since this is the only parameter of Bayes’ theorem that is explicitly used within these frameworks, specifically because these approaches require the specification of a likelihood distribution for the statistical models that aim to describe the data (cf. Field 2018; King 1998). The Bayesian framework differs by instead considering the posterior, \(P(a | b)\), to be of central interest. To enable its calculation, in addition to specifying a distribution for the likelihood, Bayesian researchers must also specify a distribution for the prior. A historical limitation of this Bayesian approach is that the marginal likelihood, \(P(b)\), also needs to be calculated, which can be difficult, if not practically impossible, for complex statistical models, restricting its use to only the simplest of models (cf. Kruschke 2014: 115). However, recent advances in Markov-Chain Monte Carlo methods (MCMC, Brooks et al. 2011) have solved this limitation (cf. Kruschke 2014: 143-191), which enables entirely new applications of Bayesian inference. These advancements also makes a Bayesian re-conceptualization of the KMO index more viable, since it now can be relatively easily calculating with software implementing such MCMC methods.
With Bayes’ theorem in mind, \(a\) and \(b\) will thus here be respectively conceptualized as the ‘sampling adequacy’ and data matrix (\(\mathcal{D}\)), consistent with the denotation previously introduced for the MLE-Bayesian approach in Section 2.2. This is appropriate, since formula 2.2.8.3 in function 2.2.8 showed that \(a\) is a parameter derivable from the common (\(r\)) and unique correlations (\(r_q\)), which both are derived from the data matrix (\(\mathcal{D}\)), and accordingly, \(\mathcal{D}\) is conceptualized as \(b\). \(P(a)\) can then be conceptualized as the prior probability of the ‘sampling adequacy’, i.e., the probability of the ‘sampling adequacy’ before observing the data matrix. \(P(b)\) is the marginal likelihood of the data matrix, but as previously mentioned, this parameter can be mostly ignored due to the reliance on MCMC methods. With this in mind, \(P(b|a)\) then reflects the likelihood of the data matrix given the prior ‘sampling adequacy’. From this, it follows that the \(P(a|b)\) can be conceptualized as the posterior probability of the ‘sampling adequacy’, i.e., the probability of the ‘sampling adequacy’ given the (modeled) data.
With the reliance on MCMC methods, researchers thus only need to specify the prior of the sampling adequacy, \(P(a)\), and the likelihood of the data given the ‘sampling adequacy’, \(P(b | a)\), to arrive at the posterior probability of the sampling adequacy, \(P(a | b)\). However, considering the relatively abstract concept and interpretation of ‘sampling adequacy’ and the aforementioned confusion regarding its mathematical properties, researchers may not find it easy to specify a prior and likelihood distribution for it. These difficulties can be circumvented by recognizing that researchers need not necessarily specify a prior and likelihood for the KMO index per se, which is contrary to the specification of most statistical models in Bayesian inference (e.g., McElreath 2019; for an exception, see Kruschke 2014: 226-230). This is possible because the KMO index is a deterministic function of the correlation matrix and anti-image correlation matrix, and with the anti-image correlation matrix similarly being derived from the correlation matrix, this means that the prior and likelihood need only be specified for the correlation matrix. Stated differently, the prior and likelihood of the correlation matrix implies priors and likelihood of the anti-image correlation matrix and the KMO index. Applying Bayesian inference in relation to the KMO index thus becomes a mere question of applying Bayesian inference in relation to a correlation matrix.
In line with the above conceptualization of the correlations of the correlation matrix as being special cases of the GLM (see Section 2.4), a likelihood useful for deriving the posterior of a correlation matrix, and thus anti-image correlation matrix and ‘sampling adequacy’, is the aforementioned multivariate normal (\(\mathcal{MVN}\), McElreath 2019: 441-442; Gelman et al. 2014: 582) that enables the calculation of correlations across a data matrix. With this likelihood distribution in mind, this leaves the question of specifying the priors for the correlations of the correlation matrix. A natural choice for a joint prior is the Lewandowski-Kurowicka-Joe distribution (\(\mathcal{LKJ}\), Lewandowski et al. 2009), which is an often used prior for correlation matrices in Bayesian inference (e.g., Gelman et al. 2014: 55-56; Gelman et al. 2021: 123-127). The priors can be specified with this \(\mathcal{LKJ}\) distribution through use of its single shape parameter (i.e., \(\eta\)), where higher values of this shape parameter produce less extreme correlations (e.g., -1 and 1) in the correlation matrix, and vice versa (cf. McElreath 2019: 442). This means that the \(\mathcal{LKJ}\) simultaneously specifies a prior for every correlation coefficient, making it easy for researchers to specify priors with many theorized manifestations.
While this \(\mathcal{LKJ}\) prior can be considered generally useful in relation to a Bayesian KMO index, it does not permit specifying individual priors for the correlation matrix. Since the KMO index is more commonly applied to exploratory rather than confirmatory factor analysis, researchers are here expected to not possess prior information that would enable discriminating between manifestations, making the \(\mathcal{LKJ}\) prior appropriate for most practical uses. To reflect this lack of substantial prior information about the correlations, researchers should ideally specify the shape parameter to produce a ‘weak’ or ‘non-informative’ prior (Gelman et al. 2014: 51-56). Here, in line with Gelman et al. (2014: 51-52), the author recommends specifying the former type of prior, because ‘non-informative’ priors are typically specified using the uniform distribution (\(\mathcal{U}\)), which is technically an ‘improper’ probability distribution (cf. Gelman et al. 2014: 52), and in real-world applications, there is almost always some prior information that can be extrapolated from existing research and applied to the particular research context. For example, in relation to prior information on correlation coefficients between novel phenomena, extant research on correlation coefficients in the literature can be used to shape realistic prior expectations about these correlations (e.g., Funder and Ozer 2019; Gignac and Scodorai 2016; Lovakov and Agadullina 2021). Should researchers nevertheless prefer ‘non-informative’ priors, alternatives to the uniform are recommended (Gelman et al. 2014: 51-55; Jeffreys 1946; Kruschke 2014: 293; Lee and Webb 2005; Zhu and Lu 2004).
Specifying ‘weakly-informative’ priors can be done in numerous ways. One approach involves specifying ‘skeptical, yet persuadable’ priors, whose distributions are deliberately conservative by expecting small correlations, but which remain wide enough to ‘let data speak for itself’. Such priors can impact data in a manner similar to regularization and help prevent overfitting, which can improve the generalization of results (cf. McElreath 2019: 214-217; Kruschke 2014: 531-532). A third approach (cf. Kruschke 2014: 294) takes advantage of the fact that ‘tomorrow’s prior is today’s posterior’ (Lindley 1972: 2) by specifying priors using results from previous posteriors. Such priors could be derived from previous research, or in a manner similar to cross-validation (cf. Stone 1974), by deriving the posterior from a relatively small subset of the current data (e.g., 10% - 20%), and then using this posterior as the prior for an analysis of the remaining data (for a discussion, see, e.g., Berger and Pericchi 2001). However, in the event that researchers do possess prior information, they should not stray from using this information to specify ‘strongly informative’ priors (cf. Kruschke 2014: 113-115).
Regardless of approach, researchers may be concerned whether their priors may inadvertently influence the results. As a response to this, it can be stated that for most statistical models, as long as the priors assigns non-zero probability to every possible parameter value, data can quickly overwhelm the prior (cf. Kruschke 2014: 293; Wagenmakers et al. 2010: 167), producing relatively prior-insensitive results that closely resemble Frequentist and likelihood-based estimates, but for whom a Bayesian interpretation can be made. Similarly, should researchers specify priors using the (improper) uniform distribution, results approximate likelihood estimates (cf. Hastie et al. 2017: 272) even with little data (cf. Kruschke 2014: 113). Researchers can always compare results based on ‘strong’ or ‘weakly informative’ priors to results based on ‘non-informative’ priors, for example, results derived from the non-parametric bootstrap, to assess whether results are robust to the prior specification, though researchers should note that the re-sampling involved in the non-parametric bootstrap is not equivalent to the MCMC method of Bayesian inference (cf. Kruschke 2014: 161). Regardless of approach to specifying priors, researchers should always transparently report their priors to open them up to criticism from their peers.
In summary, the Bayesian KMO index (BKMO) is conceptualized as a ‘measure of sampling adequacy given the (modeled) data’. Borrowing denotation from the Frequentist approach to distinguish this concept from the MLE-Bayesian approach, this sampling adequacy given the (modeled) data can be denoted \(\alpha\), and alternatively \(\alpha_{Bayes}\) if one wishes to distinguish it from its Frequentist and Likelihood-based conceptualizations. Since the Bayesian approach, similar to Frequentist or Likelihood-based procedures, involves estimating the sampling adequacy, the quantity that is expectedly produced from Bayesian inference with MCMC methods is the estimated sampling adequacy given the (modeled) data (i.e., \(\hat{\alpha}\), or \(\hat{\alpha}_{Bayes}\)). This is a relevant distinction, since the two quantities (i.e., \(a\) and \(\hat{\alpha}\)) may not necessarily be equivalent.
By now, it should have been made clear that the BKMO index is not directly computed using MCMC methods, but instead are to be calculated from the posterior distribution of correlation matrices, which is why the likelihood and prior only needed to be specified for the correlations. This means that the BKMO index can be calculated, analogous to the bootstrapped Frequentist KMO index, simply by, for each posterior draw, constructing a correlation matrix from the estimated correlations, calculating the anti-image correlation matrix, followed by calculating the overall KMO index in line with the MLE-Bayesian formula 2.2.7.3 in function 2.2.7 mentioned in Section 2.2. This can be concisely expressed with the following formula:
\[ \text{BKMO}(\mathcal{D}) = \hat{\alpha}_d = \frac{\underset{i \neq j}{\sum \sum} r^2_{i,j,d}}{\underset{i \neq j}{\sum \sum} r^2_{i,j,d} + \underset{i \neq j}{\sum \sum} r_{q[i,j,d]}^2} \tag{3.1.2} \]where \(d\) denotes the index for the \(d\)th posterior draw. Similarly, adapting the MLE-Bayesian formula 2.2.8.3 in function 2.2.8 using the same Bayesian re-conceptualization yields the following for estimating the posterior ‘sampling adequacy’ of manifestation \(m\):
\[ \text{BKMO}(\mathcal{D}_m) = \hat{\alpha}_{m,d} = \frac{\sum_{j, j \neq i} r_{i,j,d}^2}{\sum_{j, j \neq i} r_{i,j,d}^2 +\sum_{j, j \neq i} r^2_{q[i,j,d]}} \tag{3.1.3} \]In accordance with this Bayesian conceptualization of the KMO index, from the distribution of its estimated posterior values, coherent probabilistic statements can be made about the sampling adequacy given the (modeled) data. For example, one can ascertain the posterior probability that the ‘sampling adequacy’ is greater than .5 (i.e., \(P(\hat{\alpha} > .5 \:\ | \:\ \mathcal{D})\)), or one can compute the 95% highest density (contiguous) interval (HD(C)I, Kay 2024a; 2024b; Kruschke 2014: 28; Makowski et al. 2019b; McElreath 2019: 56-58), which contains the 95% most probable values of the ‘sampling adequacy’, both of which are coherent probabilistic statements within the Bayesian framework (cf. Clayton 2021; Kruschke 2014). Having thus re-conceptualized the KMO index within the Bayesian statistical framework and introduced it as the Bayesian KMO index (BKMO), a demonstration is provided.
To demonstrate the conceptualized Bayesian KMO index, the simulated data matrix previously specified (see Section 2.4) is once again used. Contrary to the Frequentist approach, the manifestations are standardized using the MLE-Bayesian formula 2.3.1.1 for function 2.3.1 mentioned in Section 2.2, which is preferred to a ‘proper’ Bayesian standardization for computational ease, since the the estimation would otherwise have to account for a posterior of standardizations as input. Using the same notation introduced in Section 2.4, based in part on Richard McElreath (2019: 441-442), a Bayesian Generalized Linear Model (BGLM) is formalized as:
\[\begin{gather*} \tag{3.2.1} \theta_u | \mu_u, \Sigma \overset{iid}{\sim} \mathcal{MVN}(\mu_u, \Sigma) \; \text{for} \: u \: \text{in 1, } \dots \text{, } n \\ \Sigma = \text{DRD} \\ \mu_u = 0 \; \text{for all} \: u \\ \text{diag(D)} = 1 \\ \text{R} \sim \mathcal{LKJ}(2) \end{gather*}\]Model 3.2.1 states that the individual manifestations (i.e., \(\theta_u\)) in the data matrix (\(\mathcal{D}\)) are iid as a conditionally multivariate normal distribution (i.e., \(\mathcal{MVN}\)) with a mean vector (i.e., \(\mu_u\)) and variance-covariance matrix (i.e., \(\Sigma\)) parameters. The variance-covariance matrix is a product of dispersion matrices, \(\text{D}\), and a correlation matrix, \(\text{R}\), and due to the standardization of each manifestation in \(\mathcal{D}\), the mean vector (\(\mu_u\)) is invariably 0, while the diagonal of the dispersion matrices, \(\text{diag(D)}\), is invariably 1 across all manifestations, which simplifies \(\text{D}\) to an identity matrix. This reduces the variance-covariance matrix to the correlation matrix (\(\text{R}\)), whose correlations become the only estimands of the model for which priors will have to be specified. The prior distribution of \(\text{R}\) is specified using the \(\mathcal{LKJ}\) distribution (Lewandowski et al. 2009), whose shape parameter (i.e., \(\eta\)), consistent with recommendations by McElreath (2019: 442), is specified as 2 to reflect an a priori belief in small correlations, which is intended to be ‘regularizing/weakly informative’ relative to the data (cf. McElreath 2019: 214-217, 442; see also Gelman et al. 2014: 55-56; Gelman et al. 2021: 123-127).
To visualize the effective prior for the sampling adequacy that results from this prior for the correlation matrix, MCMC can be used to sample from the prior (cf. Kruschke 2014: 212). The effective prior distribution of the sampling adequacy resulting from this \(\mathcal{LKJ}\) prior has an average of .304 (SD = .093; 95% HDI[.127; .480]), with the entire distribution being shown in figure 3.1. Inspecting this prior for the ‘sampling adequacy’ reveals an a priori belief in a relatively low ‘sampling adequacy’. This can be substantively interpreted to reflect a conservative expectation that the data will be relatively infactorable due to a high diffusion in correlations and thus be inappropriate for factor analysis. As such, a shape parameter for the \(\mathcal{LKJ}\) prior distribution of 2 for 10 theorized manifestations effectively implies a ‘skeptical’ prior for the ‘sampling adequacy’. While the width of this distribution can be taken to indicate that it is also a priori ‘persuadable’, this property of the prior is ultimately determined in relation to the strength of the data, and as such, whether the prior is ‘persuadable’ will be assessed through comparisons with results derived from a ‘non-informative’ prior.
Model 3.2.1 is then fit to the simulated data using the brms
R package (Bürkner 2017, 2018), which is useful due to its flexibility in model specification and reliance on the STAN probabilistic programming language (Stan Development Team 2024a) that enables efficient sampling from the posterior using the No-U-Turn Sampler (NUTS, Hoffman and Gelman 2014) for Hamiltonian MCMC (cf. Kruschke 2014: 400-416). To help assess whether the sampler converged to the posterior distribution, 15 independent sampling chains were used, and to improve estimation reliability, 2,000 warm-up samples were drawn to increase the chances of convergence prior to sampling. After the warm-up period, 4,667 samples were drawn from each chain, resulting in a total of 40,005 posterior samples, whose size was chosen to help invoke the law of large numbers (Angrist and Pischke 2015; Sen and Singer 1993; Stock and Watson 2019: 85-86), as well as ensure a sufficient effective sample size (i.e., ESS ≥ 10,000, Bürkner 2017; Kass et al. 1998; Kruschke 2014: 182-184) after accounting for autocorrelation between the sampling chains (Kruschke 2014: 182). Based on a visual inspections of trace plots (Kruschke 2014: 179-180) and adequate values of the Gelman-Rubin convergence metric (\(\hat{R}\) < 1.01, cf. Vehtari et al. 2021; see also Gelman and Rubin 1992) and Monte Carlo Standard Error (i.e., MCSE ≤ .001, cf. Kruschke 2014: 186-187), the sampling chains were assessed to have converged, suggesting that the posterior distribution had been reliably identified.
Figure 3.1: Overall Sampling Adequacy
NOTE: Prior/Posterior distributions of the estimated overall Bayesian KMO index (i.e., \(\hat{\alpha}\)). Prior/Posterior samples = 40,005. The geometrics below the distributions indicate the mode (circle) and the 95% HDCI (bar). Results derived from simulated data.
Having estimated the posterior distribution of the correlation matrix, the BKMO index can then be calculated using the formula outlined in Section 2.2, by constructing a correlation matrix from the estimated correlation coefficients, which in line with the above Bayesian conceptualization of the BKMO as the estimated the sampling adequacy given the (modeled) data (i.e., \(\hat{\alpha}\)), will be denoted \(\hat{P}\) (alternatively \(\hat{P}_{Bayes}\)). This is followed by calculating the anti-image correlation matrix (denoted \(\hat{P_q}\), or \(\hat{P}_{q[Bayes]}\)) using the inverse correlation matrix (i.e., \(\hat{P}^{-1}\), or \(\hat{P}^{-1}_{Bayes}\)) using the framework-appropriate formula 2.2.6.3 for function 2.2.6. From these matrices, the overall and individual BKMO indices are calculated using the similarly appropriate formula 2.2.7.3 and 2.2.8.3 for functions 2.2.7 and 2.2.8, respectively. The reliance on these MLE-Bayesian point-estimating formula is appropriate, because the prior information has already been incorporated into a posterior correlation matrix from which a posterior sampling adequacy can be computed simply by applying point estimators to each draw. Since the posterior distribution consists of 40,005 samples, these steps had to be iterated for each sample to produce an estimated posterior distribution of the BKMO index, which is why the functions implementing the calculation of the BKMO relied on vectorization to speed up computation (cf. Kruschke 2014: 408). These functions were implemented using R code and are provided for ease of implementation in Section 7.3 of the Appendix.
The estimated posterior distribution of the BKMO index, that is, the estimated sampling adequacy given the (modeled) data (\(\hat{\alpha}\)) is provided visually next to the prior distribution in figure 3.1. These results can be summarized metrically, with the posterior average of the overall KMO index being .825, with a posterior standard deviation14 (SD) of .011. The 95% HDI reveals that the sampling adequacy were with 95% credibility between .802 and .845. Comparing the posterior to the prior distribution reveals an overlap of .000% and an average difference of .521 (SD = .093; 95% HDI[.346; .701], revealing that the two distributions are credibly different. The fact that the credible values of the sampling adequacy have crossed the threshold of .5 and become credibly higher than this value, reflected in the lack of overlap between .5 and the 95% HDI, is consistent with a discernible and credibly high ‘sampling adequacy’. This means that the diffusion in the correlations are relatively low with a high degree of credibility, and the data matrix could thus be considered factorable and appropriate for a subsequent factor analysis.
To demonstrate how researchers can assess the robustness of results in relation to their choice of priors, the posterior derived from strongly or weakly informative priors can be compared to posteriors derived using ‘non-informative’ priors. This is done here by respecifying model 3.2.1 so that the \(\mathcal{LKJ}\) prior has a shape parameter of 1, which essentially reflects a uniform Beta prior on the correlations of the correlation matrix, and refitting it using the same MCMC software yields similarly reliable model diagnostics that are not reported here. Since the priors of this model are ‘non-informative’, this model approximates a maximum-likelihood model, and the resulting ‘likelihood’ distribution is illustrated in figure 3.2, where the average value is .825 (SD = .011; 95% HDI[.802; .845]). The overlap between this ‘likelihood’ distribution and the posterior is approximately 100%, with an average difference of .000 (SD = .000; 95% HDI[.000; .000]), revealing them to be credibly indifferent. This corroborates the claim that the prior specified in model is ‘weakly informative’ and showcases that the BKMO can approximate likelihood-based results with enough data.
While a Bayesian model with ‘non-informative’ priors can often be fitted using MCMC methods, sometimes for complex models, it can be difficult to get the model to reliably converge. As such, an alternative method to assess the robustness of results to the specification of priors, a model with ‘non-informative’ priors can be estimated using the non-parametric bootstrap, which similarly approximates MCMC with ‘non-informative’ priors. This model was fit previously when discussing the inferential viability of the Frequentist KMO index in Section 2.6, and the resulting (non-parametric) bootstrap distribution is also provided in figure 3.2. The results of this model are not repeated here, but the overlap between this bootstrap distribution and the posterior is approximately 79.4%, with an average difference of -.006 (SD = .016; 95% HDI[-.038; .025]). This relatively high degree of overlap, and the lack of a credible difference, again corroborates the claim that the prior in model 3.2.1 is ‘weakly informative’ and showcases that the BKMO can approximate a bootstrapped Frequentist KMO index.
Figure 3.2: Overall Sampling Adequacy
NOTE: Likelihood, Posterior, and (Non-parametric) bootstrap distributions of the estimated overall KMO index (i.e., \(\hat{\alpha}\)). Likelihood/Posterior/Bootstrap samples/permutations = 40,005. The geometrics below the distributions indicate the mode (circle) and the 95% HDCI (bar). Results derived from simulated data.
In summary, the similarities in results across the Bayesian, likelihood-based, and bootstrapped Frequentist approaches reflect the aforementioned properties of Bayesian inference, where ‘weakly informative’ priors can be quickly overwhelmed by data, resulting in the convergence of results. As such, results from Bayesian inference often approximate Frequentist/Likelihood-based results, though it is clear that solely Bayesian inference enables the incorporation of prior information and coherent probabilistic statements from the posterior.
This paper has introduced the Bayesian KMO index. This was accomplished by reviewing the classical Kaiser-Meyer-Olkin (KMO) index, which is a Measure of Sampling Adequacy (MSA) that helps assess whether a data matrix is appropriate for a factor analysis. To build towards a Bayesian KMO index (Section 2), this review revisited its calculation and discussed its mathematical and conceptual properties, which served to show that the KMO index is a Frequentist statistic, whose robustness and inferential viability rely on strict assumptions about the data matrix. Bayes’ theorem was then used to re-conceptualize the KMO index (Section 3), resulting in a Bayesian KMO (BKMO) index that can incorporate prior information and enable coherent probabilistic statement about the sampling adequacy given the (modeled) data. Estimated with MCMC methods, its ease of computation was demonstrated, and this paper provides code to implement it in R (Section 7.3).
The conceptualization of a Bayesian KMO index constitutes a novel contribution to the literature on statistical inference, in particular, the subdiscipline of factor analysis (e.g., Dziuban and Shirkey 1974; Kaiser 1970), since it enables researchers to incorporate prior information when assessing the ‘sampling adequacy’ of a data matrix, and from the resulting posterior distributions, coherent probabilistic statements about sampling adequacy can be made. To remain consistent within their statistical framework, Bayesian researchers should ideally substitute the Frequentist KMO index for its Bayesian counterpart whenever they are interested in assessing the ‘sampling adequacy’ prior to a factor analysis. The areas of application for the BKMO index are numerous, especially within the psychometrics subdiscipline of psychology (e.g., Brown 2015; Levy and Mislevy 2020), where it can provide critical information to researchers developing new measurement scales for latent phenomenon. Though not explicitly covered here, like its Frequentist counterpart, the Bayesian KMO index can be made more robust by rank-standardizing the data matrix. Similarly, the usage of the BKMO is not restricted to exploratory factor analyses but can in principle be used in relation to a Bayesian confirmatory factor analysis (BCFA, Levy and Mislevy 2020: 187-230), where researchers can prespecify and test hypotheses about the sampling adequacy.
Author Contributions
Funding
Conflicts of Interest
Availability of Data, Code & Materials
Acknowledgements
This appendix contains the changelog (Section 7.1); software specifications (Section 7.2); R function implementing the Bayesian Kaiser-Meyer-Olkin (BKMO) index (Section 7.3); the formula and R function implementations of the different versions of the KMO index, more formally denoted Measures of Sampling Adequacy (MSA) (Section 7.4); the results from the Monte Carlo simulations assessing the characteristics of the KMO index discussed in 2.3 (Section 7.5); a matrix containing the correlations, unique correlations, and KMO indices of the simulated data used in Section 2.4 (Section 7.6); and information on citation (Section 7.7).
This document was created on a Windows 11 x64 (build 26100) operating system (OS) on the 2025-06-05 07:28 CEST (Timezone: Europe/Copenhagen). The machine has 16 cores available, and the range of doubles is \(2.23\cdot10^{-308}\) - \(1.8\cdot10^{+308}\).
All code is written using the R programming language (v4.4.3 [2025-02-28 ucrt], R Core Team 2024), while the document is written with markdown (Gruber 2014) and LaTeX (Lamport 1986; The LaTeX Project 2025) in the RStudio IDE (2024.12.1.563, Posit Team 2025), using the R packages bayesplot
(v1.11.1, Gabry et al. 2019), bayestestR
(v0.15.2, Makowski et al. 2019b), bookdown
(v0.42, Xie 2016, 2025), brms
(v2.22.0, Bürkner 2017, 2018), datawizard
(v1.0.0, Patil et al. 2022), doParallel
(v1.0.17, Microsoft and Weston 2022a), dplyr
(v1.1.4, Wickham et al. 2023), effectsize
(v1.0.0, Ben-Shachar et al. 2020), forcats
(v1.0.0, Wickham 2023a), foreach
(v1.5.2, Microsoft and Weston 2022b), ggplot2
(v3.5.1, Wickham 2016), Hmisc
(v5.2.2, Harrell Jr 2025), iterators
(v1.0.14, Revolution Analytics and Weston 2022), kableExtra
(v1.4.0, Zhu 2024), klippy
(v0.0.0.9500, Lesur 2025), knitr
(v1.49, Xie 2014, 2015, 2024), latex2exp
(v0.9.6, Meschiari 2022), lubridate
(v1.9.4, Grolemund and Wickham 2011), pacman
(v0.5.1, Rinker and Kurkiewicz 2018), parallel
(v4.4.3, R Core Team 2024), performance
(v0.13.0, Lüdecke et al. 2021), Polychrome
(v1.5.4, Coombes et al. 2019), psych
(v2.5.3, Revelle 2025), purrr
(v1.0.4, Wickham and Henry 2025), pwr
(v1.3.0, Champely 2020), Rcpp
(v1.0.14, Eddelbuettel 2013; Eddelbuettel et al. 2025; Eddelbuettel and Balamuta 2018; Eddelbuettel and François 2011), readr
(v2.1.5, Wickham et al. 2024a), rmarkdown
(v2.29, Allaire et al. 2024; Xie et al. 2018, 2020), rQCC
(v2.22.12, Park et al. 2022; Park and Wang 2020, 2022), rstan
(v2.32.6, Stan Development Team 2024b), rstudioapi
(v0.17.1, Ushey et al. 2024), scrutiny
(v0.5.0, Jung 2024), StanHeaders
(v2.32.10, Stan Development Team 2020), stringr
(v1.5.1, Wickham 2023b), tibble
(v3.2.1, Müller and Wickham 2023), tidybayes
(v3.0.7, Kay 2024c), tidyr
(v1.3.1, Wickham et al. 2024b), tidyverse
(v2.0.0, Wickham et al. 2019), trialr
(v0.1.6, Brock 2023), and weights
(v1.0.4, Pasek et al. 2021). The seed used for generating random numbers (e.g., for MCMC) was exogenously predetermined using the random
R package (v0.2.6, Eddelbuettel 2017).
A simple implementation of the Bayesian Keiser-Meyer-Olkin (BKMO) index in R (v4.4.3 [2025-02-28 ucrt], R Core Team 2024) that computes the posterior sampling adequacy from the posterior correlation matrix using the brms
R package (v2.22.0, Bürkner 2017, 2018) is provided in the code below:
# Compute Bayesian KMO index from a Bayesian multivariate linear model
#
# @param r Posterior draws of the (residual) correlations from a Bayesian multivariate linear model.
# @return A matrix of Bayesian KMO indices, where rows are posterior draws (with the number of rows equal to the number of posterior draws) and columns are the individual BKMO indices and one overall BKMO index (with the number of columns equal to the number of theorized manifestations plus one).
BKMO <- function(r = NULL){
if(!is.matrix(r)) r <- as.matrix(r)
# Compute p (i.e., number of theorized manifestations)
disc <- sqrt(1 + 8 * ncol(r))
p <- (1 + disc) / 2
# Define utility function for vectorization
utility_KMO <- function(r = r, p = p){
# Construct correlation matrix
R <- diag(1, p)
R[upper.tri(R)] <- r
R[lower.tri(R)] <- t(R)[lower.tri(R)]
row.names(R) <- paste0('m', 1:p)
colnames(R) <- paste0('m', 1:p)
# Compute inverse R
R_inv <- solve(R)
# Compute anti-image R
R_q <- -R_inv / sqrt(outer(diag(R_inv), diag(R_inv)))
# Compute KMO
diag(R_q) <- 0
r2 <- R^2
q2 <- R_q^2
diag(r2) <- 0
diag(q2) <- 0
kmo <- c(
rowSums(r2) / (rowSums(r2) + rowSums(q2)),
overall = sum(r2) / (sum(r2) + sum(q2))
)
return(kmo)
}
# Compute KMO for each posterior draw using vectorization
bkmo <- do.call(rbind, lapply(1:nrow(r), function(i) utility_KMO(
r = r[i, , drop = FALSE], p = p
)
)
)
return(as.data.frame(bkmo))
}
## Example
# Simulate data
library("tidyverse")
sample_size <- 1000
n_manifestations <- 10
sesoi <- .3
loadings <- runif(n_manifestations, sesoi, .7)
df <- data.frame(
latent = rnorm(sample_size)
)
for(i in 1:n_manifestations){
df[,paste0("m", i)] <- loadings[i]*df$latent + rnorm(sample_size)
}
df <- df %>% select(-latent)
# Define maximum-likelihood functions
sd2 <- function(x = NULL, na.rm = FALSE){
stopifnot(is.numeric(x))
stopifnot(is.logical(na.rm))
x_mean <- mean(x, na.rm = na.rm)
x_var <- mean((x - x_mean)^2, na.rm = na.rm)
x_sd <- sqrt(x_var)
return(x_sd)
}
standardize2 <- function(x = NULL, na.rm = FALSE){
stopifnot(is.numeric(x))
stopifnot(is.logical(na.rm))
x_mean <- mean(x, na.rm = na.rm)
x_var <- mean((x - x_mean)^2, na.rm = na.rm)
x_sd <- sqrt(x_var)
x_standardized <- (x - x_mean) / x_sd
return(x_standardized)
}
# Standardize data matrix
df <- apply(df, 2, standardize2) %>% as.data.frame()
# Specify MCMC
library("parallel")
posterior_samples <- 4000
chains <- cores <- parallel::detectCores()-1
warmup <- 1000
iter <- ceiling((posterior_samples / chains) + warmup)
posterior_samples <- (iter - warmup)*chains
# Specify Bayesian Generalized Linear Model (BGLM)
library("brms")
bglm_model <- brms::bf(
mvbind(m1, m2, m3, m4, m5, m6, m7, m8, m9, m10) ~ 0,
sigma ~ 0,
family = gaussian(
link = "identity"
)
) + set_rescor(TRUE)
# Specify 'weakly-informative' model priors
bglm_priors <- brms::get_prior(bglm_model, data = df)
bglm_priors[1,] <- brms::set_prior(
"lkj(2)",
class = "rescor"
)
# Fit Bayesian multivariate linear model with brms
bglm_fit <- brms::brm(
formula = bglm_model,
family = gaussian(
link = "identity"
),
prior = bglm_priors,
data = df,
chains = chains,
cores = cores,
iter = iter,
warmup = warmup
)
# Extract posterior draws of the (residual) correlations from the fitted Bayesian model
library("stringr")
bglm_samples <- brms::as_draws_df(bglm_fit)
bglm_correlations <- bglm_samples[, stringr::str_detect(colnames(bglm_samples), "rescor__")]
# Compute Bayesian KMO index
library("bayestestR")
kmo_bayes <- BKMO(r = bglm_correlations)
hist(kmo_bayes$overall)
print(
paste0(
"Mean = ", mean(kmo_bayes$overall) %>% round(digits = 3),
" (SD = ", sd2(kmo_bayes$overall) %>% round(digits = 3),
"; 95% HDI[",
bayestestR::hdi(kmo_bayes$overall)$CI_low %>% round(digits = 3), "; ",
bayestestR::hdi(kmo_bayes$overall)$CI_high %>% round(digits = 3), "])"
)
)
Generalizing the code for more sophisticated scenarios is trivial but beyond the scope of this paper. Researchers interested in robust versions of the BKMO can simply rank-standardize their variables before computing the posterior correlation matrix.
The formula underlying the Kaiser-Meyer-Olkin (KMO) indices discussed in Section 2.3, more formally denoted Measures of Sampling Adequacy (MSA), are provided here. These formula are based on the notation established in this paper, using the Frequentist conceptualizations (see Section 2.2). This section also includes code implementing each of these MSAs as functions in R. Section 7.4.1 covers the ‘Mark II’ version (i.e., Kaiser 1970), Section 7.4.2 covers ‘Mark IV’ (i.e., Kaiser and Rice 1974), and Section 7.4.3 covers ‘Mark V’ (i.e., Kaiser 1981).
The initial MSA (i.e., \(\text{MSA}_1\)), also known as ‘second generation Little Jiffy’, proposed by Kaiser (1970). The formula for calculating the overall MSA is provided in equation 7.4.1.1:
\[ \text{MSA}_1 = 1 - \frac{\underset{i \neq j}{\sum \sum} \hat{\rho}^2_{q[i,j]}}{\underset{i \neq j}{\sum \sum} \hat{\rho}^2_{i,j}} \tag{7.4.1.1} \]where \(\Sigma \Sigma\) denotes the row-wise summation operator, \(\hat{\rho}_{q}\) denotes the anti-image correlation coefficient, and \(\hat{\rho}\) denotes the Pearson product-moment correlation coefficient. The formula for calculating indvidual MSAs are provided in equation 7.4.1.2:
\[ \text{MSA}_1(i) = 1 - \frac{ \sum_{j, j \neq i} \hat{\rho}^2_{q[i,j]} }{ \sum_{j, j \neq i} \hat{\rho}^2_{i,j} } \tag{7.4.1.2} \]where \(\Sigma\) denotes the summation operator, \(\hat{\rho}_{q}\) denotes the anti-image correlation coefficient, and \(\hat{\rho}\) denotes the Pearson product-moment correlation coefficient. Implementing \(\text{MSA}_1\) as a function in R can be done with:
# MSA #1 ('Mark II', Kaiser, 1970)
msa1 <- function(R = NULL){
if (!is.matrix(R)) stop("Input must be a matrix.")
if (!all(R == t(R))) stop("Matrix must be symmetric.")
inv_R <- solve(R)
P <- -inv_R / sqrt(outer(diag(inv_R), diag(inv_R)))
r2 <- R^2
p2 <- P^2
diag(r2) <- 0
diag(p2) <- 0
kmo_i <- 1 - (rowSums(p2) / rowSums(r2))
kmo_overall <- 1 - (sum(p2) / sum(r2))
return(list(overall = kmo_overall, individual = kmo_i))
}
The ‘second’ MSA (i.e., \(\text{MSA}_2\)), also known as ‘Little Jiffy, Mark IV’, by Kaiser and Rice (1974). The formula for calculating the overall MSA is provided in equation 7.4.2.1:
\[ \text{MSA}_2 = \frac{ \underset{i \neq j}{\sum \sum} \hat{\rho}^2_{i,j} }{ \underset{i \neq j}{\sum \sum} \hat{\rho}^2_{i,j} + \underset{i \neq j}{\sum \sum} \hat{\rho}^2_{q[i,j]} } \tag{7.4.2.1} \]where \(\Sigma \Sigma\) denotes the row-wise summation operator, \(\hat{\rho}_{q}\) denotes the anti-image correlation coefficient, and \(\hat{\rho}\) denotes the Pearson product-moment correlation coefficient. The formula for calculating indvidual MSAs are provided in equation 7.4.2.2:
\[ \text{MSA}_2(i) = \frac{ \sum_{j, j \neq i} \hat{\rho}^2_{i,j} }{ \sum_{j, j \neq i} \hat{\rho}^2_{i,j} + \sum_{j, j \neq i} \hat{\rho}^2_{q[i,j]} } \tag{7.4.2.2} \]where \(\Sigma\) denotes the summation operator, \(\hat{\rho}_{q}\) denotes the anti-image correlation coefficient, and \(\hat{\rho}\) denotes the Pearson product-moment correlation coefficient. A function implementing \(\text{MSA}_2\) in R is provided below:
# MSA #2 ('Mark IV', Kaiser & Rice, 1974)
msa2 <- function(R = NULL){
if (!is.matrix(R)) stop("Input must be a matrix.")
if (!all(R == t(R))) stop("Matrix must be symmetric.")
inv_R <- solve(R)
P <- -inv_R / sqrt(outer(diag(inv_R), diag(inv_R)))
r2 <- R^2
p2 <- P^2
diag(r2) <- 0
diag(p2) <- 0
kmo_i <- rowSums(r2) / (rowSums(r2) + rowSums(p2))
kmo_overall <- sum(r2) / (sum(r2) + sum(p2))
return(list(overall = kmo_overall, individual = kmo_i))
}
The ‘third’ MSA (i.e., \(\text{MSA}_3\)) proposed by Kaiser (1981), which can also be considered ‘Mark V’ of ‘Little Jiffy’. The formula for calculating the overall MSA is provided in equation 7.4.3.1:
\[ \text{MSA}_3 = \sqrt{ 1 - \frac{\underset{i \neq j}{\sum \sum} \hat{\rho}^2_{q[i,j]}}{\underset{i \neq j}{\sum \sum} \hat{\rho}^2_{i,j}}} \tag{7.4.3.1} \]where \(\Sigma \Sigma\) denotes the row-wise summation operator, the radical (i.e., \(\sqrt{}\)) denotes the square-root operation, \(\hat{\rho}_{q}\) denotes the anti-image correlation coefficient, and \(\hat{\rho}\) denotes the Pearson product-moment correlation coefficient. The formula for calculating indvidual MSAs are provided in equation 7.4.3.2:
\[ \text{MSA}_3(i) = \sqrt{ 1 - \frac{ \sum_{j, j \neq i} \hat{\rho}^2_{q[i,j]} }{ \sum_{j, j \neq i} \hat{\rho}^2_{i,j} }} \tag{7.4.3.2} \]where \(\Sigma\) denotes the summation operator, the radical (i.e., \(\sqrt{}\)) denotes the square-root operation, \(\hat{\rho}_{q}\) denotes the anti-image correlation coefficient, and \(\hat{\rho}\) denotes the Pearson product-moment correlation coefficient. A function implementing both equations related to \(\text{MSA}_3\) is provided below:
# MSA #3 ('Mark V', Kaiser, 1981)
msa3 <- function(R = NULL){
if (!is.matrix(R)) stop("Input must be a matrix.")
if (!all(R == t(R))) stop("Matrix must be symmetric.")
inv_R <- solve(R)
P <- -inv_R / sqrt(outer(diag(inv_R), diag(inv_R)))
r2 <- R^2
p2 <- P^2
diag(r2) <- 0
diag(p2) <- 0
kmo_i <- sqrt(1 - (rowSums(p2) / rowSums(r2)))
kmo_overall <- sqrt(1 - (sum(p2) / sum(r2)))
return(list(overall = kmo_overall, individual = kmo_i))
}
The results from the Monte Carlo simulation study of the characteristics of the KMO index discussed in Section 2.3 are provided here. Results from the first set of simulations investigating the relationship between the correlations of the correlation matrix and the anti-image correlations of the anti-image correlation matrix are illustrated in figure 7.1 in Section 7.5.1. Results from the second set of simulations used to gauge the relationship between the KMO index and independent correlations of a correlation matrix are shown in figure 7.2 in Section 7.5.2. Results from the third and last set of simulations that sought to assess the relationship between the KMO index and codependent correlations of a correlation matrix are shown in figure 7.3 in Section 7.6.
Figure 7.1: Association between Inter-Correlations & Partial Inter-Correlations
NOTE: Simulated data, with 40,005 simulations for each panel.
Figure 7.2: KMO index Characteristics with Independent Inter-Correlations
NOTE: Simulated data, with 40,005 simulations for each panel.
Figure 7.3: KMO index Characteristics with Codependent Inter-Correlations
NOTE: Simulated data, with 40,005 simulations for each panel.
The matrix showcasing the results from the Monte Carlo simulated data covered in Section 2.4 is provided here. The lower diagonal of the matrix in table 7.1 showcases the estimated correlations (i.e, \(\hat{P}\)), with the upper diagonal containing the upper triangle of the anti-image correlation matrix (i.e, \(\hat{P}_q\)) computed using the inverse correlation matrix (i.e, \(\hat{P}^{-1}\)), while the diagonal displays the the individual KMO indices, one for each manifestation.
\(\theta_{1}\) | \(\theta_{2}\) | \(\theta_{3}\) | \(\theta_{4}\) | \(\theta_{5}\) | \(\theta_{6}\) | \(\theta_{7}\) | \(\theta_{8}\) | \(\theta_{9}\) | \(\theta_{10}\) | |
---|---|---|---|---|---|---|---|---|---|---|
\(\theta_{1}\) | 0.844 | 0.016 | 0.066 | 0.125 | 0.053 | -0.003 | 0.040 | 0.051 | 0.081 | 0.005 |
\(\theta_{2}\) | 0.101 | 0.852 | 0.053 | 0.156 | -0.011 | 0.072 | 0.040 | 0.036 | 0.081 | 0.075 |
\(\theta_{3}\) | 0.130 | 0.139 | 0.869 | 0.046 | 0.010 | 0.076 | 0.063 | 0.078 | 0.048 | 0.079 |
\(\theta_{4}\) | 0.222 | 0.277 | 0.186 | 0.820 | 0.044 | 0.131 | 0.108 | 0.125 | 0.177 | 0.131 |
\(\theta_{5}\) | 0.115 | 0.080 | 0.094 | 0.176 | 0.836 | 0.073 | 0.111 | 0.090 | 0.014 | 0.107 |
\(\theta_{6}\) | 0.101 | 0.187 | 0.174 | 0.288 | 0.169 | 0.857 | 0.108 | 0.054 | 0.078 | 0.103 |
\(\theta_{7}\) | 0.141 | 0.163 | 0.166 | 0.280 | 0.200 | 0.239 | 0.851 | 0.070 | 0.147 | 0.054 |
\(\theta_{8}\) | 0.136 | 0.142 | 0.164 | 0.260 | 0.172 | 0.179 | 0.195 | 0.864 | 0.048 | 0.064 |
\(\theta_{9}\) | 0.179 | 0.210 | 0.167 | 0.342 | 0.138 | 0.232 | 0.281 | 0.189 | 0.840 | 0.121 |
\(\theta_{10}\) | 0.113 | 0.194 | 0.180 | 0.296 | 0.196 | 0.239 | 0.210 | 0.192 | 0.264 | 0.851 |
NOTE: Correlation matrix (\(\hat{P}\)) provided in the lower triangle, anti-image correlation matrix (\(\hat{P}_q\)) provided in the upper triangle, and KMO indices provided in the diagonal. Based on simulated data (n = 1,000) and computed using Frequentist estimators (see functions 2.2.1 - 2.2.8 in Section 2.2).
Building on previous conceptualizations (Kaiser 1970, 1981; Kaiser and Rice 1974), the Bayesian Keiser-Meyer-Olkin (BKMO) index is an original Bayesian re-conceptualization by Emil Niclas Meyer-Hansen, conceived as part of this paper. For correspondence, contact the author via email: emil098meyerhansen@gmail.com.
Please, if you use, refer to, modify, and/or continue the development of the Bayesian KMO index, provide proper reference and citation to its founding author. An example of proper citation is provided below:
For LaTeX users, a BibTeX entry is provided below:
@unpublished{,
title = {Revisiting 'Little Jiffy, Mark IV': Towards a Bayesian KMO index},
author = {Emil Niclas Meyer-Hansen},
publisher = {Open Science Framework},
year = {2025},
doi = {10.17605/OSF.IO/T3UPD},
pubstate = {Working Paper (v2025-06-03-09-14-HTML)}
}
License: Except where otherwise indicated, all contents of this document and associated files are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). All software, source code, executable code, code snippets, code chunks, algorithms, and/or scripts within this document and associated files are expressly excluded from the foregoing license, and unless otherwise indicated, are instead licensed under the GNU General Public License, version 3 (GPL-3.0). By engaging with this document and/or any associated files, which include, but are not necessarily limited to, downloading, using, and/or distributing any of them, in parts of whole, you agree to comply with the applicable license terms for the respective content types. For information on citation, see Section 7.7 in the Appendix.↩︎
Author: Independent Researcher. Holds a Master of Science and Bachelor of Science in Political Science from Aarhus University, Denmark (email: emil098meyerhansen@gmail.com).↩︎
Version: 2025-06-03-09-14-HTML. For a changelog, see Section 7.1 in the Appendix.↩︎
Former Professor of the University of California, Berkeley.↩︎
Former professor of Loyola University, Chicago. It should be noted that there is some uncertainty about Edward P. Meyer being the ‘Meyer’ of ‘Kaiser-Meyer-Olkin’, because Kaiser never mentions this ‘Meyer’ by first name, but only makes vague mentions of contributions from a ‘Professor Meyer of Loyola, Chicago’ (Kaiser 1970: 405). At the same time, it appears that Kaiser did not himself coin the term ‘Kaiser-Meyer-Olkin’ as a name for the MSA, which instead appears to be an invention of Dziuban and Shirkey (1974), but these authors also makes no explicit mention of a ‘Professor Meyer’. However, based on personal communication with staff of Loyola University, Chicago, this ‘Professor Meyer of Loyola, Chicago’ would appear to be Edward Paul Meyer, who was a Professor of Psychology at Loyola during the period (1969 - 1972) when the KMO index was invented.↩︎
Former Professor of Stanford University.↩︎
The term ‘Little Jiffy’ was coined by former Professor Chester W. Harris of the University of Wisconsin–Madison and was used by Henry F. Kaiser to denote the analytic procedure of ‘principal components with associated eigenvalues greater than one followed by normal varimax rotation’ (Kaiser 1970: 402). Note that this ‘Little Jiffy, Mark I’ is not directly related to later versions of ‘Little Jiffy’ (e.g., Kaiser 1970).↩︎
The first revision of the KMO index, ‘Little Jiffy, Mark III’, went unpublished though its changes were incorporated into ‘Mark IV’ (Kaiser and Rice 1974: 112)↩︎
Emeritus Professor of the University of California, Berkeley. Former Professor of the University of California, San Diego.↩︎
A final revision of the KMO index was made by Kaiser (1981), which is simply the square-root of the original KMO index by Kaiser (1970), but this revised version appears mostly ignored across statistical software packages and applied research. Accordingly, this paper focuses on ‘Little Jiffy, Mark IV’ (Kaiser and Rice 1974).↩︎
For researchers interested in these mostly forgotten versions of the KMO index, their mathematical formula, and functions implementing them in R, are provided in Section 7.4 of the Appendix.↩︎
For software specifications, see Section 7.2 of the Appendix↩︎
An alternative to Spearman’s rank correlation coefficient is Kendall’s (1938) \(\tau\), which despite providing more conservative estimates of correlation (Field 2018: 353; Howell 2012) is harder to meaningfully interpret, and comparability with the aforementioned coefficients is mostly lost (Field 2018: 363; Strahan 1982; though see Moran 1948).↩︎
Consistent with the MLE-Bayesian approach, the standard deviation of the posterior is calculated using formula 2.2.3.3 for function 2.2.3 (see Section 2.2).↩︎