Keywords: Kaiser-Meyer-Olkin index, KMO, Bayesian Kaiser-Meyer-Olkin index, BKMO, Measure of Sampling Adequacy, MSA, Bayesian Measure of Sampling Adequacy, BMSA, Bootstrap KMO index, Robust KMO index, Robust BKMO index, Bayesian inference, Likelihood-based inference, Frequentist inference



1 Introduction

The Kaiser-Meyer-Olkin (KMO) index, formally referred to as a Measure of Sampling Adequacy (MSA, Kaiser 1970, 1981; Kaiser and Rice 1974), is a statistic useful for determining whether a data matrix can be considered ‘appropriate’ for factor analysis (Dziuban and Shirkey 1974; Field 2018: 798; Revelle 2025). Named in honor of Henry F. Kaiser4, Edward P. Meyer5, and Ingram Olkin6 (cf. Dziuban and Shirkey 1974; Kaiser 1970: 405; Kaiser 1981: 380; Kaiser and Rice 1974: 112), it is often used to assess whether a data matrix is factorable (cf. Revelle 2025; e.g., Field 2018: 797-799), which is a desirable property to ascertain prior to an exploratory factor analysis (EFA, cf. Field 2018: 778-833). While originally introduced as ‘a second generation Little Jiffy’7 by Kaiser (1970), the currently most used iteration of the KMO index is the result of two subsequent revisions8, including a collaboration between Kaiser and John Rice9, titled ‘Little Jiffy, Mark IV’ (Kaiser and Rice 1974).10 Though it historically may have seen most use within psychology (e.g., Dziuban and Shirkey 1974; Ghadiri et al. 2015; Zabihi et al. 2014), often being used in conjunction with Maurice S. Bartlett’s (1950) test of sphericity (e.g., Dziuban and Shirkey 1974; Field 2018), to this day, it remains a standard measure in statistical software (e.g., IBM 2025; Revelle 2025) that helps researchers across disciplines gauge whether their data can be meaningfully explored using factor analysis (e.g., Ahmad et al. 2024; Almeida et al. 2022; Danaei et al. 2024; Sadowska et al. 2021; Sarfaraz et al. 2024).

Conceived within the classical statistical framework (for introductory texts, see, e.g., Agresti 2018; Field 2018; Imai 2017; Stock and Watson 2019; Wooldridge 2019), the KMO index can be shown to be conceptually ‘Frequentist’ (cf. Kaiser 1970, 1981; Kaiser and Rice 1974), and a counterpart suitable for the alternative Bayesian statistical framework (for introductory texts, see, e.g., Gelman et al. 2014, 2021; Kruschke 2014; Levy and Mislevy 2020; McElreath 2019) is currently missing. This is a relevant shortcoming, because these frameworks fundamentally differ in their conceptualization of probability and statistical uncertainty (see, e.g., Clayton 2021; Field 2018: 122-131; Imai 2017: 242-244; Kruschke 2014; Levy and Mislevy 2020; McElreath 2019). Given that recent advances in Bayesian inference (for a review, see, e.g., Gelman 2022) have made it more viable for applied research (e.g., Gelman et al. 1996, 2019; Hoffman and Gelman 2014; Vehtari et al. 2021) and the numerous emerging criticisms of Frequentism (e.g., Clayton 2021; Cohen 1994; Gelman 2023; Gelman and Stern 2006; Gigerenzer 2004, 2018), transitioning to the Bayesian framework may be the best option for researchers (cf. Kruschke 2014; McElreath 2019; Wagenmakers et al. 2010). Accordingly, a multitude of Frequentist statistics have been converted into Bayesian counterparts (e.g., Ben-Shachar et al. 2020; Gelman et al. 2019; Makowski et al. 2019a), but researchers acquainted with the KMO index may take issue with the lack of a Bayesian version that can properly exploit the numerous advantages of Bayesian inference, such as the ability to incorporate prior information into the analysis and make coherent probabilistic statements from a posterior distribution (cf. Clayton 2021; Gelman et al. 2014; Kruschke 2014; McElreath 2019).

This paper solves the issue of a missing Bayesian KMO index by re-conceptualizing it within the Bayesian framework and demonstrating its implementation using standard statistical software. To build towards a Bayesian concept, this is achieved by revisiting the current KMO index (Section 2), discussing its mathematical and conceptual properties within the Frequentist, Likelihood-based, and Bayesian frameworks, as well as re-examining its viability for inference and robustness. The resulting Bayesian Kaiser-Meyer-Olkin (BKMO) index is then introduced and discussed (Section 3), being a Bayesian Measure of Sampling Adequacy (BMSA) that can incorporate prior information and express uncertainty about the ‘sampling adequacy’ (i.e., factorability) of a data matrix in the form of posterior distributions. With weakly informative priors and small samples, the BKMO index is effectively estimated using regularization, and with ‘non-informative’ priors, the BKMO index approximates results derived from its Frequentist and Likelihood-based counterparts. Analogous to its classical counterpart, the BKMO can be made robust using rank-standardization, though a more Bayesian approach to robustness can also be achieved by changing the likelihood involved in its computation from a Gaussian to a Student t probability distribution. Similar to its bootstrapped Frequentist version, the Bayesian version enables inferential statements to be made regarding the ‘sampling adequacy’ of the data matrix. The paper concludes by discussing the developed BKMO index (Section 4), reviewing its properties, computation, areas of application, and robustness.

2 The Kaiser-Meyer-Olkin index

To build towards a Bayesian conceptualization, it is necessary to first understand the classical and most widely used version of the KMO index, that is, ‘Little Jiffy, Mark IV’ (Kaiser and Rice 1974), including the mathematical formula involved in its calculation. Readers should note that this paper uses the terms ‘KMO index’, ‘sampling adequacy’, and ‘factorability’ interchangeably, since the KMO index is a measure of the factorability of a data matrix (Kaiser 1970, 1981; Kaiser and Rice 1974). This helps emphasize that the viability of the KMO index in applied research is highly dependent on the characteristics of this data matrix from which it is calculated. While explicitly specifying this data matrix is often neglected (e.g., Kaiser 1970, 1981; Kaiser and Rice 1974), this will be done here (Section 2.1) to help make sense of its formula (Section 2.2) and interpretation (Section 2.3). After demonstrating its application (Section 2.4), its viability for inference is demonstrated (Section 2.5), followed by a discussion of its assumptions and robustness (Section 2.6).

2.1 Data Matrix

Specifying the data matrix can be done using set theory (Burgess 2022[1948]; Cantor 1874) and standard notation (e.g., Levy and Mislevy 2020) that is made more mnemonic for the purposes of the paper (cf. Kruschke 2014). Suppose the existence of an \(n\) times \(k\) data matrix (i.e., \(\mathcal{D}\)). Let \(n\) be a positive integer denoting the total number of independent unit observations constituting a representative and sufficiently large random sample, taken from a large and heterogenous population of units. This population could, in principle, be infinitely large, but for practical purposes, it is taken to be of finite size (i.e., \(N\), with \(\{n, N\} \subset \mathbb{N}\)) at no loss of generality (i.e., \(1 \ll n \leq N < \infty\)), with \(u\) indexing an arbitrary individual \(u\)nit observation. Similarly, let \(k\) be a positive integer reflecting the total number of independently measured phenomena \(-\) theorized to be a set of manifestations (i.e., \(\theta\)) of some latent phenomenon (i.e., \(\phi\)) \(-\) that constitute a representative and sufficiently large random sample, taken from a large and heterogenous population of manifestations (i.e., \(\Theta\), with \(\theta \subseteq \Theta\)), with an arbitrary \(m\)anifestation indexed by \(m\). For similar reasons, the size of that population is also taken to be finite (i.e., \(K\), with \(\{k, K\} \subset \mathbb{N}\) and \(1 \ll k \leq K < \infty\)). This data matrix (\(\mathcal{D}\)) is illustrated in definition 2.1.1.

\[ \mathcal{D} \equiv \begin{bmatrix} \theta_{1,1} & \cdots & \theta_{1,k} \\ \vdots & \ddots & \vdots \\ \theta_{n,1} & \cdots & \theta_{n,k} \end{bmatrix} \tag{2.1.1} \]

Assume that the latent phenomenon (\(\phi\)) and all of its manifestations (\(\theta\)) can be meaningfully expressed as real-valued vectors (\(\{\phi, \theta\} \subset \mathbb{R}\), Boyd and Vandenberghe 2018: 4; Stock and Watson 2019: 748-749), and that the latent phenomenon is independent and identically distributed (iid, cf. McElreath 2019: 81; Stock and Watson 2019: 81-82) in the population following a Normal probability distribution (\(\mathcal{N}\), i.e., Gaussian, Gauss 2012[1809]; cf. Agresti 2018: 84-91; Imai 2017: 286-292; Stock and Watson 2019: 710-711; Wooldridge 2019: 704-705) so that their values in the sample can be characterized as: \(\phi_u \sim \mathcal{N}(\gamma, \iota^2)\), for \(u\) in \(1\), \(\dots\), \(n\) (cf. Levy and Mislevy 2020: 88), where \(\gamma\) is a real-valued scalar (Boyd and Vandenberghe 2018: 4; Stock and Watson 2019: 749) that denotes the mean parameter, while \(\iota\) is a positive real-valued scalar denoting the standard deviation parameter.

Further, assume that the latent phenomenon is linearly distinct from its manifestations, with all manifestations also being linearly distinct from each other, meaning that \(\phi\) and \(\theta\) could together form a full column rank matrix (Stock and Watson 2019: 751). Assume that the latent phenomenon temporally precedes in origin all of its manifestations (i.e., \(\forall \theta_m: \phi \prec \theta_m\)), and that every manifestation is temporally simultaneous in origin with every other manifestation (i.e., \(\forall \theta_m: \theta_m \substack{\not \prec \\ \not \succ} \theta_{m'}\)), with \(m'\) denoting an arbitrary manifestation that it not \(m\). Suppose that the latent phenomenon is exogenous of all other causes of the manifestations (i.e. \(\zeta\), so that \(\forall \zeta : \phi \perp \zeta\)).

In line with its standard conceptualization in factor analysis (Field 2018: 781-785; see also Levy and Mislevy 2020: 187-190), assume finally that the manifestations are endogenous and iid as a linear function of the latent phenomenon in the population, so that their values in the sample can be characterized with the following conditionally Normal probability distribution: \(\theta_{u,m} | \tau_m, \lambda_m, \varsigma_m^2 \sim \mathcal{N}(\tau_m + \lambda_m \phi_u, \varsigma_m^2)\), for \(u\) in \(1\), \(\dots\), \(n\), and \(m\) in \(1\), \(\dots\), \(k\). While \(\varsigma_m\) is a positive and real \(k\)-length vector denoting the standard deviations for the manifestations, the mean vector parameter is decomposed into \(\tau_m\) and \(\lambda_m\), where the former is a real \(k\)-length vector denoting the intercepts, while the latter is a real \(k\)-length vector denoting the factor loadings (Field 2018: 781; Levy and Mislevy 2020: 144-145). The reason for these specifications and assumptions are discussed in section 2.5.

2.2 Formula

With this general data matrix (\(\mathcal{D}\)) just outlined in section 2.1 in mind, the steps typically used to calculate the KMO index can be shown (cf. Kaiser 1981: 380; Kaiser and Rice 1974; consistent with Dziuban and Shirkey 1974: 359; IBM 2025; Revelle 2025). Conceptual considerations from the Frequentist, Likelihood-based, and Bayesian frameworks will be discussed throughout to build towards a Bayesian reconceptualization.

First, let \(\text{Mean}(\theta_m)\) denote a piecewise defined function (cf. Stewart 2010: 16-17) for calculating the mean of manifestation \(m\) (cf. Agresti 2018: 47-48; Field 2018: 26-27; Gelman et al. 2021: 41-42; Stock and Watson 2019: 82-83; Wooldridge 2019: 666-668), illustrated in definition 2.2.1, where \(\Sigma\) denotes the summation operation (Cajori 2007; Euler 1755), and the aforementioned \(u\) indexes the rows of \(\mathcal{D}\) (i.e., \(u\)nit observations), with \(n\) rows in total, while \(m\) indexes columns of \(\mathcal{D}\) (i.e, \(m\)anifestations). Here, the simplicity of function 2.2.1 helps express some of the conceptual differences between the Frequentist, Likelihood-based, and Bayesian frameworks, since the formula used in each are mathematically equivalent.

\[ \text{Mean}(\theta_m) \equiv \left\{ \begin{array}{lll} \hat{\mu}_m \equiv \frac{1}{n} \sum_{u=1}^{n} \theta_{u,m} & \text{if Freq.} \land\ n < N & \text{(2.2.1.1)} \\ \mu_m \equiv \frac{1}{n} \sum_{u=1}^{n} \theta_{u,m} & \text{if Freq.} \land\ n = N & \text{(2.2.1.2)} \\ \bar{\theta}_m \equiv \frac{1}{n} \sum_{u=1}^{n} \theta_{u,m} & \text{if MLE-Bayes.} & \text{(2.2.1.3)} \end{array} \right. \tag{2.2.1} \]

While all these frameworks employ formula that equivalently calculates the equally-weighted arithmetic mean of manifestation \(m\), the calculated mean can be conceptualized in at least three different ways. For example, a Frequentist researcher is conceptually interested in estimating the mean of the population (\(\mu_m\)). To achieve this, they conceptualize the formula used to arrive at \(\mu_m\) as an estimator (cf. Stock and Watson 2019: 105). The particular Frequentist estimator of the population mean employed by formula 2.2.1.1 in function 2.2.1 can be denoted \(\hat{\mu}_{OLS}\) to reflect its association with the estimation method of ordinary least squares (OLS, Legendre 1805; Wooldridge 2019: 70-73). Here, it is chosen not necessarily because it calculates the mean of the sample (i.e., \(\bar{\theta}_m\)), but because of its properties across an (in principle) infinitely large number of samples, where it can be considered an unbiased estimator of the population mean when relying on OLS, because it is equivalent to the expected sampling mean (i.e., \(\mathbb{E}(\hat{\mu}_{OLS[m]}) = \mu_m\), Agresti 2018: 116; Imai 2017: 316-317; Stock and Watson 2019: 106-108). Accordingly, formula 2.2.1.1 calculates the estimated mean of the population (\(\hat{\mu}_m\)). For Frequentist researchers in possession of data on the entire population (i.e., \(n = N\)), the calculated mean is conceptually different from \(\hat{\mu}_m\), since the sample mean here is the population mean (i.e., \(\mu_m\)), enabling the use of formula 2.2.1.2. However, both of these conceptualizations are different from the Likelihood-based and Bayesian approaches.

By comparison, formula 2.2.1.3 in function 2.2.1 can be considered conceptually Bayesian, because it calculates the mean of the sample (i.e., \(\bar{\theta}_m\)). This is because in the event of no prior information, formula 2.2.1.3 provides the best possible estimate of the mean of manifestation \(m\) given the data. Here, the Bayesian framework is not inherently built around estimating population characteristics through some concept of drawing infinite samples from that population (cf. Clayton 2021; McElreath 2019), and accordingly, there is no need for varying formula to distinguish between data from a sample or population. While 2.2.1.3 is here referred to as ‘Bayesian’, readers should note that this formula is only a point estimator (cf. Agresti 2018: 115-116; Casella and Berger 2002: 311) of the sample mean and fails to incorporate prior information. This leaves the calculated mean of formula 2.2.1.3 conceptually equivalent to the mode of the likelihood-distribution, which is of interest to the Likelihood-based statistical framework (cf. Agresti 2018: 138-140; King 1998). In this approach, the conceptual output of formula 2.2.1.3 is also preferred for function 2.2.1, since in a manner similar to the Frequentist framework, it can be considered a maximum-likelihood estimator (\(\hat{\mu}_{MLE}\)), which is unbiased for the likelihood-based estimation method. This means that the conceptual output of formula 2.2.1.3 could alternatively be denoted the maximum-likelihood estimate (Agresti 2018: 139; Stock and Watson 2019: 405-406; Wooldridge 2019: 725-726). As such, formula 2.2.1.3 is only conceptually Bayesian in context of no prior information, and it is otherwise mostly of interest to researchers subscribing to the likelihood-based framework.

It should be noted that while the likelihood and Bayesian approaches are conceptually distinct, for simplicity, when in the context of point estimators and no prior information, this paper makes no distinction between them and jointly refers to them as ‘MLE-Bayesian’. For ‘proper’ Bayesian inference, which is concerned with the incorporation of prior information and the estimation of the entire posterior distribution, more complex calculations are required, which typically involve Markov-Chain Monte Carlo methods (MCMC, see Brooks et al. 2011). This will be expanded upon when reconceptualizing the KMO index within the Bayesian framework (see Section 3.1). Given the Frequentist formula otherwise generally employed when calculating the KMO index (more on that in this section), the conceptual mean involved in the calculation of the classical KMO index (Kaiser and Rice 1974) is taken to be the estimated mean of the population (\(\hat{\mu}_m\)), with formula 2.2.1.1 thus used for function 2.2.1.

Second, let \(\text{Var}(\theta_m)\) denote the piecewise function for calculating the variance of manifestation \(m\) (cf. Agresti 2018: 54-55; Field 2018: 28-31; Gelman et al. 2021: 41-42; Imai 2017: 66-68; Stock and Watson 2019: 83-84; Wooldridge 2019: 695-696). Here, not just the conceptual differences between the Frequentist and MLE-Bayesian approaches but also their mathematical differences become apparent, since the three different conceptualizations of variance involve at least two mathematically distinct formula, as shown in function 2.2.2. Here, the three formula are similar because they all involve the calculation of the sum of squares (SS, cf. Agresti 2018: 55; Field 2018: 29, 56-57) of manifestation \(m\), and the use of piecewise functions allows it to be conceptualized irrespective of statistical framework as: \(\sum_{u=1}^{n} (\theta_{u,m} - \text{Mean}(\theta_m))^2\). What makes the function ‘Frequentist’ or ‘MLE-Bayesian’ is the conceptual output of \(\text{Mean}(\theta_m)\) and the weighing of the contributions of the observations. For example, formula 2.2.2.1 and 2.2.2.2 are Frequentist, while 2.2.2.3 can be considered MLE-Bayesian.

\[ \text{Var}(\theta_m) \equiv \left\{ \begin{array}{lll} \hat{\sigma}^2_m \equiv \frac{1}{n - 1} \sum_{u=1}^{n} (\theta_{u,m} - \hat{\mu}_m)^2 & \text{if Freq.} \land\ n < N & \text{(2.2.2.1)} \\ \sigma^2_m \equiv \frac{1}{n} \sum_{u=1}^{n} (\theta_{u,m} - \mu_m)^2 & \text{if Freq.} \land\ n = N & \text{(2.2.2.2)} \\ s^2_m \equiv \frac{1}{n} \sum_{u=1}^{n} (\theta_{u,m} - \bar{\theta}_m)^2 & \text{if MLE-Bayes.} & \text{(2.2.2.3)} \end{array} \right. \tag{2.2.2} \]

Formula 2.2.2.1 is again a Frequentist estimator (denoted \(\hat{\sigma}^2_{OLS}\)) that calculates the variance by dividing the SS with \(n - 1\), which is a weight also known as Bessel’s correction (cf. Field 2018: 29-30; Stock and Watson 2019: 112). This is a Frequentist adjustment to the sample size based on the degrees of freedom that makes the estimator unbiased for OLS by accounting for the mean being estimated (i.e., \(\mathbb{E}(\hat{\sigma}^2_{OLS[m]}) = \sigma^2_m\), cf. Field 2018: 58-59; Stock and Watson 2019: 112). This correction, however, is only applied when working with samples (i.e., \(n < N\)), and Frequentist researchers with census data (i.e., \(n = N\)) need not apply Bessel’s correction but can instead rely on formula 2.2.2.2, since the population mean (\(\mu_m\)) is known for such data (cf. Agresti 2018: 56; Stock and Watson 2019: 112).

For conceptual clarity, it should be noted that Frequentist researchers may sometimes refer to \(\hat{\sigma}^2_m\) as the ‘sample variance’ (denoting it \(s^2_m\), e.g., Field 2018: 30; Gelman et al. 2021: 168; Imai 2017: 325; Stock and Watson 2019: 112), but this is inappropriate (cf. McElreath 2019: 197), since \(\hat{\sigma}^2_m\) is not the sample variance (i.e., \(s^2_m\)) but instead an estimate of the population variance (i.e., \(\hat{\sigma}^2_m\)), and for any finite sample, \(s^2_m\) does not necessarily equal \(\hat{\sigma}^2_m\). Instead, the ‘sample variance’, that is, the variance in the sample (\(s^2_m\)), is distinct and calculated by formula 2.2.2.3, which despite being mathematically equivalent to formula 2.2.2.2 is conceptually different for the aforementioned reasons and thus preferred by Bayesian researchers (cf. McElreath 2019: 197).

Consistent with the likelihood-based framework, the formula involved in formula 2.2.2.2 and 2.2.2.3 is the maximum-likelihood estimator (\(\hat{\sigma}^2_{MLE}\), cf. Casella and Berger 2002: 313), in contrast to the unbiased estimator for OLS (\(\sigma^2_{OLS}\), cf. Field 2018: 29-30; Stock and Watson 2019: 112) used in formula 2.2.2.1. While formula 2.2.2.3 is generally preferred by MLE-Bayesians (cf. McElreath 2019: 197), a ‘proper’ Bayesian estimate of the variance will again typically involve MCMC methods to estimate the posterior distribution and incorporate prior information. Consistent with existing implementations (Dziuban and Shirkey 1974; IBM 2025; Revelle 2025), the formula generally used by the current KMO index (Kaiser and Rice 1974) for function 2.2.2 is based on formula 2.2.2.2, which outputs the estimated population variance (\(\hat{\sigma}^2_m\)). As such, the KMO index (cf. Kaiser and Rice 1974) can be considered to be conceptually Frequentist and not equivalent to a Bayesian or Likelihood-based conceptualization for finite data, which is a detriment to researchers working within those frameworks who seek to assess ‘sampling adequacy’ prior to a factor analysis. The fact that the classical KMO index is conceptually Frequentist is further corroborated in the following.

Let \(\text{SD}(\theta_m)\) then denote the piecewise function for calculating the standard deviation of manifestation \(m\) (cf. Agresti 2018: 54-55; Field 2018: 28-31; Gelman et al. 2021: 41-42; Imai 2017: 66-68; Stock and Watson 2019: 84; Wooldridge 2019: 696). Illustrated in function 2.2.3, where the radical (i.e., \(\sqrt{}\)) denotes the square-root operation (Descartes 1637; Rudolff 1525). Similar to the variance, the Frequentist and Bayesian approaches differ with respect to \(\text{SD}(\theta_m)\) as shown by conceptual disagreement about the output of function 2.2.3. Formula 2.2.3.1 and 2.2.3.2 are again Frequentist, with 2.2.3.1 being the result of relying on formula 2.2.2.1 for calculating the variance (function 2.2.2), while 2.2.3.2 relies on formula 2.2.2.2. Formula 2.2.3.1 is again an estimator (i.e., \(\hat{\sigma}_{OLS}\)) and thus calculates the estimated standard deviation in the population (i.e., \(\hat{\sigma}_m\)), while 2.2.3.2 calculates the population standard deviation (i.e., \(\sigma_m\)), with the former being used when in possession of the sample, while the latter is reserved for census data, where the population mean (\(\mu_m\)) is known.

\[ \text{SD}(\theta_m) \equiv \left\{ \begin{array}{lll} \hat{\sigma}_m \equiv \sqrt{\hat{\sigma}^2_m} & \text{if Freq.} \land n < N & \text{(2.2.3.1)} \\ \sigma_m \equiv \sqrt{\sigma^2_m} & \text{if Freq.} \land n = N & \text{(2.2.3.2)} \\ s_m \equiv \sqrt{s^2_m} & \text{if MLE-Bayes.} & \text{(2.2.3.3)} \end{array} \right. \tag{2.2.3} \]

Similar to the conceptualization of variance, Frequentists often denote the conceptual output of formula 2.2.3.1 the ‘sample standard deviation’ (i.e., \(s_m\), e.g., Agresti 2018: 55; Field 2018: 30), but this is again inappropriate (cf. McElreath 2019: 197), because it provides an estimate of the population standard deviation (i.e., \(\hat{\sigma}_m\)) that does not equal the sample standard deviation for finite data (Imai 2017: 66-68). Frequentist researchers should also note that while formula 2.2.3.1 is a less biased estimator than 2.2.3.2 when estimating the population standard deviation with OLS (due to 2.2.2.1 applying Bessel’s correction), it is not unbiased due to Jensen’s (1906) inequality (i.e., \(\mathbb{E}(\hat{\sigma}_{OLS[m]}) \neq \sigma_m\), cf. Bolch 1968). Instead, one will have to apply more complicated corrections (cf. Park et al. 2022; Park and Wang 2020, 2022), which are typically not done when calculating the KMO index (cf. IBM 2025; Revelle 2025), meaning it relies on a biased estimator. Given the present aim of developing a Bayesian KMO index, these Frequentist corrections will not be covered here.

The last formula, 2.2.3.3, can be considered MLE-Bayesian in the same manner as formula 2.2.2.3 in function 2.2.2, with formula 2.2.3.3 calculating the sample standard deviation (i.e., \(s_m\), alternatively denoted \(\hat{\sigma}_{MLE}\)). MCMC is again the preferred method to estimate the entire posterior distribution and incorporate prior information rather than merely relying on the point estimator of formula 2.2.3.3. In line with the above choice of formula 2.2.2.1 for this section, and consistent with common implementations (IBM 2025; Revelle 2025), the classical KMO index is calculated using 2.2.3.1 in function 2.2.3, meaning it uses a biased estimate of the population standard deviation, further corroborating the claim that it is conceptually Frequentist and in need of a Bayesian reconceptualization.

Third, let \(\text{Cov}(\theta_m, \theta_{m'})\) denote the piecewise function for calculating the covariance between manifestation \(m\) and \(m'\) (Agresti 2018: 92; Field 2018: 336-338; Imai 2017: 384; Wooldridge 2019: 697-698). Defined in 2.2.4, it should come as no surprise that the conceptual output of function 2.2.4 differs across approaches, where Frequentists rely on formula 2.2.4.1 and 2.2.4.2, while MLE-Bayesians would apply 2.2.4.3. Again, formula 2.2.4.1 applies Bessel’s correction to create an unbiased estimator of the covariance in the population (i.e., \(\hat{\sigma}_{OLS[\theta_m, \theta_{m'}]}\), with \(\mathbb{E}(\hat{\sigma}_{OLS[\theta_{m,m'}]}) = \sigma_{\theta_{m,m'}}\), Field 2018: 336-338). Conceptually speaking, Frequentists applying formula 2.2.4.1 are thus interested in an estimate of the population covariance (\(\hat{\sigma}_{\theta_{m,m'}}\)). Bessel’s correction is again only applied when in possession of a sample, and with population data, Frequentists apply formula 2.2.4.2 to calculate the population covariance (i.e., \(\sigma_{\theta_{m,m'}}\)), since the mean of each manifestation is known.

\[ \small \text{Cov}(\theta_m, \theta_{m'}) \equiv \left\{ \begin{array}{lll} \hat{\sigma}_{\theta_{m,m'}} \equiv \frac{1}{n - 1} \sum_{u=1}^{n} (\theta_{u,m} - \hat{\mu}_m)(\theta_{u,m'} - \hat{\mu}_{m'}) & \text{if Freq.} \land n < N & \text{(2.2.4.1)} \\ \sigma_{\theta_{m,m'}} \equiv \frac{1}{n} \sum_{u=1}^{n} (\theta_{u,m} - \mu_m)(\theta_{u,m'} - \mu_{m'}) & \text{if Freq.} \land n = N & \text{(2.2.4.2)} \\ s_{\theta_{m,m'}} \equiv \frac{1}{n} \sum_{u=1}^{n} (\theta_{u,m} - \bar{\theta}_m)(\theta_{u,m'} - \bar{\theta}_{m'}) & \text{if MLE-Bayes.} & \text{(2.2.4.3)} \end{array} \right. \tag{2.2.4} \]

By contrast, MLE-Bayesians are conceptually more interested in the sample covariance (i.e., \(s_{\theta_{m,m'}}\)) and so do not apply Bessel’s correction. Researchers applying either formula 2.2.4.2 or 2.2.4.3 can alternatively be said to rely on a maximum-likelihood estimator (i.e., \(\hat{\sigma}_{MLE[\theta_{m,m'}]}\)), though ‘proper’ Bayesians generally substitute formula 2.2.4.3 with MCMC methods for its aforementioned advantages. Consistent with existing implementations (IBM 2025; Revelle 2025), the Frequentist nature of the current KMO index is thus further corroborated due to calculating an estimate of the population variance (\(\hat{\sigma}_{\theta_{m,m'}}\)) by relying on formula 2.2.4.1 for function 2.2.4.

Let \(\text{Cor}(\theta_m, \theta_m')\) then denote the piecewise function for calculating the Pearson product-moment correlation coefficient (PPCC, cf. Bravais 1844; Pearson 1895a; Stigler 1989) between manifestation \(m\) and \(m'\):

\[ \text{Cor}(\theta_m, \theta_{m'}) \equiv \left\{ \begin{array}{lll} \hat{\rho}_{m,m'} \equiv \frac{\hat{\sigma}_{\theta_{m,m'}}}{\hat{\sigma}_m \hat{\sigma}_{m'}} & \text{if Freq.} \land n < N & \text{(2.2.5.1)} \\ \rho_{m,m'} \equiv \frac{\sigma_{\theta_{m,m'}}}{\sigma_m \sigma_{m'}} & \text{if Freq.} \land n = N & \text{(2.2.5.2)} \\ r_{m,m'} \equiv \frac{s_{\theta_{m,m'}}}{s_m s_{m'}} & \text{if MLE-Bayes.} & \text{(2.2.5.3)} \end{array} \right. \tag{2.2.5} \]

When function 2.2.5 is applied between the \(k\) manifestations in \(\mathcal{D}\), the resulting correlations can be used to construct a correlation matrix. This square matrix (Stock and Watson 2019: 749), whose rows are indexed by \(i\) and columns by \(j\), is composed of \(k^2\) correlations, one for each relationship between the manifestations (including each manifestation with itself). This results in its off-diagonal being symmetric (Stock and Watson 2019: 749) and bounded between -1 and 1 (i.e., due to the Cauchy-Bunyakovsky-Schwarz inequality, Bunyakovsky 1859; Cauchy 1821; Schwarz 1888; Wooldridge 2019: 698-699), reflecting the linear correlation (Imai 2017: 143; Stock and Watson 2019: 157) between manifestations (i.e., inter-correlations). By comparison, its diagonal is composed of a \(k\)-length ones vector (Boyd and Vandenberghe 2018: 5), reflecting the perfect linear association of each manifestation with itself (i.e., intra-correlation). Given the obvious conceptual differences across and within the statistical approaches, the correlation matrix will be denoted \(\hat{P}\) if relying on Frequentist estimators for a sample, \(P\) if the data is the population, and \(R\) if relying on definitions preferable to the MLE-Bayesian approach.

With this denotation in mind, the inter-correlations are calculated as defined in 2.2.5 (cf. Agresti 2018: 92; Field 2018: 338-340; Imai 2017: 101-105, 384; Stock and Watson 2019: 71; Wooldridge 2019: 698-699), where the statistical framework and data used for \(\text{Cov}(\theta_m, \theta_{m'})\) and \(\text{SD}(\theta_m)\) determine whether function 2.2.5 results in \(\hat{P}\), \(P\), or \(R\). Given the aforementioned conceptualizations, \(\hat{\rho}\) is the estimated linear correlation in the population, \(\rho\) is the linear correlation in the population, and \(r\) is the linear correlation in the sample.

Researchers should similarly note that while applying Bessel’s correction makes the estimator of the variance and covariance unbiased, the OLS-estimator of the population linear correlation in 2.2.5.1 (i.e., \(\hat{\rho}_{OLS[mm']}\)) is not an unbiased estimator of the population correlation (i.e., \(\mathbb{E}(\hat{\rho}_{OLS[m,m']}) \neq \rho_{m,m'}\)), again because of Jensen’s inequality. While corrections to the correlations (or versions thereof) have been proposed, for example, Kelley’s (1935) \(\epsilon\), Hays’ (1963) \(\omega\), or regularized estimation (Schäfer and Strimmer 2005), the literature lacks consensus on whether any of these are properly unbiased (e.g., Albers and Lakens 2018; Carroll and Nordholm 1975; Mordkoff 2019), and since the typical calculation of the KMO index does not make any adjustments besides applying Bessel’s correction for the aforementioned functions, these will not be further covered here.

By comparison, this issue of ‘unbiasedness’ can be avoided with formula 2.2.5.2, which calculates the population linear correlation (i.e., \(\rho_{m,m'}\)) when in possession of census data. A conceptual alternative is formula 2.2.5.3, preferable to MLE-Bayesians, which calculates the linear correlation in the sample (\(r_{m,m'}\), alternatively conceptualized as \(\hat{\rho}_{MLE[m,m']}\)). Formula 2.2.5.3 can again be expanded to estimate the entire posterior distribution and include prior information with the use of MCMC methods. Consistent with the use of Frequentist estimators to calculate the covariance and standard deviation (Dziuban and Shirkey 1974; IBM 2025; Revelle 2025), the KMO index is conceptually Frequentist and relies on the calculation of the estimated correlation matrix in the population (\(\hat{P}\)), applying the biased formula 2.2.5.1 for function 2.2.5.

Finally, let \(\text{Cor}_q(\theta_m, \theta_{m'})\) denote the piecewise function for calculating the anti-image product-moment correlation coefficient between manifestation \(m\) and \(m'\) (Kaiser 1970: 405; Kaiser and Rice 1974: 113-114; Kaiser 1981: 379). This is a linear coefficient that is adjusted for the linear relationships between all the manifestations and is closely related to the partial linear correlation (Agresti 2018: 343-346, 362; Field 2018: 358-361). Again, when applied to every manifestation in \(\mathcal{D}\), the resulting correlations can be used to construct a \(k\) times \(k\) anti-image correlation matrix (consistent with Kaiser 1970: 405; Kaiser and Rice 1974: 113-114; Kaiser 1981: 379), which, similar to the correlation matrix, is square and symmetric, with values ranging from -1 to 1, though its diagonal is instead a negative ones vector. The anti-image inter-correlations can be calculated with reference to the \(k\) rows and \(k\) columns of the correlation matrix (where \(i \neq j\)):

\[ \text{Cor}_q(\theta_m, \theta_{m'}) \equiv \left\{ \begin{array}{lll} \hat{\rho}_{q[i,j]} \equiv -\frac{\hat{\rho}_{i,j}^{-1}}{\sqrt{\hat{\rho}_{i,i}^{-1}\hat{\rho}_{j,j}^{-1}}} & \text{if Freq.} \land n < N & \text{(2.2.6.1)} \\ \rho_{q[i,j]} \equiv -\frac{\rho_{i,j}^{-1}}{\sqrt{\rho_{i,i}^{-1}\rho_{j,j}^{-1}}} & \text{if Freq.} \land n = N & \text{(2.2.6.2)} \\ \textit{r}_{q[i,j]} \equiv -\frac{\textit{r}_{i,j}^{-1}}{\sqrt{\textit{r}_{i,i}^{-1}\textit{r}_{j,j}^{-1}}} & \text{if MLE-Bayes.} & \text{(2.2.6.3)} \end{array} \right. \tag{2.2.6} \]

Again, the conceptual output of \(\text{Cor}_q(\theta_m, \theta_{m'})\) depends entirely on the used statistical framework and data, with \(\hat{P}_q\) being the estimated anti-image correlation matrix in the population, whose elements are given by formula 2.2.6.1, while \(P_q\) denotes the anti-image correlation matrix in the population, and formula 2.2.6.2 is thus used for census data. MLE-Bayesians can conceptualize the output of \(\text{Cor}_q(\theta_m, \theta_{m'})\) as \(\textit{R}_q\) to denote the anti-image correlation matrix in the sample, with elements given by formula 2.2.6.3. In line with the previous considerations, it is apparent that this Frequentist KMO index conceptually involves the estimated anti-image correlation in the population (\(\hat{P}_q\)) through its reliance on formula 2.2.6.1 for function 2.2.6 (cf. Dziuban and Shirkey 1974; IBM 2025; Revelle 2025).

For all these conceptualizations, it should be noted that the calculation of \(\text{Cor}_q(\theta_m, \theta_{m'})\) necessitates that the correlation matrix is invertible (Boyd and Vandenberghe 2018: 202), because it relies on the calculation of the inverse correlation matrix (cf. Kaiser and Rice 1974: 113), whose components are denoted as \(\hat{\rho}^{-1}\), \(\rho^{-1}\), and \(r^{-1}\) in formula 2.2.6.1 - 2.2.6.3, respectively. This necessity is the reason why the aforementioned stipulations of the data matrix (\(\mathcal{D}\)) in section 2.1 explicitly specified that the manifest phenomena were all linearly distinct and could form a full column rank matrix, because any instance of perfect multicollinearity in the correlation matrix makes it singular and thus not invertible (Boyd and Vandenberghe 2018: 202-203). Another requirement is that piecewise functions 2.2.1 - 2.2.6 all require complete data, meaning that no data can be missing. While Kaiser and Rice (1974: 113) recommend using mean imputation to handle missing values, an arguably more appropriate procedure would involve multiple imputations by chained equations (MICE, see van Buuren and Groothuis-Oudshoorn 2011) to improve imputation accuracy and account for imperfect imputations.

\[ \text{KMO}(\mathcal{D}) \equiv \left\{ \begin{array}{lll} \hat{\alpha} \equiv \frac{\underset{i \neq j}{\sum \sum} \hat{\rho}^2_{i,j}}{\underset{i \neq j}{\sum \sum} \hat{\rho}^2_{i,j} + \underset{i \neq j}{\sum \sum} \hat{\rho}_{q[i,j]}^2} & \text{if Freq.} \land n < N & \text{(2.2.7.1)} \\ \alpha \equiv \frac{\underset{i \neq j}{\sum \sum} \rho^2_{i,j}}{\underset{i \neq j}{\sum \sum} \rho^2_{i,j} + \underset{i \neq j}{\sum \sum} \rho_{q[i,j]}^2} & \text{if Freq.} \land n = N & \text{(2.2.7.2)} \\ a \equiv \frac{\underset{i \neq j}{\sum \sum} r^2_{i,j}}{\underset{i \neq j}{\sum \sum} r^2_{i,j} + \underset{i \neq j}{\sum \sum} r_{q[i,j]}^2} & \text{if MLE-Bayes.} & \text{(2.2.7.3)} \end{array} \right. \tag{2.2.7} \]

Now, using the values computed from these standard formula, let \(\text{KMO}(\mathcal{D})\) denote the piecewise function for calculating the overall ‘sampling adequacy’ of a data matrix (Kaiser and Rice 1974: 112-113; consistent with Dziuban and Shirkey 1974; IBM 2025; Revelle 2025). Since it has become apparent that the functions involved in the calculation of the KMO index differ in conceptualization and mathematical formula, \(\text{KMO}(\mathcal{D})\) can be made to account for these variations, as expressed in function 2.2.7. Here, \(\sum \sum\) denotes the row-wise summation operation, \(\hat{\alpha}\) denotes the estimated sampling \(\alpha\)dequacy in the population (formula 2.2.7.1), \(\alpha\) denotes the sampling \(\alpha\)dequacy in the population (formula 2.2.7.2), and a denotes the sampling \(a\)dequacy in the sample (formula 2.2.7.3), which help distinguish the conceptual nuances of the KMO index. Calculating the individual KMO indices, that is, the MSA for manifestation \(m\) in the data (\(\mathcal{D}\)), is given by the similarly distinguishing piecewise function:

\[ \text{KMO}(\mathcal{D}_m) \equiv \left\{ \begin{array}{lll} \hat{\alpha}_m \equiv \frac{\sum_{j, j \neq i} \hat{\rho}^2_{i,j}}{\sum_{j, j \neq i} \hat{\rho}^2_{i,j} +\sum_{j, j \neq i} \hat{\rho}^2_{q[i,j]}} & \text{if Freq.} \land n < N & \text{(2.2.8.1)} \\ \alpha_m \equiv \frac{\sum_{j, j \neq i} \rho_{i,j}^2}{\sum_{j, j \neq i} \rho_{i,j}^2 +\sum_{j, j \neq i} \rho^2_{q[i,j]}} & \text{if Freq.} \land n = N & \text{(2.2.8.2)} \\ a_m \equiv \frac{\sum_{j, j \neq i} r_{i,j}^2}{\sum_{j, j \neq i} r_{i,j}^2 +\sum_{j, j \neq i} r^2_{q[i,j]}} & \text{if MLE-Bayes.} & \text{(2.2.8.3)} \end{array} \right. \tag{2.2.8} \]

The reason for distinguishing between the the varying conceptual outputs produced by the formula in function 2.2.7 and 2.2.8 should have been made apparent based on the discussion of the different concepts and formula involved in their calculation. While it will here be left to the discretion of the researcher to determine whether they are interested in calculating the ‘sampling adequacy in the population’ (e.g., formula 2.2.7.1 - 2.2.7.2) or the ‘sampling adequacy in the sample’ (e.g., formula 2.2.7.3), the latter MLE-Bayesian conceptualization does appear more consistent with the aim of the KMO index (see Kaiser 1970, 1981; Kaiser and Rice 1974), since it provides a metric about the factorability of the data matrix that was collected, instead of some hypothetical metric about infinitely many sampled data matrices. Accordingly, reconceptualizing the KMO index with the Bayesian framework can be considered appropriate and in line with its intended purpose.

2.3 Interpretation

Irrespective of conceptualization or formula, the ‘sampling adequacy’ measured by the KMO index can be meaningfully interpreted as the ratio of the sum of squared inter-correlations relative to the sum of the squared inter-correlations and squared anti-image inter-correlations. This means that the KMO index is a real scalar (\(\text{KMO} \in \mathbb{R}\)), where a value of .5 occurs when the sum of squared inter-correlations are equal to the sum of the squared anti-image inter-correlations (cf. Kaiser and Rice 1974: 113).

Regarding the theoretical lower and upper bounds of the KMO index, there have been made numerous conflicting claims. Early researchers assert that its bounds are 0 and 1 (Dziuban and Shirkey 1974: 359; Kaiser and Rice 1974: 113), where a value of 0 would indicate that ‘the sum of [squared anti-image inter-, ed.] correlations is large relative to the sum of [squared inter-, ed.] correlations, indicating diffusion in the pattern of correlations’ (Field 2018: 798), while a value close to 1 would indicate that ‘patterns of correlations are relatively compact and so factor analysis should yield distinct and reliable factors.’ (Field 2018: 798). These bounds were challenged by Shirkey and Dzuban (1976), who by simulating independent variables mostly failed to produce KMO index values below .5 (as cited in Kaiser 1981: 380). By comparison, examinations of real-world data from Harman (1967), Armstrong and Soelberg (1968) yielded a KMO index as high as .93 (Dziuban and Shirkey 1974: 360) and as low as .42 (Dziuban and Shirkey 1974: 359). In an attempt to fix what was perceived as non-intuitive bounds, Kaiser (1981) abandoned the ‘Mark IV’ revision (i.e., Kaiser and Rice 1974) in favor of the original ‘Mark II’ version (i.e., Kaiser 1970), whose lower bound instead was negative (Kaiser 1981: 381). To increase interpretation, Kaiser (1981) revised ‘Mark II’ to instead be its square-root, but this inadvertently meant that the lower bound of this new ‘Mark V’ became an imaginary number (Kaiser 1981: 381; cf. Nahin 1998), and that version has seen little use compared to ‘Mark IV’ (i.e., Kaiser and Rice 1974).11

To solve this continuing discrepancy and obtain more clarity about the theoretical bounds of the ‘Mark IV’ KMO index (i.e., Kaiser and Rice 1974), which in turn will be informative about the bounds of its Bayesian reconceptualization, this paper makes some analytic observations and briefly investigates the sampling behavior of the KMO index using a series of Monte Carlo simulations (Robert and Casella 2013; Rubinstein and Kroese 2016) using the \(\textsf{R}\) programming language12 (R Core Team 2024). These were partly motivated by Kaiser’s (1970: 405) statements about ‘Mark II’ (i.e., Kaiser 1970) that, ceteris paribus, the value of the KMO index increases as a function of \(n\), \(k\), and \(\text{Cor}(\theta_m, \theta_{m'})\), while it decreases as a function of \(\text{Cor}_q(\theta_m, \theta_{m'})\). Following this, 480,060 simulations were conducted that directly manipulated the dependency structure of the correlation matrix, the magnitude and direction of the correlations, and the number of theorized manifestations (\(k\)), to gauge the association between the inter-correlations and anti-image inter-correlations, and the values of the KMO index under linearly interdependent and independent (i.e., random) correlation structures, expanding upon the simulation study by Shirkey and Dzuban (1976).

Before consulting the simulated results, the analytic observations made were these: It can be stated that for any correlation matrix composed of perfectly linearly independent variables (i.e., its off-diagonals are zero vectors) that the KMO index is undefined for the real number system, because both the inter-correlations and the anti-image inter-correlations become zero, resulting in a division by zero in the formula of functions 2.2.7 and 2.2.8; it can be further stated that for any correlation matrix with perfectly linearly dependent variables (i.e., its off-diagonals are ones vectors), the KMO index is also undefined, because the correlation matrix is singular (as similarly observed by Kaiser and Rice 1974: 114), making it impossible to calculate the inverse inter-correlations needed in function 2.2.6. This inadvertently means that correlation matrices that could intuitively constitute lower and upper bounds does not yield a KMO index of 0 or 1.

These insights are complimented by the simulations, which reveal that the association between the correlation matrix and anti-image correlation matrix is non-monotonous and complex, being independent, positive, and negative under different circumstances. This means that an increase in the inter-correlations does not necessarily lead to an improved KMO index because the anti-image inter-correlations may similarly increase. This complexity may be a cause of why data matrices with ‘bad’ ‘sampling adequacy’ often yield a KMO index of .5. Regarding the theoretical bounds, the lowest calculated KMO index was \(2.54e^{-09}\), while the highest was \(9.98e^{-01}\) (for more information, see Section 7.4 of the Appendix.).

These results are consistent with the earliest claims that the theoretical bounds of the KMO index are indeed 0 and 1, though based on the analytic insights, these bounds are more likely exclusive rather than inclusive (i.e., \(\text{KMO} \in \mathbb{R}^{(0; 1)}\) rather than \(\text{KMO} \in \mathbb{R}^{[0; 1]}\)). However, they also corroborate claims (e.g., Field 2018: 798) that values below .5 and values close to 1 (e.g., > .95) are highly unlikely, occurring across these simulations at a frequency of 6.02% and 3.03%, respectively. Another key insight is that a KMO index of exactly .5 is a highly possible outcome, occurring across the simulations at a frequency of 83.2%. Since this value occurred consistently and almost irrespective of how ‘good’ or ‘bad’ the inter-correlations were, a KMO index value of .5 can be taken to provide indecisive information in an assessment of the ‘sampling adequacy’ of the data matrix, which will be considered later when discussing inferential viability (see Section 2.5). Taken together, and with the mentioned caveats in mind, higher values of the KMO index thus generally indicate a greater percentage of shared information between the theorized manifestations (i.e., factorability), which is desirable for a factor analysis.

To help researchers make discrete decisions of ‘sampling adequacy’, guidelines by (Kaiser 1974), reproduced in table 2.1, are often referenced when interpreting the KMO index (e.g., Dziuban and Shirkey 1974: 359; Revelle 2025). While these guidelines may be appropriate for interpreting it, they were not originally formulated in relation to the KMO index. In the paper referenced as the source of these guidelines (i.e., Kaiser 1974; cf. Dziuban and Shirkey 1974: 359; Revelle 2025), Kaiser explicitly introduces them for a separate Index of Factor Simplicity (IFS, Kaiser 1974: 35; see also Kaiser and Rice 1974: 112), and the only relationship between these guidelines and ‘sampling adequacy’ is that the IFS is partly based on the KMO index (see Kaiser and Rice 1974). Instead, in reference to the original ‘Mark II’ version of the KMO index, Kaiser (1970: 405) states that ‘good factor-analytic data’ requires a KMO index of at least .8, while ‘really excellent data’ only occurs with a KMO index of at least .9. Whether these generalize to the ‘Mark IV’covered here is not mentioned in Kaiser and Rice (1974), and Kaiser (1981) fails to comment on Dziuban and Shirkey’s (1974: 359) use of the IFS-guidelines for interpreting the KMO index. However, given that Kaiser acknowledges the substantial differences between ’Mark II’ and ‘Mark IV’ (Kaiser 1981: 380-381), it is unlikely that he would favor interpreting them using the same guidelines.

Table 2.1: Interpretive Guidelines Wrongly Associated with the KMO index
KMO Interpretation
\(1 \geq IFS \geq .9\) ‘Marvelous’
\(.9 > IFS \geq .8\) ‘Mertitourious’
\(.8 > IFS \geq .7\) ‘Middling’
\(.7 > IFS \geq .6\) ‘Mediocre’
\(.6 > IFS \geq .5\) ‘Miserable’
\(.5 > IFS \geq 0\) ‘Unacceptable’

NOTE: Interpretive guidelines developed for the Index of Factor Simplicity (IFS, Kaiser 1974: 35; see also Kaiser and Rice 1974: 112), which are often incorrectly associated with the KMO index (e.g., Dziuban and Shirkey 1974: 359; Revelle 2025).

While it can be useful to derive interpretive heuristics, instead of developing them entirely on a theoretical basis and applying them across fields indiscriminately (e.g., Cohen 1988), one could argue that they should instead be empirically based and specified to suit the particular contexts of each field (cf. Funder and Ozer 2019; Gignac and Szodorai 2016; Lovakov and Agadullina 2021). The currently missing guidelines for interpreting the KMO index could thus be addressed through an empirical examination of KMO indices in the literature, with interpretive guidelines being specified for each particular field. At the same time, one could also argue that the discretization involved in producing such rules-of-thumb of a continuous measure likely oversimplifies interpretation (cf. Field 2018: 117), and researchers should thus ideally use guidelines with caution and report non-discretized KMO-indices. It is left to the reader’s discretion which of these perspectives to take, and this paper makes no attempt to empirically derive guidelines for interpreting the KMO index. To more practically illustrate the utility of the KMO index, an example using simulated data is provided in the subsequent section.

2.4 Example

To demonstrate the usefulness of the KMO index, a 1,000 times 10 data matrix is simulated, again with Monte Carlo methods (Rubinstein and Kroese 2016) using \(\textsf{R}\) (R Core Team 2024). In line with the aforementioned stipulations in section 2.1, the latent phenomenon (\(\phi\)) is generated with the parameter specifications: \(\gamma = 0\) and \(\iota = 1\), while its manifestations (\(\theta\)) are generated with the specifications: \(\forall \theta_m: \tau_m = 0, \lambda_m \sim \mathcal{U}(.3, .7), \varsigma_m = 1\), with \(\mathcal{U}\) being the continuous uniform probability distribution (cf. Gelman et al. 2014: 577-578; McElreath 2019: 36), where .3 inclusively denotes the lower bound and .7 the upper bound. While the specification of 10 manifestations is rather arbitrary and merely serves to help demonstrate a computation of the KMO index, this number has previously been considered adequate for reliably measuring latent phenomena (e.g., Ashton and Lee 2009). The smallest effect size of interest (SESOI, see Lakens 2017) is specified to be .3, since, for simplicity, this constitutes a general threshold for the lowest factor loading of interest (cf. Field 2018: 806). Similarly, the specified sample size of 1,000 is also arbitrary, though in relation to EFA, it has been interpreted as ‘excellent’ (Comrey and Lee 1992; Field 2018: 797), and it provides a 99.9% statistical power (Cohen 1988, 1991) to detect a factor loading as low as .3, which can be considered more than ‘adequate’ (cf. Cohen 1988, 1991). The upper threshold of a factor loading is specified to be .7 to mitigate multicollinearity among the 10 manifestations (cf. Field 2018: 799; McElreath 2019), which would make the correlation matrix singular and the KMO index undefined. This issue can be assessed by computing the variance inflation factor (VIF, Agresti 2018: 445-446; Lüdecke et al. 2021; Wooldridge 2019: 92) when predicting one manifestation using all other manifestations, and from its results, the risk of multicollinearity can be considered ‘low’ (i.e., \(\forall \theta_m: VIF \leq 1.091\), Ben-Shachar et al. 2020; Zuur et al. 2010).

To ease model specification (cf. McElreath 2019: 111), all measures are transformed using Fisher z-scoring (henceforth standardization, cf. Field 2018: 37; Stock and Watson 2019: 65; Wooldridge 2019: 696-697). In line with the ongoing conceptual distinction of formula between statistical frameworks, let \(\text{Z}(\theta_m)\) denote the piecewise function for standardizing manifestation \(m\), defined as:

\[ \text{Z}(\theta_m) \equiv \left\{ \begin{array}{lll} \frac{\theta_{u,m} - \hat{\mu}_m}{\hat{\sigma}_m} & \text{if Freq.} \land n < N & \text{(2.4.1.1)} \\ \frac{\theta_{u,m} - \mu_m}{\sigma_m} & \text{if Freq.} \land n = N & \text{(2.4.1.2)} \\ \frac{\theta_{u,m} - \bar{\theta}_m}{s_m} & \text{if MLE-Bayes.} & \text{(2.4.1.3)} \end{array} \right. \tag{2.4.1} \]

where formula 2.4.1.1 is a Frequentist standardization based on the estimated mean (i.e., \(\hat{\mu}_m\)) and (biased) standard deviation (i.e., \(\hat{\sigma}_m\)) of the population. Formula 2.4.1.2 is similarly Frequentist and is used with census data, with standardization being based on the mean (i.e., \(\mu_m\)) and standard deviation (i.e., \(\sigma_m\)) of the population. By contrast, formula 2.4.1.3 can be considered MLE-Bayesian, with the standardization being based on the mean (i.e., \(\bar{\theta}_m\)) and standard deviation (i.e., \(s_m\)) of the sample. This could again be made ‘properly’ Bayesian using MCMC methods, though that is beyond the scope of this paper. For conceptual clarity, this demonstration of the Frequentist KMO index uses formula 2.4.1.1 for function 2.4.1, and applying this function across every column of the data matrix simplifies the computation of the correlation matrix, because it becomes equivalent to the variance-covariance matrix (i.e., \(\text{Cov}(\text{Z}(\theta_m), \text{Z}(\theta_{m'})) = \text{Cor}(\theta_m, \theta_{m'})\)).

To help illustrate the usefulness of standardization, it is important to recognize that numerous statistical functions can be conceptualized as special cases of the Generalized Linear Model (GLM, Field 2018; Gelman et al. 2021: 263-264; Kruschke 2014), and the correlations of a correlation matrix are no exception. For reasons that will become apparent when demonstrating the Bayesian KMO index (Section 3.3), a GLM-expression for calculating the correlations between the manifestations in the data, using standard notation (e.g., Levy and Mislevy 2020; McElreath 2019: 441-442), is provided in model 2.4.2. Here, each row (\(u\)) in the standardized set of \(k\) manifestations, with \(Z(\theta)_u = \{Z(\theta_{u, 1}), \dots, Z(\theta_{u, k})\}\), is taken to be iid as a conditionally multivariate Normal probability distribution (i.e., \(\mathcal{MVN}\), Gelman et al. 2014: 578, 582; Stock and Watson 2019: 752-753), with a \(k\)-length manifestation-varying mean vector parameter (i.e., \(\mu_m\)) and a \(k\) times \(k\) variance-covariance matrix parameter (i.e., \(\Sigma\)). Following Barnard and colleagues (2000; see also McElreath 2019: 441-442), the variance-covariance matrix can be decomposed into two \(k\) times \(k\) diagonal matrices (Boyd and Vandenberghe 2018: 114) of standard deviations (i.e., \(D\)) and one \(k\) times \(k\) correlation matrix (i.e., \(P\)). Due to the aforementioned standardization of the data, and the absence of predictors in the model, the mean vector becomes a \(k\)-length zero vector (Boyd and Vandenberghe 2018: 5) of intercepts; and \(D\) becomes a \(k\) times \(k\) identity matrix (Boyd and Vandenberghe 2018: 113; Stock and Watson 2019: 749), because its diagonal becomes a ones vector, which simplifies the variance-covariance matrix (\(\Sigma\)) into the correlation matrix (i.e., \(P\)). As such, due to the standardization, symmetry, and ones vector diagonal, the model will solely have to estimate the unique \(\frac{k(k-1)}{2}\) inter-correlations in \(P\), which speeds up computation. A more cumbersome alternative would involve fitting \(\frac{k(k-1)}{2}\) bivariate linear models to the standardized data matrix, one for each pair of manifestations, since the resulting standardized coefficients (Wooldridge 2019: 184-185) are equivalent to the linear inter-correlations (cf. Agresti 2018: 273).

Irrespective of approach, once the inter-correlations are estimated, they can be used to construct the estimated correlation matrix (\(\hat{P}\)), which together with its inverse (\(\hat{P}^{-1}\)), can be used to first compute the anti-image correlation matrix (\(\hat{P_q}\)) and then the overall (\(\hat{\alpha}\)) and \(k\) individual KMO indices (\(\hat{\alpha}_m\)). Applying the Frequentist formula to the simulated data, a matrix containing the estimated inter-correlations (\(\hat{\rho}\)), anti-image inter-correlations (i.e, \(\hat{\rho}_q\)), and the KMO indices, is provided for ease of overview in table 7.1 in section 7.5 of the Appendix.

The estimated overall KMO index of this simulated data (\(\hat{\alpha}\)) is .846, which is indicative of a relatively small diffusion in the correlations, consistent with relatively few latent factors which would make it appropriate for a factor analysis. This demonstrates that by calculating the KMO index, the researcher can obtain useful information that enables them to consider the ‘sampling adequacy’ of the data matrix to be fit for an EFA and to expect few latent factors. By now, the reader should be familiar with the KMO index, its calculation, Frequentist conceptualization, interpretation and usefulness, and this review thus turns to a discussion of its inferential viability and robustness.

2.5 Inference

While inferential statements are seldom made in relation to the KMO index, these can be useful to express uncertainty about the estimated ‘sampling adequacy’ of a data matrix. The absence of such practices may be due to the lack of studies on the sampling distribution of the KMO index, making researchers unable to make inferential statements with reference to a known parametric distribution. However, researchers with sufficiently large and representative samples can generally overcome such a limitation (cf. Agresti 2018: 140) with the non-parametric bootstrap (Efron 1979, 2003; Efron and Tibshirani 1994). Here, the data matrix (\(\mathcal{D}\)) can be permuted a sufficiently large number of times (i.e., \(\mathcal{D}\) is re-sampled with replacement) to produce a distribution of counterfactually sampled data matrices. From each permuted data matrix, an overall and \(k\) individual KMO indices (one for each manifestation) can be calculated by applying the functions in section 2.2, to produce an approximation of the sampling distribution of the KMO indices. Probability statements subject to the limitations of Frequentism and the Likelihood-based approach (cf. Cohen 1994; Kruschke 2014; Wagenmakers et al. 2010) can then be made, for example, in the form of p-values (cf. Stock and Watson 2019: 110-117) and/or 95% confidence intervals (Stock and Watson 2019: 117-118). While such inferential statements may not necessarily be appropriate within the context of an exploratory factor analysis (EFA), because such statements theoretically are confirmatory and assume a priori specified hypotheses, they could be considered useful in relation to a Confirmatory Factor Analysis (CFA, Brown 2015; Levy and Mislevy 2020), where prespecified hypotheses of the ‘sampling adequacy’ of a data matrix can be tested prior to testing hypotheses about factor loadings and model structure. Since the ‘sampling adequacy’ matters just as much for a CFA as for an EFA, such hypotheses are relevant for applied research, and the evidence provided by this procedure can corroborate the Frequentist conclusion that the data matrix is factorable at the population level and appropriate for a factor analysis.

An example of applying the Frequentist Null Hypothesis Significance Testing procedure (NHST, cf. Field 2018: 72-90; Kruschke 2014: 300-333) to make inferential statements about the KMO index can be demonstrated for the previous example. Here, the null hypothesis (\(H_0\)) can be stated as \(\text{KMO}_{\text{overall}} = .5\), which reflects the aforementioned KMO index value that can be considered ‘indiscernible’ in relation to an assessment of ‘sampling adequacy’ (see Section 2.3), and researchers should preferably be able to eliminate the uncertainty expressed by this demonstrably plausible value. As such, the alternative hypothesis (\(H_a\)) will state that \(\text{KMO}_{\text{overall}} \neq .5\), and the rejection level for \(H_0\) will be prespecified at the conventional 5% level. Here, the mean, bias-reduced (Park et al. 2022; Park and Wang 2020, 2022) standard error (SE, Stock and Watson 2019: 113-114), and 95% bias-corrected and accelerated interval (BCa, Efron 1987) are calculated from 40,005 permutations of the data. This number of permutations was chosen to help invoke the law of large numbers (Angrist and Pischke 2015: 13-16; Imai 2017: 300-302; Sen and Singer 1993; Stock and Watson 2019: 85-86) and for comparability with a later demonstration of the Bayesian KMO index (see Section 3.3). This results in an estimated overall KMO index (\(\hat{\alpha}\)) of .831 (SE = .012; 95% BCa[.805; .851]), where the SE and 95% BCa, given their Frequentist conceptualization, help express the (sampling) uncertainty about the previously conceptualized estimated sampling adequacy in the population (\(\hat{\alpha}\)). Since the estimated sampling distribution of the KMO index is with 95% confidence higher than a value of .5, this is consistent with the sampling adequacy being discernible enough for a proper assessment, and the relatively high values are also consistent with a low level of diffusion in the correlations at the population level, suggesting that the data matrix is factorable and appropriate for a subsequent factor analysis.

2.6 Robustness

With the inferential viability of the KMO index having been demonstrated, this section clarifies why the arguably strict stipulations were necessary when specifying the data matrix in section 2.1. While those stipulations by definition hold for data simulated under such specifications, for any real-world data, they merely constitute assumptions that may be violated. Should this be the case, the conclusions regarding the ‘sampling adequacy’ of a data matrix provided by the KMO index may be incorrect.

As made evident by the formula in section 2.2, the KMO index is a function of the correlation matrix. Because the calculation of the correlation matrix can be considered a special application of the GLM, the robustness of the KMO index is directly tied to the Gauss-Markov theorem (Stock and Watson 2019: 103-104, 195; Wooldridge 2019: 88). Besides requiring the manifestations to be measurable on an interval scale and meaningfully reducible to quantitative constructs (Agresti 2018: 24; Field 2018: 10-13), this means that the KMO index generally makes assumptions that include the following:

Random sampling (Agresti 2018: 30; Stock and Watson 2019: 158-161, 176; Wooldridge 2019: 41, 80): If one wishes to credibly infer results about the ‘sampling adequacy’ from a sample to a population, the probability of the units being included in the sample must effectively be equivalent to their proportion in the population. As used in the data specification in sections 2.1 and 2.4, this is generally best achieved with random sampling. Violating this assumption could bias inference if the sampled units are systematically different from the unit population (i.e., sampling bias, Agresti 2018: 30-31), possibly resulting in incorrect inference (i.e., a Type III error, Gigerenzer 2004: 599). For example, if the correlations used to compute the KMO index suffer from a wrongly estimated magnitude and direction (i.e., a Type M or Type S error, respectively, cf. Gelman et al. 2021: 59; Gelman and Carlin 2014). In real-world scenarios, there are no easy solutions to a violation of this assumption, though common approaches involve re-weighting data to better match the population (e.g., Downes et al. 2018), which could improve the inferential viability of the KMO index.

Sample variation (Wooldridge 2019: 42): None of the measured manifestations can be constant, that is, \(\forall \theta_m: SD(\theta_m) > 0)\). This is because the linear correlation is only defined for variable phenomena, as the formulas in function 2.2.5 otherwise include a division by zero. For that reason, sections 2.1 and 2.4 specified a large and heterogenous population, with positive standard deviations for both the latent and manifest phenomena. This assumption can easily be assessed by checking whether all of the measured manifestations vary, and a violation could be solved by collecting more data but can otherwise be difficult to solve.

No perfect multicollinearity (Stock and Watson 2019: 226-231; Wooldridge 2019: 80): There is no exact linear association between any manifestation, because, as previously noted in 2.2, the correlation matrix will otherwise be singular and not invertible, making the KMO index undefined. If the data matrix is full column rank, as specified in section 2.1, this assumption is met by definition. As demonstrated in section 2.4, multicollinearity can be assessed with the VIF. If it is violated, a solution may be to remove the manifestations responsible for the multicollinearity from the analysis.

Homoskedasticity (Stock and Watson 2019: 188-192; Wooldridge 2019: 45, 88): All of the manifestations must have constant variance as assumed by the variance-covariance matrix (\(\Sigma\)) of the GLM in section 2.4. This was ensured by specifying constant standard deviations for both the latent and manifest phenomena in sections 2.1 and 2.4. Violating this assumption can reduce the ability to infer results, though this can be somewhat mitigated if inference is made with the non-parametric bootstrap. Alternative solutions could possibly involve modeling instances of heteroskedasticity (e.g., with weighted least squares, Agresti 2018: 449; Field 2018: 239; Stock and Watson 2019: 195-196; Wooldridge 2019: 273-283).

Zero conditional mean (Imai 2017: 370-371; Stock and Watson 2019: 157-158, 160-161; Wooldridge 2019: 42, 82-83): None of the manifestations can be measured with systematic error. This assumption would be violated if the manifestations were systematically mismeasured, or if their relationship with the latent phenomenon was spurious (Agresti 2018: 306-308; Gumbel 1933) or masked/suppressed (i.e., confounded, Agresti 2018: 309; McElreath 2019: 144-153; Lenz and Sahn 2021), potentially leading to false positive or false negative conclusions regarding the factorability of the data matrix (i.e., a Type I and Type II error, respectively, cf. Neyman and Pearson 1933). In sections 2.1 and 2.4, this was guaranteed by specifying the latent phenomenon to be exogenous of all confounders and temporally preceding the manifestations, which for simplicity, were also specified to occur simultaneously in time. Violations of this assumption are hard to solve and are perhaps best addressed a priori, with accurate measures, exogenous manipulation of the latent phenomenon, and/or sound theory.

Independent and identically distributed (Stock and Watson 2019: 82): All the units and manifestations must be iid, as specified in sections 2.1 and 2.4. Violating this assumption can bias inference (Stock and Watson 2019: 108), with uncertainty estimates often becoming too small, though solutions could possibly include modeling the clustering (i.e., interdependence) of the units and/or manifestations with multilevel multivariate models.

No outliers (Stock and Watson 2019: 176): There can be no values of the measured manifestations in the sample with a sufficiently high leverage (Agresti 2018: 442) to make the inter-correlations unrepresentative of the population of interest. This assumption was met by specifying that the sample was to be sufficiently large, representative, and drawn from the Normal probability distribution in sections 2.1 and 2.4. While the leverage of outliers can generally be reduced by increasing the sample size, another approach is to change the likelihood distribution of the GLM from a Normal to a Student t (Gosset 1908; Helmert 1876a; 1876b; 1875; Lüroth 1876; Pearson 1895b). This is because the Student t probability distribution, besides containing a mean and standard deviation parameter, includes a degrees of freedom parameter (df, Agresti 2018: 126-132; Imai 2017: 339-342; McElreath 2019: 233-234; Stock and Watson 2019: 80; Wooldridge 2019: 708-709), which controls the width of the tails of the distribution: The highest width is achieved at the lower bound of 1, while the lowest width is achieved as df approaches infinity, where the Student t becomes equivalent to the Normal. By allowing for a greater width of the tails, the Student t can be considered more robust, because is better able to handle outliers (Gelman et al. 2021: 264-286; Juárez and Steel 2010; Kruschke 2014: 458-468; McElreath 2019: 233-234; O’Hagan 1979), reducing their leverage and ability to influence the results. Since the Student t is seldom implemented for GLMs in the Frequentist framework, a Student t multivariate model is not implemented in relation to the Frequentist KMO index, but given its common use in Bayesian statistics (e.g., Juárez and Steel 2010; O’Hagan 1979), it will be discussed in relation to the Bayesian KMO index (see Section 3.5).

Linearity (Wooldridge 2019: 40, 80): All of the manifestations must be (approximately) linearly related to the latent phenomena, as specified in sections 2.1 and 2.4, since the PPCC is a linear measure of association (Imai 2017: 143; Stock and Watson 2019: 157). This assumption is typically violated (cf. Kruschke 2014: 424), for example, if the manifestations have been measured on an ordinal scale (Agresti 2018: 24-25; Field 2018: 12) instead of an interval scale. Such issues can be mitigated by transforming the measured manifestations into ranks (cf. Field 2018: 284) and applying the aforementioned standardization function 2.4.1 covered in section 2.4. This joint rescaling-procedure, henceforth denoted rank-standardization, effectively transforms the PPCC into Spearman’s (1904) rank correlation coefficient (SRCC, cf. Agresti 2018: 247-248; Field 2018: 351-352; Spearman 1910). Such a rescaling-procedure can be shown by letting \(Rank(\theta_m)\) denote the ranking function (cf. Field 2018: 288; R Core Team 2024):

\[ \text{Rank}(\theta_m) \equiv \frac{1 + \#\{ l \mid x_l < \theta_{u,m} \} + \#\{ l \mid x_l \le \theta_{u,m} \}}{2} \tag{2.6.1} \]

where \(\#\{ l \mid x_l < \theta_{u,m} \}\) expresses the counts less than the \(u\)th value of manifestation \(m\) and \(\#\{ l \mid x_l \le \theta_{u,m} \}\) expresses the counts less than or equal to the \(u\)th value of manifestation \(m\), which together with 1 in the numerator and the division by 2 serves to account for ties through averaging. Then, let \(Z_{rank}(\theta_m)\) denote the rank-standardization function of manifestation \(m\), and while its conceptual output, similar to definition 2.4.1, varies across statistical approaches, it can be specified irrespective of framework as:

\[ \text{Z}_\text{rank}(\theta_m) \equiv \frac{\text{Rank}(\theta_m)_u - \text{Mean}(\text{Rank}(\theta_m))}{\text{SD}(\text{Rank}(\theta_m))} \tag{2.6.2} \]

Calculating the KMO index on data transformed using function 2.6.2 enables it to capture ‘sampling adequacy’ in relation to ordinal relationships, which unlike the PPCC does not require linear relationships between the latent phenomenon and its manifestations. Instead, it merely requires the relationships between the theorized manifestations and the latent phenomenon to be monotonous (cf. Clapham and Nicholson 2014). Researchers may be unsure when to apply such a transformation, but in the event of linearity, SRCC approximates PPCC, so researchers could by default do rank-standardization and compare results with and without this rescaling-procedure.13 For the above example, rank-standardizing every manifestation in the data matrix and then calculating the KMO index yields an estimated overall KMO (\(\hat{\alpha}\)) of .840. The negligible difference between this result and the previous linear-dependent result (i.e., \(\Delta\).006) can be interpreted as being consistent with linearity of the functional form underlying the relationship between the latent phenomenon and the theorized manifestations.

Taken together, the robustness of KMO index thus relies on numerous assumptions that were accounted for when specifying the data matrix in sections 2.1 and 2.4. However, for any real-world data, any of these could potentially be violated, possibly resulting in misleading results. Accordingly, to ensure a valid and reliable KMO index, researchers should be wary of violating the stipulations made here. Having revisited the Frequentist KMO index, a Bayesian reconceptualization can now be made.

3 Bayesian KMO

The first part of this paper (Section 2) revisited the classical KMO index (i.e., ‘Little Jiffy, Mark IV,’ Kaiser and Rice 1974), thoroughly reviewing its formula, interpretation, viability for inference, and robustness. By considering its calculation and conceptualization within different statistical frameworks, this KMO index was found to be Frequentist, leaving it incompatible with Bayesian inference. For the purposes of that review, the term ‘MLE-Bayesian’ was specifically introduced in section 2.2 to jointly refer to a Likelihood-based conceptualization of the KMO index that could be considered Bayesian in the context of ‘non-informative’ priors and maximum-likelihood point estimators. While this served to build towards a possible Bayesian conceptualization, the formula and concepts specific to that terminology evidently failed to enable the estimation of a posterior distribution and the incorporation of prior information, making it somewhat ‘improper’ for Bayesian inference.

In this section, the limitations of that MLE-Bayesian approach are solved by introducing an arguably more proper Bayesian Kaiser-Meyer-Olkin index. This is accomplished by reconceptualizing the KMO index through use of Bayes’ theorem and discussing the specification of priors and estimation of a posterior distribution (Section 3.1), resulting in an adjustment of the formula introduced in section 2.2 to derive the Bayesian KMO (BKMO) index from a posterior of correlations (Section 3.2). This is followed by an example demonstrating how to compute this BKMO index (Section 3.3), how to assess whether results are sensitive to priors (Section 3.4), and a discussion of a Robust BKMO index (Section 3.5).

3.1 Reconceptualization

To reconceptualize the KMO index in a ‘proper’ Bayesian manner, it is necessary to consider Bayes’ theorem (Bayes and Price 1763; Laplace 2009[1814]), sometimes known as Bayes’ rule (e.g., Imai 2017: 266; Kruschke 2014: 100-104), which is the conceptual foundation of Bayesian inference (Gelman et al. 2014; Kruschke 2014; Levy and Mislevy 2020; McElreath 2019). As shown in equation 3.1.1 (cf. Field 2018: 124; Imai 2017: 266; McElreath 2019: 49), Bayes’ theorem consists of \(P(b | a)\), which is the likelihood; \(P(b)\), the marginal likelihood; \(P(a)\), the prior probability; and \(P(a | b)\), the posterior probability.

\[ P(a | b) = \frac{P(a)P(b | a)}{P(b)} \tag{3.1.1} \]

Within Bayesian inference (Gelman et al. 2014; Kruschke 2014; Levy and Mislevy 2020; McElreath 2019), \(a\) and \(b\) are often respectively conceptualized as the parameter(s) and the data. This means that \(P(b)\) is the marginal likelihood of the data, \(P(a)\) is the (prior) probability of the parameter, and \(P(b | a)\) is the likelihood of the data given the parameter. While this often leads researchers to conceptualize \(P(a | b)\) as the ‘probability of the parameter given the data’ (e.g., Kruschke 2014), it is worth emphasizing that this posterior is dependent on the prior, \(P(a)\), and likelihood, \(P(b | a)\), making conclusions sensitive to their specifications. Accordingly, the choices involved in specifying the prior and likelihood can be taken to reflect a modeling of the data that may impact the posterior, and since researchers could disagree about the choice of prior and likelihood, these choices may be a point of contention. To appropriately reflect this dependency, the posterior, \(P(a | b)\), will be referred to as the probability of the parameter given the (modeled) data.

Researchers working within the Frequentist and Likelihood-based frameworks should be familiar with the likelihood, \(P(b | a)\), since this is the only parameter of Bayes’ theorem that is used within these frameworks, either implicitly or explicitly, specifically because these approaches require the specification of a likelihood distribution for the statistical models that aim to describe the data (e.g., the Normal likelihood for the GLM, cf. Agresti 2018; Field 2018; King 1998; Stock and Watson 2019; Wooldridge 2019). The Bayesian framework differs by instead considering the posterior, \(P(a | b)\), to be of central interest. To enable its calculation, in addition to specifying a distribution for the likelihood, Bayesian researchers must also specify a distribution for the prior. A historical limitation of this approach is that the marginal likelihood, \(P(b)\), also needs to be calculated, which can be difficult, if not practically impossible, except for only the simplest of models (cf. Kruschke 2014: 115). However, recent advances in Markov-Chain Monte Carlo methods (MCMC, Brooks et al. 2011) have solved this limitation by circumventing the need to calculate \(P(b)\) through approximation-based solutions (cf. Gelman et al. 2014: 275-308; Kruschke 2014: 143-191; Levy and Mislevy 2020: 93-112; McElreath 2019: 263-296), enabling entirely new applications of Bayesian inference. These advancements also make a Bayesian reconceptualization of the KMO index more viable, since it now can be relatively easily calculated with MCMC methods.

With Bayes’ theorem in mind, \(a\) and \(b\) will thus here be respectively reconceptualized as the ‘sampling adequacy’ (i.e., \(\alpha_{Bayes}\)) and data matrix (\(\mathcal{D}\)). Here \(\alpha_{Bayes}\) substitutes the MLE-Bayesian concept of the ‘sampling adequacy in the data’ (\(a\)) introduced in section 2.2, since this ‘proper’ Bayesian conceptualization reflects the ‘sampling adequacy given the data’, where \(a\) may not equal \(\alpha_{Bayes}\) due to the infusion of prior information. While \(\alpha\) could generally be used instead of \(\alpha_{Bayes}\), the latter denotation is preferred here to distinguish it from the Frequentist concept of \(\alpha\) used in section 2. This reconceptualization of \(a\) and \(b\) in Bayes’ theorem is appropriate, since formula 2.2.8.3 for function 2.2.8 in section 2.2 showed that \(a\) is a function of the inter-correlations (\(r\)) and anti-image inter-correlations (\(r_q\)), which both are based on the data matrix (\(\mathcal{D}\)). For similar reasons, the inter-correlations, inverse inter-correlations, and anti-image inter-correlations, when infused with prior information, can be distinctly denoted \(\rho_{Bayes}\), \(\rho_{Bayes}^{-1}\), and \(\rho_{qBayes}\), respectively.

\(P(a)\) can then be denoted \(P(\alpha_{Bayes})\) and conceptualized as the prior probability of the ‘sampling adequacy’, which can be more meaningfully interpreted as the uncertainty about the ‘sampling adequacy’ before observing the data. Such prior uncertainty can be taken to reflect the stochasticity involved in data collection (e.g., random sampling can produce data of varying quality) or fundamental epistemological uncertainty about the phenomenon being studied. \(P(b)\) can then be denoted \(P(\mathcal{D})\) and becomes the marginal likelihood of the data matrix, which, as previously mentioned, can be ignored due to the reliance on MCMC methods. With this in mind, \(P(b|a)\) becomes \(P(\mathcal{D}|\alpha_{Bayes})\), which reflects the likelihood of the data matrix given the ‘sampling adequacy’, and from this, it follows that the \(P(\alpha_{Bayes}|\mathcal{D})\) can be conceptualized as the posterior probability of the ‘sampling adequacy’, that is, the probability of the ‘sampling adequacy’ given the (modeled) data.

With the reliance on MCMC methods, researchers thus only need to specify the prior of the sampling adequacy, \(P(\alpha_{Bayes})\), and the likelihood of the data given the ‘sampling adequacy’, \(P(\mathcal{D} | \alpha_{Bayes})\), to arrive at the posterior probability of the ‘sampling adequacy’, \(P(\alpha_{Bayes} | \mathcal{D})\). However, considering the relatively abstract concept and interpretation of ‘sampling adequacy’ and the aforementioned confusion regarding its mathematical properties, researchers may not find it easy to specify a prior and likelihood distribution for it. These difficulties can be circumvented by recognizing that researchers need not necessarily specify a prior and likelihood for the KMO index per se, which is contrary to the specification of most statistical models in Bayesian inference (e.g., McElreath 2019; for an exception, see, e.g., Kruschke 2014: 226-230). This circumvention is possible because the KMO index has been shown to be a deterministic function of the correlation matrix, meaning that there is no stochastic nor epistemological uncertainty added to our knowledge of \(a\) in transitioning from \(r\) \(-\) through \(r^{-1}\) and \(r_q\) \(-\) to \(a\), which means that the prior and likelihood need only be specified for the correlations. Stated differently, the prior and likelihood of the correlation matrix implies priors and likelihood of the KMO index. Estimating the Bayesian KMO index thus becomes a question of applying Bayesian inference in relation to a correlation matrix.

In line with the above application of the GLM for calculating correlations (see Section 2.4), a likelihood useful for deriving the posterior of a correlation matrix, and thus the ‘sampling adequacy’, is the aforementioned multivariate Normal probability distribution (\(\mathcal{MVN}\), Gelman et al. 2014: 578, 582). With this distribution in mind for the likelihood, this leaves the question of specifying the priors for the correlation matrix. A natural choice for such a prior is the Lewandowski-Kurowicka-Joe probability distribution (\(\mathcal{LKJ}\), Gelman et al. 2014: 578, 584; Lewandowski et al. 2009), which is an often used prior for correlation matrices in Bayesian inference (e.g., Gelman et al. 2014: 55-56; Gelman et al. 2021: 123-127; McElreath 2019: 441-442). This is, in part, because it accounts for correlation matrices having symmetric off-diagonal elements bounded between -1 and 1 and a diagonal that is a ones vector. The priors can be specified with this \(\mathcal{LKJ}\) distribution through use of its single positive real scalar shape parameter (i.e., \(\eta\)). At an \(\eta\) of one, the distribution of correlations is flat; when \(\eta\) increases from one, the distribution of correlations become less extreme, approaching an identity matrix; as \(\eta\) decreases from one, the distribution becomes more extreme (cf. McElreath 2019: 442). This means that \(\mathcal{LKJ}\) specifies a joint prior for all the correlations, making it easy for researchers to specify priors with many theorized manifestations, though it should be noted that low values of \(\eta\) can result in singular correlation matrices.

While this \(\mathcal{LKJ}\) prior can be considered generally useful in relation to a Bayesian KMO index, it does not permit specifying individual priors for the correlation matrix. Since the KMO index is more commonly applied to exploratory rather than confirmatory factor analysis, researchers are here expected to not possess prior information that would enable discriminating between the inter-correlations of manifestations, making the \(\mathcal{LKJ}\) prior appropriate for most practical uses. Should researchers nevertheless possess varying prior information about the inter-correlations, instead of a multivariate Normal, they can specify \(\frac{k(k-1)}{2}\) individual bivariate linear models with different priors to account for that information when estimating the inter-correlations, as mentioned in section 2.4.

To reflect the expected lack of substantial prior information about the inter-correlations, researchers should ideally specify the shape parameter to produce a ‘weak’ or ‘non-informative’ prior (cf. Gelman et al. 2014: 51-56). Here, in line with Gelman and colleagues (2014: 51-52), the author recommends specifying the former. While a ‘non-informative’ uniform prior can be specified for the \(\mathcal{LKJ}\) with \(\eta = 1\), uniform distributions are often not considered ‘proper’ (cf. Gelman et al. 2014: 52), and in real-world applications, there is almost always some prior information that can be extrapolated from existing research and applied to the particular research context. For example, in relation to prior information on correlations between novel phenomena, research on inter-correlations in the literature can be used to create realistic prior expectations about these inter-correlations (e.g., Funder and Ozer 2019; Gignac and Szodorai 2016; Lovakov and Agadullina 2021). Should researchers nevertheless prefer ‘non-informative’ priors, caution is advised, since a uniform distribution is not always the correct choice (cf. Gelman et al. 2014: 51-55; Jeffreys 1946; Kruschke 2014: 293; Lee and Webb 2005; Zhu and Lu 2004).

Specifying ‘weakly-informative’ priors for the \(\mathcal{LKJ}\) prior can be done in numerous ways. One general approach involves specifying ‘skeptical, yet persuadable’ priors, whose distributions are deliberately conservative by expecting small inter-correlations, but which remain wide enough to ‘let data speak for itself’ (e.g., \(\eta = 2\), McElreath 2019: 441-442). Such priors can impact data in a manner similar to regularization and help prevent overfitting, which can improve the generalization of results (cf. McElreath 2019: 214-217; Kruschke 2014: 531-532). Another approach (cf. Kruschke 2014: 294) could take advantage of the fact that ‘tomorrow’s prior is today’s posterior’ (Lindley 1972: 2) by specifying a prior for each individual inter-correlation using existing results. Such priors could be derived from previous posteriors, or in a manner similar to cross-validation (cf. Stone 1974), by deriving the posterior from a relatively small subset of the current data (e.g., 10% - 20%), and then using this posterior to specify priors for an analysis of the remaining data (for a discussion, see, e.g., Berger and Pericchi 2001). However, in the event that researchers do possess quality prior information, they should not stray from using this information to specify ‘strongly informative’ priors (cf. Kruschke 2014: 113-115), though they should always transparently report their priors to open them up to criticism from their peers.

Regardless of approach, researchers may be concerned whether their choice of priors for the \(\mathcal{LKJ}\) inadvertently influence the results. As a response to this, it can be stated that for most statistical models, as long as the priors assigns non-zero probability to every possible parameter value, data can quickly overwhelm the prior (cf. Kruschke 2014: 293; Wagenmakers et al. 2010: 167), producing relatively prior-insensitive results that closely resemble Frequentist and Likelihood-based estimates, but for whom a Bayesian interpretation can be made. Similarly, should researchers specify priors using a uniform distribution (e.g., \(\eta = 1\), McElreath 2019: 441-442), results approximate likelihood estimates (cf. Hastie et al. 2017: 272) even with little data (cf. Kruschke 2014: 113). Researchers can always compare results based on ‘strong’ or ‘weakly informative’ priors to results based on ‘non-informative’ priors, or results derived from the non-parametric bootstrap, to assess whether results are robust to the prior specification, though researchers should note that the re-sampling involved in the non-parametric bootstrap is not equivalent to the MCMC method of Bayesian inference (cf. Kruschke 2014: 161).

With these considerations for the specification of the prior in mind, the Bayesian KMO index (BKMO) is thus conceptualized as ‘a measure of sampling adequacy given the data’. Borrowing denotation from the Frequentist approach to distinguish this concept from the MLE-Bayesian approach, this sampling adequacy given the (modeled) data can be denoted \(\alpha_{Bayes}\), or alternatively \(\alpha\), depending on the need to distinguish it from its Frequentist and Likelihood-based conceptualizations. Since the Bayesian approach, similar to Frequentist or Likelihood-based procedures, involves estimating the posterior ‘sampling adequacy’ with MCMC methods, the resulting quantity is the estimated sampling adequacy given the (modeled) data (i.e., \(\hat{\alpha}_{Bayes}\), or \(\hat{\alpha}\)). This is a relevant distinction, since the two quantities (i.e., \(\alpha_{Bayes}\) and \(\hat{\alpha}_{Bayes}\)) may not necessarily be equivalent. For the same reasons, notation is introduced to denote the estimated inter-correlations (i.e., \(\hat{\rho}_{Bayes}\)), inverse inter-correlations (i.e., \(\hat{\rho}_{Bayes}^{-1}\)), and anti-image inter-correlations (i.e., \(\hat{\rho}_{qBayes}\)).

3.2 Formula

By now, it should have been made clear that the BKMO index is not directly computed using MCMC methods, but instead are to be calculated from the posterior distribution of correlation matrices, which is why the likelihood and prior only needed to be specified for the correlation matrix. This means that the BKMO index can be calculated, analogous to the bootstrapped Frequentist KMO index, simply by, for each posterior draw, constructing a correlation matrix from the estimated inter-correlations (\(\hat{\rho}_{Bayes}\)), which together with the estimated anti-image inter-correlations (\(\hat{\rho}_{qBayes}\)), can be used to estimate the posterior of the overall (\(\hat{\alpha}_{Bayes}\)) and individual KMO indices (\(\hat{\alpha}_{Bayes[m]}\)) using the MLE-Bayesian formula in functions 2.2.7 and 2.2.8 from section 2.2.

\[ \text{BKMO}(\mathcal{D}) \equiv \hat{\alpha}_{Bayes[d]} \equiv \frac{\underset{i \neq j}{\sum \sum} \hat{\rho}^2_{Bayes[i,j,d]}}{\underset{i \neq j}{\sum \sum} \hat{\rho}^2_{Bayes[i,j,d]} + \underset{i \neq j}{\sum \sum} \hat{\rho}_{qBayes[i,j,d]}^2} \tag{3.2.1} \]

The reliance on these MLE-Bayesian point-estimating formula is appropriate, because all prior information has already been incorporated into the estimated posterior correlation matrix (i.e., \(\hat{P}_{Bayes}\)). Accordingly, the formula adjusted for the Bayesian reconceptualization can be concisely expressed with formula 3.2.1, where \(d\) denotes the index for the \(d\)th posterior draw. Similarly, adapting the MLE-Bayesian formula 2.2.8.3 in function 2.2.8 using the same Bayesian reconceptualization yields the following formula for estimating the posterior ‘sampling adequacy’ of manifestation \(m\):

\[ \text{BKMO}(\mathcal{D}_m) \equiv \hat{\alpha}_{Bayes[m,d]} \equiv \frac{\sum_{j, j \neq i} \hat{\rho}_{Bayes[i,j,d]}^2}{\sum_{j, j \neq i} \hat{\rho}_{Bayes[i,j,d]}^2 +\sum_{j, j \neq i} \hat{\rho}^2_{qBayes[i,j,d]}} \tag{3.2.2} \]

In accordance with this Bayesian conceptualization of the KMO index, from the distribution of its estimated posterior values, coherent probabilistic statements can be made about the sampling adequacy given the (modeled) data. For example, one can ascertain the posterior probability that the ‘sampling adequacy’ is greater than .5 (i.e., \(P(\hat{\alpha}_{Bayes} > .5 \:\ | \:\ \mathcal{D})\)), or one can compute the 95% highest density (contiguous) interval (HD(C)I, Kay 2024a; 2024b; Kruschke 2014: 28; Makowski et al. 2019b; McElreath 2019: 56-58), which contains the 95% most probable values of the ‘sampling adequacy’, both of which are coherent probabilistic statements within the Bayesian framework (cf. Clayton 2021; Kruschke 2014). This is different from the Frequentist KMO index, which cannot incorporate prior information, is highly sensitive to sampling intentions, and where probabilistic statements are less intuitive, since the ‘sampling adequacy’ is fixed for any particular sample, varies solely across samples, and are about the data given the ‘sampling adequacy’ (i.e., \(P(\mathcal{D} | \alpha)\)), instead of the typically more interesting statements about the ‘sampling adequacy’ given the data (i.e., \(P(\alpha | \mathcal{D})\)) (cf. Clayton 2021; Kruschke 2014; McElreath 2019). Having thus reconceptualized the KMO index within the Bayesian statistical framework and introduced it as the Bayesian KMO index (BKMO), a demonstration is provided.

3.3 Example

To demonstrate the conceptualized Bayesian KMO index, the simulated data matrix previously specified (see Section 2.4) is once again used. Contrary to the Frequentist approach, the manifestations are standardized using the MLE-Bayesian formula 2.3.1.1 for function 2.3.1 mentioned in section 2.2, which is preferred to a ‘proper’ Bayesian standardization for computational ease, since the the estimation would otherwise have to account for a posterior of standardizations as input. Using the same notation introduced in section 2.4 (cf. Levy and Mislevy 2020; McElreath 2019: 441-442), a Bayesian Generalized Linear Model (BGLM) is formalized.

Model 3.3.1 states that the individual standardized manifestations (i.e., \(Z(\theta)_u\)) in the data matrix (\(\mathcal{D}\)) are iid14 as a conditionally multivariate Normal distribution (i.e., \(\mathcal{MVN}\)) with a \(k\)-length manifestation-varying mean vector parameter (i.e., \(\mu_m\)) and \(k\) times \(k\) variance-covariance matrix (i.e., \(\Sigma\)) parameter. Similar to the GLM in section 2.4 (cf. Barnard et al. 2000), the variance-covariance matrix can be decomposed into two \(k\) times \(k\) diagonal matrices of standard deviations (i.e., \(D\)) and one \(k\) times \(k\) correlation matrix (i.e., \(P\)). Given the standardization of each manifestation in \(\mathcal{D}\) and lack of predictors, the mean vector (\(\mu_m\)) becomes a zero vector of intercepts, while \(D\) simplifies to an identity matrix. This reduces the variance-covariance matrix to the correlation matrix (\(P\)), whose unique \(\frac{k(k-1)}{2}\) inter-correlations become the only estimands of the model for which priors will have to be specified. The prior distribution of \(P\) is specified using the \(\mathcal{LKJ}\) distribution (Lewandowski et al. 2009), whose shape parameter (i.e., \(\eta\)), consistent with recommendations by McElreath (2019: 442), is specified as 2 to reflect an a priori belief in elatively small inter-correlations, which is intended to be ‘regularizing/weakly informative’ relative to the data (cf. McElreath 2019: 214-217, 442; Gelman et al. 2014: 55-56; Gelman et al. 2021: 123-127).

To visualize the effective prior for the ‘sampling adequacy’ that results from this prior for the correlation matrix, MCMC can be used to sample from the prior distribution (cf. Kruschke 2014: 212). The effective prior distribution of the ‘sampling adequacy’ resulting from this \(\mathcal{LKJ}\) prior has an average of .304 (SD = .093; 95% HDI[.127; .480]), with the estimated distribution being shown in figure 3.1. Inspecting this prior for the ‘sampling adequacy’ reveals an a priori belief in a relatively low ‘sampling adequacy’. This can be substantively interpreted to reflect a conservative expectation that the data will be relatively infactorable due to a high diffusion in correlations and thus inappropriate for factor analysis. In this instance, a shape parameter for the \(\mathcal{LKJ}\) prior distribution of 2 for 10 theorized manifestations effectively implies a ‘skeptical’ prior for the ‘sampling adequacy’. While the width of this distribution can be taken to indicate that it is also a priori ‘persuadable’, this property of the prior is ultimately determined in relation to the strength of the data, and as such, whether the prior is ‘persuadable’ will be assessed through comparisons with results derived from a ‘non-informative’ prior.

Figure 3.1: Overall ‘Sampling Adequacy’

Overall 'Sampling Adequacy'

NOTE: Prior/Posterior distributions of the estimated overall Bayesian KMO index (i.e., \(\hat{\alpha}_{Bayes}\)). Prior/Posterior samples = 40,005. The geometrics below the distributions indicate the mode (circle) and the 95% HDCI (bar). Results derived from simulated data.

Model 3.3.1 is then fit to the simulated data using the brms \(\textsf{R}\) package (Bürkner 2017, 2018), which is useful due to its flexibility in model specification and reliance on the STAN probabilistic programming language (Stan Development Team 2024a) that enables efficient sampling from the posterior using the No-U-Turn Sampler (NUTS, Hoffman and Gelman 2014) for Hamiltonian MCMC (cf. Kruschke 2014: 400-416). To help assess whether the sampler converged to the posterior distribution, 15 independent sampling chains were used, and to improve estimation reliability, 2,000 warm-up samples were drawn to increase the chances of convergence prior to sampling. After the warm-up period, 4,667 samples were drawn from each chain, resulting in a total of 40,005 posterior samples, whose size was chosen to help invoke the law of large numbers (Angrist and Pischke 2015; Imai 2017: 300-302; Sen and Singer 1993; Stock and Watson 2019: 85-86), as well as ensure a sufficient effective sample size (i.e., ESS ≥ 10,000, Bürkner 2017; Kass et al. 1998; Kruschke 2014: 182-184) after accounting for autocorrelation (Kruschke 2014: 182). Based on a visual inspections of trace plots (Kruschke 2014: 179-180), adequate values of the Gelman-Rubin convergence metric (\(\hat{R}\) < 1.01, cf. Vehtari et al. 2021; see also Gelman and Rubin 1992) and the Monte Carlo Standard Error (i.e., MCSE ≤ .001, cf. Kruschke 2014: 186-187), the sampling chains showed signs of convergence, suggesting the posterior distribution had been reliably identified.

Having estimated the posterior distribution of the inter-correlations matrix, the BKMO index can then be calculated, using the MLE-Bayesian formula outlined in section 2.2, by constructing a correlation matrix from the estimated correlation coefficients for each posterior draw. Since this estimated correlation matrix is infused with prior information, it is no longer denoted \(R\) as previously conceptualized in section 2.2 for the MLE-Bayesian approach. Instead, like the Bayesian KMO index, it is the estimated correlation matrix given the (modeled) data, which can be denoted \(\hat{P}_{Bayes}\) to distinguish it from the Frequentist conceptualization, though it may also be denoted \(\hat{P}\) when the context is clear. This is followed by calculating the estimated anti-image correlation matrix given the (modeled) data (i.e., \(\hat{P}_{q[Bayes]}\), alternatively \(\hat{P_q}\)) using the inverse estimated correlation matrix given the (modeled) data (i.e., \(\hat{P}^{-1}_{Bayes}\), alternatively \(\hat{P}^{-1}\)), using the framework-appropriate formula 2.2.6.3 in function 2.2.6 for each posterior draw. From the resulting distribution of matrices, the overall and individual BKMO indices are calculated using the Bayesian formula 3.2.1 and 3.2.2, respectively. Since the posterior distribution consists of 40,005 samples, the function implementing the calculation of the BKMO relied on vectorization to speed up computation (cf. Kruschke 2014: 408). This BKMO index function was implemented using \(\textsf{R}\) code and is provided for ease of implementation in section 7.2 of the Appendix.

The estimated posterior distribution of the BKMO index, that is, the estimated factorability given the (modeled) data (\(\hat{\alpha}_{Bayes}\)) is provided visually next to the prior distribution in figure 3.1. These results can be summarized metrically, with the posterior average of the overall BKMO index being .825, with a posterior standard deviation15 (SD) of .011. The 95% HDI reveals that the ‘sampling adequacy’ with 95% credibility is between .802 and .845. Comparing the posterior to the prior distribution reveals an overlap of .000% and an average difference of .521 (SD = .093; 95% HDI[.346; .701]), revealing that the two distributions are credibly different. The fact that the credible values of the ‘sampling adequacy’ excludes .5 and have become substantially higher than this value is consistent with a discernible and credibly high ‘sampling adequacy’. This means that the diffusion in the correlations are relatively low with a high degree of credibility, and the data matrix could thus be considered factorable and appropriate for a subsequent factor analysis.

3.4 Prior Sensitivity

To demonstrate how researchers can assess the sensitivity of results to their choice of priors, the posterior derived from strongly or weakly informative priors can be compared to results derived from ‘non-informative’ priors or bootstrapped-based approaches.

Figure 3.2: Overall ‘Sampling Adequacy’

Overall 'Sampling Adequacy'

NOTE: Likelihood, Posterior, and (Non-parametric) bootstrap distributions of the estimated overall KMO index (i.e., \(\hat{\alpha}\)). Likelihood/Posterior/Bootstrap samples/permutations = 40,005. The geometrics below the distributions indicate the mode (circle) and the 95% HDCI (bar). Results derived from simulated data.

The first approach can be demonstrated here by respecifying model 3.3.1 so that the \(\mathcal{LKJ}\) prior has a shape parameter of 1, creating a flat prior probability for the correlation matrix. Refitting it using the same MCMC software yields similarly reliable model diagnostics, which are not reported here for brevity. Since the priors of this model are ‘non-informative’, this model approximates a maximum-likelihood model, and the resulting ‘likelihood’ distribution is illustrated in figure 3.2, where the average BKMO index value is .825 (SD = .011; 95% HDI[.802; .845]). The overlap between this ‘likelihood’ distribution and the posterior is approximately 100%, with an average difference of .000 (SD = .000; 95% HDI[.000; .000]), revealing them to be credibly indifferent. This corroborates the claim that the prior specified in model 3.3.1 is ‘pursuadeable’/‘weakly informative’ and showcases that the BKMO can approximate likelihood-based results with enough data.

While a Bayesian model with ‘non-informative’ priors can often be fitted using MCMC methods, sometimes for complex models, it can be difficult to get the model to reliably converge. An alternative approach involves a comparison with results from the non-parametric bootstrap, which similarly approximates MCMC with ‘non-informative’ priors. This model was fit previously when discussing the inferential viability of the Frequentist KMO index in section 2.5, and the resulting (non-parametric) bootstrap distribution is also provided in figure 3.2. The results of this model are not repeated here, but the overlap between this bootstrap distribution and the posterior is approximately 79%, with an average difference of -.006 (SD = .016; 95% HDI[-.038; .026]). This relatively high degree of overlap and lack of a credible difference is again consistent with a ‘weakly informative’ prior, and it showcases that the BKMO can approximate a bootstrapped Frequentist KMO index.

As such, the similarities in results across the Bayesian, likelihood-based, and bootstrapped Frequentist approaches reflect the aforementioned properties of Bayesian inference, where ‘weakly informative’ priors can be quickly overwhelmed by data, resulting in the convergence of results. Results from Bayesian inference can thus approximate Frequentist/Likelihood-based results, though solely the Bayesian-based approach enables the incorporation of prior information and coherent probabilistic statements from the posterior.

3.5 Robustness

Similar to its Frequentist counterpart, the BKMO index can also be made more robust to violations of the assumptions of linearity and the presence of outliers. Since the data matrix was specified and simulated in sections 2.1 and 2.4 to ensure linearity and no outliers, and the fact that Frequentist and Bayesian approaches have already been demonstrated to provide similar results, for brevity, the implementation of a Robust BKMO index will not be demonstrated here. Instead it will only be conceptualized and discussed.

As previously discussed (see Section 2.6), if the manifestations exhibit non-linearity or are measured with an ordinal scale, the manifestations can be rank-standardized to reduce the influence of outliers and identify ordinal relationships, making the resulting KMO index more robust to these violations. Consistent with the Bayesian framework, for the BKMO index this would involve rank-standardizing the data matrix using the MLE-Bayesian formula 2.4.1.3 for function 2.4.1 before fitting model 3.3.1, resulting in a Robust BKMO index.

While rank-standardizing is one way to derive a Robust BKMO index, a more common Bayesian approach involves changing the likelihood of the multivariate model from the Normal to the more outlier-robust Student t (Gelman et al. 2021: 264-286, 437; Juárez and Steel 2010; Kruschke 2014: 458-468; McElreath 2019: 233-234; O’Hagan 1979), as previously alluded to in section 2.6. By basing the likelihood of the model on a multivariate Student t probability distribution (\(\mathcal{MVT}\), Gelman et al. 2014: 580), this introduces a normality (i.e., \(\nu\)) parameter into the model16 (Kruschke 2014: 458-459), which also requires a prior specification. At the lower bound of 1, the Student t is equivalent to a Cauchy probability distribution (Li and Nadarajah 2020), for which the mean and variance are not defined (being defined at a \(\nu\) of 2 and 3, respectively), and as previously mentioned, as \(\nu\) approaches infinity, the Student t becomes equivalent to the Normal probability distribution (Gelman et al. 2021: 264-286, 437; Stock and Watson 2019: 80; Wooldridge 2019: 708-709).

A common Bayesian approach to specifying the prior for \(\nu\) (e.g., Juárez and Steel 2010; Simpson et al. 2017) involves the natural logarithmic function (i.e., \(\text{ln}(\cdot)\), Imai 2017: 91-92) and the Gamma probability distribution (\(\mathcal{G}\), Agresti 2018: 449-450; Gelman et al. 2014: 578, 583; Kruschke 2014: 236-245; McElreath 2019: 315), whose joint application ensure that the prior for \(\nu\) has a lower bound of 1. This Gamma distribution can be characterized by a shape and rate parameter (Kruschke 2014: 236-245), and Juárez and Steel (2010) find that a shape parameter of 2 and a rate of 0.1 have general advantages when estimating the model. Here, a Robust BKMO index could thus be achieved by respecifying the likelihood of model 3.3.1 to: \(Z(\theta)_u | \mu_m, \Sigma \sim \mathcal{MVT}_{\nu_u}(\mu_m, \Sigma) \; \text{for} \: u \: \text{in 1, } \dots \text{, } n\); and appending the Gamma specification for the prior at the end of the model: \(\text{ln}(\nu)_u \sim \mathcal{G}(2, 0.1)\). Fitting that model would enable the inter-correlations to be estimated with greater robustness, yielding a more robust BKMO index in turn.

4 Conclusion

This paper has introduced the Bayesian KMO index. This was accomplished by reviewing the classical Kaiser-Meyer-Olkin (KMO) index, which is a Measure of Sampling Adequacy (MSA) that helps assess whether a data matrix is appropriate for a factor analysis. To build towards a Bayesian KMO index (Section 2), this review revisited its calculation and discussed its mathematical and conceptual properties, which served to show that the KMO index is a Frequentist statistic, whose inferential viability and robustness rely on strict assumptions about the data matrix. Bayes’ theorem was then used to reconceptualize the KMO index (Section 3), resulting in a Bayesian KMO (BKMO) index that can incorporate prior information and enable coherent probabilistic statement to be made about the sampling adequacy given the (modeled) data. Estimated with MCMC methods, its ease of computation was demonstrated, and this paper provides efficient code to easily implement it in \(\textsf{R}\) software.

The conceptualization of a BKMO index constitutes a novel contribution to the literature on statistical inference, in particular, the subdiscipline of factor analysis. This is because it enables researchers to incorporate prior information when assessing the ‘sampling adequacy’ of a data matrix, and from the resulting posterior distributions, coherent probabilistic statements about the factorability of the data matrix can be made. To remain consistent within their statistical framework, Bayesian researchers should ideally substitute the Frequentist KMO index for its Bayesian counterpart whenever they are interested in assessing the ‘sampling adequacy’ prior to a factor analysis. The areas of application for the BKMO index are numerous, especially within the psychometrics subdiscipline of psychology (e.g., Brown 2015; Levy and Mislevy 2020), where it can provide critical information to researchers developing new measurement scales for latent phenomena. Researchers can assess the sensitivity of results to their priors through comparisons to results based on uniform priors or bootstrapped-based approaches. Like its Frequentist counterpart, the BKMO can also be made more robust to violations of linearity and the presence of outliers, for example, by rank-standardizing the data matrix prior to the analysis or by using a Student t multivariate likelihood for the statistical model. The usefulness of the BKMO index is not restricted to exploratory factor analyses, and can be extended to confirmatory factor analyses, where researchers can prespecify and test hypotheses about the sampling adequacy. As such, the BKMO index is a novel and widely applicable tool for Bayesians researchers that can offer relevant insights prior to a factor analysis.

5 Declarations

Author Contributions

  • Emil Niclas Meyer-Hansen independently conceived, conceptualized, and implemented the Bayesian version of the Kaiser-Meyer-Olkin (KMO) index, simulated the data, conducted the analyses, and wrote this manuscript.

Funding

  • Emil Niclas Meyer-Hansen received no funding to assist with this paper.

Conflicts of Interest

  • Emil Niclas Meyer-Hansen has no competing interests to declare that are relevant to the content of this paper.

Availability of Data, Code & Materials

  • Data, code, and materials are made freely available on the OSF page of this study (DOI: 10.17605/OSF.IO/T3UPD).

Acknowledgements

  • This study was made possible by building on insights and contributions from numerous researchers across disciplines, in particular Henry F. Kaiser, Edward P. Meyer, Ingram Olkin, and John Rice. The author would like to thank those who provided helpful comments on the paper and is similarly indebted to the STAN Development Team and Paul-Christian Bürkner, whose contributions to Bayesian inference is fundamental to the Bayesian KMO as conceptualized and implemented here.

6 Literature

Agresti, A. (2018), Statistical methods for the social sciences, Pearson.
Ahmad, U. S., Safdar, S., and Azam, M. (2024), “An assessment of bilateral debt swap financing indispensable for economic growth and environment sustainability: A policy implication for heavily indebted countries,” Environmental Science and Pollution Research International, 31, 5716–5734. https://doi.org/10.1007/s11356-023-31577-3.
Albers, C., and Lakens, D. (2018), “When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias,” Journal of experimental social psychology, 74, 187–195. https://doi.org/10.1016/j.jesp.2017.09.004.
Allaire, J. J., Xie, Y., Dervieux, C., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., Wickham, H., Cheng, J., Chang, W., and Iannone, R. (2024), Rmarkdown: Dynamic documents for r.
Almeida, L. N., Behlau, M., Santos Ramos, N. dos, Barbosa, I. K., and Almeida, A. A. (2022), “Factor analysis of the brazilian version of the voice-related quality of life (v-RQOL) questionnaire,” Journal of Voice, 36, 736.e17–736.e24. https://doi.org/10.1016/j.jvoice.2020.08.033.
Angrist, J. D., and Pischke, J.-S. (2015), Mastering ’metrics: The path from cause to effect, Princeton: Princeton University Press.
Armstrong, J. S., and Soelberg, P. (1968), “On the interpretation of factor analysis,” Psychological Bulletin, 70, 361–364. https://doi.org/10.1037/h0026434.
Ashton, M. C., and Lee, K. (2009), “The HEXACO-60: A short measure of the major dimensions of personality,” Journal of Personality Assessment, 91, 340–345. https://doi.org/10.1080/00223890902935878.
Barnard, J., McCulloch, R., and Meng, X.-L. (2000), Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage,” Statistica Sinica, 10, 1281–1311.
Bartlett, M. S. (1950), “Tests of significance in factor analysis,” British Journal of Psychology, 3, 77–85. https://doi.org/10.1111/j.2044-8317.1950.tb00285.x.
Bayes, T., and Price, R. (1763), “An essay towards solving a problem in the doctrine of chance. By the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to John Canton, A.M.F.R.S. Philosophical Transactions of the Royal Society of London, 53, 370–417. https://doi.org/10.1098/rstl.1763.0053.
Ben-Shachar, M. S., Lüdecke, D., and Makowski, D. (2020), effectsize: Estimation of effect size indices and standardized parameters,” Journal of Open Source Software, 5, 2815. https://doi.org/10.21105/joss.02815.
Berger, J. O., and Pericchi, L. R. (2001), Objective bayesian methods for model selection: Introduction and comparison, Institute of Mathematical Statistics; IMS Lecture notes—Monograph series, Model selection, pp. 135–207. https://doi.org/10.1214/lnms/1215540968.
Bolch, B. W. (1968), “More on unbiased estimation of the standard deviation,” The American Statistician, 22, 27. https://doi.org/10.2307/2681802.
Boyd, S. P., and Vandenberghe, L. (2018), Introduction to applied linear algebra: Vectors, matrices, and least squares, Cambridge University Press.
Bravais, A. (1844), Analyse mathematique sur les probabilités des erreurs de situation d’un point, Impr. royale.
Brock, K. (2023), Trialr: Clinical trial designs in ’rstan’.
Brooks, S., Gelman, A., Jones, G., and Meng, X.-L. (2011), Handbook of markov chain monte carlo, Chapman; Hall/CRC. https://doi.org/10.1201/b10905.
Brown, T. A. (2015), Confirmatory factor analysis for applied research, Guilford.
Bunyakovsky, V. (1859), “Sur quelques inegalités concernant les intégrales aux différences finies,” Mémoires présentés à l’Académie impériale des sciences de St. Petersbourg, 7, 6.
Burgess, J. P. (2022[1948]), Set theory, Cambridge University Press.
Bürkner, P.-C. (2017), brms: An R package for Bayesian multilevel models using Stan,” Journal of Statistical Software, 80, 1–28. https://doi.org/10.18637/jss.v080.i01.
Bürkner, P.-C. (2018), “Advanced Bayesian multilevel modeling with the R package brms,” The R Journal, 10, 395–411. https://doi.org/10.32614/RJ-2018-017.
Cajori, F. (2007), A history of mathematical notations, Cosimo Classics.
Cantor, G. (1874), “Über eine eigenschaft des inbegriffs aller reellen algebraischen zahlen,” Journal für die Reine und Angewandte Mathematik, 77, 258–262. https://doi.org/10.1515/crll.1874.77.258.
Carroll, R. M., and Nordholm, L. A. (1975), “Sampling characteristics of Kelley’s epsilon and Hays’ omega,” Educational and Psychological Measurement, 35, 541–554.
Casella, G., and Berger, R. L. (2002), Statistical inference, Duxbury.
Cauchy, A.-L. (1821), “Sur les formules qui résultent de l’emploie du signe et sur > ou <, et sur les moyennes entre plusieurs quantités,” Cours d’Analyse, 1er Partie: Analyse Algébrique. OEuvres Ser.2 III, 373–377.
Champely, S. (2020), Pwr: Basic functions for power analysis.
Clapham, C., and Nicholson, J. (2014), Oxford concise dictionary of mathematics, Oxford University Press.
Clayton, A. (2021), Bernoulli’s fallacy: Statistical illogic and the crisis of modern science, Columbia University. https://doi.org/10.7312/clay19994.
Cohen, J. (1988), Statistical power analysis for the behavioral sciences, Lawrence Erlbaum Associates.
Cohen, J. (1991), “A power primer,” Psychological Bulletin, 112, 155–159. https://doi.org/10.1037/0033-2909.112.1.155.
Cohen, J. (1994), “The earth is round (p < .05),” The American Psychologist, 49, 997–1003. https://doi.org/10.1037/0003-066X.49.12.997.
Comrey, A. L., and Lee, H. B. (1992), A first course in factor analysis, Erlbaum.
Coombes, K. R., Brock, G., Abrams, Z. B., and Abruzzo, L. V. (2019), Polychrome: Creating and assessing qualitative palettes with many colors,” Journal of Statistical Software, Code Snippets, 90, 1–23. https://doi.org/10.18637/jss.v090.c01.
Danaei, Z., Madadizadeh, F., Sheikhshoaei, F., and Dehdarirad, H. (2024), “Translation and cross-cultural adaptation of persian version of evidence based medicine questionnaire (EBMQ) in postgraduate medical students in iran,” PLoS One, 19, e0301831. https://doi.org/10.1371/journal.pone.0301831.
de Finetti, B. (1974), Theory of probability, volume 1, Wiley.
Descartes, R. (1637), La géométrie.
Downes, M., Gurrin, L. C., English, D. R., Pirkis, J., Currier, D., Spittal, M. J., and Carlin, J. B. (2018), “Multilevel regression and poststratification: A modeling approach to estimating population quantities from highly selected survey samples,” American Journal of Epidemiology, 187, 1780–1790. https://doi.org/10.1093/aje/kwy070.
Dziuban, C. D., and Shirkey, E. C. (1974), “When is a correlation matrix appropriate for factor analysis? Some decision rules,” Psychological Bulletin, 81, 358–361. https://doi.org/10.1037/h0036316.
Eddelbuettel, D. (2013), Seamless R and C++ integration with Rcpp, New York: Springer. https://doi.org/10.1007/978-1-4614-6868-4.
Eddelbuettel, D. (2017), Random: True random numbers using RANDOM.ORG.
Eddelbuettel, D., and Balamuta, J. J. (2018), Extending R with C++: A Brief Introduction to Rcpp,” The American Statistician, 72, 28–36. https://doi.org/10.1080/00031305.2017.1375990.
Eddelbuettel, D., and François, R. (2011), Rcpp: Seamless R and C++ integration,” Journal of Statistical Software, 40, 1–18. https://doi.org/10.18637/jss.v040.i08.
Eddelbuettel, D., François, R., Allaire, J. J., Ushey, K., Kou, Q., Russell, N., Ucar, I., Bates, D., and Chambers, J. (2025), Rcpp: Seamless r and c++ integration.
Efron, B. (1979), “Bootstrap methods: Another look at the jackknife,” The Annals of Statistics, 7, 1–26. https://doi.org/10.1214/aos/1176344552.
Efron, B. (1987), “Better bootstrap confidence intervals,” Journal of the American Statistical Association, 82, 171–185. https://doi.org/10.1080/01621459.1987.10478410.
Efron, B. (2003), “Second thoughts on the bootstrap,” Statistical Science, 18, 135–140. https://doi.org/10.1214/ss/1063994968.
Efron, B., and Tibshirani, R. (1994), An introduction to the bootstrap, Chapman & Hall.
Euler, L. (1755), Institutiones calculi differentialis, Academia Imperialis Scientiarum.
Field, A. (2018), Discovering statistics using IBM SPSS statistics, SAGE.
Funder, D. C., and Ozer, D. J. (2019), “Evaluating effect size in psychological research: Sense and nonsense,” Advances in Methods and Practices in Psychological Science, 2, 155–168. https://doi.org/10.1177/2515245919847202.
Gabry, J., Simpson, D., Vehtari, A., Betancourt, M., and Gelman, A. (2019), “Visualization in bayesian workflow,” The Journal of the Royal Statistical Society, Series A (Statistics in Society), 182, 389–402. https://doi.org/10.1111/rssa.12378.
Gauss, C. F. (2012[1809]), Theoria motus corporum coelestium in sectionibus conicis solem ambientium, Cambridge University Press.
Gelman, A. (2022), “The development of bayesian statistics,” Journal of the Indian Institute of Science, 102, 1131–1134. https://doi.org/10.1007/s41745-022-00307-y.
Gelman, A. (2023), “What is a standard error?” Journal of Econometrics, 237, 105516. https://doi.org/10.1016/j.jeconom.2023.105516.
Gelman, A., and Carlin, J. B. (2014), “Beyond power calculations: Assessing type S (sign) and type M (magnitude) errors,” Perspectives on Psychological Science, 9, 641–651. https://doi.org/10.1177/1745691614551642.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2014), Bayesian data analysis, Chapman & Hall/CRC Press.
Gelman, A., Goodrich, B., Gabry, J., and Vehtari, A. (2019), “R-squared for bayesian regression models,” The American Statistician, 73, 307–309. https://doi.org/10.1080/00031305.2018.1549100.
Gelman, A., Hill, J., and Vehtari, A. (2021), Regression and other stories, Cambridge University Press. https://doi.org/10.1017/9781139161879.
Gelman, A., Meng, X.-L., and Stern, H. (1996), “Posterior predictive assessment of model fitness via realized discrepancies,” Statistica Sinica, 6, 733–760.
Gelman, A., and Rubin, D. B. (1992), “Inference from iterative simulation using multiple sequences,” Statistical Science, 7, 457–472. https://doi.org/10.1214/ss/1177011136.
Gelman, A., and Stern, H. (2006), “The difference between ‘significant’ and ‘not significant’ is not itself statistically significant,” The American Statistician, 60, 328–331. https://doi.org/10.1198/000313006X152649.
Ghadiri, M., Tavakoli, M., and Ketabi, S. (2015), “Development of a criticality-oriented english teaching perceptions inventory (CETPI) and exploring its internal consistency and underlying factor structure,” Current Psychology (New Brunswick, N.J.), 34, 268–281. https://doi.org/10.1007/s12144-014-9256-z.
Gigerenzer, G. (2004), “Mindless statistics,” The Journal of Socio-Economics, 33, 587–606. https://doi.org/10.1016/j.socec.2004.09.033.
Gigerenzer, G. (2018), “Statistical rituals: The replication delusion and how we got there,” Advances in Methods and Practices in Psychological Science, 1, 198–218. https://doi.org/10.1177/251524591877132.
Gignac, G. E., and Szodorai, E. T. (2016), “Effect size guidelines for individual differences researchers,” Personality and individual differences, 102, 74–78. https://doi.org/10.1016/j.paid.2016.06.069.
Gosset, W. S. (1908), “The probable error of a mean,” Biometrika, 6, 1–25. https://doi.org/10.2307/2331554.
Grolemund, G., and Wickham, H. (2011), Dates and times made easy with lubridate,” Journal of Statistical Software, 40, 1–25.
Gruber, J. (2014), The markdown file extension, The Daring Fireball Company, LLC.
Gumbel, E. J. (1933), “Spurious correlation and its significance to physiology,” Mathematical Proceedings of the Cambridge Philosophical Society, 21, 179–194. https://doi.org/10.1080/01621459.1926.10502169.
Harman, H. H. (1967), Modern factor analysis, University of Chicago Press.
Harrell Jr, F. E. (2025), Hmisc: Harrell miscellaneous.
Hastie, T. J., Tibshirani, R. J., and Friedman, J. (2017), The elements of statistical learning: Data mining, inference, and prediction, Springer.
Hays, W. (1963), Statistics for psychologists, Holt, Rinehart; Winston.
Helmert, F. R. (1876a), “Über die wahrscheinlichkeit der potenzsummen der beobachtungsfehler und uber einige damit in zusammenhang stehende fragen,” Zeitschrift für Angewandte Mathematik und Physik, 21, 192–218.
Helmert, F. R. (1876b), “Die genauigkeit der formel von peters zur berechnung des wahrscheinlichen beobachtungsfehlers directer beobachtungen gleicher genauigkeit,” Astronomische Nachrichten, 88, 113–132.
Helmert, F. R. (1875), “Über die berechnung des wahrscheinlichen fehlers aus einer endlichen anzahl wahrer beobachtungsfehler,” Zeitschrift für Angewandte Mathematik und Physik, 20, 300–303.
Hoffman, M. D., and Gelman, A. (2014), “The No-U-Turn Sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo,” Journal of Machine Learning Research, 15, 1593–1623. https://doi.org/10.48550/arxiv.1111.4246.
Howell, D. C. (2012), Statistical methods for psychology, Wadsworth.
IBM (2025), IBM SPSS statistics: KMO and bartlett’s test, IBM.
Imai, K. (2017), Quantitative social science: An introduction, Princeton University Press.
Jeffreys, H. (1946), “An invariant form for the prior probability in estimation problems,” Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 186, 453–461. https://doi.org/10.1098/rspa.1946.0056.
Jensen, J. L. W. V. (1906), “Sur les fonctions convexes et les inégalités entre les valeurs moyennes,” Acta Mathematica, 30, 175–193. https://doi.org/10.1007/BF02418571.
Juárez, M. A., and Steel, M. F. J. (2010), “Model-based clustering of non-gaussian panel data based on skew-t distributions,” Journal of Business & Economic Statistics, 28, 52–66. https://doi.org/10.1198/jbes.2009.07145.
Jung, L. (2024), Scrutiny: Error detection in science.
Kaiser, H. F. (1970), “A second generation little jiffy,” Psychometrika, 35, 401–415. https://doi.org/10.1007/BF02291817.
Kaiser, H. F. (1974), “An index of factor simplicity,” Psychometrika, 39, 31–36. https://doi.org/10.1007/BF02291575.
Kaiser, H. F. (1981), “A revised measure of sampling adequacy for factor-analytic data matrices,” Educational and Psychological Measurement, 41, 379–381. https://doi.org/10.1177/001316448104100216.
Kaiser, H. F., and Rice, J. (1974), “Little jiffy, mark IV,” Educational and Psychological Measurement, 34, 111–117. https://doi.org/10.1177/001316447403400115.
Kass, R. E., Carlin, B. P., Gelman, A., and Neal, R. M. (1998), “Markov chain Monte Carlo in practice: A roundtable discussion,” The American Statistician, 52, 93–100. https://doi.org/10.1080/00031305.1998.10480547.
Kay, M. (2024a), ggdist: Visualizations of distributions and uncertainty in the grammar of graphics,” IEEE Transactions on Visualization and Computer Graphics, 30, 414–424. https://doi.org/10.1109/TVCG.2023.3327195.
Kay, M. (2024b), ggdist: Visualizations of distributions and uncertainty. https://doi.org/10.5281/zenodo.3879620.
Kay, M. (2024c), tidybayes: Tidy data and geoms for Bayesian models. https://doi.org/10.5281/zenodo.1308151.
Kelley, T. L. (1935), “An unbiased correlation ratio measure,” Proceedings of the National Academy of Sciences, 21, 554–559. https://doi.org/10.1073/pnas.21.9.554.
Kendall, M. G. (1938), “A new measure of rank correlation,” Biometrika, 30, 81–89. https://doi.org/10.1093/biomet/30.1-2.81.
King, G. (1998), Unifying political methodology - the likelihood theory of statistical inference, The University Of Michigan Press.
Kruschke, J. K. (2014), Doing bayesian data analysis: A tutorial with r, JAGS, and stan, Academic Press.
Lakens, D. (2017), “Equivalence tests: A practical primer for t tests, correlations, and meta-analyses,” Social Psychological & Personality Science, 8, 355–363. https://doi.org/10.1177/1948550617697177.
Lamport, L. (1986), LATEX: A document preparation system, Addison-Wesley Pub. Co.
Laplace, P. S. (2009[1814]), Essai philosophique sur les probabilités, Cambridge: Cambridge University Press.
Lee, M. D., and Webb, M. R. (2005), “Modeling individual differences in cognition,” Psychonomic Bulletin & Review, 12, 605–621. https://doi.org/10.3758/BF03196751.
Legendre, A. M. (1805), Nouvelles méthodes pour la détermination des orbites des comètes, F. Didot.
Lenz, G. S., and Sahn, A. (2021), “Achieving statistical significance with covariates and without transparency,” Political Analysis, 29, 356–369. https://doi.org/10.1017/pan.2020.31.
Lesur, R. (2025), Klippy: Copy to clipboard buttons for r markdown HTML documents.
Levy, R., and Mislevy, R. J. (2020), Bayesian psychometric modeling, Chapman & Hall/CRC.
Lewandowski, D., Kurowicka, D., and Joe, H. (2009), “Generating random correlation matrices based on vines and extended onion method,” Journal of Multivariate Analysis, 100, 1989–2001. https://doi.org/10.1016/j.jmva.2009.04.008.
Li, R., and Nadarajah, S. (2020), “A review of student’s t distribution and its generalizations,” Empirical Economics, 58, 1461–1490. https://doi.org/10.1007/s00181-018-1570-0.
Lindley, D. V. (1972), Bayesian statistics, a review, SIAM.
Lovakov, A., and Agadullina, E. R. (2021), “Empirically derived guidelines for effect size interpretation in social psychology,” European Journal of Social Psychology, 51, 485–504. https://doi.org/10.1002/ejsp.2752.
Lüdecke, D., Ben-Shachar, M. S., Patil, I., Waggoner, P., and Makowski, D. (2021), performance: An R package for assessment, comparison and testing of statistical models,” Journal of Open Source Software, 6, 3139. https://doi.org/10.21105/joss.03139.
Lüroth, J. (1876), “Vergleichung von zwei werten des wahrscheinlichen fehlers,” Astronomische Nachrichten, 87, 209–220. https://doi.org/10.1002/asna.18760871402.
Makowski, D., Ben-Shachar, M. S., Chen, S. H. A., and Lüdecke, D. (2019a), “Indices of effect existence and significance in the bayesian framework,” Frontiers in Psychology, 2767. https://doi.org/10.3389/fpsyg.2019.02767.
Makowski, D., Ben-Shachar, M. S., and Lüdecke, D. (2019b), “bayestestR: Describing effects and their uncertainty, existence and significance within the bayesian framework,” Journal of Open Source Software, 4, 1541. https://doi.org/10.21105/joss.01541.
McElreath, R. (2019), Statistical rethinking: A bayesian course with examples in r, Chapman; Hall/CRC. https://doi.org/10.1201/9780429029608.
Meschiari, S. (2022), latex2exp: Use LaTeX expressions in plots.
Microsoft, and Weston, S. (2022a), doParallel: Foreach parallel adaptor for the ’parallel’ package.
Microsoft, and Weston, S. (2022b), Foreach: Provides foreach looping construct.
Moran, P. A. P. (1948), “Rank correlation and product-moment correlation,” Biometrika, 35, 203–206. https://doi.org/10.2307/2332641.
Mordkoff, J. T. (2019), “A simple method for removing bias from a popular measure of standardized effect size: Adjusted partial eta squared,” Advances in Methods and Practices in Psychological Science, 2, 228–232. https://doi.org/10.1177/2515245919855053.
Müller, K., and Wickham, H. (2023), Tibble: Simple data frames.
Nahin, P. (1998), An imaginary tale: The story of the square root of −1, Princeton University Press.
Neyman, J., and Pearson, E. S. (1933), “The testing of statistical hypotheses in relation to probabilities a priori,” Mathematical Proceedings of the Cambridge Philosophical Society, 29, 492–510. https://doi.org/10.1017/S030500410001152X.
O’Hagan, A. (1979), “On outlier rejection phenomena in bayes inference,” Journal of the Royal Statistical Society. Series B (Methodological), 41, 358–367. https://doi.org/10.1111/j.2517-6161.1979.tb01090.x.
Park, C., Kim, H., and Wang, M. (2022), “Investigation of finite-sample properties of robust location and scale estimators,” Communications in Statistics - Simulation and Computation, 51, 2619–2645. https://doi.org/10.1080/03610918.2019.1699114.
Park, C., and Wang, M. (2020), “A study on the x-bar and s control charts with unequal sample sizes,” Mathematics, 8, 698. https://doi.org/10.3390/math8050698.
Park, C., and Wang, M. (2022), rQCC: Robust quality control chart.
Pasek, J., Tahk, A., Culter, G., and Schwemmle, M. (2021), Weights: Weighting and weighted statistics.
Patil, I., Makowski, D., Ben-Shachar, M. S., Wiernik, B. M., Bacher, E., and Lüdecke, D. (2022), datawizard: An R package for easy data preparation and statistical transformations,” Journal of Open Source Software, 7, 4684. https://doi.org/10.21105/joss.04684.
Pearson, K. (1895a), “Notes on regression and inheritance in the case of two parents,” Proceedings of the Royal Society of London, 58, 240–242. https://doi.org/10.1098/rspl.1895.0041.
Pearson, K. (1895b), “Contributions to the mathematical theory of evolution. II. Skew variation in homogeneous material,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 186, 343–414. https://doi.org/10.1098/rsta.1895.0010.
Posit Team (2025), RStudio: Integrated development environment for r, Boston, MA: Posit Software, PBC.
R Core Team (2024), R: A language and environment for statistical computing, Vienna, Austria: R Foundation for Statistical Computing.
Revelle, W. (2025), Psych: Procedures for psychological, psychometric, and personality research, Evanston, Illinois: Northwestern University.
Revolution Analytics, and Weston, S. (2022), Iterators: Provides iterator construct.
Rinker, T. W., and Kurkiewicz, D. (2018), pacman: Package management for R, Buffalo, New York.
Robert, C., and Casella, G. (2013), Monte carlo statistical methods, Springer Nature.
Rubinstein, R. Y., and Kroese, D. P. (2016), Simulation and the monte carlo method, Wiley.
Rudolff, C. (1525), Die coss.
Sadowska, A., Nowak, M., and Czarkowska-Pączek, B. (2021), “Assessment of the reliability of the Polish language version of the FATCOD-b scale among nursing students,” Journal of Cancer Education, 36, 561–566. https://doi.org/10.1007/s13187-019-01665-5.
Sarfaraz, S., Surti, A., Ali, R., Rehman, R., Heboyan, A., and Ahmed, N. (2024), “Faculty application and perceived effectiveness of cognitive psychology principles in medical education. A mixed method study,” BMC Medical Education, 24, 911. https://doi.org/10.1186/s12909-024-05892-3.
Schäfer, J., and Strimmer, K. (2005), “A shrinkage approach to large-scale covariance estimation and implications for functional genomics,” Statistical Applications in Genetics and Molecular Biology, 4, 32. https://doi.org/10.2202/1544-6115.1175.
Schwarz, H. A. (1888), “Über ein flächen kleinsten flächeninhalts betreffendes problem der variationsrechnung,” Acta Societatis Scientiarum Fennicae, XV, 318.
Sen, P. K., and Singer, J. M. (1993), Large sample methods in statistics: An introduction with applications, Chapman; Hall/CRC.
Shirkey, E. C., and Dzuban, C. D. (1976), “A note on some sampling characteristics of the measure of sampling adequacy (MSA),” Multivariate Behavioral Research, 11, 125–128. https://doi.org/10.1207/s15327906mbr1101_9.
Simpson, D., Rue, H., Riebler, A., Martins, T. G., and Sørbye, S. H. (2017), “Penalising model component complexity: A principled, practical approach to constructing priors,” Statistical Science, 32, 1–28. https://doi.org/10.1214/16-STS576.
Spearman, C. (1904), “The proof and measurement of association between two things,” The American Journal of Psychology, 15, 72–101. https://doi.org/10.2307/1412159.
Spearman, C. (1910), “Correlation calculated with faulty data,” British Journal of Psychology, 3, 271–295. https://doi.org/10.1111/j.2044-8295.1910.tb00206.x.
Stan Development Team (2024a), Stan reference manual.
Stan Development Team (2024b), RStan: The R interface to Stan.”
Stan Development Team (2020), StanHeaders: Headers for the R interface to Stan.”
Stewart, J. (2010), Calculus: Early transcendentals, Cengage Learning.
Stigler, S. M. (1989), Francis Galton’s account of the invention of correlation,” Statistical Science, 4, 73–79. https://doi.org/10.1214/ss/1177012580.
Stock, J., and Watson, M. W. (2019), Introduction to econometrics, Pearson Education Limited.
Stone, M. (1974), “Cross-validatory choice and assessment of statistical predictions,” Journal of the Royal Statistical Society, Series B (Methodological), 36, 111–147. https://doi.org/10.1111/j.2517-6161.1974.tb00994.x.
Strahan, R. S. (1982), “Assessing magnitude of effect from rank-order correlation coefficients,” Educational and Psychological Measurement, 42, 763–765. https://doi.org/10.1177/001316448204200306.
The LaTeX Project (2025), The LaTeX Project, Available athttps://www.latex-project.org/.
Ushey, K., Allaire, J. J., Wickham, H., and Ritchie, G. (2024), Rstudioapi: Safely access the RStudio API.
van Buuren, S., and Groothuis-Oudshoorn, K. (2011), mice: Multivariate imputation by chained equations in R,” Journal of Statistical Software, 45, 1–67. https://doi.org/10.18637/jss.v045.i03.
Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., and Bürkner, P.-C. (2021), “Rank-normalization, folding, and localization: An improved \(\hat{R}\) for assessing convergence of MCMC (with discussion),” Bayesian Analysis, 16, 667–718. https://doi.org/10.1214/20-BA1221.
Wagenmakers, E.-J., Lodewyckx, T., Kuriyal, H., and Grasman, R. (2010), “Bayesian hypothesis testing for psychologists: A tutorial on the Savage-Dickey method,” Cognitive Psychology, 60, 158–189. https://doi.org/10.1016/j.cogpsych.2009.12.001.
Wickham, H. (2023a), Forcats: Tools for working with categorical variables (factors).
Wickham, H. (2023b), Stringr: Simple, consistent wrappers for common string operations.
Wickham, H. (2016), ggplot2: Elegant graphics for data analysis, Springer-Verlag New York.
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., Takahashi, K., Vaughan, D., Wilke, C., Woo, K., and Yutani, H. (2019), “Welcome to the tidyverse,” Journal of Open Source Software, 4, 1686. https://doi.org/10.21105/joss.01686.
Wickham, H., François, R., Henry, L., Müller, K., and Vaughan, D. (2023), Dplyr: A grammar of data manipulation.
Wickham, H., and Henry, L. (2025), Purrr: Functional programming tools.
Wickham, H., Hester, J., and Bryan, J. (2024a), Readr: Read rectangular text data.
Wickham, H., Vaughan, D., and Girlich, M. (2024b), Tidyr: Tidy messy data.
Wooldridge, J. M. (2019), Introductory econometrics, Cengage.
Xie, Y. (2014), “Knitr: A comprehensive tool for reproducible research in R,” in Implementing reproducible computational research, eds. V. Stodden, F. Leisch, and R. D. Peng, Chapman; Hall/CRC.
Xie, Y. (2015), Dynamic documents with R and knitr, Boca Raton, Florida: Chapman; Hall/CRC.
Xie, Y. (2016), Bookdown: Authoring books and technical documents with R markdown, Boca Raton, Florida: Chapman; Hall/CRC.
Xie, Y. (2024), Knitr: A general-purpose package for dynamic report generation in r.
Xie, Y. (2025), Bookdown: Authoring books and technical documents with r markdown.
Xie, Y., Allaire, J. J., and Grolemund, G. (2018), R markdown: The definitive guide, Boca Raton, Florida: Chapman; Hall/CRC.
Xie, Y., Dervieux, C., and Riederer, E. (2020), R markdown cookbook, Boca Raton, Florida: Chapman; Hall/CRC.
Zabihi, R., Ketabi, S., Tavakoli, M., and Ghadiri, M. (2014), “Examining the internal consistency reliability and construct validity of the authentic happiness inventory (AHI) among iranian EFL learners,” Current Psychology (New Brunswick, N.J.), 33, 377–392. https://doi.org/10.1007/s12144-014-9217-6.
Zhu, H. (2024), kableExtra: Construct complex table with ’kable’ and pipe syntax.
Zhu, M., and Lu, A. Y. (2004), “The counter-intuitive non-informative prior for the bernoulli family,” Journal of Statistics Education, 12. https://doi.org/10.1080/10691898.2004.11910734.
Zuur, A. F., Ieno, E. N., and Elphick, C. S. (2010), “A protocol for data exploration to avoid common statistical problems: Data exploration,” Methods in Ecology and Evolution, 1, 3–14. https://doi.org/10.1111/j.2041-210X.2009.00001.x.

7 Appendix

This appendix contains the specifications for software used in this paper (Section 7.1); the \(\textsf{R}\) function implementing the Bayesian Kaiser-Meyer-Olkin (BKMO) index (Section 7.2); the formula and \(\textsf{R}\) function implementations of the different versions of the Frequentist KMO index (Section 7.3); the results from the Monte Carlo simulations assessing the characteristics of the Frequentist KMO index discussed in 2.3 (Section 7.4); and a results matrix containing the Frequentist inter-correlations, anti-image inter-correlations, and KMO indices of the simulated data used in section 2.4 (Section 7.5).

7.1 Software Specifications

This document was created on a Windows 11 x64 (build 26100) operating system (OS) on the 2025-09-20 14:45 CEST (Timezone: Europe/Copenhagen). The machine has 16 cores available, and the range of doubles is \(2.23\cdot10^{-308}\) - \(1.8\cdot10^{+308}\).

All code is written using the \(\textsf{R}\) programming language (v4.4.3 [2025-02-28 ucrt], R Core Team 2024), while the document is written with markdown (Gruber 2014) and LaTeX (Lamport 1986; The LaTeX Project 2025) in the RStudio IDE (2025.5.1.513, Posit Team 2025), using the \(\textsf{R}\) packages bayesplot (v1.13.0, Gabry et al. 2019), bayestestR (v0.16.1, Makowski et al. 2019b), bookdown (v0.43, Xie 2016, 2025), brms (v2.22.0, Bürkner 2017, 2018), datawizard (v1.1.0, Patil et al. 2022), doParallel (v1.0.17, Microsoft and Weston 2022a), dplyr (v1.1.4, Wickham et al. 2023), effectsize (v1.0.1, Ben-Shachar et al. 2020), forcats (v1.0.0, Wickham 2023a), foreach (v1.5.2, Microsoft and Weston 2022b), ggplot2 (v3.5.2, Wickham 2016), Hmisc (v5.2.3, Harrell Jr 2025), iterators (v1.0.14, Revolution Analytics and Weston 2022), kableExtra (v1.4.0, Zhu 2024), klippy (v0.0.0.9500, Lesur 2025), knitr (v1.50, Xie 2014, 2015, 2024), latex2exp (v0.9.6, Meschiari 2022), lubridate (v1.9.4, Grolemund and Wickham 2011), pacman (v0.5.1, Rinker and Kurkiewicz 2018), parallel (v4.4.3, R Core Team 2024), performance (v0.15.0, Lüdecke et al. 2021), Polychrome (v1.5.4, Coombes et al. 2019), psych (v2.5.6, Revelle 2025), purrr (v1.1.0, Wickham and Henry 2025), pwr (v1.3.0, Champely 2020), Rcpp (v1.1.0, Eddelbuettel 2013; Eddelbuettel et al. 2025; Eddelbuettel and Balamuta 2018; Eddelbuettel and François 2011), readr (v2.1.5, Wickham et al. 2024a), rmarkdown (v2.29, Allaire et al. 2024; Xie et al. 2018, 2020), rQCC (v2.22.12, Park et al. 2022; Park and Wang 2020, 2022), rstan (v2.32.7, Stan Development Team 2024b), rstudioapi (v0.17.1, Ushey et al. 2024), scrutiny (v0.5.0, Jung 2024), StanHeaders (v2.32.10, Stan Development Team 2020), stringr (v1.5.1, Wickham 2023b), tibble (v3.3.0, Müller and Wickham 2023), tidybayes (v3.0.7, Kay 2024c), tidyr (v1.3.1, Wickham et al. 2024b), tidyverse (v2.0.0, Wickham et al. 2019), trialr (v0.1.6, Brock 2023), and weights (v1.1.2, Pasek et al. 2021). The seed used for generating random numbers (e.g., for MCMC) was exogenously predetermined using the random \(\textsf{R}\) package (v0.2.6, Eddelbuettel 2017).

7.2 Function

A simple implementation of a function for the Bayesian Kaiser-Meyer-Olkin (BKMO) index in \(\textsf{R}\) (v4.4.3 [2025-02-28 ucrt], R Core Team 2024), which computes the posterior ‘sampling adequacy’ from the posterior inter-correlations estimated by the brms \(\textsf{R}\) package (v2.22.0, Bürkner 2017, 2018), is provided in the code below:

# Compute Bayesian KMO index from a Bayesian multivariate linear model
#
# @param r A two-dimensional matrix, where rows are posterior draws and columns are
# manifestations, containing the estimated inter-correlations from a Bayesian
# multivariate linear model
# @return A matrix of Bayesian KMO indices, where rows are posterior draws (with the
# number of rows equal to the number of posterior draws) and columns are the
# individual BKMO indices and one overall BKMO index (with the number of columns equal
# to the number of theorized manifestations plus one).

BKMO <- function(r = NULL){
  if(!is.matrix(r)) r <- as.matrix(r)
  
  # Compute p (i.e., number of theorized manifestations)
  disc <- sqrt(1 + 8 * ncol(r))
  p <- (1 + disc) / 2
  
  # Define utility function for vectorized operation
  utility_KMO <- function(r = r, p = p){
    # Construct correlation matrix 
    R <- diag(1, p)
    R[upper.tri(R)] <- r
    R[lower.tri(R)] <- t(R)[lower.tri(R)]
    row.names(R) <- paste0('m', 1:p)
    colnames(R) <- paste0('m', 1:p)
    
    # Compute inverse R
    R_inv <- solve(R)
    
    # Compute anti-image R
    R_q <- -R_inv / sqrt(outer(diag(R_inv), diag(R_inv)))
    
    # Compute KMO
    r2 <- R^2
    q2 <- R_q^2
    diag(r2) <- 0
    diag(q2) <- 0
    kmo <- c(
      rowSums(r2) / (rowSums(r2) + rowSums(q2)),
      overall = sum(r2) / (sum(r2) + sum(q2))
    )
    return(kmo)
  }
  
  # Compute KMO for each posterior draw using vectorization
  bkmo <- do.call(rbind, lapply(1:nrow(r), function(i) utility_KMO(
        r = r[i, , drop = FALSE], p = p
      )
    )
  )
  return(as.data.frame(bkmo))
}
## Example

# Simulate data
library("tidyverse")
sample_size <- 1000
n_manifestations <- 10
sesoi <- .3
loadings <- runif(n_manifestations, sesoi, .7)
df <- data.frame(
  latent = rnorm(sample_size)
)
for(i in 1:n_manifestations){
  df[,paste0("m", i)] <- loadings[i]*df$latent + rnorm(sample_size)
}
df <- df %>% select(-latent)

# Define maximum-likelihood functions
sd2 <- function(x = NULL, na.rm = FALSE){
  stopifnot(is.numeric(x))
  stopifnot(is.logical(na.rm))

  x_mean <- mean(x, na.rm = na.rm)
  x_var <- mean((x - x_mean)^2, na.rm = na.rm)
  x_sd <- sqrt(x_var)
  return(x_sd)
}
standardize2 <- function(x = NULL, na.rm = FALSE){
  stopifnot(is.numeric(x))
  stopifnot(is.logical(na.rm))
  
  x_mean <- mean(x, na.rm = na.rm)
  x_var <- mean((x - x_mean)^2, na.rm = na.rm)
  x_sd <- sqrt(x_var)

  x_standardized <- (x - x_mean) / x_sd
  return(x_standardized)
}

# Standardize data matrix
df <- apply(df, 2, standardize2) %>% as.data.frame()

# Specify MCMC
library("parallel")
posterior_samples <- 4000
chains <- cores <- parallel::detectCores()-1
warmup <- 1000
iter <- ceiling((posterior_samples / chains) + warmup)
posterior_samples <- (iter - warmup)*chains

# Specify Bayesian Generalized Linear Model (BGLM)
library("brms")
bglm_model <- brms::bf(
  mvbind(m1, m2, m3, m4, m5, m6, m7, m8, m9, m10) ~ 0,
  sigma ~ 0,
  family = gaussian(
    link = "identity"
  )
) + set_rescor(TRUE)


# Specify 'weakly-informative' model priors
bglm_priors <- brms::get_prior(bglm_model, data = df)
bglm_priors[1,] <- brms::set_prior(
  "lkj(2)",
  class = "rescor"
)

# Fit Bayesian multivariate linear model with brms
bglm_fit <- brms::brm(
  formula = bglm_model,
  family = gaussian(
    link = "identity"
  ),
  prior = bglm_priors,
  data = df, 
  chains = chains,
  cores = cores,
  iter = iter,
  warmup = warmup
)

# Extract posterior draws of the (residual) correlations from the fitted Bayesian model
library("stringr")
bglm_samples <- brms::as_draws_df(bglm_fit)
bglm_correlations <- bglm_samples[, stringr::str_detect(colnames(bglm_samples), "rescor__")]

# Compute Bayesian KMO index
library("bayestestR")
kmo_bayes <- BKMO(r = bglm_correlations)
hist(kmo_bayes$overall)
print(
  paste0(
    "Mean = ", mean(kmo_bayes$overall) %>% round(digits = 3),
    " (SD = ", sd2(kmo_bayes$overall)  %>% round(digits = 3),
    "; 95% HDI[",
    bayestestR::hdi(kmo_bayes$overall)$CI_low  %>% round(digits = 3), "; ",
    bayestestR::hdi(kmo_bayes$overall)$CI_high  %>% round(digits = 3), "])"
  )
)

Generalizing the code for more sophisticated scenarios is trivial but beyond the scope of this paper. Researchers interested in robust versions of the BKMO can simply rank-standardize their variables before computing the posterior correlation matrix. Alternatively, they can change the multivariate likelihood from a Normal to a Student t by manually editing the Stan code produced by brms (Bürkner 2017, 2018), since a Student t multivariate model is currently unsupported by that package.

7.3 Measures of Sampling Adequacy

The formula underlying the Kaiser-Meyer-Olkin (KMO) indices discussed in section 2.3, more formally denoted Measures of Sampling Adequacy (MSA), are provided here. These formula are based on the notation established in this paper, using the Frequentist conceptualizations (see Section 2.2). This section also includes code implementing each of these MSAs as functions in \(\textsf{R}\). Section 7.3.1 covers the ‘Mark II’ version (i.e., Kaiser 1970), section 7.3.2 covers ‘Mark IV’ (i.e., Kaiser and Rice 1974), and section 7.3.3 covers ‘Mark V’ (i.e., Kaiser 1981).

7.3.1 Kaiser (1970)

The initial MSA (i.e., \(\text{MSA}_1\)), also known as ‘second generation Little Jiffy’, proposed by Kaiser (1970). The formula for calculating the overall MSA is provided in definition 7.4.1.1:

\[ \text{MSA}_1 \equiv 1 - \frac{\underset{i \neq j}{\sum \sum} \hat{\rho}^2_{q[i,j]}}{\underset{i \neq j}{\sum \sum} \hat{\rho}^2_{i,j}} \tag{7.4.1.1} \]

where \(\Sigma \Sigma\) denotes the row-wise summation operator, \(\hat{\rho}_{q}\) denotes the estimated anti-image product-moment correlation coefficient, and \(\hat{\rho}\) denotes the estimated Pearson product-moment correlation coefficient. The formula for calculating indvidual MSAs are provided in definition 7.4.1.2:

\[ \text{MSA}_1(i) \equiv 1 - \frac{ \sum_{j, j \neq i} \hat{\rho}^2_{q[i,j]} }{ \sum_{j, j \neq i} \hat{\rho}^2_{i,j} } \tag{7.4.1.2} \]

where \(\Sigma\) denotes the summation operator, \(\hat{\rho}_{q}\) denotes the estimated anti-image product-moment correlation coefficient, and \(\hat{\rho}\) denotes the estimated Pearson product-moment correlation coefficient. Implementing \(\text{MSA}_1\) as a function in \(\textsf{R}\) can be done with:

# MSA #1 ('Mark II', Kaiser, 1970)
#
# @param R A correlation matrix
# @return A matrix of KMO indices, where columns are the individual KMO indices
# and one overall KMO index

msa1 <- function(R = NULL){
  if (!is.matrix(R)) stop("Input must be a matrix.")
  if (!all(R == t(R))) stop("Matrix must be symmetric.")
  inv_R <- solve(R)
  P <- -inv_R / sqrt(outer(diag(inv_R), diag(inv_R)))
  r2 <- R^2
  p2 <- P^2
  diag(r2) <- 0
  diag(p2) <- 0
  kmo_i <- 1 - (rowSums(p2) / rowSums(r2))
  kmo_overall <- 1 - (sum(p2) / sum(r2))
  return(list(overall = kmo_overall, individual = kmo_i))
}

7.3.2 Kaiser and Rice (1974)

The ‘second’ MSA (i.e., \(\text{MSA}_2\)), also known as ‘Little Jiffy, Mark IV’, by Kaiser and Rice (1974). The formula for calculating the overall MSA is provided in definition 7.4.2.1:

\[ \text{MSA}_2 \equiv \frac{ \underset{i \neq j}{\sum \sum} \hat{\rho}^2_{i,j} }{ \underset{i \neq j}{\sum \sum} \hat{\rho}^2_{i,j} + \underset{i \neq j}{\sum \sum} \hat{\rho}^2_{q[i,j]} } \tag{7.4.2.1} \]

where \(\Sigma \Sigma\) denotes the row-wise summation operator, \(\hat{\rho}_{q}\) denotes the estimated anti-image product-moment correlation coefficient, and \(\hat{\rho}\) denotes the estimated Pearson product-moment correlation coefficient. The formula for calculating indvidual MSAs are provided in definition 7.4.2.2:

\[ \text{MSA}_2(i) \equiv \frac{ \sum_{j, j \neq i} \hat{\rho}^2_{i,j} }{ \sum_{j, j \neq i} \hat{\rho}^2_{i,j} + \sum_{j, j \neq i} \hat{\rho}^2_{q[i,j]} } \tag{7.4.2.2} \]

where \(\Sigma\) denotes the summation operator, \(\hat{\rho}_{q}\) denotes the estimated anti-image product-moment correlation coefficient, and \(\hat{\rho}\) denotes the estimated Pearson product-moment correlation coefficient. A function implementing \(\text{MSA}_2\) in \(\textsf{R}\) is provided below:

# MSA #2 ('Mark IV', Kaiser & Rice, 1974)
#
# @param R A correlation matrix
# @return A matrix of KMO indices, where columns are the individual KMO indices
# and one overall KMO index

msa2 <- function(R = NULL){
  if (!is.matrix(R)) stop("Input must be a matrix.")
  if (!all(R == t(R))) stop("Matrix must be symmetric.")
  inv_R <- solve(R)
  P <- -inv_R / sqrt(outer(diag(inv_R), diag(inv_R)))
  r2 <- R^2
  p2 <- P^2
  diag(r2) <- 0
  diag(p2) <- 0
  kmo_i <- rowSums(r2) / (rowSums(r2) + rowSums(p2))
  kmo_overall <- sum(r2) / (sum(r2) + sum(p2))
  return(list(overall = kmo_overall, individual = kmo_i))
}

7.3.3 Kaiser (1981)

The ‘third’ MSA (i.e., \(\text{MSA}_3\)) proposed by Kaiser (1981), which can also be considered ‘Mark V’ of ‘Little Jiffy’. The formula for calculating the overall MSA is provided in definition 7.4.3.1:

\[ \text{MSA}_3 \equiv \sqrt{ 1 - \frac{\underset{i \neq j}{\sum \sum} \hat{\rho}^2_{q[i,j]}}{\underset{i \neq j}{\sum \sum} \hat{\rho}^2_{i,j}}} \tag{7.4.3.1} \]

where \(\Sigma \Sigma\) denotes the row-wise summation operator, the radical (i.e., \(\sqrt{}\)) denotes the square-root operation, \(\hat{\rho}_{q}\) denotes the estimated anti-image product-moment correlation coefficient, and \(\hat{\rho}\) denotes the estimated Pearson product-moment correlation coefficient. The formula for calculating indvidual MSAs are provided in definition 7.4.3.2:

\[ \text{MSA}_3(i) \equiv \sqrt{ 1 - \frac{ \sum_{j, j \neq i} \hat{\rho}^2_{q[i,j]} }{ \sum_{j, j \neq i} \hat{\rho}^2_{i,j} }} \tag{7.4.3.2} \]

where \(\Sigma\) denotes the summation operator, the radical (i.e., \(\sqrt{}\)) denotes the square-root operation, \(\hat{\rho}_{q}\) denotes the estimated anti-image product-moment correlation coefficient, and \(\hat{\rho}\) denotes the estimated Pearson product-moment correlation coefficient. An \(\textsf{R}\) function implementing both definitions related to \(\text{MSA}_3\) is provided below:

# MSA #3 ('Mark V', Kaiser, 1981)
#
# @param R A correlation matrix
# @return A matrix of KMO indices, where columns are the individual KMO indices
# and one overall KMO index

msa3 <- function(R = NULL){
  if (!is.matrix(R)) stop("Input must be a matrix.")
  if (!all(R == t(R))) stop("Matrix must be symmetric.")
  inv_R <- solve(R)
  P <- -inv_R / sqrt(outer(diag(inv_R), diag(inv_R)))
  r2 <- R^2
  p2 <- P^2
  diag(r2) <- 0
  diag(p2) <- 0
  kmo_i <- sqrt(1 - (rowSums(p2) / rowSums(r2)))
  kmo_overall <- sqrt(1 - (sum(p2) / sum(r2)))
  return(list(overall = kmo_overall, individual = kmo_i))
}

7.4 Characteristics of the KMO index

The results from the Monte Carlo simulation study of the characteristics of the KMO index discussed in section 2.3 are provided here. Results from the first set of simulations investigating the relationship between the inter-correlations and the anti-image inter-correlations are illustrated in figure 7.1 in section 7.4.1. Results from the second set of simulations used to gauge the relationship between the overall KMO index and linearly independent inter-correlations are shown in figure 7.2 in section 7.4.2. Results from the third and last set of simulations that sought to assess the relationship between the overall KMO index and linearly interdependent inter-correlations are shown in figure 7.3 in section 7.4.3.

7.4.1 Association between Inter-Correlations & Anti-Image Inter-Correlations

Figure 7.1: Association between Inter-Correlations & Anti-Image Inter-Correlations

Association between Inter-Correlations \& Anti-Image Inter-Correlations

NOTE: Simulated data, with 5,000 of 40,005 simulations randomly sampled for display in each panel.

7.4.2 Independent Inter-Correlations

Figure 7.2: KMO index Characteristics with Independent Inter-Correlations

KMO index Characteristics with Independent Inter-Correlations

NOTE: Simulated data, with 5,000 of 40,005 simulations randomly sampled for display in each panel.

7.4.3 Interdependent Inter-Correlations

Figure 7.3: KMO index Characteristics with Codependent Inter-Correlations

KMO index Characteristics with Codependent Inter-Correlations

NOTE: Simulated data, with 5,000 of 40,005 simulations randomly sampled for display in each panel.

7.5 Results Matrix

The matrix of results from the Monte Carlo simulated data covered in section 2.4 is provided here. The lower triangle of the matrix in table 7.1 showcases the estimated inter-correlations (i.e, \(\hat{\rho}\)), with the upper triangle containing the estimated anti-image inter-correlations (i.e, \(\hat{\rho}_q\)), while the diagonal displays the estimated individual KMO indices (i.e., \(\hat{\alpha}_m\)), one for each manifestation.
Table 7.1: Results Matrix
\(\theta_{1}\) \(\theta_{2}\) \(\theta_{3}\) \(\theta_{4}\) \(\theta_{5}\) \(\theta_{6}\) \(\theta_{7}\) \(\theta_{8}\) \(\theta_{9}\) \(\theta_{10}\)
\(\theta_{1}\) 0.844 0.016 0.066 0.125 0.053 -0.003 0.040 0.051 0.081 0.005
\(\theta_{2}\) 0.101 0.852 0.053 0.156 -0.011 0.072 0.040 0.036 0.081 0.075
\(\theta_{3}\) 0.130 0.139 0.869 0.046 0.010 0.076 0.063 0.078 0.048 0.079
\(\theta_{4}\) 0.222 0.277 0.186 0.820 0.044 0.131 0.108 0.125 0.177 0.131
\(\theta_{5}\) 0.115 0.080 0.094 0.176 0.836 0.073 0.111 0.090 0.014 0.107
\(\theta_{6}\) 0.101 0.187 0.174 0.288 0.169 0.857 0.108 0.054 0.078 0.103
\(\theta_{7}\) 0.141 0.163 0.166 0.280 0.200 0.239 0.851 0.070 0.147 0.054
\(\theta_{8}\) 0.136 0.142 0.164 0.260 0.172 0.179 0.195 0.864 0.048 0.064
\(\theta_{9}\) 0.179 0.210 0.167 0.342 0.138 0.232 0.281 0.189 0.840 0.121
\(\theta_{10}\) 0.113 0.194 0.180 0.296 0.196 0.239 0.210 0.192 0.264 0.851

NOTE: Estimated inter-correlations (\(\hat{\rho}\)) provided in the lower triangle, estimated anti-image inter-correlations (\(\hat{\rho}_q\)) provided in the upper triangle, and estimated individual KMO indices provided in the diagonal. Based on simulated data (\(n\) = 1,000) and computed using Frequentist estimators (see Section 2.2).


  1. License: Except where otherwise indicated, all contents of this document and/or associated files are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). All software, including but not necessarily limited to, source code, executable code, code snippets, code chunks, algorithms, and/or scripts, attributable to this document and/or its associated files are expressly excluded from the foregoing license, and unless otherwise indicated, are instead licensed under the GNU General Public License, version 3 (GPL-3.0). By engaging with this document and/or any associated files, which include, but are not necessarily limited to, downloading, using, viewing, and/or distributing any of them, in parts of whole, you agree to comply with the applicable license terms for the respective content types.↩︎

  2. Author: Independent Researcher. Holds a Master of Science (MSc.) and Bachelor of Science (BSc.) in Political Science from Aarhus University, Denmark (email: ).↩︎

  3. Version: 2025-09-19-10-52-HTML. Citation: Meyer-Hansen, E. N. (2025): ‘Revisiting ’Little Jiffy, Mark IV’: Towards a Bayesian KMO index’, Open Science Framework, Working paper (v2025-09-19-10-52-HTML). DOI: 10.17605/OSF.IO/T3UPD↩︎

  4. Former Professor of the University of California, Berkeley.↩︎

  5. Former professor of Loyola University, Chicago. It should be noted that there is some uncertainty about Edward P. Meyer being the ‘Meyer’ of ‘Kaiser-Meyer-Olkin’, because Henry F. Kaiser never appears to mention this ‘Meyer’ by first name, but only makes vague mentions of contributions from a ‘Professor Meyer of Loyola, Chicago’ (Kaiser 1970: 405). At the same time, it seems that Kaiser did not himself coin the term ‘Kaiser-Meyer-Olkin’ as a name for the MSA, which instead appears to be an invention of Dziuban and Shirkey (1974), but these authors also makes no explicit mention of a ‘Professor Meyer’. However, based on personal communication with staff of Loyola University, Chicago, this ‘Professor Meyer of Loyola, Chicago’ is likely to be Edward Paul Meyer, who was a Professor of Psychology at Loyola during the period (1969 - 1972) when the KMO index was invented.↩︎

  6. Former Professor of Stanford University.↩︎

  7. The term ‘Little Jiffy’ was coined by former Professor Chester W. Harris of the University of Wisconsin–Madison and was used by Henry F. Kaiser to denote the analytic procedure of ‘principal components with associated eigenvalues greater than one followed by normal varimax rotation’ (cf. Kaiser 1970: 402). Note that this ‘Little Jiffy, Mark I’ is not directly related to later versions of ‘Little Jiffy’ (e.g., Kaiser 1970).↩︎

  8. The first revision of the KMO index, ‘Little Jiffy, Mark III’, went unpublished though its changes were incorporated into ‘Mark IV’ (cf. Kaiser and Rice 1974: 112)↩︎

  9. Emeritus Professor of the University of California, Berkeley. Former Professor of the University of California, San Diego.↩︎

  10. A final revision of the KMO index was made by Kaiser (1981), which is simply the square-root of the original KMO index by Kaiser (1970), but this revised version appears mostly ignored across statistical software packages and applied research. Accordingly, this paper focuses on ‘Little Jiffy, Mark IV’ (Kaiser and Rice 1974).↩︎

  11. For researchers interested in these mostly forgotten versions of the KMO index, their mathematical formula, and functions implementing them in \(\textsf{R}\) (R Core Team 2024), are provided in section 7.3 of the Appendix.↩︎

  12. For software specifications, see Section 7.1 of the Appendix.↩︎

  13. An alternative to the PPCC and SRCC is Kendall’s (1938) \(\tau\), which despite providing more conservative estimates of correlation (cf. Field 2018: 353; Howell 2012) is harder to meaningfully interpret, and comparability with the aforementioned coefficients is mostly lost (cf. Field 2018: 363; Strahan 1982; though see Moran 1948).↩︎

  14. Consistent with the Bayesian framework (Gelman et al. 2014; Levy and Mislevy 2020; McElreath 2019), the relatively stringent iid assumption in model 3.3.1 may be relaxed by replacing it with the weaker assumption of exchangeability (de Finetti 1974). This means that the units could be interdependent (e.g., by belonging to distinct clusters), but that the priors and model treat them as exchangeable given the available information (cf. Levy and Mislevy 2020: 56-64; McElreath 2019: 81).↩︎

  15. Consistent with the MLE-Bayesian approach, the standard deviation of the posterior is calculated using formula 2.2.3.3 for function 2.2.3 (see Section 2.2).↩︎

  16. In the Frequentist framework (Agresti 2018: 270; Field 2018: 59; Imai 2017: 168-169, 339-342; Stock and Watson 2019: 154; Wooldridge 2019: 94), \(\nu\) is referred to as a degrees of freedom parameter (df), which is the number of parameters that are free to vary, being the difference between the sample size (i.e., \(n\)) and the number of estimated parameters (i.e., \(p\)). In that approach, \(\nu\) must be at least 0 for the model to be identifiable and at least 1 to be able to assess uncertainty about the model. This is different from the Bayesian approach, where models technically can be identified at a df less than zero due to the reliance on priors.↩︎