Swap distance minimization in SOV languages. Cognitive and mathematical foundations. Ramon Ferrer-i-Cancho1* (0000-0002-7820-923X), Savithry Namboodiripad2 (0000-0002-7685-5895) 1 Quantitative, Mathematical and Computational Linguistics Research Group. arXiv:2312.04219v1 [cs.CL] 7 Dec 2023 Departament de Ciències de la Computació, Universitat Politècnica de Catalunya (UPC), Barcelona, Catalonia, Spain. 2 Linguistics Department, University of Michigan, Ann Arbor, Michigan, USA. * Corresponding author’s email: rferrericancho@cs.upc.edu. DOI: ABSTRACT Distance minimization is a general principle of language. A special case of this principle in the domain of word order is swap distance minimization. This principle predicts that variations from a canonical order that are reached by fewer swaps of adjacent constituents are lest costly and thus more likely. Here we investigate the principle in the context of the triple formed by subject (S), object (O) and verb (V). We introduce the concept of word order rotation as a cognitive underpinning of that prediction. When the canonical order of a language is SOV, the principle predicts SOV < SVO, OSV < VSO, OVS < VOS, in order of increasing cognitive cost. We test the prediction in three flexible order SOV languages: Korean (Koreanic), Malayalam (Dravidian), and Sinhalese (Indo-European). Evidence of swap distance minimization is found in all three languages, but it is weaker in Sinhalese. Swap distance minimization is stronger than a preference for the canonical order in Korean and especially Malayalam. Keywords: word order preferences, canonical order, swap distance minimization 1 Introduction Distance minimization pervades languages. In the domain of word order, there is massive evidence that the distance between words in a syntactic dependency representation of the sentence is minimized (), a consequence of the syntactic dependency distance minimization principle (Ferrer-i-Cancho, 2004). A general principle of distance minimization in word order, which instantiates as syntactic dependency distance minimization, has been proposed (Ferrer-i-Cancho, 2014). Furthermore, the action of distance minimization in languages goes beyond the common notion of physical distance. Iconicity – which has also been argued to shape word order (Motamedi et al., 2022) – can be viewed as a response to a pressure to minimize the distance between a linguistic form and meaning in production and interpretation (). Glottometrics XX, 20XX 1 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. Alignment in dialog () is the minimization of the distance between two or more speakers involved in a conversation. Because it operates across domains, distance minimization is likely to be one of the most general principles of language. Distance minimization in word order (Ferrer-i-Cancho, 2014) presents itself as the syntactic dependency distance minimization principle (Ferrer-i-Cancho, 2004) and the swap distance minimization principle (Ferrer-i-Cancho, 2016). Critical characteristics of a compact but general theory of language are to specify (a) the cognitive origins of its principles (b) the cross linguistic support of its principles, and (c) the separation between principles and manifestations. Then compactness is achieved by uncovering the many distinct manifestations of the same principle (alone or interacting with other principles). Further, among the manifestations of a given principle, one has to distinguish direct from indirect manifestations. 1.1 Syntactic dependency distance minimization Next we will revise the principle of syntactic dependency distance minimization from the standpoint of (a), (b) and (c) as a road map for research on swap distance minimization. Concerning (a), syntactic dependency distance minimization is argued to result from counteracting interference and decay of activation in linguistic processes () and, accordingly, syntactic dependency distance in sentences is positively correlated with reading times (Niu and Liu, 2022). Concerning (b), direct evidence of the principle of syntactic dependency distance minimization stems from the finding that syntactic dependency distances are smaller than expected by chance in samples of languages that have been growing in size and typological diversity (). Concerning (c), various manifestations of syntactic dependency distance minimization have been predicted. First, the acceptability of word orders and related word order preferences (). Second, formal properties of syntactic dependency structures such as the scarcity of crossing dependencies (GómezRodríguez and Ferrer-i-Cancho, 2017) and the tendency to uncover the root (Ferrer-i-Cancho, 2008), thus predicting projectivity (continuous constituents) and planarity with high probability. Furthermore, syntactic dependency distance minimization predicts, in combination with projectivity, that the root of a sentence should be placed at the center (). An implication of the predictions is that verbs, which are typically the roots of a sentence, should be placed at the center, as in SVO orders or SVOI orders. For word orders in which the verb appears first or last, syntactic dependency distance minimization predicts consistent branching for dependents of nominal heads (Ferrer-i-Cancho, 2015b), demonstrating the “unnecessity” of the headedness parameter of principles & parameters theory (). 1 The principle of swap distance minimization has received much less attention. 1See Table 1 of Ferrer-i-Cancho and Gómez-Rodríguez (2021b) for further predictions. Glottometrics 2 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. SVO VSO SOV VOS OSV OVS Figure 1: The word order permutation ring. 1.2 The order of S, V and O Research on the order of S, V and O is biased towards SOV and SVO languages. SOV and SVO are the most attested dominant orders (76.5% according to Dryer (2013); 83.6% of languages and 69.6% families according to Hammarström (2016)). Accordingly, a large body of experimental research in the silent gesture paradigm has focused on factors that determine the choice between SOV and SVO (see Motamedi et al. (2022) and references therein). That bias neglects that there are languages that lack a dominant order (13.7% of languages according to Dryer (2013); 2.3% of languages and 6.1% of families according to Hammarström (2016)) or that exhibit two, rather than one, dominant orders (Dryer, 2013). Crucially, in many languages which do exhibit a dominant order, the other 5 non-dominant orders are produced. Though understanding such variation is vital, documentation and analyses of non-dominant orders receive relatively little attention (Levshina et al., 2023). This is reflected in psycholinguistic work, where the bulk of experimental research on the processing cost of word order focuses on just two orders, e.g. SVO versus OVS () or SVO versus VOS (Koizumi and Kim, 2016).2 This challenge is the motivation of Namboodiripad’s research program on the cognitive cost of the six possible orders of S, V, and O in flexible order languages (). This is also why swap distance minimization is brought into play in this article. 1.3 Swap distance minimization Swap distance minimization predicts pairs of primary alternating dominant orders (Ferrer-i-Cancho, 2016) and has been applied to shed light on the evolution of the dominant orders of S, V, and O from an ancestral SOV order (). In general, the principle of swap distance minimization states that variations 2Note that practical challenges contribute to this. Comparing all six orders in an experiment requires more participants and different statistical tools as compared to simpler experimental designs; cf. Ohta et al. (2017). Glottometrics 3 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. from a certain word order (canonical or not) that require fewer swaps of adjacent constituents are less costly (). To illustrate how the principles works on triples, let us consider the case of the triple formed by subject (S), object (O) and verb (V). The so-called word order permutation ring is a graph where the vertices are all the six possible orderings of the triple, and edges between two orders indicate that one order can be obtained from the other by swapping a pair of adjacent constituents (Figure 1). SOV and SVO are linked because swapping OV in SOV produces SVO, or equivalently, swapping VO in SVO produces SOV. For the case of triples, the permutation ring is an instance of a kind of graph which is called permutahedron in combinatorics (Ceballos et al., 2015). The swap distance between two orders is the distance (in edges) between two word orders in the permutahedron, namely, their distance is the minimum number of swaps of adjacent constituents that transforms one order into the other and vice versa. A prediction of the swap distance minimization is that the cognitive cost of a word order will depend on its distance to the canonical order. When the canonical order of a language is SOV, SOV is at swap distance 0, SVO and OSV are at swap distance 1, VSO and OVS are at swap distance 2, and VOS is at swap distance 3 (Figure 1). Thus, the principle predicts (from easiest to most costly) the sequence3 (1) 𝑆𝑂𝑉 < 𝑆𝑉𝑂, 𝑂𝑆𝑉 < 𝑉 𝑆𝑂, 𝑂𝑉 𝑆 < 𝑉𝑂𝑆. For other canonical orders, the predictions that the permutahedron generates as a function of the canonical order are, in order of increasing processing cost (the canonical order appears first) 𝑆𝑉𝑂 < 𝑆𝑂𝑉, 𝑉 𝑆𝑂 < 𝑉𝑂𝑆, 𝑂𝑆𝑉 < 𝑂𝑉 𝑆 𝑉 𝑆𝑂 < 𝑆𝑉𝑂, 𝑉𝑂𝑆 < 𝑆𝑉𝑂, 𝑂𝑉 𝑆 < 𝑂𝑆𝑉 𝑉𝑂𝑆 < 𝑉 𝑆𝑂, 𝑂𝑉 𝑆 < 𝑆𝑉𝑂, 𝑂𝑆𝑉 < 𝑆𝑂𝑉 𝑂𝑉 𝑆 < 𝑉𝑂𝑆, 𝑂𝑆𝑉 < 𝑆𝑂𝑉, 𝑆𝑉𝑂 < 𝑆𝑉𝑂 (2) 𝑂𝑆𝑉 < 𝑆𝑂𝑉, 𝑂𝑉 𝑆 < 𝑆𝑉𝑂, 𝑉𝑂𝑆 < 𝑉 𝑆𝑂. It is well-known that canonical orders are easier to process than non-canonical orders and thus canonical orders are processed faster than non-canonical orders (). The principle of swap distance minimization subsumes a preference for the canonical order but, crucially, it introduces a gradation for non-canonical orders, namely not all non-canonical orders are equally easy to process. The gradation is determined, by a precise definition of distance to the canonical order (Equation 1 and Equation 2). In contrast to 3A sequence of this sort can be expressed with the following notation (Tamaoka et al., 2011) 𝑆𝑂𝑉 < 𝑆𝑉𝑂 = 𝑂𝑆𝑉 < 𝑉 𝑆𝑂 = 𝑂𝑉 𝑆 < 𝑉𝑂𝑆. In our notation, = is replaced by a comma. Glottometrics 4 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. Equation 1, just of preference of the canonical word order is expressed simply as (3) 1.4 𝑆𝑂𝑉 < 𝑆𝑉𝑂, 𝑂𝑆𝑉, 𝑉 𝑆𝑂, 𝑂𝑉 𝑆, 𝑉𝑂𝑆. The present article Here we aim to contribute to research on swap distance minimization in the three directions above: (a), (b) and (c). We will increase the support for the principle both in terms of (a) and (b). As for (a), here we will introduce the concept of word order rotation as the analog of rotation in visual recognition experiments (). In addition, we aim to validate the arguments using proxies of cognitive cost that are commonly used in cognitive science research such as reaction times and error rates (). As for (b), we will investigate the principle in languages from distinct linguistic families and quantify its effect with respect to other word order principles. As for (c), we will show that swap distance minimization predicts the acceptability of the order of subject, verb and object as syntactic dependency distance minimization predicts the acceptability of sentences (). Put differently, we will show that swap distance minimization manifests in the form of acceptability preferences. We select three SOV languages which exhibit considerable word order flexibility, each from different language families: Sinhalese (Indo-European), Malayalam (Dravidian), and Korean (Koreanic). For each of these languages, all of the six possible orderings of S, V, and O are grammatical, attested, and have the same truth-conditional meaning (), though the degree of flexibility may vary depending on the context or measure of flexibility (). Sinhalese and Malayalam have been regarded as non-configurational (). Interestingly, Malayalam exhibits more word order flexibility than Korean while, in turn, the flexibility of Korean is closer to that of English (Figure 8 of Levshina et al. (2023)). In the context of Malayalam, the acceptability of a certain order has been argued to be determined by the position of the verb (Namboodiripad and Goodall, 2016). We will transform this specific proposal into a general competing hypothesis, namely that the cost of a certain order (no matter how it is measured) is determined to some degree by the position of the verb, and link it with the theory of word order: a decrease in cost of processing of the verb as it is placed closer to the end is actually a prediction of the principle of minimization of the surprisal (maximization of the predictability) of the head (Ferrer-i-Cancho, 2017).4 In contrast to Equation 1, a preference for verb final would be expressed simply as (4) 𝑆𝑂𝑉, 𝑂𝑆𝑉 < 𝑆𝑉𝑂, 𝑂𝑉 𝑆 < 𝑉 𝑆𝑂, 𝑉𝑂𝑆. 4A word of caution is necessary concerning the term competing hypothesis. It does not mean that maximization of predictability excludes swap distance minimization. Both forces can co-exist, and it is tempting to think that swap distance minimization implies the maximization of the predictability of the head for certain canonical orders, e.g., SOV or OSV. Indeed, we will show that swap distance and the position of the head (the verb) are significantly correlated. Glottometrics 5 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. The reminder of the article is organized as follows. Section 2 introduces the concept of word order rotation and a new mathematical framework. Section 3 justifies the choice of SOV languages and presents the data while Section 4 presents the statistical analysis methods. Section 5 shows evidence of swap distance minimization as predicted by Equation 1 in these three languages and compares it against two competing principles: a preference for the canonical order and a preference for the verb towards the end. Section 6 provides hawk-eye view of the results, speculates on their relation with the degree of word order flexibility of the languages, and proposes some issues for future research. 2 Theoretical foundations 2.1 Word order rotations Here we present an argument on the cognitive support of the minimization of swap distance to the canonical order that is inspired by classic research on the cognitive effort of the visual recognition of objects (). That research revealed that such cost depends on the rotation angle with respect to some canonical representation of the object. By analogy, the object is the triple formed by subject, object, and verb; we assume that its canonical representation is the order that language experts have identified as canonical; the rotation angle is the swap distance to the canonical order. However, the analogy with visual rotation can be made stronger by drawing the word order permutation ring on a circle as in Figure 1, placing a rotation axis at the center of the circle, and replacing the swap distance to the canonical order by the absolute value of the minimum angle of the rotation that is needed to put • The word order of interest in the original position of the canonical order, or equivalently, • The canonical order in the original position of the word order of interest. The rotations that are needed to transform any order of S, V and O into SOV are shown in Figure 2. Accordingly, the orders at distance 1 imply a rotation angle of ±60°, orders at distance 2 imply a rotation of angle of ±120°, and finally the order at distance 3 implies a rotation angle of ±180°. In mathematical language, 𝛼, the angle of rotation (in degrees) that is required to transform a certain word order into the canonical word order, and 𝑑, the swap distance between an order and the canonical, obey 𝑑= 2.2 |𝛼| . 60 The correlation between a distance measure and cognitive cost Here we present a new mathematical framework to measure the effect distinct word order principles by translating Equation 1, Equation 3, and Equation 4 into Kendall 𝜏 correlations and also to understand Glottometrics 6 Ferrer-i-Cancho & Namboodiripad SVO VSO ±0◦ SOV Swap distance minimization in SOV languages. OSV SVO VOS SVO VSO VOS SOV VOS OSV OVS OSV OVS SVO VSO SVO VSO SOV VOS -60◦ OSV SVO 120◦ 60◦ SOV OVS VSO SOV VSO ±180◦ SOV OSV VOS OVS VOS -120◦ OVS OSV OVS Figure 2: Rotations of word orders with respect to an axis at the center of the ring (marked in red). Recall that clockwise rotations have negative sign while anticlockwise rotations have positive sign. To become the canonical order SOV, (a) SOV needs a rotation of ±0 degrees, (b) SVO needs a rotation of 60 degrees, (c) VSO needs a rotation of 120 degrees, (d) VOS needs a rotation of ±180 degrees, (e) OSV needs a rotation of −60 degrees, (f) OVS needs a rotation of −120 degrees. how these principles interact. We define 𝑠 as the cognitive cost of a certain ordering of S, V, and O. Swap distance minimization predicts that 𝑠 should increase following the ordering in Equation 1. Accordingly, we test the swap distance minimization hypothesis by measuring 𝜏(𝑑, 𝑠), the Kendall 𝜏 correlation between the target score 𝑠 and 𝑑, which is the swap distance between an order and the canonical order SOV. To test the hypothesis of the minimization of surprisal of the verb (Equation 4), we measure 𝜏( 𝑝, 𝑠), namely the Kendall 𝜏 correlation between the target score 𝑠 and 𝑝, the distance of the verb to the end (0 for verb-last, 1 for medial verb and 2 for verb first). Finally, as swap distance minimization subsumes a preference for the canonical order (Equation 3), we also define a control hypothesis, namely that the effect is merely simply determined by the word order being canonical or not. That hypothesis is tested by means of 𝜏(𝑐, 𝑠), the Kendall correlation between the target score and 𝑐, a binary variable that is zero if the order is canonical and 1 otherwise. We refer to 𝑑, 𝑝 and 𝑐 as distance measures. 𝑐 is a binary distance to the canonical order. The values of these distances in an SOV language are shown in Table 1. The are the three main variants of the Kendall 𝜏 correlation: 𝜏𝑎 , 𝜏𝑏 and 𝜏𝑐 (Kendall, 1970). The simplest definition is that of 𝜏𝑎 , that is defined, for a bivariate sample of size 𝑛, as (5) 𝜏𝑎 = 𝑛𝑐 − 𝑛𝑑 , 𝑛 2 where 𝑛𝑐 is the number of concordant pairs and 𝑛 𝑑 is the number of discordant pairs. 𝜏𝑎 performs no adjustment for ties, while 𝜏𝑏 and 𝜏𝑐 do. In our study, adjustments for ties bother. As swap distance minimization subsumes the preference for the canonical order, we want to warrant that if 𝜏(𝑑, 𝑠) is sufficiently large then 𝜏(𝑑, 𝑠) > 𝜏(𝑐, 𝑠) because swap distance minimization is a more precise Glottometrics 7 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. Table 1: For each of the six possible orders, we show the swap distance to the canonical order SOV (𝑑), the distance of the verb to the end of the triple (𝑝), the binary distance to canonical order (𝑐), the mean 𝑧-score acceptability according to the results of the experiments by Namboodiripad (2017, Table 2.7) and the corresponding rank transformation (the most acceptable has rank 1, the second most acceptable has rank 2 and so on). Note: Order 𝑑 𝑝 𝑐 Acceptability Rank transformation SOV OSV SVO OVS VSO VOS 0 1 1 2 2 3 0 0 1 1 2 2 0 1 1 1 1 1 1.05 0.80 0.36 0.30 -0.14 -0.36 1 2 3 4 5 6 𝑝 takes the values 0 for verb final, 1 for verb medial, and 2 for verb initial. 𝑐 takes a value of 0 if the order is canonical and 1 otherwise. hypothesis than a preference for the canonical order. In the Appendix, we show two very useful properties of 𝜏𝑎 : if 𝜏𝑎 is large enough, then one can be certain that swap distance minimization does not reduce to a preference for the canonical order or to a preference for verb-last. In the language of mathematics, if 𝜏𝑎 (𝑑, 𝑠) > 0.3̄ then 𝜏𝑎 (𝑑, 𝑠) > 𝜏𝑎 (𝑐, 𝑠); if 𝜏𝑎 (𝑑, 𝑠) > 0.8 then 𝜏𝑎 (𝑑, 𝑠) > 𝜏𝑎 ( 𝑝, 𝑠), 𝜏𝑎 (𝑐, 𝑠). We also want to ensure that the comparison between 𝜏(𝑑, 𝑠) and 𝜏( 𝑝, 𝑠) is fair; notice that 𝑝 has lower precision than 𝑑 (𝑑 is on an integer scale between 0 and 3 while 𝑝 is on an integer scale between 0 and 2). Adjustments for ties may cause the illusion of a weaker manifestation of swap distance minimization compared to other cognitive pressures.5 Hereafter 𝜏 means 𝜏𝑎 . Finally, notice that distinct word order principles are related and thus the Kendall 𝜏 correlation between two distance measures are all positive (Table 2). Kendall 𝜏 correlation between 𝑑 and 𝑝, 𝜏(𝑑, 𝑝) is significantly high while 𝜏(𝑑, 𝑐) and 𝜏( 𝑝, 𝑐) are not (Table 2). Obviously, the fact that 𝜏(𝑑, 𝑐) is not significant is clearly due to a lack of statistical power. The arguments in the Appendix for the correlation between 𝑐 and some other variable, allow one to conclude that 𝜏(𝑑, 𝑐) is maximum and its right 𝑝-value is minimum. Table 2: Correlogram of Kendall 𝜏 correlation between each distance measure. We use right-sided exact tests of correlation with 𝜏𝑎 on the matrix in Table 1. Recall 𝑑 is the swap distance to the canonical order, 𝑝 is distance of the verb to the end of the triple and 𝑐 is the binary canonical distance. Variables Kendall 𝜏 correlation 𝑝-value 𝑑 and 𝑝 𝑑 and 𝑐 𝑝 and 𝑐 0.67 0.33 0.27 0.044 0.166 0.333 5Finally, another reason for not using 𝜏𝑏 is a further consequence of the adjustment for ties: 𝜏𝑏 is undefined when the variance of one of the variables is zero. With this respect, 𝜏𝑎 is robust across conditions and simplifies the coding as it does not require to deal with the special case of zero variance. Glottometrics 8 Ferrer-i-Cancho & Namboodiripad 3 Material 3.1 Why SOV languages Swap distance minimization in SOV languages. The predictions in Equation 1 and 2 raise the question of the ideal conditions where swap distance minimization should be tested (point (b) in Section 1). One could naively argue that these predictions should hold for every language in any condition. The challenge is that swap distance minimization is just one of the various principles that shape word order in languages: word order is a multiconstraint satisfaction problem (). Thus, the observation of the action of a specific word order principle requires identifying the conditions where that principle will suffer from less interference from other word order principles. For instance, it has been predicted theoretically and demonstrated empirically that the action of surprisal minimization (predictability maximization) should be more visible in short sentences (). Interestingly, it has been shown that syntactic dependency distance minimization is weaker in Warlpiri, a non-configurational language (Ferrer-i-Cancho et al., 2022). Indeed, discontinuous constituents, one of the hallmarks of non-configurational languages () may indicate that dependency distance minimization is weaker, as it has been demonstrated that pressure to reduce the distance between syntactically related elements reduces the chance of discontinuity (). Thus, interference from dependency distance minimization is expected to be weaker in non-configurational languages. Recall that dependency distance minimization alone would draw the verb, the root of the triple, towards the center of the triple (). In addition, we expect that, in languages that exhibit word order flexibility, there is more room for capturing the manifestation of swap distance minimization. English, which is an SVO language, is an example of a non-ideal language to test this because of its word order rigidity (Figure 8 of Levshina et al. (2023)). Given the considerations above, this article focuses on SOV languages. SOV languages are an ideal arena for testing this principle. In terms of representativity, SOV represents the most common dominant word order across languages (). Furthermore, SOV has been hypothesized to be an early stage in spoken languages (), and it has been regarded as a default basic word order (). This view is supported by the fact that SOV is often the dominant order found in sign languages which are at the early stages of community-level conventionalisation (). 3.2 Data Data is borrowed from existing publications but is available as a single file in the repository of the article.6 We borrow data from word order experiments in Malayalam (Namboodiripad, 2017), Korean (Namboodiripad et al., 2019), and Sinhalese (Tamaoka et al., 2011).7 In Korean and Malayalam, the target 6In the data folder of https://osf.io/b62ep/. 7For each language, the target sentences have the same structure: animate subjects, inanimate objects, and active transitive verbs; sample stimuli can be found in each paper. Due to space limitations, we refer the reader to those original sources for Glottometrics 9 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. scores are average 𝑧-scored acceptability ratings from experiments in the spoken (listening) modality that are obtained from Namboodiripad (2017, Table 2.7 in Chapter 2) for Malayalam and Table 2 of Namboodiripad et al. (2019) for Korean. As is typical in acceptability judgment experiments, 𝑧-scores are used to control for individual variation in the use of the rating scale. All participants in the Malayalam experiment (𝑁 = 18) grew up speaking Malayalam in Kerala, India, where it is the dominant language. For Korean, we consider three groups that are borrowed from Namboodiripad et al. (2019): bilingual speakers of Korean and English that are split into Korean-dominant (𝑁 = 30), English-dominant active (individuals who are fluent in comprehension and production of spoken Korean; 𝑁 = 13), and English-dominant passive (individuals who are far more proficient in comprehension of spoken Korean than they are in production; 𝑁 = 14). For Sinhalese, the participants are described as native speakers. The target scores are mean reaction times and mean error rates in the spoken (𝑁 = 42) and written (𝑁 = 36) modality. Mean reaction times and mean error rates are borrowed from Table 1 and Table 2 of Tamaoka et al. (2011) for the written (reading) and spoken (listening) modality, respectively. Here, it is not clear how the authors controlled for individual variation (i.e., via 𝑧-scores or other statistical methods). To validate findings in Malayalam as did, we borrow frequencies of each of the six orders of S, V and O from an online corpus (Leela, 2016, Table 4) as an additional target score.8 By target score, we mean acceptability, reaction time, error, frequency, and the variants that result from pairwise contrasts. Every target score (other than frequency) yields a rank variant that results from comparing the scores of every pair of distinct orders by means of some statistical test. Here we adopt the convention that these ranks reflect cognitive cost: the least costly order has rank 1, the second least costly has rank 2 and so on. The pairwise contrasts for Malayalam give, in order of decreasing acceptability (Namboodiripad, 2017) 𝑆𝑂𝑉, 𝑂𝑆𝑉 > 𝑆𝑉𝑂, 𝑂𝑉 𝑆 > 𝑉 𝑆𝑂, 𝑉𝑂𝑆. Thus, SOV and OSV have acceptability rank 1, SVO and OVS have acceptability rank 2, and VSO and VOS have acceptability rank VSO and VOS. For Sinhalese, the pairwise contrasts for reaction time in spoken language give, in order of increasing reaction time (Tamaoka et al., 2011), 𝑆𝑂𝑉 < 𝑆𝑉𝑂, 𝑂𝑉 𝑆 < 𝑂𝑆𝑉, 𝑉 𝑆𝑂, 𝑉𝑂𝑆 further methodological details. 8The corpus comprises three types of discourse: interviews, discussions or debates, and conversations appearing in printed form in online media. The genres are relatively comparable with the experimental items because they come from more casual and conversational contexts. The whole corpus comprises 5598 monotransitive sentences but only 67.1% contain S, V and O according to Table 4 (Leela, 2016, Table 4). Thus we estimate that the frequencies of S, V and O are based on 3756 sentences. Further details be found at http://hdl.handle.net/10803/399556 in Section 3.2.1 Methodology. Glottometrics 10 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. and thus SOV has reaction time rank 1, SVO and OVS have reaction time rank 2 and OSV, VSO and VOS have reaction time rank 3. For Korean, Namboodiripad et al. (2019) report in prose that the verbmedial orders and verb-initial orders group together, but the authors do not give more details. However, (Namboodiripad et al., 2020) report pairwise comparisons9 in a reanalysis of the same data. The ranking in order of decreasing acceptability is 𝑆𝑂𝑉 > 𝑂𝑆𝑉 > 𝑆𝑉𝑂, 𝑂𝑉 𝑆 > 𝑉 𝑆𝑂, 𝑉𝑂𝑆. Thus, SOV has acceptability rank 1, OSV has acceptability rank 2, SVO and OVS have acceptablity rank 3, and VSO and VOS have acceptability rank 4. All the pairwise contrasts for the languages investigated in this article are summarized in Table 3. Table 3: Summary of pairwise contrasts, in order of increasing cognitive cost for Korean (Namboodiripad et al., 2020), Malayalam (Namboodiripad, 2017) and (Tamaoka et al., 2011). Language Group Score Modality Pairwise contrasts Korean Korean Korean Malayalam Sinhalese Sinhalese Sinhalese Sinhalese Korean-dominant English-dominant active English-dominant passive acceptability acceptability acceptability acceptability reaction time reaction time error error spoken spoken spoken spoken spoken written spoken written 𝑆𝑂𝑉 < 𝑂𝑆𝑉 < 𝑆𝑉𝑂, 𝑂𝑉 𝑆 < 𝑉 𝑆𝑂, 𝑉𝑂𝑆 𝑆𝑂𝑉 < 𝑂𝑆𝑉 < 𝑆𝑉𝑂, 𝑂𝑉 𝑆 < 𝑉 𝑆𝑂, 𝑉𝑂𝑆 𝑆𝑂𝑉 < 𝑂𝑆𝑉 < 𝑆𝑉𝑂, 𝑂𝑉 𝑆 < 𝑉 𝑆𝑂, 𝑉𝑂𝑆 𝑆𝑂𝑉, 𝑂𝑆𝑉 < 𝑆𝑉𝑂, 𝑂𝑉 𝑆 < 𝑉 𝑆𝑂, 𝑉𝑂𝑆 𝑆𝑂𝑉 < 𝑆𝑉𝑂, 𝑂𝑉 𝑆 < 𝑂𝑆𝑉, 𝑉 𝑆𝑂, 𝑉𝑂𝑆 𝑆𝑂𝑉 < 𝑆𝑉𝑂, 𝑂𝑉 𝑆, 𝑂𝑆𝑉, 𝑉 𝑆𝑂, 𝑉𝑂𝑆 𝑆𝑂𝑉 < 𝑆𝑉𝑂, 𝑂𝑉 𝑆, 𝑉 𝑆𝑂 < 𝑂𝑆𝑉, 𝑉𝑂𝑆 𝑆𝑂𝑉, 𝑆𝑉𝑂, 𝑉 𝑆𝑂, 𝑉𝑂𝑆, 𝑂𝑉 𝑆, 𝑂𝑆𝑉 We define a condition as the combination of modality (spoken or written), the target score, and, optionally, a group. The sign of certain scores that measure cognitive ease is inverted before the analyses to transform them into scores of cognitive cost. This is the case of acceptability ratings in Malayalam and Korean and word order frequencies in Malayalam. As we are using Kendall 𝜏 correlation, the transformation does not alter the potential conclusions and has a clear advantage: all target scores can then be submitted to a right-sided Kendall correlation test. The resulting association between swap distance and acceptability rank is shown in Table 1. 4 Methodology All the code used to produce the results is available in the repository of the article.10 9Bonferroni corrected, with pooled SD. 10In the code folder of https://osf.io/b62ep/. Glottometrics 11 Ferrer-i-Cancho & Namboodiripad 4.1 Swap distance minimization in SOV languages. Kendall 𝜏 correlation We used R for the analyses. To compute Kendall 𝜏 correlation, we used neither the standard function to compute Kendall correlation, i.e. cor (that runs in 𝑂 (𝑛2 ) time, where 𝑛 is the size of the sample), nor the faster implementation cor.fk (that runs in 𝑂 (𝑛 log 𝑛) time) from the pcaPP library. The reason is that cor function computes Kendall 𝜏𝑏 instead of 𝜏𝑎 when there are ties.11 The documentation of cor.fk is not clear on this matter, but our experience suggests that it also implements 𝜏𝑏 : when we compute Kendall 𝜏 between the vector (1, 1, 2, 2, 3, 3) and itself, cor and cor.fk yield 1, the maximum value, as expected by the definition of 𝜏𝑏 . In contrast, our implementation of 𝜏𝑎 yields 0.8 because of the presence of ties. Therefore we computed 𝜏𝑎 using a naive implementation by us that runs in 𝑂 (𝑛2 ) time. 4.2 Kendall 𝜏 correlation test The standard function for the Kendall correlation test, i.e. cor.test, fails to compute accurate enough 𝑝-values. To fix it, we implemented a function that computes, exactly, the right 𝑝-value of the Kendall correlation test by generating all permutations of the values of one of the variables and computing the Kendall 𝜏 correlation on each of those permutations. This exact test was also used for the differences 𝜏(𝑑, 𝑠) − 𝜏( 𝑝, 𝑠) and 𝜏(𝑑, 𝑠) − 𝜏(𝑐, 𝑠). 4.3 Maximum correlation We distinguish two reasons why a Kendall correlation is maximum: • Maximum given a distance measure. Namely, given the sample as a matrix with two columns, one for the distance measure and the other for the score, there is no possible replacement of the values of the score that gives a higher correlation. See Property 3 for the maximum correlation and Property 5 for the minimum right 𝑝-value that is obtained when the correlation is maximum. • Maximum given the sample. In this case, the correlation is the maximum given the bivariate sample used to compute the correlation. Namely, given the sample as a matrix with two columns, no permutation of a column of the sample matrix yields a higher correlation. This kind of maximum correlation is determined computationally from its definition. It is easy to see that if a correlation is maximum given the distance measure, then it is also maximum given the sample. We also extend this notions to the differences 𝜏(𝑑, 𝑠) − 𝜏( 𝑝, 𝑠) and 𝜏(𝑑, 𝑠) − 𝜏(𝑐, 𝑠). 11https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.html Glottometrics 12 Ferrer-i-Cancho & Namboodiripad 4.4 Swap distance minimization in SOV languages. A Monte Carlo global analysis The Kendall 𝜏 correlation tests above suffer from lack of statistical power: the minimum 𝑝-value for the Kendall 𝜏 depends on the distance measure and ranges between 0.16̄ for 𝑐 and 0.005̄ for 𝑑 (Property 5). In the case of Sinhalese, none of the correlations across conditions and distance measures was statistically significant. To gain statistical power, we decided to perform a global statistical test for a given distance measure across all conditions. The statistic of that test is 𝑆, that is defined as the sum of all the Kendall correlations across all conditions for a given language and distance measure. The right 𝑝-value of the test was estimated by a Monte Carlo procedure as the proportion of 𝑇 = 106 randomizations where 𝑆 ′ , the value of 𝑆 in a randomization, satisfied 𝑆 ′ ≥ 𝑆. Each randomization consists of producing a uniformly random permutation the values of one the target score that are assigned to the distance measure for each language and distance measure. Therefore, the smallest non-zero estimated 𝑝-value that this test can produce is 1/𝑇 = 10−6 . The test was adapted to assess the significance of the difference between pairs of distance measures. As an orientation for discussion, we assume a significance level of 𝛼 = 0.05 throughout this article. When we perform statistical tests over various individual conditions, we may suffer from multiple comparisons. When presenting results on individual conditions, we do not correct 𝑝-values for them because this problem is addressed by the Monte Carlo test, where we apply Holm correction in two contexts. When answering the question of when a distance measure yields significance, we adjust the 𝑝-values of 𝑆(𝑑), 𝑆( 𝑝) and 𝑆(𝑐) for each language (9 comparisons). When answering the question of when the difference between swap distance minimization and another principle yields significance, we adjust the 𝑝-values of 𝑆(𝑑) − 𝑆(𝑐) and 𝑆(𝑑) − 𝑆( 𝑝) for each language (6 comparisons). 5 Results 5.1 Evidence of swap distance minimization In Korean, the correlation between acceptability and swap distance to the canonical order, (𝜏(𝑑, 𝑠)) is statistically significant in all three groups: Korean-dominant, English-dominant active, and Englishdominant passive (Table 4), suggesting that swap distance minimization is a robust effect. When acceptability ranks are used, the correlation turns out to be maximum given the sample. In the Englishdominant active group, the correlation increases when mean acceptability is replaced by acceptability rank. In Malayalam, that correlation is statistically significant and maximum given the distance measure (Table 4). When raw mean acceptability scores are replaced by acceptability ranks resulting from pairwise contrasts, the correlation (𝜏( 𝑝, 𝑠)) weakens (the opposite phenomenon with respect to group of English-dominant active in Korean) but it is still significant. That suggests that, in Malayalam, raw mean Glottometrics 13 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. acceptability scores contain some information about swap distance minimization that is lost when using these ranks, likely due to lack of statistical power in the pairwise contrasts. The support for the swap distance minimization from the canonical order is confirmed when acceptability ratings are replaced by frequencies from Leela’s corpus, which achieve a maximum correlation given the sample (Table 4). These findings suggest that swap distance minimization in Malayalam is a robust phenomenon because it is captured by independent measures. Table 4: The outcome of three correlation tests. First, the Kendall 𝜏 correlation test between 𝑠, the target score, and 𝑑 is its swap distance to the canonical order SOV. Second, the Kendall 𝜏 correlation test between 𝑠 and 𝑝, the distance of the verb to the end. Second, the Kendall 𝜏 correlation test between 𝑠 and 𝑐, a binary variable that indicates if the order is canonical or not. For each correlation test, red indicates that the correlation is maximum (and the 𝑝-value is minimum) given the distance measure; orange indicates that the correlation is maximum (and 𝑝-value is minimum) given the sample. Language Group Score Modality 𝜏(𝑑, 𝑠) 𝑝-value 𝜏( 𝑝, 𝑠) 𝑝-value 𝜏(𝑐, 𝑠) 𝑝-value Korean Korean Korean Korean Korean Korean Malayalam Malayalam Malayalam Sinhalese Sinhalese Sinhalese Sinhalese Sinhalese Sinhalese Sinhalese Sinhalese Korean-d Korean-d English-d a English-d a English-d p English-d p - acceptability acceptability rank acceptability acceptability rank acceptability acceptability rank acceptability acceptability rank frequency reaction time reaction time rank reaction time reaction time rank error error rank error error rank spoken spoken spoken spoken spoken spoken spoken spoken spoken spoken written written spoken spoken written written 0.733 0.733 0.667 0.733 0.733 0.733 0.867 0.667 0.8 0.333 0.467 0.6 0.333 0.267 0.4 0 0 0.022 0.022 0.033 0.022 0.022 0.022 0.006 0.044 0.011 0.228 0.117 0.061 0.167 0.239 0.15 0.6 1 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.267 0.4 0.4 0.267 0.133 0.2 -0.133 0 0.011 0.011 0.011 0.011 0.011 0.011 0.011 0.011 0.011 0.289 0.133 0.167 0.333 0.422 0.333 0.733 1 0.333 0.333 0.333 0.333 0.333 0.333 0.333 0.267 0.333 0.333 0.333 0.333 0.333 0.333 0.333 0.2 0 0.167 0.167 0.167 0.167 0.167 0.167 0.167 0.333 0.167 0.167 0.167 0.167 0.167 0.167 0.167 0.5 1 Note: 𝑐 is 0 if the order is canonical and 1 otherwise. 𝑝 is 0 for verb-last, 1 for verb-medial and 2 for verb first. In Korean, the groups are Korean-d (Korean-dominant), English-d a (English-dominant active) and English-d p (English-dominant passive). In Sinhalese, we find no support for swap distance minimization on individual conditions except for reaction times in the written modality, where the correlation between reaction time and swap distance to the canonical order yields a borderline 𝑝-value (𝑝-value=0.061). When the raw mean reaction times in that modality are replaced by ranks obtained from pairwise contrasts, the correlation 𝜏(𝑑, 𝑠) decreases (𝜏(𝑑, 𝑠) drops from 0.6 to 𝜏( 𝑝, 𝑠) = 0.3), suggesting that raw reaction times may contain some information about swap distance minimization that is lost during the pairwise contrasts. Interestingly, the correlation with these ranks is maximum given the sample (Table 4). In contrast, the rank transformation resulting from pairwise contrasts has the opposite effect for reaction time and error in the spoken modality: 𝜏(𝑑, 𝑠) increases after applying that transformation. That suggests that mean reaction time and mean error rate Glottometrics 14 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. are noisy in the spoken modality. Table 5: Summary of the outcome of the Monte Carlo global analysis over all conditions for each language 𝑆 is the sum of the Kendall 𝜏 correlation over all conditions for a certain distance measure. 𝑑 is swap distance to the canonical order, 𝑝 is distance of the verb to the end of the triple, and 𝑐 is binary canonical distance. 𝑝-values have been adjusted with Holm correction (as explained in Section 4. Language Korean Malayalam Sinhalese 𝑆(𝑑) 𝑝-value 4.33 2.33 2.4 < 10 −6 1.8 · 10 −5 4.6 · 10 −3 𝑆( 𝑝) 𝑝-value 4.8 2.4 1.53 < 10 −6 1.4 · 10 −5 0.065 𝑆(𝑐) 𝑝-value 2 0.93 2.2 2.4 · 10 −5 9.3 · 10 −3 1.6 · 10 −5 𝑆(𝑑) − 𝑆(𝑐) 𝑝-value 𝑆(𝑑) − 𝑆( 𝑝) 𝑝-value 2.33 1.4 0.2 9 · 10 −4 -0.47 -0.07 0.87 1 1 0.11 2.1 · 10 −3 1 Although statistical support for swap distance minimization is missing on individual conditions in Sinhalese, the Monte Carlo global analysis (Table 5) indicates that the sum of Kendall 𝜏 correlations over all conditions is significantly high (𝑆(𝑑) = 2.4, 𝑝-value = 1.5 · 10−3 ), suggesting that swap distance minimization is present but weak in Sinhalese. In Korean and Malayalam, the Monte Carlo global analysis just confirms the findings on individual languages (Table 5; 𝑝-value < 10−5 in both languages). 5.2 Evidence of maximization of the predictability of the verb The correlation between the distance from the verb to the end of the sentence and each of the scores (𝜏( 𝑝, 𝑠)) was statistically significant for Korean and Malayalam over all conditions, and it was indeed maximum given the distance measure (Table 4). In both languages and across all conditions, 𝜏( 𝑝, 𝑠) was maximum given the distance measure. However, the global analysis (Table 5) revealed that the sum of Kendall 𝜏 correlations over all conditions is borderline significant in Sinhalese (𝑆( 𝑝) = 1.53, 𝑝-value = 0.066), suggesting that the maximization of the predictability of the verb has some global effect on that language. In Korean and Malayalam, the Monte Carlo global analysis based on 𝑆( 𝑝) just confirms the findings on individual languages (Table 5; 𝑝-value < 10−5 in both languages). 5.3 Evidence of a preference for the canonical order The correlation between the binary distance to the canonical order and each of the scores (𝜏( 𝑝, 𝑠)) was never statistically significant across languages and conditions (Table 4), but this is due to the lack of the statistical power of the test (the minimum 𝑝-value is 0.16̄ as explained in the Appendix). Indeed, the Monte Carlo global analysis based on 𝑆(𝑐) shows that a preference for the canonical order has a significant effect in all languages but much more strongly in Korean and Sinhalese (Table 5; 𝑝-value < 10−2 in all languages). The latter could be due to the larger amount of conditions in Sinhalese and Korean, which may amplify the statistical effect. Glottometrics 15 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. Table 6: The outcome of two Kendall correlation difference tests. The first test is on 𝜏(𝑑, 𝑠) − 𝜏(𝑐, 𝑠). The second test is on 𝜏(𝑑, 𝑠) − 𝜏( 𝑝, 𝑠). In each correlation test, orange indicates that the correlation is maximum (and then the 𝑝-value is minimum) given the sample. Language Group Score Modality 𝜏(𝑑, 𝑠) − 𝜏(𝑐, 𝑠) 𝑝-value 𝜏(𝑑, 𝑠) − 𝜏( 𝑝, 𝑠) 𝑝-value Korean Korean Korean Korean Korean Korean Malayalam Malayalam Malayalam Sinhalese Sinhalese Sinhalese Sinhalese Sinhalese Sinhalese Sinhalese Sinhalese Korean-d Korean-d English-d a English-d a English-d p English-d p - acceptability acceptability rank acceptability acceptability rank acceptability acceptability rank acceptability acceptability rank frequency reaction time reaction time rank reaction time reaction time rank error error rank error error rank spoken spoken spoken spoken spoken spoken spoken spoken spoken spoken written written spoken spoken written written 0.4 0.4 0.333 0.4 0.4 0.4 0.533 0.4 0.467 0 0.133 0.267 0 -0.067 0.067 -0.2 0 0.1 0.078 0.133 0.078 0.1 0.078 0.006 0.078 0.022 0.6 0.35 0.233 0.5 0.611 0.383 0.883 1 -0.067 -0.067 -0.133 -0.067 -0.067 -0.067 0.067 -0.133 0 0.067 0.067 0.2 0.067 0.133 0.2 0.133 0 0.753 0.728 0.778 0.728 0.753 0.728 0.5 0.833 0.558 0.5 0.433 0.247 0.5 0.256 0.167 0.267 1 Note: 𝜏(𝑑, 𝑠) is the correlation between a score and swap distance. 𝜏(𝑐, 𝑠) is the correlation between a score and the binary distance to canonical order. 𝜏( 𝑝, 𝑠) is the correlation between a score and the distance to end of the verb. In Korean, the groups are Korean-d (Korean-dominant), English-d a (English-dominant active) and English-d p (English-dominant passive). 5.4 Can the results be reduced to simply a preference for the canonical order? It could be argued the finding of swap distance minimization effects is a mere consequence of a rather obvious expectation: canonical orders are easier to process than non-canonical orders. Indeed, swap distance minimization also predicts a preference for canonical orders but adds a gradation on noncanonical orders. However, we find that the correlation between a target score and swap distance to canonical order (𝜏(𝑑, 𝑠)) as well as the correlation between a target score and distance of the verb to the end (𝜏( 𝑝, 𝑠)) are always greater than the correlation between the target score and being canonical or not (𝜏(𝑐, 𝑠)) in both Korean and Malayalam; this is also the case in Sinhalese with two exceptions: error in the spoken and written modality (Table 4 and Table 6). In Korean, the difference 𝜏(𝑑, 𝑠) − 𝜏(𝑐, 𝑠) is always positive but never significant. However, the difference is borderline significant in all groups when acceptability ranks are used (𝑝-value = 0.078). In Malayalam, the analysis of 𝜏(𝑑, 𝑠) − 𝜏(𝑐, 𝑠) (Table 6) indicates that swap distance minimization has a significantly stronger effect than a preference for a canonical order across conditions (although the 𝑝-value of acceptability ranks, i.e. 0.078 is borderline). Furthermore, concerning mean acceptability, the difference is maximum given the sample. The Monte Carlo global analysis shows that indeed 𝑆(𝑑) − 𝑆(𝑐) is significantly large in both Korean and Malayalam (𝑝-value < 10−4 ), indicating that swap distance minimization is significantly stronger than a preference Glottometrics 16 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. for a canonical order (Table 5). In Sinhalese, the difference 𝜏(𝑑, 𝑠) − 𝜏(𝑐, 𝑠) is never statistically significant across conditions and that is confirmed by the Monte Carlo global analysis (𝑝-value = 0.369). (Table 5). 5.5 Swap distance minimization versus maximization of the predictability of the verb In Korean, the effect of swap distance minimization is weaker than the force that drags the verb towards the end. In particular, the correlation between acceptability and swap distance to the canonical order (𝜏(𝑑, 𝑠)) is always smaller than the correlation between mean acceptability and verb position (𝜏( 𝑝, 𝑠)). In Table 4 and Table 6, we can check that 𝜏(𝑑, 𝑠) < 𝜏( 𝑝, 𝑠) in all conditions. The 𝑝-value of 𝜏(𝑑, 𝑠) are greater than those of 𝜏( 𝑝, 𝑠) (Table 4). Unsurprisingly, we find that the 𝜏(𝑑, 𝑠) − 𝜏( 𝑝, 𝑠) is never significant – neither on individual conditions (Table 6), nor on the global analysis (see 𝑆(𝑑) − 𝑆( 𝑝) in Table 5). In Malayalam results are mixed: the sign of 𝜏(𝑑, 𝑠) − 𝜏( 𝑝, 𝑠) depends on the condition but 𝜏(𝑑, 𝑠) beats 𝜏( 𝑝, 𝑠) in the condition where both 𝜏(𝑑, 𝑠) and 𝜏( 𝑝, 𝑠) are maximum given the distance measure (𝜏(𝑑, 𝑠) = 0.867 > 𝜏( 𝑝, 𝑠) = 0.8 in Table 4). Thus, in that condition, swap distance minimization has an effect in Malayalam that cannot be reduced to preference for verb-last. The lack of verb initial orders with two overt arguments in Leela’s corpus, in spite of being grammatically possible, suggests that undersampling may be limiting the observation of a stronger swap distance minimization effect when frequencies are used as a proxy for cognitive cost. As it happened with Korean, we find that the 𝜏(𝑑, 𝑠) − 𝜏( 𝑝, 𝑠) is never significant neither on individual conditions (Table 6) nor on the global analysis (see 𝑆(𝑑) − 𝑆( 𝑝) in Table 5). In Sinhalese we find the opposite phenomenon with respect to Korean: the effect of swap distance minimization is stronger: given a score and a condition, 𝜏(𝑑, 𝑠) > 𝜏( 𝑝, 𝑠) in all cases. Interestingly, we find that the 𝜏(𝑑, 𝑠) − 𝜏( 𝑝, 𝑠) is never significant on individual conditions (Table 6) and this is confirmed in the global analysis (see 𝑆(𝑑) − 𝑆( 𝑝) in Table 5). 6 Discussion We have seen that an effect consistent with swap distance minimization is found in all three languages (Table 4). However, we have seen that in Sinhalese, the effect is weak and requires a global analysis over all conditions for it to become statistically significant (Table 5). We have demonstrated that swap distance minimization is significantly stronger than a preference for the canonical order in Korean and Malayalam by means of a global analysis across conditions (Table 5). In Malayalam, swap distance minimization is so strong that its superiority with respect to a preference Glottometrics 17 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. SVO 3 VSO 5 SOV 1 VOS 6 OSV 2 OVS 4 Figure 3: The word order permutation ring with the acceptability rank of every word order marked in red below each word order. The word order with the highest mean acceptability has rank 1, the word order with the 2nd highest mean acceptability has rank 2 and so on. for the canonical order manifests also on individual conditions (Table 6). Notice that the acceptability ranks in Table 1 coincide with a labelling of the vertices of the permutahedron following a traversal of the permutahedron from SOV (Figure 3), which is known as breadth first traversal in computer science (Cormen et al., 1990). There are 5! = 120 possible traversals starting at SOV, but only 4 four of them are breadth first traversals; the acceptability rank (that results from transforming mean acceptability scores into ranks) has hit one of them. In Sinhalese, swap distance minimization is neither significantly stronger than a preference for the canonical order nor significantly stronger than the preference for verb-last (Table 5) that is believed to explain acceptability in Malayalam (). We have provided evidence that swap distance minimization is cognitively relevant in capturing human behavior: it is significantly stronger than the principle it subsumes, i.e. the preference for the canonical order, in Korean and in Malayalam. In Sinhalese, we failed to find that swap distance minimization is acting significantly stronger than a preference for the canonical order. It is possible that swap distance minimization is acting beyond a preference for the canonical order, but its additional contribution with respect to other word order principles may remain statistically invisible. First, recall that swap distance minimization subsumes the preference for the canonical word order. Second, swap distance minimization and preference for verb-last are strongly correlated. Recall that the Kendall 𝜏 correlation between 𝑑 and 𝑝, 𝜏(𝑑, 𝑝) is significantly high while 𝜏(𝑑, 𝑐) and 𝜏( 𝑝, 𝑐) are not (Table 2). This is in line with the view that word order is a multiconstraint satisfaction principle, and word orders can compete or collaborate (Ferrer-i-Cancho, 2017). Third, our analyses on Sinhalese are based on data which is averaged across participants. Because we could not control for individual variation in that language as in Namboodiripad’s dataset (Section 3), the effects of swap distance minimization could indeed be stronger than what our Glottometrics 18 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. analysis has revealed. Thus, controlling for individual variation in Sinhalese should be the subject of future research. Finally, the behavioral measures are not uniform across languages, as we currently do not have acceptability scores for Sinhalese, which could contribute to apparent differences across languages. In neurolinguistics, it has been found that activity in certain brain regions (e.g., the left inferior frontal gyrus) is higher for non-canonical orders than for canonical orders (Meyer and Friederici, 2016). We suggest an interpretation of this finding as a consequence of a mental “rotation” operation to retrieve the canonical order (Figure 2) and propose a new research line: the use of swap distance as a more fine grained predictor of brain activity with respect to the traditional binary contrast of canonical versus non-canonical order (Meyer and Friederici, 2016, Table 48.1). The strength of the swap distance minimization compared to the effect of other principles depends on the language. In Korean, the manifestation of swap distance minimization is weaker than that of the maximization of the predictability of the verb but stronger than a preference for the canonical order (Table 6). In Malayalam, swap distance minimization exhibits the strongest effect (Table 4). In Sinhalese, swap distance minimization is the second strongest, as in Korean, but the preference for a canonical order exhibits the strongest effect(Table 4). We speculate that the major findings summarized above are consistent with the following scenario. First, recall that there is evidence that Korean exhibits a word order flexibility close to that of English and that Korean is more rigid than Malayalam (Levshina et al., 2023). The proposals of Sinhalese and Malayalam as non-configurational languages () suggest these two languages exhibit more word order freedom than Korean.12 Second, consider the following arguments. As we discussed in Section 1, strong evidence of swap distance minimization requires that interference from other word order principles is reduced. The fact that Korean is the only language where the maximization of the predictability of the verb has the strongest effect, provides additional support for the rigidity of Korean and the possible interference of that principle with swap distance minimization. As one moves from more rigid word orders to more flexible word orders, one expects that the manifestation of swap distance minimization becomes clearer. Accordingly, Malayalam exhibits the strongest manifestation of swap distance minimization but a weaker effect of the maximization of the predictability of the verb. However, an excess of word order flexibility may shadow the manifestation of swap distance minimization. If we assume that Sinhalese has the highest degree of 12Non-configurationality can be seen from a strong a priori theoretical assumption, namely that non-configurationality is an adjustable parameter in a language as opposed to an emergent property which becomes apparent via the interaction of a constellation of other factors Ferrer-i-Cancho, 2017. We take the position of Levshina et al., 2023, that languages are not separable into configurational or non-configurational, but rather that they vary along a cline in degree of flexibility. However, we do currently mention a role for non-configurationality on Page 19. Glottometrics 19 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. word order flexibility, it is not surprising that none of the principles has a significant effect on individual conditions (Table 4) and that swap distance minimization does not show a significantly stronger effect than other word order preferences after a global analysis over conditions (Table 5). A weakness of the arguments above is that, for Sinhalese, we are not measuring word order flexibility in the same way as for Korean and Malayalam. We are just assuming it should be very flexible according the non-configurational hypothesis (), and, as argued in (Levshina et al., 2023), going from categorical to gradient characterizations of constituent order typology is critical to building explanatory models in this domain (see also Yan and Liu (2023) for research on categorical versus gradient characterizations). Thus, an urgent task is to investigate word order flexibility in Sinhalese in a cross-linguistically comparable way, perhaps with the same methodology as in Namboodiripad’s research program (). The complementary is also another important question for future research, namely, investigating reaction times and error rates in Malayalam and Korean with the methodology of (Tamaoka et al., 2011). We hope this research stimulates researchers also to investigate languages with canonical orders other than SOV (cf. Garrido Rodriguez et al., 2023). The predictions of swap distance minimization on non-SOV languages are already available in Equation 2. Finally, an implication of swap distance minimization for word order evolution is a tendency to preserve the canonical order, as variants that deviate from it will be more costly (contra misinterpretations of efficiency-based explanations which might lead one to predict that SOV languages should eventually change to SVO). That tendency would be reinforced by other principles that determine the optimality of the canonical word order, e.g., in verb final languages, the placement of the verb is optimal with respect to maximization of the predictability of the verb (Ferrer-i-Cancho, 2017), and we have shown that a preference for verb-last and swap distance minimization are strongly correlated (Table 2). Therefore, it is not surprising that grammars are robustly transmitted even during instances of rapid discontinuities in language change, such as the emergence of creole languages; the dominant word order in creoles is overwhelmingly that of the lexifiers (Blasi et al., 2017). As such, swap distance minimization provides one potential answer for why languages vary when it comes to how much they minimize dependencies. Moreover, the findings here exemplify cases where general efficiency-based explanations do not lead to the same outcomes for every language, even when those languages on the surface seem to be very similar. Additional typological features, such as degree of flexibility, interact with swap distance minimization and dependency length minimization, leading us to predict structured variation across languages in how these very general principles are applied and manifest. Glottometrics 20 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. Acknowledgments We are very grateful to L. Alemany-Puig for a careful revision of the manuscript and to L. Meyer for helpful comments. We also thank V. Franco-Sánchez and A. Martí-Llobet for helpful discussions on swap distance minimization. We became aware of the concept of permutahedron in combinatorics thanks to V. Franco-Sánchez. RFC is supported by a recognition 2021SGR-Cat (01266 LQMC) from AGAUR (Generalitat de Catalunya) and the grants AGRUPS-2022 and AGRUPS-2023 from Universitat Politècnica de Catalunya. References Alemany-Puig, L., Esteban, J. L., Ferrer-i-Cancho, R. (2022). Minimum projective linearizations of trees in linear time. Information Processing Letters, 174, 106204. https://doi.org/10.1016/j.ipl.2021.106204 Austin, P., Bresnan, J. (1996). Non-configurationality in Australian aboriginal languages. Natural Language and Linguistic Theory, 14(2), 215–268. https://doi.org/10.1007/bf00133684 Blasi, D. E., Michaelis, S. M., Haspelmath, M. (2017). Grammars are robustly transmitted even during the emergence of creole languages. Nature Human Behaviour, 1(10), 723–729. https://doi.org/10.1038/s41562-0170192-4 Ceballos, C., Manneville, T., Pilaud, V., Pournin, L. (2015). Diameters and geodesic properties of generalizations of the associahedron. Discrete Mathematics & Theoretical Computer Science, DMTCS Proceedings, 27th International Conference on Formal Power Series and Algebraic Combinatorics (FPSAC 2015). https : //doi.org/10.46298/dmtcs.2540 Cooper, L. A., Shepard, R. N. (1973). Chronometric studies of the rotation of mental images. In W. G. CHASE (Ed.), Visual information processing (pp. 75–176). Academic Press. https://doi.org/10.1016/B978-0-12-1701505.50009-3 Corbett, G. (1993). The head of Russian numeral expressions. In Heads in grammatical theory (pp. 11–35). Cambridge University Press Cambridge. https://doi.org/https://doi.org/10.1017/CBO9780511659454 Cormen, T. H., Leiserson, C. E., Rivest, R. L. (1990). Introduction to algorithms. The MIT Press. Dingemanse, M., Blasi, D. E., Lupyan, G., Christiansen, M. H., Monaghan, P. (2015). Arbitrariness, iconicity, and systematicity in language. Trends in Cognitive Sciences, 19(10), 603–615. https://doi.org/https://doi.org/10. 1016/j.tics.2015.07.013 Dryer, M. S. (2013). Order of subject, object and verb. In M. S. Dryer M. Haspelmath (Eds.), The world atlas of language structures online. Max Planck Institute for Evolutionary Anthropology. http://wals.info/chapter/81 Ferrer-i-Cancho, R. (2004). Euclidean distance between syntactically linked words. Physical Review E, 70, 056135. https://doi.org/10.1103/PhysRevE.70.056135 Glottometrics 21 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. Ferrer-i-Cancho, R. (2008). Some word order biases from limited brain resources. A mathematical approach. Advances in Complex Systems, 11(3), 393–414. https://doi.org/10.1142/S0219525908001702 Ferrer-i-Cancho, R. (2014). Towards a theory of word order. Comment on “Dependency distance: A new perspective on syntactic patterns in natural language” by Haitao Liu et al. Physics of Life Reviews, 21, 218–220. https://doi.org/10.1016/j.plrev.2017.06.019 Ferrer-i-Cancho, R. (2015a). The placement of the head that minimizes online memory. A complex systems approach. Language Dynamics and Change, 5(1), 114–137. https://doi.org/10.1163/22105832-00501007 Ferrer-i-Cancho, R. (2015b). Reply to the commentary “Be careful when assuming the obvious”, by P. Alday. Language Dynamics and Change, 5(1), 147–155. https://doi.org/10.1163/22105832-00501009 Ferrer-i-Cancho, R. (2016). Kauffman’s adjacent possible in word order evolution. The evolution of language: Proceedings of the 11th International Conference (EVOLANG11). Ferrer-i-Cancho, R. (2017). The placement of the head that maximizes predictability. An information theoretic approach. Glottometrics, 39, 38–71. Ferrer-i-Cancho, R., Gómez-Rodríguez, C. (2021a). Anti dependency distance minimization in short sequences. a graph theoretic approach. Journal of Quantitative Linguistics, 28(1), 50–76. https://doi.org/10.1080/09296174. 2019.1645547 Ferrer-i-Cancho, R., Gómez-Rodríguez, C., Esteban, J. L., Alemany-Puig, L. (2022). Optimality of syntactic dependency distances. Physical Review E, 105(1), 014308. https://doi.org/10.1103/PhysRevE.105.014308 Ferrer-i-Cancho, R., Gómez-Rodríguez, C. (2021b). Dependency distance mininimization predicts compression. Proceedings of the Second Workshop on Quantitative Syntax (Quasy, SyntaxFest 2021), 45–57. https://aclanthology. org/2021.quasy-1.4/ Futrell, R., Levy, R. P., Gibson, E. (2020). Dependency locality as an explanatory principle for word order. Language, 96(2), 371–412. https://doi.org/10.1353/lan.2020.0024 Futrell, R., Mahowald, K., Gibson, E. (2015). Large-scale evidence of dependency length minimization in 37 languages. Proceedings of the National Academy of Sciences USA, 112(33), 10336–10341. https://doi.org/https: //doi.org/10.1073/pnas.1502134112 Garrido Rodriguez, G., Norcliffe, E., Brown, P., Huettig, F., Levinson, S. C. (2023). Anticipatory processing in a verb-initial Mayan language: Eye-tracking evidence during sentence comprehension in Tseltal. Cognitive Science, 47(1), e13292. https://doi.org/https://doi.org/10.1111/cogs.13219 Garrod, S., Pickering, M. J. (2013). Dialogue: Interactive alignment and its implications for language learning and language change. In The language phenomenon (pp. 47–64). Springer Berlin Heidelberg. https://doi.org/10. 1007/978-3-642-36086-2_3 Gell-Mann, M., Ruhlen, M. (2011). The origin and evolution of word order. Proceedings of the National Academy of Sciences USA, 108(42), 17290–17295. https://doi.org/10.1073/pnas.1113716108 Glottometrics 22 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. Gildea, D., Temperley, D. (2007). Optimizing grammars for minimum dependency length. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, 184–191. https://www.aclweb.org/anthology/ P07-1024 Givón, T. (1979). On understanding grammar. Academic. Gómez-Rodríguez, C., Christiansen, M., Ferrer-i-Cancho, R. (2022). Memory limitations are hidden in grammar. Glottometrics, 52, 39–64. https://doi.org/10.53482/2022_52_397 Gómez-Rodríguez, C., Ferrer-i-Cancho, R. (2017). Scarcity of crossing dependencies: A direct outcome of a specific constraint? Physical Review E, 96, 062304. https://doi.org/10.1103/PhysRevE.96.062304 Hale, K. (1983). Warlpiri and the grammar of non-configurational languages. Natural Language and Linguistic Theory, 1(1). https://doi.org/10.1007/bf00210374 Hammarström, H. (2016). Linguistic diversity and language evolution. Journal of Language Evolution, 1(1), 19–29. https://doi.org/10.1093/jole/lzw002 Hyönä, J., Hujanen, H. (1997). Effects of case marking and word order on sentence parsing in Finnish: An eye fixation analysis. Quarterly Journal of Experimental Psychology, 50, 841–858. https://doi.org/10.1080/713755738 Kaiser, E., Trueswell, J. C. (2004). The role of discourse context in the processing of a flexible word-order language. Cognition, 94(2), 113–147. https://doi.org/10.1016/j.cognition.2004.01.002 Kendall, M. G. (1970). Rank correlation methods (4th). Griffin. Koizumi, M., Kim, J. (2016). Greater left inferior frontal activation for SVO than VOS during sentence comprehension in kaqchikel. Frontiers in Psychology, 7. https://doi.org/10.3389/fpsyg.2016.01541 Leela, M. (2016). Early acquisition of word order: Evidence from Hindi, Urdu and Malayalam [Doctoral dissertation, Universitat Autonoma de Barcelona]. http://hdl.handle.net/10803/399556 Levshina, N., Namboodiripad, S., Allassonnière-Tang, M., Kramer, M., Talamo, L., Verkerk, A., Wilmoth, S., Rodriguez, G. G., Gupton, T. M., Kidd, E., Liu, Z., Naccarato, C., Nordlinger, R., Panova, A., Stoynova, N. (2023). Why we need a gradient approach to word order. Linguistics, 61(4), 825–883. https://doi.org/10.1515/ling2021-0098 Lin, D. (1996). On the structural complexity of natural language sentences. COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics. https://aclanthology.org/C96-2123 Liu, H. (2008). Dependency distance as a metric of language comprehension difficulty. Journal of Cognitive Science, 9, 159–191. https://doi.org/10.17791/jcs.2008.9.2.159 Liu, H., Xu, C., Liang, J. (2017). Dependency distance: A new perspective on syntactic patterns in natural languages. Physics of Life Reviews, 21, 171–193. https://doi.org/10.1016/j.plrev.2017.03.002 Glottometrics 23 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. Meir, I., Sandler, W., Padden, C., Aronoff. (2010). Emerging sign languages. In M. Marschark P. E. Spencer (Eds.), Oxford handbook of deaf studies, language, and education (pp. 267–280, Vol. 2). Oxford University Press Oxford. https://doi.org/10.1093/oxfordhb/9780195390032.013.0018 Menn, L. (2000). It’s time to face a simple question: Why is canonical form simple? Brain and Language, 71(1), 157–159. https://doi.org/10.1006/brln.1999.2239 Meyer, L., Friederici, A. D. (2016). Chapter 48 - neural systems underlying the processing of complex sentences. In G. Hickok S. L. Small (Eds.), Neurobiology of language (pp. 597–606). Academic Press. https://doi.org/https: //doi.org/10.1016/B978-0-12-407794-2.00048-1 Mohanan, K. (1983). Lexical and configurational structures. The Linguistics Review, 3, 113–139. https://doi.org/ 10.1515/tlir.1983.3.2.113 Morrill, G. (2000). Incremental processing and acceptability. Computational Linguistics, 25(3), 319–338. https: //aclanthology.org/J00-3002 Motamedi, Y., Wolters, L., Schouwstra, M., Kirby, S. (2022). The effects of iconicity and conventionalization on word order preferences. Cognitive Science, 46(10). https://doi.org/10.1111/cogs.13203 Namboodiripad, S., Garcia-Amaya, L., Kramer, M., Tobin, S., Sedarous, Y., Henriksen, N., Boland, J., Coetzee, A. (2020). Verb position and flexible constituent order processing: Comparing verb-final and verbmedial languages. Poster at 33rd CUNY Conference on Human Sentence Processing. Amherst, Massachusetts. https://osf.io/d9wq8/ Namboodiripad, S., Goodall, G. (2016). Verb position predicts acceptability in a flexible SOV language. Poster at 29th CUNY Conference on Human Sentence Processing. Gainesville, Florida. Namboodiripad, S. (2017). An Experimental Approach to Variation and Variability in Constituent Order [PhD Thesis]. UC San Diego. https://escholarship.org/uc/item/2sv6z8bz Namboodiripad, S. (2019). A gradient approach to flexible constituent order. https://doi.org/10.31234/osf.io/rvjn5 Namboodiripad, S., Kim, D., Kim, G. (2019). English dominant and Korean speakers show reduced flexibility in constituent order. Proceedings of Chicago Linguistics Society 53. http://savi.ling.lsa.umich.edu/publications/ CLSmanuscript.pdf Newmeyer, F. J. (2000). On the reconstruction of ’proto-world’ word order. In C. K. et al. (Ed.), The evolutionary emergence of language (pp. 372–388). Cambridge University Press. Niu, R., Liu, H. (2022). Effects of syntactic distance and word order on language processing: An investigation based on a psycholinguistic treebank of English. Journal of Psycholinguistic Research, 51(5), 1043–1062. https: //doi.org/10.1007/s10936-022-09878-4 Occhino, C., Anible, B., Wilkinson, E., Morford, J. P. (2017). Iconicity is in the eye of the beholder: How language experience affects perceived iconicity. Gesture, 16(1), 100–126. Glottometrics 24 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. Ohta, S., Koizumi, M., Sakai, K. L. (2017). Dissociating effects of scrambling and topicalization within the left frontal and temporal language areas: An fMRI study in Kaqchikel Maya. Frontiers in Psychology, 8, 748. Perniss, P., Thompson, R., Vigliocco, G. (2010). Iconicity as a general property of language: Evidence from spoken and signed languages. Frontiers in Psychology, 1. https://doi.org/10.3389/fpsyg.2010.00227 Pickering, M. J., Garrod, S. (2006). Alignment as the basis for successful communication. Research on Language and Computation, 4(2-3), 203–228. https://doi.org/10.1007/s11168-006-9004-0 Prabath, K., Ananda, M. L. (2017). Configurationality and mental grammars: Sentences in Sinhala with reduplicated expressions. International Journal of Multidisciplinary Studies, 3(2), 25. https://doi.org/10.4038/ijms. v3i2.4 Sandler, W., Meir, I., Padden, C., Aronoff, M. (2005). The emergence of grammar: Systematic structure in a new language. Proceedings of the National Academy of Sciences USA, 102, 2661–2665. https://doi.org/10.1073/ pnas.0405448102 Tamaoka, K., Kanduboda, P., Sakai, H. (2011). Effects of word order alternation on the sentence processing of Sinhalese written and spoken forms. Open Journal of Modern Linguistics, 1, 24–32. https://doi.org/10.4236/ojml. 2011.12004 Tarr, M. J., Pinker, S. (1989). Mental rotation and orientation-dependence in shape recognition. Cognitive Psychology, 21, 233–282. https://doi.org/10.1016/0010-0285(89)90009-1 Temperley, D. (2008). Dependency-length minimization in natural and artificial languages. Journal of Quantitative Linguistics, 15(3), 256–282. https://doi.org/10.1080/09296170802159512 Temperley, D., Gildea, D. (2018). Minimizing syntactic dependency lengths: Typological/Cognitive universal? Annual Review of Linguistics, 4(1), 67–80. https://doi.org/10.1146/annurev-linguistics-011817-045617 Winter, B., Sóskuthy, M., Perlman, M., Dingemanse, M. (2022). Trilled /r/ is associated with roughness, linking sound and touch across spoken languages. Scientific Reports, 12(1). https://doi.org/10.1038/s41598-021-04311-7 Xu, C., Liang, J., Liu, H. (2017). DDM at work. Physics of Life Reviews, 21, 233–240. https://doi.org/10.1016/j. plrev.2017.07.001 Yan, J., Liu, H. (2023). Basic word order typology revisited: A crosslinguistic quantitative study based on UD and WALS. Linguistics Vanguard. https://doi.org/10.1515/lingvan-2021-0001 Appendix The maximum Kendall correlation Recall the definition of 𝜏 in Equation 5. Let 𝑛0 be the number of pairs that are neither concordant nor discordant. Glottometrics 25 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. Property 1. (6) 𝑛0 𝑛0 𝑛 − 1 ≤ 𝜏 ≤ 1 − 𝑛 . 2 2 Proof. By definition,   𝑛 𝑛 𝑐 + 𝑛 𝑑 + 𝑛0 = . 2 The substitution   𝑛 𝑛𝑐 = − 𝑛 𝑑 − 𝑛0 2 transforms Equation 5 into 2𝑛 𝑑 + 𝑛0 . 𝑛 𝜏 =1− 2 The latter and the fact that 𝑛 𝑑 ≥ 0 by definition leads to 𝜏 ≤ 1− 𝑛0 𝑛 . 2 By symmetry, the substitution   𝑛 𝑛𝑑 = − 𝑛 𝑐 − 𝑛0 2 transforms Equation 5 into 𝜏= 2𝑛𝑐 + 𝑛0 − 1. 𝑛 2 The latter and the fact that 𝑛𝑐 ≥ 0 by definition leads to 𝜏≥ 𝑛0 𝑛  − 1. 2 Hence we conclude Equation 6. Consider the Kendall 𝜏 correlation between 𝑥 and 𝑦. Let 𝑁 𝑥 be the number of distinct values of 𝑥 and 𝑁 𝑦 be the number of distinct values of 𝑦. Let us group the values of 𝑥 in a tie and define 𝑡𝑖 the number of tied values in the 𝑖-th group. Let us group the values of 𝑦 in a tie and define 𝑢 𝑖 the number of tied values in the 𝑖-th group. Then Property 2. (7) 𝑁𝑥   𝑁𝑦   ©∑︁ 𝑡𝑖 ∑︁ 𝑢 𝑖 ª 𝑛0 ≥ max ­ , ®. 2 2 𝑖=1 « 𝑖=1 ¬ Proof. Notice that pairs formed with values in a tie cannot be neither concordant nor discordant. Then  the 𝑖-th tie group of 𝑥 contributes with 𝑡2𝑖 pairs of points that are not concordant nor discordant. Then, the overall contribution to pairs of this sort by 𝑥 is 𝑁𝑥   ∑︁ 𝑡𝑖 𝑖=1 Glottometrics 2 . 26 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. Similarly, the contribution by 𝑦 to pairs of points that are neither concordant nor discordant is 𝑁𝑦   ∑︁ 𝑢𝑖 2 𝑖=1 . Combining the contributions of 𝑥 and 𝑦 one retrieves Equation 7. The reader with some statistical background may have already realized that the summations over the number of distinct pairs in a group above are the ingredients of the adjustment for ties in the denominator in the definition of 𝜏𝑏 (Kendall, 1970). The next property presents the range of variation of 𝜏 for each distance measure Property 3. Consider the Kendall correlation, i.e 𝜏(𝑥, 𝑦) where 𝑥 is some distance measure and 𝑦 can be any (for instance, 𝑦 can be some score 𝑠). We have that − 13 13 = −0.86̄ ≤ 𝜏(𝑑, 𝑦) ≤ = 0.86̄ 15 15 4 4 − = −0.8 ≤ 𝜏( 𝑝, 𝑦) ≤ = 0.8. 5 5 1 1 − = −0.3̄ ≤ 𝜏(𝑐, 𝑦) ≤ = 0.3̄. 3 3 Proof. Now we will derive the range of variation of 𝜏 for each distance measure by applying an implication of Equation 7, namely 𝑁𝑥   ∑︁ 𝑡𝑖 𝑛0 ≥ 2 𝑖=1 . Notice that 𝑛0 = 𝑁𝑥   ∑︁ 𝑡𝑖 2 𝑖=1 This happens when all the values of 𝑦 are different. This is a typical situation when using continuous scores, as repeated values are unlikely except in case of lack of numerical precision. Consider the matrix in Table 1. In case of 𝜏(𝑑, 𝑠), there are four groups with 𝑡1 = 𝑡4 = 1 (for 𝑑 = 1 and 𝑑 = 3) and 𝑡2 = 𝑡3 = 2 (for 𝑑 = 1 and 𝑑 = 2), that yield 𝑛0 = 𝑁𝑥   ∑︁ 𝑡𝑖 𝑖=1   2 =2 =2 2 2 and then Equation 6 gives 𝜏(𝑑, 𝑠) ≤ 1 − 13 2 = . 15 15 In case of 𝜏( 𝑝, 𝑠), there are three groups with 𝑡1 = 𝑡2 = 𝑡3 = 2 (two points in a tie for 𝑝 = 0, 𝑝 = 1 and also 𝑝 = 2), that yield 𝑛0 = 3 Glottometrics   2 =3 2 27 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. and then Equation 6 gives 3 4 = . 15 5 𝜏( 𝑝, 𝑠) ≤ 1 − Finally, in case of 𝜏(𝑐, 𝑠), there are only two groups with 𝑡1 = 1 and 𝑡2 = 5 (5 points in a tie for 𝑐 = 1), that yield   5 𝑛0 = = 10 2 and then Equation 6 gives 𝜏(𝑐, 𝑠) ≤ 1 − 10 1 = . 15 3 The lower bounds are obtained just by inverting the sign thanks to Equation 6. The following corollary indicates that if 𝜏(𝑑, 𝑦) is sufficiently large then no other distance measure can give a higher correlation and also the symmetric, namely, if 𝜏(𝑑, 𝑦) is sufficiently small then no other distance measure can give a smaller correlation. Corollary 1. If 𝜏(𝑑, 𝑦) > 1/3 then 𝜏(𝑑, 𝑦) > 𝜏(𝑐, 𝑦). If 𝜏(𝑑, 𝑦) > 4/5 then 𝜏(𝑑, 𝑦) > 𝜏( 𝑝, 𝑦), 𝜏(𝑐, 𝑦). If 𝜏(𝑑, 𝑦) < −1/3 then 𝜏(𝑑, 𝑦) < 𝜏(𝑐, 𝑦). If 𝜏(𝑑, 𝑦) < −4/5 then 𝜏(𝑑, 𝑦) < 𝜏( 𝑝, 𝑦), 𝜏(𝑐, 𝑦). Proof. A trivial consequence of Proposition 3. The minimum 𝑝-value of the Kendall correlation test As we explain in Section 4, the 𝑝-value of the Kendall 𝜏 correlation test is computed exactly by enumerating all the 6! = 720 permutations. In general, 𝑝-value ≥ 𝑚 , 𝑛! where 𝑚 is the number of permutation with the same 𝜏 as the actual one. Notice that 𝑚 ≥ 1 because the permutation that coincides with the current ordering yields the same 𝜏. As the test is one-sided and 𝑚 ≥ 1, one obtains 𝑝-value ≥ 1/6! = 1 = 0.00138̄. 720 However, a more accurate lower bound of 𝑚 is given by Property 4. (8) Glottometrics 𝑁𝑦 𝑁𝑥 ©Ö Ö ª 𝑚 ≥ max ­ 𝑡𝑖 !, 𝑢 𝑖 !® . 𝑖=1 𝑖=1 « ¬ 28 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. Proof. Every permutation of values in the same tie group does not produce a different sequence. For the 𝑖-th group of 𝑥, there are 𝑡𝑖 ! permutations of values in the same group that do not produce a different sequence. Integrating all the groups, one obtains that there are 𝑁𝑥 Ö 𝑡𝑖 ! 𝑖=1 permutations of the 𝑥 column of the matrix that produce the same sequence. By symmetry, there are 𝑁𝑦 Ö 𝑢𝑖 ! 𝑖=1 permutations of the 𝑦 column of the matrix that produce the same sequence. Combining the contributions of 𝑥 and 𝑦, we obtain Equation 8. Equation 8 leads to more accurate lower bounds of the 𝑝-value of 𝜏 that are presented in the following property. Property 5. Consider the 𝑝-value of the exact right sided correlation test of 𝜏(𝑥, 𝑦) where 𝑥 is some distance and 𝑦 can be any (for instance, 𝑦 can be some score 𝑠). The 𝑝-value of 𝜏(𝑑, 𝑦) satisfies 𝑝-value ≥ 1 = 0.005̄. 180 The 𝑝-value of 𝜏( 𝑝, 𝑦) satisfies 𝑝-value ≥ 1 = 0.01̄. 90 𝑝-value ≥ 1 = 0.16̄. 6 The 𝑝-value of 𝜏(𝑐, 𝑦) satisfies Proof. Now we will derive a lower bound of the 𝑝-value for each distance measure neglecting any information of about the distribution of the values of 𝑦, namely applying an implication of Equation 8, that is 𝑚≥ 𝑁𝑥 Ö 𝑡𝑖 !. 𝑖=1 Notice that 𝑚= 𝑁𝑥 Ö 𝑡𝑖 ! 𝑖=1 holds when all the values of 𝑦 are different. This is a typical situation when using continuous scores, as we have explained above. For 𝜏(𝑑, 𝑠), the four groups with 𝑡1 = 𝑡4 = 1 (for 𝑑 = 1 and 𝑑 = 3) and 𝑡2 = 𝑡3 = 2 (for 𝑑 = 1 and 𝑑 = 2) give 𝑝-value ≥ Glottometrics 4 1 = . 6! 180 29 Ferrer-i-Cancho & Namboodiripad Swap distance minimization in SOV languages. For 𝜏( 𝑝, 𝑠), the three groups with 𝑡1 = 𝑡2 = 𝑡3 = 2 (two points in a tie for 𝑝 = 0, 𝑝 = 1 and also 𝑝 = 2) give 𝑝-value ≥ 8 1 = . 6! 90 Finally, for 𝜏(𝑐, 𝑠), the only two groups with 𝑡1 = 1 and 𝑡2 = 5 (5 points in a tie for 𝑐 = 1) give 𝑝-value ≥ Glottometrics 5! 1 = = 0.16̄. 6! 6 30