Reviewing Lehöfer and Broersma (LexTALE)

1732147200000

A Review of the article Introducing LexTALE: A quick and valid LexicalTest for Advanced Learners of English by Kristin Lemhöfer and Mirjam Broersma from Radboud University, Netherlands

This paper (Lemhöfer & Broersma, 2012) aims at validating the LexTALE vocabulary test ability to reliably locate test takers in CEFR-like fluency categories for advanced learners of English. The study is ogarnised around a comparision of the test results with those of control tests usually used in psycholinguistic studies. The ultimate goal being to have a reliable, free, short, easy to access test that could be used in future studies. As pointed out below, this work stems out of previous works by Paul Meara in the field of second language acquisition and is not the first study evaluating LexTALE.

The literature review in the introduction does a great job summarizing the point made by previous research (Anderson and Freebody, 1979) that there is a general correlation between vocabulary knowledge and other aspects of language proficiency in both L1 and L2, also supporting the idea that proficiency in non-native languages means more native-like language processing at the cognitive level. These two facts should naturally lead to conclude, by conjunctive syllogism, that vocabulary size is an indicator of how efficiently a language, native or not, is being processed by the brain. However, the paper never makes this conclusion and instead focuses on L2 learners who started learning English in middle school and have reach a rather advanced level. The goal of this paper is to see if the test can effectively sort those into subcategories, from advanced intermediate (B2) to higher intermediate (C2).

Comparing a panel of commercial and non-commercial tests, in their scopes and methodological approaches, the introduction displays a thorough understanding of the state of the research in the field of L2 proficiency and justifies every aspect of the methodology used. No attempt is made however to point at the limitations of those choices. Maybe are the other metrics used so unreliable that the format of the test is implicitly understood to be the “least bad”. However, identifying the limitations of the study would allow both a better understanding of the results and prospects for others to try to alleviate those potential weaknesses in future work.

The goal of a vocabulary test seems simple, counting the words in the mind of the subjects. However, answering this question pushes the boundaries of psycholinguistics, and spreads into cognitive psychology and other fields of cognitive science. When looking away from psycholinguistics, we see that others tried to answer similar questions with their own methods and protocols (Smith & al., 2010). The Wechsler intelligence scales ask for synonyms of words of different frequencies in one of its subtests used to build its Verbal Comprehension Index (VCI) one of the four used in calculating the final IQ. If the subjectivity and time-consuming characters of this task would have made it impractical for the requirements the LexTALE format tries to fulfil, other approaches would have been as well, if not better, suited. The Peabody Picture Vocabulary Test for instance relies on matrices of four pictures of which one is supposed to be selected by the subjects after hearing words of different levels of difficulties (understand, frequency). The advantages of this second format are numerous. It is usually fast and does not rely on subjective assessment by anyone, which would make it easy to port on a software form. The four choices reduce the chance of getting the right answer by pure luck. The Peabody test would also be easier to translate in other languages, and would work better for dyslexic people, making it a good candidate in the field of second language acquisition studies. However, none of what I mention in this paragraph is ever alluded to in the paper, which is more centered on proving the validity of LexTALE.

Both the lexical items used in the test and the participants in the study were carefully selected in order to avoid biased result. The words and non-words used are selected from a unpublished list made by Meara for a similar test, the selection process was a pilot study of its own aiming at creating four categories of difficulty for the test, based on the words item-total correlation. The participants are taken from two very different populations pools to avoid the L1 potentially biasing the results. It is also made sure that they all have a solid knowledge of English. The Koreans in the study were required to have scored a minimum of 750 in TOEIC test, which means at least a CEFR B2 level. The Dutch participants are expected to be even more fluent than that. The fixed nature of the list would make it impractical to test the dynamics of individuals’ progress, as passing the same test several times would flaw the results, this narrows the use cases for the test. But those two points are not in the scope of the paper. The four other control tests, are carefully crafted and the protocols are clearly laid out and made reproducible. In the case of the translation test (L1 to L2 and vice versa), the list of words is given in the Appendix and the protocol for the selection of the words is fully described in the methodology section. The two other control tests are a Quick Placement Test (QPT), and a background questionnaire. The QPT was published by Oxford University Press in 2001, but the reference did not allow me to find the reference online. It may be the predecessor of the current Oxford Placement Test, but I could not find more, no study is mentioned on its accuracy or validity, but the authors seem confident that it is a reliable indicator for CEFR level in English. This questionnaire cannot be found in the Appendices, but it is generally described in the methodology section.

The mean reesults of each of the two groups are given for each test independently. Different methods of calculations are used to analyze the LexTALE results: the so-called mean rating, a mean of the proportions of correct Yes and correct No answers, the $ΔM$, introduced by Maera in 1992 (Huibregtse & al. 2002), and the $I_{SDT}$, a correction of the previous index (ibid.). All these metrics are normalized in values between 0 and 1. Which allows for a correlation analysis between LexTALE and each of the control tests in scatter diagrams in figures 1 to 4. The following two figures analyze in depth the agreements between correct and incorrect recognition rates in LexTALE in the two groups. All the results yield strong statistical evidence (with no p-value over 0.5) for the confirmation of the tested hypothesis. The paper examines meticulously the consistency of the figures using the split-half reliabilities of each test in regard to each group to ensure it avoids any confirmation bias. It does find some more unconsistency in the Korean group but avoid making hasty conclusion on it. Later in the discussion, the test compares the results to previous papers, from the same authors.

In conclusion, this study displays consistent evidences confirming the hypothesis that the LexTALE test can be used to differentiate intermediate, from upper-intermediate, from advanced non-native speakers of English. This result can be used in selecting participants for psycholinguistic studies in L2 research, regardless of the L1 with a high degree of confidence. Without pretending to be the only candidate metrics for this end, it has several advantages that could make it a reliable standard if researchers in the field were to systematically integrate it to their studies. But some may find it too specialized to become a standard, arguing that a potential standard ought to have finer disambiguation granularity (not just CEFR levels) and should not rely on a fixed list of words in order to allow successive evaluations.

References

Anderson, R.C. and Freebody, P. (1979) ‘Vocabulary Knowledge, Technical Repert No. 136.’, Center for the study of reading [Preprint].

Council of Europe (2020) Common European Framework of Reference for Languages: Companion Volume. Namur: Council of Europe.

Huibregtse, I., Admiraal, W. and Meara, P. (2002) ‘Scores on a yes-no vocabulary test: correction for guessing and response style’, Language Testing, 19(3), pp. 227–245. Available at: https://doi.org/10.1191/0265532202lt229oa.

Lemhöfer, K. and Broersma, M. (2012) ‘Introducing LexTALE: A quick and valid Lexical Test for Advanced Learners of English’, Behavior Research Methods, 44(2), pp. 325–343. Available at: https://doi.org/10.3758/s13428-011-0146-0.

Meara, P., & Buxton, B. (1987). An alternative to multiple choice vocabulary tests. Language Testing, 4(2), 142-154. https://doi.org/10.1177/026553228700400202

Maera, P. and Jones, G. (1988) ‘Vocabulary size as a placement indicator’.

Smith, N. et al. (2010) The Signs of a Genius: Language Against the Odds. Cambridge: Cambridge University Press. Available at: https://doi.org/10.1017/CBO9780511780530.

On how to create pseudo words using n-gram algorithms

New, B., Bourgin J., Barra J., and Pallier, C. (2004) UniPseudo : A Universal Pseudoword Generator. ResearchGate [Preprint]. Available at: https://doi.org/10.1177/17470218231164373.

Shannon, Claude E. The Redundancy of English. Aus: Cybernetics. The Macy Conferences 1946–1953. The Complete Transactions (1953).