Review: Language Testing – Tim McNamara

I’ve been revising my testing, evaluation and assessment theory as of late in preparation for some work that I have coming up, and it’s been great to get back into an area that I’ve always been interested in. I’ve been involved in assessment in some way or another for the past ten years, and I’ve even written about exam moderation, but I’m always surprised at how much I don’t know when I go back to the literature – there is so much! Many years ago I read Brown’s Testing in Language Programs and it really opened my eyes to just how much statistics were involved, and how much ground I needed to cover there was. Fast forward a good eight years, and here I am still learning! I recently picked up Tim McNamara’s very short and to-the-point Language Testing for dirt cheap (In Spain we have Vinted, which is mainly for clothes, but they also have second-hand books for really cheap), and so what follows here is my review of the book, what I learnt/revised and what I liked/didn’t like about the book.

Three-sentence summary

McNamara’s Language Testing is a short introduction to the world of testing, evaluation and assessment in the world of language teaching. Readers encounter four sections within the book (survey, readings, references and glossary) which provide them with an overview of the main concepts of testing and plenty of material to keep them thinking. Whilst this book does not go deep in testing theory, it does provide the necessary details to prepare readers to engage with other pieces of literature that are somewhat more dense (e.g., Bachman’s Fundamental Considerations in Language Testing).

Three takeaways

“Designing and introducing a new test is a little like getting a new car on the road. It involves a design stage, a construction stage, and a try-out stage before the test is finally operational. Bu that suggests a linear process, where’s in fact test development involves a cycle of activity, because the actual operational use of the test generates evidence about its own qualities.”

McNamara, 2000, p.23

  • Test development is a somewhat complex, iterative process: McNamara makes clear that in order for a test to become operational, it needs to go through a whole range of processes. These processes involve test designers identifying the construct that needs to be assessed (e.g., reading fluency), the domain – “the set of tasks of the kinds of behaviours in the criterion setting, as informed by out understanding of the test construct” (McNamara, 2000, p.25) – the test content, and test method (i.e., identifying how test takers will interact with the test). From there, once we have everything ‘ready-to-go’, we then go through a testing phase, and then a trialing phase with real learners, and through these phases we collect data on how each test item collects evidence of the test taker’s ability within the identified construct. From there, we can make changes to items (e.g., we might remove an item from the test, modify it, etc.) to ensure that the test is both valid and reliable. Once all of this is done, then the test is ready to go.

“The purpose of validation in language testing is to ensure the defensibility and fairness of interpretations based on test performance. It asks ‘On what basis is it proposed that individuals be admitted or denied access to the criterion setting being sought? Is this a sufficient basis?'”

McNamara, 2000, p.48

  • Validity is vital, but not easy to achieve at times: So, why would we want a test to be valid? Well, we want to be sure that the test can draw correct inferences about a test taker’s ability with a certain construct so that the test score accurately represents their performance. In effect, we want the test to be accurate and fair. Now, if we take these two words – accurate and fair – and break them down a little further, we can start to unravel the somewhat mysterious world of validation.
    • Accuracy: In general terms, we can think of validity as the test testing what the test says it’s testing (although, within the literature this definition is somewhat contested, with more recent publications saying that this is only part of what validity is). McNamara writes that when validity enters the conversation, much of the focus is on test content, and thus touches on content validity. When thinking about content, test designers need to be sure that the items that are testing construct X, are doing so in such a way that they connect appropriately to the domain in which the construct is used. McNamara (2000, p.51) uses the example of academic reading: “For example, in a test of ability to read academic texts, does it matter from which academic domain the texts are drawn? Should someone studying law be asked to read texts drawn from fields such as education or medicine?” He also makes clear that this process of identifying if the test has content validity or not is complex and elaborate process.
    • Fairness: Now, accuracy and fairness are obviously intertwined, but I wanted to separate them to distinguish between test content and test method. So, test designers need to consider how test takers interact with the test, and how the test method may impact test takers’ ability to engage with the test fairly. For example, “tests may introduce factors that are irrelevant to the aspect of ability being measured (construct irrelevant variance); or they may require too little of the candidate (construct under-representation)” (McNamara, 2000, p.53). If the test favours X test takers because of Y factors, then can the test be considered ‘fair’? These are difficult questions that test designers need to answer.

“Authorities responsible for assessment sometimes use assessment reform to drive curriculum reform, believing that the assessment can be designed to have positive washback on the curriculum. However, research both on the presumed negative washback of conservative test formats, and on the presumed positive washback of communicative assessment (assumed to be more progressive) has shown that washback is often rather unpredictable. Whether or not ht desired effect is achieved will depend on local conditions in classrooms, the established traditions of teaching, the immediate motivations of learners, and the frequently unpredictable ways in which classroom interacts develop.”

McNamara, 2000, p.74

  • Washback and test impact are important to consider, but are by no means ‘simple’: When I was first getting started with assessment theory many years ago (right as I started studying for my Delta), one of the major terms that kept coming up was washback. Washback is the idea that the test influences the teaching that occurs – and this can either be positive or negative (in its simplest form, although rarely is it so simple). What I hand’t known was the research into positive and negative washback, and how the reality is that classroom dynamics, teacher actions, etc. might have a greater effect on what occurs in the classroom than the test itself. This, as a teacher educator and manager, makes intuitive sense, although from my experience the washback of exams is impacted also by the social value of the test – otherwise known as test impact. So, we’ve got these two terms – washback and test impact – but they are not the same thing. Washback focuses on the classroom, whereas test impact looks at “the effects of the test on the community as a whole” (McNamara, 2000, p.74). Where tests play gatekeeping roles (e.g., IELTs, TOEFL, Cambridge English Qualifications, etc.), they play a significant role in society, and as such often have a significant impact on learners’ lives. Personally, I believe that the tests that have large test impact (whether this is positive or negative is another thing) are the ones that have a large washback effect.

What I liked

  • Short and to the point: This book is part of the Oxford Introductions series, and so as you can imagine it is very much to the point, with very little fuss and depth. It emphasises breadth and explanation of assessment metalanguage. Its bevity is also a negative (I’ll explore this shortly), but for those looking to get a grasp of language assessment more broadly, this is a good start.
  • Split into nice sections: The book is split into four sections: Survey, Readings, References and Glossary. The survey section focuses on the bread and butter of language assessment, and it’s here that the reader will spend the majority of the time. The readings sections was quite interesting – basically, there are extracts from other articles and books on assessment (most of them are VERY famous within the assessment world), and McNamara has included some post-reading questions for each of them. The references section is one of my favourite though, as it has all of the references classified using three boxes (see image): one means that it is written at an introductory level, two boxes at a more advanced and technical level, and three boxes at a specialised level. Lastly, there is a the glossary, which is very useful if you need to be familiar with a lot of the metalanguage.
McNamara, 2000, p. 122

What I didn’t like

  • A little dated: I’m also reading Fulcher and Davidson’s Language Testing and Assessment which was written only about seven years after McNamara’s Language Testing – and it’s clear to see that even in that short span, there were some important developments within language testing. Fast forward to 2024 and we are now in the age of multi-level testing, AI, and perhaps even remote proctoring – basically there are some things that would need to be included in this book for it to be considered a great introductory book on assessment nowadays.
  • Scratches the surface: Even though this is an introductory book on assessment, I felt at times that there was scope to go just a little deeper. For example, the chapter on validity, whilst covering most of the broad aspects, could have benefited from a more in-depth discussion and further examined examples. Having said this, I’m being really picky.

Who should read this book?

  • Teachers interested in assessment: For those that are interested in understanding the basics of language testing, that this is a great start. It is written in easy-to-understand language, and its lack of depth is a strength for introductory readers.
  • Delta/DipTESOL candidates and MA students: Those that are preparing for these courses and who have no experience with assessment previously would benefit from reading this book. This being said, as it lacks depth, you would need to continue your reading elsewhere; however, this will give you a great overview of the main elements, so use it as your prep reading.

Final notes

Hopefully I’ve conveyed that McNamara’s Language Testing is a good introductory book to language testing, albeit a little dated. It will give the reader a good overview of the major concepts, and for those of you that need to then engage with more complex assessment literature (e.g., Bachman’s works), you will be more prepared. This is a great little book to have on the shelf, and as the assessment world is so alien in many aspects (compared to ‘teaching’), I think this introductory book is necessary for many teachers.

I’d love to hear your thoughts, though. What assessment literature are you reading at the moment? Have you read McNamara’s Language Testing? Feel free to share your thoughts in the comments!

Book details

Book title: Language Testing

Authors: Tim McNamara

Pages: 140

ISBN: 9780194372220

References

McNamara, T. (2000). Language Testing. Oxford University Press.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.