Assessment Literacy - what is it and why does it matter?
Part 3 looks at traditional notions of test validity, and some tips for creating and evaluating language tests which do what you want them to do for your learners.
How do we know whether a test can do what it claims to be able to do? Why are some ESOL examinations perceived as more reliable or trustworthy than others? How can we design tests which help our learners to progress in their study while giving us a realistic view of their ability?
What is Validity?
The notion of validity is essential when we talk about assessment of any kind. The different types of validity that we can engineer into language assessment help us to do exactly that. This article aims to define some ways in which a test can be said to be ‘valid’ for specific purposes, and will give examples of some ways in which we can design assessments with these criteria in mind. I present these types of validity as questions that we can ask about any language assessment, and which relate to the types of validity presented here. Following this, some considerations for teachers and test designers are presented, which can help to maintain these different types of validity, aiming at the production of more effective assessment tools and practices as a result.
Do my learners know what they are being tested on?
One reason why many test-takers underperform in language assessments is that they do not understand what is being asked of them in the different sections of the test. Face validity relates to the clarity of assessment criteria to the test-taker, and how much their performance may be affected by this. Face validity can only be evidenced to the test-takers themselves, as only they experience the test with (or without) the necessary clarity to perform to the standard required of them. Face validity can be maintained by ensuring that questions clearly address specific language or skills. This may be achieved by clearly highlighting target forms, framing questions in clear and specific ways, using examples to aid learners in their progress through specific question types, and making sure that there is not too much distraction from the qualities of language being tested.
How can I maintain face validity?
Face validity can be ascertained by asking students for reflective feedback after the tests they take, in terms of the purposes, task types and assessed criteria they felt they had experienced during the test itself. Feedback activity could include questions such as:
‘Was anything in this test surprising to you?’
‘Why do you think I decided to give you this test?’
‘Did this test help you to understand your strengths and weaknesses in (language area)?’
‘Would you like to do a similar test in a few weeks?’
Responses can be gathered via a written post-test feedback form, or more detailed responses can be achieved through short meetings with students who are representative of the level of the class.
As a teacher, gathering information about face validity can be a good way of assessing how effective your learners think your tests are, and whether there is any area in which they feel it could be improved. The more face-valid students feel your tests to be, the more likely they will be to see the assessments that you design as an important part of their learning. This can in turn add motivation and engagement to their learning experience, indirectly leading to higher performance overall.
Does the test do what I want it to do?
Aside from the subjective views of test-takers and other stakeholders about the value of language assessment, another way in which a test can be valid (or not) is shown by the overlap between the language which appears on a test and the types of language which are claimed to be assessed (content validity) and the way in which test questions are designed, structured and ordered through the test (construct validity). For example, a test which claims to be assessing students’ comprehension of written grammar through reading may contain a text which includes a lot of challenging vocabulary, none of which is typically studied at the test-takers’ level of class. This would give the test low content validity, as the challenge posed by the vocabulary content would prevent students from performing well in their grammar comprehension (which claims to be tested). Similarly, if only a partial range of the language and skills claiming to be tested appear on the test, key language areas may slip through the assessment net, and your test would provide little evidence of test-takers’ performance in those areas, another way in which content validity can be compromised. A test which claims to be assessing students’ production of appropriate grammar forms, but which includes reading texts followed by comprehension questions, would have low construct validity, as comprehension tasks alone are not an appropriate question type for assessing productive skills.
How can I ensure content and construct validity?
Evidence of both construct and content validity can be gained by observing the types of student activity that occur in different parts of a test. If test-takers are underlining content words and pausing to produce the right vocabulary for a task answer, then whatever the desired purpose of the task, this question would seem to be testing students’ vocabulary. By contrast, if test-takers are completing tasks well, referring often to a text and using that information in their responses, it could be said that these tasks are prompting reading skills, as students are working receptively in order to prepare their responses. This activity may overshadow the assessment of language work (grammar or vocabulary, say), as more processing time would be devoted to the skill of reading rather than the processing of language itself. You can gauge the test-takers’ experience of the content and construct of a test by asking them questions such as:
‘What did you have to do in section 1 of the test?’
‘Did you have to think about spelling in section 2?’
‘Describe the process you took when you answered questions 12 to 16. What did you do first? And then?
‘What language did you have to think about for questions 8 to 15?’
‘When did you focus on vocabulary the most during the test?’
If the test-takers report the same areas of focus for different questions as compared to your intended purpose in those sections, it is likely that you have achieved content and construct validity. If not, revisit those sections and think about the language and skills being asked of the test-takers in the terms outlined above.
How can I focus a test for specific language?
As a teacher or test designer, focus the challenge of assessment tasks onto the area you are claiming to test. In a grammar assessment, use vocabulary that is familiar to the test-takers to avoid distraction from the purpose of the assessment. In a productive skills assessment (writing or speaking), don’t provide too much textual support (unless you are specifically aiming to assess the ways in which test-takers use information in their own speaking or writing); a short prompt leaving space for expression and interpretation will assess a wider range of speaking or writing skills than a long, complicated task with a lot of information to draw on. It is also important to include a range of language which relates to the area being tested. Using a wide range of relevant topics, for example, to ensure range of vocabulary in those areas, or including tasks focusing on different types of interaction (dialogues, articles, interviews, etc.) can add range and therefore content validity, to a grammar test.
What are the effects of language testing?
In the previous part of this article, we saw that the term ‘test’ is seen to mean ‘summative test’, the assumption being that language assessment is the end point of a process for teachers and learners. However, most high-stakes examinations in ESOL actually represent the starting point of a new stage of life for test-takers. IELTS candidates may study many hours per week in preparation for the exam, where they must get a certain score to enable them to study in an English-speaking country. Little thought is given to whether the skills they develop in preparation for the exam will be of any use to them in their new lives overseas. Success in IELTS or TOEFL is not the end point of a period of study, it is the gateway to a new stage of test-takers’ lives in another country, working under a new educational system in a different culture. How far can language examinations develop skills for life in these new situations?
The next article in this series will focus on two types of assessment validity relate to exactly this point, which are the basis of many contemporary language assessments. These key features of language performance assessment aim at transferrable skills alongside language and exam skills development: consequential validity and cognitive validity.