Dr Danny Hinton, Faculty of Education, Health & Wellbeing

Dr Danny Hinton is a Lecturer in Psychology, and Acting Course Director for the MSc in Occupational Psychology programme.  Danny completed his PhD in 2015, investigating ethnic group performance differences on measures of cognitive ability.  His research interests are focused on the use of psychometric measures within organisations, particularly how these tools can be deployed fairly without compromising their psychometric properties.

Through his consultancy, Danny has worked with a diverse range of clients both within the UK and internationally, developing psychometric tools and delivering HRM solutions to clients in both the public and private sectors.

 

Accounting for ethnic group test performance differences through test familiarity

Introduction

Cognitive ability tests are a powerful way to predict an applicant’s future job performance.  However, many ethnic groups score consistently lower than the White majority on these tests, representing an often unseen barrier to opportunity in Western societies.  One potential explanation for these differences is test familiarity.  However, no robust measure of test familiarity exists within the literature.  If such a scale were created, it could be used to establish whether differences in test performance between ethnic groups could be explained by differences between these groups’ test familiarity.  If this were the case, it would be proof of concept for a future solution that will allow for accurate job selection that is fair to all.  This represents a potential solution to a problem that has affected society for over 100 years.

Background

Psychometric tests of cognitive ability have been consistently shown to be the single best predictor of future job performance available to us for job selection (e.g. Schmidt & Hunter, 1998; Robertson & Cooper, 2001).  In spite of this, a consistent problem that has plagued the use of ability testing is the consistent observation that some minority ethnic groups score substantially less well than the White majority (e.g. Gottfredson, 2005; Hernstein & Murray, 1994; Martocchio & Whitener, 1992; Schmidt, Clause & Pulakos, 1996).  What is perhaps more troubling from an academic point of view is that very little consensus has been reached as to why these group differences arise, the research world being split between the Hereditarian argument (Rushton & Jensen, 2005) and the Environmental argument (Dickens & Flynn, 2006).

However, neither of these explanations adequately describe the mechanism by which ethnic group differences arise.  One possible explanation that has recently been identified is in the form of differences in test familiarity between ethnic groups.  Hinton (2015) observed that test familiarity could explain between 60 and 90% of the variance in ethnic group test score differences.  This study, however, was based on a limited sample, and used a flawed measure of test familiarity.  A robust measure of test familiarity needs to be developed to properly examine its relationship with ethnic group test performance differences.

Aims & objectives

The aim of the present study was to design and develop a measure of test familiarity that was short, reliable and valid, that could then be able to be used in future studies to further investigate the link between ethnic group test performance differences and test familiarity differences.                                                            

Methods

The project was a scale development project that followed the procedures laid out by Hinkin (1995) and DeVellis (2016).  The project consisted of three main stages.  In the first, a deductive approach to scale design was taken.  The initial pool of items to measure test familiarity was generated based on an adapted construct definition from Reeve, Heggestad and Lievens (2009).  This process resulted in an initial item pool of 34 items.

In the next stage, construct validity was assessed by recruiting five subject matter experts, all of whom were lecturers in psychology at the University of Wolverhampton with expertise in psychometric testing.  These experts were asked to review each item alongside the construct definition.  They were asked to rate each item on the degree to which it tapped into the construct, and to provide any comments on how it could be improved.  They were also asked to comment generally on whether anything was missing, and whether any other items were needed to fully capture the construct.  On the basis of this stage, 9 items were removed, leaving a total of 25 items in the reduced item set.  Additionally, some of these items were reworded for clarity.

In the final stage of data collection, participants were recruited using the SONA psychology participant pool system, through convenience sampling, and through the Prolific participant recruitment service.  522 participants were recruited.  All participants completed Pearson Talent Lens’ Core Abilities Assessment (CAA; NCS Pearson, 2007).  The CAA is a measure of general mental ability that consists of 20 questions (6 x Verbal, 7 x Numerical and 7 x Abstract Reasoning) that are administered in 12 minutes.  Once this had been completed, participants completed a short questionnaire on Qualtrics.  This questionnaire recorded their gender and age, and will then ask them to complete the test familiarity items.  Finally, participants completed the items used by Furnham, Chamorro-Pemuzic and Moutafi (2005) to measure self-estimated intelligence (SEI).

Summary findings

Exploratory Factor Analysis was conducted on the test familiarity items to determine the number of underlying factors.  This pointed to a three-factor solution, which was confirmed through parallel analysis.  The scale was then developed as a set of three subscales: Test Familiarity at Work, Test Familiarity from Education, and Test Skills Developed through Familiarity.

Each scale was checked for Cronbach’s Alpha, and further EFA was used to identify any item that did not tap into the scale the construct was measuring.  This led to led to the removal of 9 items, for a total of 20 in the final item set (Work: 6 items; Education: 8 items; Skills: 6 items).

To further establish the psychometric properties of the scale, a number of analyses were conducted.  The three factor structure was confirmed using Confirmatory Factor Analysis.  Divergent validity was determined by demonstrating (again using CFA) that the test familiarity items and SEI items were measuring different constructs.

Finally, predictive validity was examined by correlating test performance on the CAA with scores on the test familiarity scale.  Though scores on the main scale were moderately strongly correlated with test performance, some interesting findings were observed at the facet level: whereas the test skills scale was correlated strongly with test performance, scores on the other two scales were found to be unrelated to test performance.

These findings provide some interesting insight into the nature of test familiarity as a facilitator of test performance: it would appear that mere exposure to testing as part of one’s schooling or work is not enough to improve later test performance.  Rather, it appears to be the case that test familiarity that leads to the development of test-taking skills is the most useful for improving test performance.

Follow-up research in this area will have to focus on this distinction.  The tool that has been developed as a result of this research is a short, robust measure that will be useful to investigate in the future whether test familiarity can explain ethnic group performance differences on ability measures.

References

DeVellis, R. F. (2016). Scale development: Theory and applications (Vol. 26). Sage publications.

Dickens, W. T. & Flynn, J. R. (2006). Black Americans Reduce the Racial IQ Gap: Evidence From Standardization Sample. Psychological Science, 17 (10), 913–920.

Furnham, A., Moutafi, J., & Chamorro‐Premuzic, T. (2005). Personality and intelligence: Gender, the Big Five, self‐estimated and psychometric intelligence. International Journal of Selection and Assessment, 13 (1), 11–24.

Gottfredson, L. S. (2005). What if the hereditarian hypothesis is true? Psychology, Public Policy, and Law, 11, 311–319.

Herrnstein, R. J., & Murray, C. (1994). The Bell Curve. London: Simon and Schuster.

Hinkin, T. R. (1995). A review of scale development practices in the study of organizations. Journal of Management, 21 (5), 967–988.

Hinton, D. (2015). Uncovering the root cause of ethnic difference in ability testing: differential test functioning, test familiarity and trait optimism as explanations of ethnic group differences (Doctoral dissertation, Aston University).

Martocchio, J. J., & Whitener, E. M. (1992). Fairness in personnel selection: A meta-analysis and policy implications. Human Relations, 45 (5), 489–506.

NCS Pearson (2007) Core Abilities Assessment Manual.

Reeve, C. L., Heggestad, E. D., & Lievens, F. (2009). Modeling the impact of test anxiety and test familiarity on the criterion-related validity of cognitive ability tests. Intelligence, 37 (1), 34–41.

Robertson, I., & Smith, M. (2001) Personnel selection. Journal of Occupational and Organizational Psychology, 74, 441–472.

Rushton, J. P., & Jensen, A. R. (2005). Thirty years of research on race differences in cognitive ability. Psychology, Public Policy, and Law, 11, 235–294.

Schmitt, N., Clause, C. S., & Pulakos, E. D. (1996). Subgroup differences associated with different measures of some common job-relevant constructs. International Review of Industrial and Organizational Psychology, 11, 115–140.

Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124 (2), 262–274.