Science series materials are brought to you by TestGorilla’s Assessment Team: A group of IO psychology, data science, psychometric experts and assessment development specialists with a deep understanding of the science behind skills-based hiring.
Predicting the future is a difficult task, usually done only by skilled oracles, recruiters, and hiring managers. You read that right. Employers looking to hire top candidates are aiming to predict the future success of a candidate in their role. While there are no fortune tellers to help you make the right hire, there is something even better – strong, empirical science.
In our previous blog post on the criterion-related validity of different hiring tools, we reported on and discussed average validity coefficients calculated from numerous studies of different hiring tools and discussed what the results told us about the effectiveness of these tools and what that means for your hiring process.
In this blog post, we’ll dive deeper into the science and explain how you can use these insights to improve your hiring process.
While very useful, average validity coefficients do not capture the variability of the results found by individual studies. For instance, the mean validity of .42 found for structured interviews by Sackett et al. (2022) is the average of the validity coefficient values reported by different studies included in their meta-analysis. A meta-analysis is a super-study that summarizes the results of tens or hundreds of different studies. Some of the studies analyzed in the paper reported validity coefficients higher than the average, and some reported values lower than the average.
A credibility interval is useful in efficiently summarizing this variability between studies. The credibility interval is a range of values between an upper and lower bound that covers the majority of the results of the studies included in a meta-analysis. An 80% credibility interval captures 80% of the values around the average observed from many different studies.
An example can help show how these credibility intervals offer a more comprehensive picture. Imagine you're trying to find out how tall people are in your city. You can make some measurements and calculate that the average height is 5'7". This value would correspond to the average value reported in a meta-analytic study.
To get a more inclusive picture, you can conduct additional analyses and conclude that 80% of people are between 5'2" and 6'2" tall. This range would correspond to the 80% credibility interval reported in the meta-analytic study. By including both this interval and the average value, you’re able to get a much more precise and nuanced picture.
Similar to our example, Sackett et al. (2022) reported both the average values and the 80% credibility intervals to provide a nuanced picture of the validity coefficients of different hiring tools. Table 1 shows the validity coefficients found by Sackett and colleagues, along with their 80% credibility intervals.
For example, structured interviews, which is at the top of the list in terms of a mean validity of .42, have an 80% credibility interval ranging from .18 to .66. Thus, the validity of structured interviews should really be viewed as the “validity coefficient is .42, plus or minus .24”. This conveys a different message than simply “validity coefficient is .42”.
Large variability around the average indicated by wide credibility intervals usually means uncertainty. The wider the credibility interval, the less certain we can be about the exact validity coefficient of a hiring tool in our context. The organizational context, the nature of the role, and the overall goal of the selection process can all impact the exact validity coefficient in your application of the hiring tool.
Take, for example, structured interviews. This hiring tool has an 80% credibility interval between .18 and .66, with an average estimate of .42. To better understand the variability of the estimate, it pays to consider this hiring tool closely.
A single structured interview can cover multiple skills and competencies, ranging from general ones like communication or command of English to role-specific skills like negotiation. When skills scored during the interview are similar to the skills needed for success in a role, it is more likely that this hiring tool will reach above-average validity estimates. Moreover, the validity of structured interviews is also likely to vary depending on how well the interview has been designed and implemented (how the questions are formulated, how interviews are scored, how well interviewers are trained, etc.).
Another good example of the importance of context are personality tests. More specifically, those that assess the Conscientiousness (C) trait from the OCEAN personality test.
Looking at Table 1, you can see that researchers report results for two different measures of this trait: overall and contextualized. An overall measure of personality assesses candidates’ personality in general, while a contextualized measure assesses candidates’ personality at work. By introducing this distinction, Sackett et al. (2022) were able to narrow the credibility intervals so much that the contextualized measure of Conscientiousness has an average validity of .25 – with both the lower and the upper bound of the 80% credibility interval sitting at .25 as well.
Other researchers have shown that the validity coefficient of Conscientiousness can change depending on the complexity of the role a person is hired for. For example, in roles with high complexity, Conscientiousness is less important than in roles with moderate or low complexity.
The take-home message is that while these mean meta-analytic validity coefficients can provide general insights into the efficacy of different hiring tools, there is rarely a one-size-fits-all approach to hiring. You should mix and match hiring tools to balance each tool's benefits for the specific role and context you are hiring for.
You can take a few steps to deal with this uncertainty in the hiring process.
First, when comparing different hiring methods, pay attention to both the lower end of the 80% credibility interval and the average validity coefficient. This approach can help you identify predictors that have similar validities in different contexts or those with lower downside risk.
For example, Sackett and colleagues found that structured interviews have higher mean validity than empirically keyed biodata (.42 versus .38). However, biodata has a higher value for the lower end of a credibility interval (.26 versus .18). A risk-averse employer might prefer the predictor with less downside risk, and thus focus on the lower-end credibility value in identifying potential predictors.
Second, you can run a criterion-related validation study yourself to examine the relationship between the hiring tools you use and relevant job criteria for specific jobs or job families. Such a study can provide you with data-driven insights that are directly applicable to your specific situation. However, large sample sizes are needed to get meaningful insights from such a study (read more here). Organizations that don’t have a large number of hires in a position would be unable to do this type of study. To help all our customers, TestGorilla has a range of ongoing criterion-related validity initiatives that include small and large organizations. You can learn more about the opportunity here.
Lastly, you can look at the original meta-analytic studies included in the Sackett et al. (2022) meta-analysis and find studies that most closely describe your hiring process and situation. These studies often break down results based on job type, different types of performance criteria, and other factors that might help you make more informed decisions about the best hiring method for your situation (see, for example, this meta-analysis on the predictive validity of conscientiousness). By generalizing the validity of hiring tools to situations that are closer to yours, you can be more confident that the pre-employment tests you are using are valid for the positions you are testing for.
Crafting a high-quality selection procedure is no small feat– and reading this piece and other blogs in our science series is a good place to start. Purposefully building a high-quality selection process means that you carefully pick the hiring tools, plan out different stages of the process, and develop a standardized way to evaluate and rank candidates. Once this process is implemented, it ensures that every decision in the hiring funnel is supported by a strong rationale and data. Additionally, it is likely to bring you closer to above-average validity estimates.
You cannot make a good cake with bad ingredients, nor can you craft a high-quality hiring process with poor hiring tools. Using psychometrically sound hiring tools helps to ensure that the hiring decisions you make are based on trustworthy data.
At TestGorilla, we invest a lot of time and resources to develop high-quality assessments. By applying rigorous science every step of the way, we are able to deliver high-quality, reliable, and valid hiring tests. You can read more about our process here.
Every organization and role is unique, and sometimes, you may need to develop your own custom test internally to suit your needs. When building a custom hiring tool, validity and reliability are no less important than when using an existing one. Following best practices when developing custom tools, like the ones for conducting a structured interview, can help you land in the range of above-average validity coefficients.
There are many ways to evaluate how good someone is in their job. By proactively thinking about what the job entails and how performance will be evaluated, you can ensure that you’re using the most relevant hiring tools in the selection process. Science is clear that the more related a hiring tool is to the actual job a candidate is hired to do, the better it is in differentiating between good and bad candidates. Using a job-relevant hiring tool can mean a difference between missing a good candidate and hiring them. Moreover, using job-relevant hiring tools is necessary to demonstrate the legal defensibility of your hiring process.
Roles with different tasks and responsibilities can benefit from different hiring tools. Take some time to assess what everyday work will look like for the new hire. It also pays to understand what the first few days, weeks, and months will look like for these individuals. You can use job task analysis to get a comprehensive picture of the role you’re hiring for. Once you understand the requirements of the role, you can combine and select different hiring tools to maximize the validity of the process and avoid an overlap between them.
When opening a role where you expect candidates with a lot of experience, knowledge, and relevant skills to apply, consider including role-specific job knowledge and skills tests. When opening a role where you expect applicants mainly to be fresh graduates, starters, or people making a career shift, consider assessing more universal skills such as problem solving, time management, or communication. With a test library comprising nearly 400 tests, we have you covered. Our upcoming behavioral competency framework can help you identify and assess universal competencies relevant to many different roles.
Lastly, adding a personality assessment like the OCEAN test or contextualized measures such as the Culture add or Motivation tests can help you understand the unique way in which candidates can contribute to your organization.
The science of hiring has evolved to provide a nuanced understanding of the effectiveness of different hiring tools. Organizational context, hiring goals, and ways in which performance is measured are all important factors that influence the validity and utility of a hiring tool. By carefully selecting hiring tools to match the specific needs of your organization and the roles you are hiring for, you can ensure you’re making sound hiring decisions.
Sackett, P. R., Zhang, C., Berry, C. M., & Lievens, F. (2022). Revisiting meta-analytic estimates of validity in personnel selection: Addressing systematic overcorrection for restriction of range. Journal of Applied Psychology, 107(11), 2040.
Sackett, P. R., Zhang, C., Berry, C. M., & Lievens, F. (2023). Revisiting the design of selection systems in light of new findings regarding the validity of widely used predictors. Industrial and Organizational Psychology, 1-18.
Wilmot, M. P., & Ones, D. S. (2019). A century of research on conscientiousness at work. Proceedings of the National Academy of Sciences, 116(46), 23004-23010.
Why not try TestGorilla for free, and see what happens when you put skills first.
Biweekly updates. No spam. Unsubscribe any time.
Our screening tests identify the best candidates and make your hiring decisions faster, easier, and bias-free.
This handbook provides actionable insights, use cases, data, and tools to help you implement skills-based hiring for optimal success
A comprehensive guide packed with detailed strategies, timelines, and best practices — to help you build a seamless onboarding plan.
A comprehensive guide with in-depth comparisons, key features, and pricing details to help you choose the best talent assessment platform.
This in-depth guide includes tools, metrics, and a step-by-step plan for tracking and boosting your recruitment ROI.
A step-by-step blueprint that will help you maximize the benefits of skills-based hiring from faster time-to-hire to improved employee retention.
With our onboarding email templates, you'll reduce first-day jitters, boost confidence, and create a seamless experience for your new hires.
Get all the essentials of HR in one place! This cheat sheet covers KPIs, roles, talent acquisition, compliance, performance management, and more to boost your HR expertise.
Onboarding employees can be a challenge. This checklist provides detailed best practices broken down by days, weeks, and months after joining.
Track all the critical calculations that contribute to your recruitment process and find out how to optimize them with this cheat sheet.