An introduction to: Benchmarking

10min

Lindie van der Westhuizen, Matthew Borneman

When using any hiring tool—be it internally developed structured interviews or TestGorilla’s pre-employment tests—one of the most significant challenges organizations face is figuring out how scores obtained on these hiring tools ultimately translate into job success. For example, we often get the question, “What scores are associated with high performance?” or “How do I know what’s a ‘good’ score?”

In this blog, we’ll explore some strategies and techniques that will help you interpret test scores and use them to drive your hiring decisions. We’ll also discuss common misconceptions and bad practices regarding scoring benchmarks and introduce best practices to counter these.

Test scores in the hiring context: Some key principles

Pre-employment test scores ≠ Academic test scores

It's crucial to understand that most tests and assessments used in the hiring context are fundamentally different from academic exams. While school and university tests aim to gauge how much knowledge or skill has been acquired, hiring assessments are designed to help employers distinguish between candidates who are likely to excel in a job and those who may not.

This distinction underscores the importance of using test scores as a tool for comparison against others or some type of standard rather than a definitive measure of a candidate's abilities.

Test scores mean nothing in isolation

Test scores in isolation are akin to numbers floating in a vacuum—they hold little meaning without a benchmark or comparative group to anchor them. This is why using scoring benchmarks to interpret test scores is essential—they provide the context needed to interpret individual scores as low, high, or somewhere in between.

TestGorilla’s scoring benchmarks are presented as percentile rank scores that benchmark each candidate against different norm groups. A norm group is a collection of candidates representing the group being tested. In the case of TestGorilla, the groups are sorted by education level, seniority level, and job function. By default, the selected norm group is All candidates, meaning all other eligible candidates who have taken the same test. You can select a norm group that is more specific to your hiring needs.

A percentile rank score indicates the percentage of candidates in the selected norm group whose test score is less than or equal to the raw score of the candidate in question. So, if a candidate obtains a percentile rank score of 75, it means they did as well as or better than 75% of the other candidates in the norm group for a specific test.

Percentile rank scores have two big advantages. First, it normalizes for differences in the difficulty level of a test. In this way, the scores of different tests in an assessment become comparable.

Secondly, percentile rank scores give you great insight into test performance, even if you have only one candidate. While it's helpful to have the scores from many candidates (it increases the odds that you have at least a few very good ones), you can interpret an individual candidate's performance using a percentile rank score.

What “good” looks like is different for each job and/or organization

Understanding what constitutes a "good" score on a pre-employment test requires a nuanced approach, as the requirements for job success vary significantly from one role to another. For example, a score that indicates strong Working with Data skills might be crucial for a data engineering position but less critical for a customer service role in the same organization.

Moreover, the requirements of a specific role (e.g., product marketing) might even differ substantially from one organization to the next. For example, a Working with Data score that's deemed exceptional for a product marketer with a team of analysts might be considered merely satisfactory at an organization where they would be responsible for the analytics themselves.

Furthermore, organizational culture plays an important role in defining what "good" looks like. For example, a candidate who is a perfect fit for a product manager in a fast-paced, innovation-driven company might not thrive in a more structured, process-oriented environment.

This means that what is considered an “acceptable” or “good” score will vary considerably from role to role and organization to organization. Recognizing and embracing this complexity is key to leveraging pre-employment assessments effectively, ensuring that the selection process not only identifies capable candidates but also aligns with the broader objectives and culture of the organization.

Common benchmarking pitfalls

Organizations that conduct their own internal benchmarking often fall victim to a couple of common pitfalls.

Pitfall 1: The anecdotal data trap

While anecdotal data can sometimes ignite the spark for innovative ideas or strategies, it can represent a significant pitfall when setting benchmarks and cut-off scores. When organizations step into the anecdotal data trap, they allow their decisions to be overly swayed by individual experiences or isolated feedback.

For instance, if one hiring manager declares a test too easy after their own attempt, and this singular feedback shapes the entire hiring strategy, you're likely overlooking a wealth of objective data. Such an approach can skew your benchmarks away from what's genuinely predictive of job success, leading to a less effective selection process.

Pitfall 2: The top performer bias

Another common oversight is setting benchmarks based exclusively on the performance of top performers. It's crucial to assess the range of performance levels within your workforce, including employees who meet, exceed, or fall short of expectations.

This holistic approach ensures your benchmarks reflect the full spectrum of relevant skills and behaviors that drive success in your organization.

Pitfall 3: The small sample size dilemma

For internal benchmarking to be meaningful, it's critical to use a sufficiently large sample size. Small organizations, in particular, may find it challenging to gather meaningful data if only a few employees hold a similar position.

Benchmarking in a company where, for example, there are only four customer service agents might not yield representative insights. Leveraging TestGorilla’s scoring benchmarks in these scenarios can offer a more reliable benchmarking foundation.

Pitfall 4: The unmotivated benchmark trap

Relying on unmotivated, distracted job incumbents who do not recognize the importance of setting internal benchmarks correctly can significantly skew your understanding of what constitutes a good performance.

When employees who lack engagement or motivation participate in benchmarking assessments, their scores might reflect their lack of interest rather than their actual capabilities or the demands of the job. This discrepancy can lead to benchmarks that are artificially low, setting an underwhelming benchmark for new hires. Educating participating employees on the importance of the assessments and how their participation contributes to the organization's success is important.

Ensuring that employees are motivated and understand the value of their input can lead to more accurate and meaningful benchmark data.

The best insights on HR and recruitment, delivered to your inbox.

Biweekly updates. No spam. Unsubscribe any time.

Best practices when using benchmarks

To avoid these common pitfalls, organizations can adopt the following best practices:

Best Practice 1: Use TestGorilla’s percentile scoring benchmarks

TestGorilla’s percentile scoring benchmarks offer a reliable and standardized method to evaluate candidates. These benchmarks are derived from extensive data analysis, reflecting a wide range of performance across various industries and roles. By comparing candidate scores against these percentiles, organizations can more accurately determine where a candidate stands in relation to a broad talent pool.

This approach streamlines the selection process and helps ensure that decisions are based on robust, comparative data rather than subjective impressions. Our help center provides a comprehensive guide on how to use our scoring benchmarks.

Best Practice 2: Conduct a criterion-related validation study

A criterion-related validation study aims to confirm and quantify the relationship between test results and job performance.

This rigorous approach involves analyzing how well the tests or assessments predict the specific outcomes they are intended to measure within your organization. The goal is to ensure that the tests used genuinely indicate future job performance, thereby making them a solid basis for setting benchmarks and cut-off scores. When executed correctly, such studies can significantly enhance the legal defensibility and effectiveness of your hiring process.

Best Practice 3: Embrace a data-driven approach

To mitigate the "anecdotal data trap," prioritize a data-driven strategy over individual opinions. This will not only increase the quality of your hiring decisions but also make them legally defensible.

Collect and analyze data from a wide range of sources, including assessments from current employees, job performance metrics, and TestGorilla’s scoring benchmarks. This approach ensures that your benchmarks are grounded in reality and reflect the skills and attributes that truly predict job success.

Best Practice 4: Evaluate across the performance spectrum

If you do conduct internal benchmarking, ensure you assess and incorporate data from employees across all performance levels. This includes those who excel, meet expectations, and those who may underperform. By understanding the wide array of skills and qualities present in your workforce, you can develop more nuanced and inclusive benchmarks that capture the diversity of successful performance.

It is important to remember that testing existing employees might yield surprising results. For instance, after evaluating employees across the spectrum of performance, including high achievers, average performers, and those falling short, you might discover that a few of your star employees didn’t score as expected on the test or assessment. Don’t panic; it's not unusual for even the most successful employees not to excel in every test, which makes sense given that no assessment can perfectly predict job performance. Often, such instances are exceptions rather than the rule within the broader dataset. Assessing the effectiveness of your selection tools requires a comprehensive review of the entire dataset to understand how accurately the test reflects job performance across all participants.

Conclusion

Navigating the intricacies of pre-employment testing and benchmarking is no small feat. Yet, understanding and applying the right strategies can significantly enhance your hiring process, ensuring hiring the best talent for your organization. By steering clear of common pitfalls and embracing best practices, you can use TestGorilla's tools and insights to make informed decisions that align with your organizational goals and values.

Visit our Help Center to learn more about leveraging percentile scoring benchmarks, or check out our Science series blog on setting cut-off scores. Let TestGorilla be your partner in making every hire a step towards a brighter future for your company.

References

Bartram, D., & van de Vijver, F. J. R. (2016). Norming. In The ITC International Handbook of Testing and Assessment (pp. 438–448). Oxford University Press.

Hogan, T. P. (2018). Psychological Testing: A Practical Introduction (4th ed., pp. 47-86). Wiley.

Steiner, D. D., & Schmitt, N. (2016). Test Use in Work Settings. In The ITC International Handbook of Testing and Assessment (pp. 203–216). Oxford University Press.