Science glossary

10min

TestGorilla staff

Science series materials are brought to you by TestGorilla’s team of assessment experts: A group of IO psychology, data science, psychometricians, and IP development specialists with a deep understanding of the science behind skills-based hiring.

At TestGorilla, we take a deeply scientific and rigorous approach to the development, monitoring, and improvement of the tests in our test library. There’s a bunch of assessment science-related jargon related to these processes, so we’ve put together this science glossary to help you better understand the science behind TestGorilla.

How to use this glossary

You can refer back to this glossary if you come across any terminology in our science blogs, or any of our other content, that you’re unsure of. The words in the glossary are listed alphabetically, and each word’s definition references related words that we suggest you check out too.

Key terms and definitions

Below is a list of key terms and definitions that can help you better understand the science behind TestGorila, as well as assessment science in general. Happy reading!

Adverse impact

A situation where an action or policy, while not intentionally discriminatory, results in the unequal treatment of a particular group of individuals. Also known as “disparate impact.”

Assessment

The combination of individual TestGorilla tests and custom questions crafted to evaluate candidates' suitability for a particular job role. For example, when recruiting a Customer Support Representative, a TestGorilla Assessment could contain our Communication, Customer Service, Zendesk CS, and Culture Add tests, plus custom video, essay, multiple-choice, or file upload questions. This is also known in the industry as an assessment battery.

Behavioral competencies

Observable and measurable behaviors and skills that are required to perform effectively in a job function or role. They typically focus on soft skills or universal skills that are transferable across different roles, organizations, and industries rather than specific technical expertise (“hard skills”).

Examples include embracing innovation, communicating effectively, and showing resilience. Check out our brief introduction to behavioral competencies for more info.

Benchmark

Benchmarks contextualize a candidate’s test scores by comparing them to the scores achieved by a broader group of test-takers. They are available for various education levels (ranging from Some high school to Master’s degree or higher), business functions (from Administrative to Software development), and seniority levels (Junior and Senior).

Bias

Unfair and systematic preferences or disadvantages that hinder the selection of the most qualified candidates. Bias occurs when decisions about who to hire or select for a job are influenced by factors unrelated to a candidate's qualifications, skills, or abilities, and can be based on various characteristics such as gender, race, age, ethnicity, disability, or other personal attributes.

Candidate assessment satisfaction

The overall measurement of how satisfied your candidates, both successful and unsuccessful, are with the assessment(s) included as part of your hiring process.

Classical test theory (CTT)

A psychometric paradigm that centers on understanding the intricate connection between the test scores we observe and the true scores that individuals would achieve if measurement were completely free from error.

See also: Item response theory (IRT) and test theory.

Concurrent validity

A subtype of criterion-related validity. It evaluates how well a test score or measurement correlates with a known criterion measure at roughly the same time. For example, if you give the Sales Management test to your existing sales managers and see how it aligns with their current/recent sales performance. If the test scores correlate highly with the performance of your current sales managers, it suggests the Sales Management test has good concurrent validity.

Construct

The underlying qualities, skills, knowledge, abilities, attitudes, or attributes that an assessment or test is designed to measure. Examples include intelligence, creativity, problem-solving ability, and language proficiency.

Construct validity

The extent to which a test accurately measures the construct it is intended to measure.

See also: Convergent validity, discriminant validity, and factorial validity.

Content validity

The extent to which a test covers a representative sample of the skills and knowledge content relevant to the topic in question. TestGorilla uses a standardized test development process and formal test structures to ensure the skills and knowledge necessary for a particular topic are well-represented by the test and the test items.

Convergent validity

A type of construct validity. Convergent validity examines whether constructs that are supposed to be theoretically related to each other are, in fact, related. For example, one would expect a substantial relationship to exist between English Communication skills test scores and English B1 test scores as both tests assess written and verbal communication. Construct validity is the opposite of discriminant validity.

Criterion-related validity

The degree to which test scores relate to outcome measures of interest, such as job performance ratings and employee turnover. For example, TestGorilla examines the relationship between test scores and hiring outcomes (e.g., ratings from hiring team, hired/not hired).

We also conduct criterion-related validation studies in collaboration with our customers. Check out our blog about preparing for a criterion-related validity study to learn more about them.

Cronbach’s alpha

A statistic used to measure the internal consistency or reliability of a set of items or questions in a test or survey. Cronbach's alpha assesses how closely related a group of items are as a measure of a single underlying skill or construct. TestGorilla tracks the reliability of all our tests using internal consistency and other reliability metrics and coefficients.

Discriminant validity

A type of construct validity. Discriminant validity examines whether tests that are not supposed to be theoretically related are, in fact, unrelated. For example, we wouldn’t expect scores on the Communication test to correlate with scores on the HTML5 test as these two tests measure substantially different skills. Discriminant validity is the opposite of convergent validity.

Face validity

The extent to which a test appears to measure what it is intended to measure, and whether, on the surface, the test feels relevant and appropriate for what it is supposed to be assessing. TestGorilla surveys candidates about the perceived validity and relevance of a test after they complete it.

Factorial validity

A type of construct validity. Factorial validity examines whether the underlying structure of the test matches the theorized structure of what the test is measuring.

Fairness

Ensuring that the assessment process does not result in unjust advantages or disadvantages for individuals based on factors unrelated to the skills, knowledge, or abilities required for the job.

Group differences

The extent to which groups, such as those based on age, gender, ethnicity and/or race, differ significantly from each other in terms of the scores obtained on a test. At TestGorilla we regularly monitor group differences on our tests where data is available.

Item Response Theory (IRT)

A psychometric paradigm which enables the development of more precise and efficient tests by modeling how likely a candidate is to respond correctly to a particular test question, taking into account their underlying proficiency in the trait or skill being assessed.

See also: Classical test theory (CTT) and test theory.

Predictive validity

A subtype of criterion-related validity. The extent to which the results of a particular assessment or test can accurately predict future behavior or outcomes (e.g., an individual's future job performance).

For instance, if a candidate scores highly on TestGorilla’s Sales Management test, you might predict that they will perform well in a sales manager role at your organization. If high scores on the Sales Management test correlate with being a top sales manager eight months after hiring, a test can be said to have good predictive validity.

Psychometrics

The field of study and practice that involves the measurement of psychological attributes, characteristics, and abilities using standardized methods, tools, and techniques. It encompasses the development, administration, and analysis of tests and assessments designed to quantify and evaluate various psychological constructs, such as intelligence, personality, skills, knowledge, attitudes, and aptitudes.

Reliability

The extent to which test scores are stable, consistent, and free from measurement error. Reliability coefficients between .6 and .69 are typically considered reasonable, values between .7 and .79 are considered acceptable, values between .8 and .89 are considered good, and values above .9 are considered great.

Cronbach’s alpha, a measure of internal consistency, is one way that TestGorilla currently measures reliability. Reliability is impacted by a number of factors, including the breadth of content assessed, the number of items in a test, population heterogeneity, and range restriction.

Test

A test is a systematic procedure used to evaluate a person's performance, attributes, or capabilities. It provides specific questions, tasks, or stimuli to which individuals must respond, allowing the measurement of specific characteristics such as knowledge, skills, abilities, aptitudes, or personality traits. Individual TestGorilla tests in our test library are considered tests, for example the Communication test or the Customer Service test.

Test theory

A branch of psychometrics that focuses on the development, analysis, and interpretation of assessments and tests used to measure psychological constructs such as knowledge, abilities, skills, personality traits, and attitudes. Test theory provides a systematic framework for understanding how tests are constructed, how they function, and how to assess their reliability and validity. The two main psychometric paradigms are classical test theory (CTT) and item response theory (IRT).

See also: Classical test theory (CTT) and item response theory (IRT).

Unconscious bias

Unintentional and automatic biases that individuals hold, often at a subconscious level, which can influence their decisions during the recruitment and selection process. These biases are typically based on factors such as race, gender, age, physical appearance, and other characteristics that are unrelated to a candidate's qualifications, skills, or potential for success in a job.

An unconscious bias may also be an affinity bias, which occurs when people show a preference or bias towards candidates who are similar to them.

Validity

The extent to which accurate inferences or interpretations can be drawn from test scores. There are several types of validity detailed in this glossary.

See also: Concurrent validity; construct validity, content validity, convergent validity, criterion-related validity, discriminant validity, face validity, factorial validity, and predictive validity.

Validity coefficient

A statistic (typically ranging from -1 to 1) that represents the strength and direction of the relationship between a test score (or measurement) and a criterion measure (e.g., overall job performance rating). A higher absolute value indicates a stronger relationship, while the sign (positive or negative) indicates the direction of the relationship.