Hiring Research

Criteria Corp vs SHL: Validity Research and What the Numbers Mean

ClarityHire Team(Editorial)6 min read

What "validated" actually means

When a cognitive test vendor says "validated," they mean one (or several) of:

  1. Construct validity. The test measures the thing it claims to measure (general mental ability, numerical reasoning, etc.).
  2. Criterion validity. Scores on the test correlate with on-the-job performance measures, usually supervisor ratings or productivity data.
  3. Reliability. Repeat administrations produce similar scores; alternate forms are equivalent.
  4. Fairness. Score distributions and predictive accuracy do not differ unfairly across protected demographic groups.

Both Criteria Corp and SHL publish technical manuals covering all four. The manuals are useful but easy to misread. This post walks through what each vendor's research actually shows and how to evaluate vendor validity claims more generally — see also our broader predictive validity research summary.

Criteria Corp's CCAT

The flagship claim. CCAT correlates with job performance at validity coefficients in the 0.40–0.65 range across diverse job families, in line with general cognitive ability tests from the broader meta-analytic literature.

What the research base looks like. Criteria publishes a technical manual covering several hundred validation studies, including local validation studies done at customer organizations. The methodology is conventional: collect CCAT scores from a sample of incumbents, collect supervisor performance ratings, compute correlation, correct for restriction of range and measurement error.

Adverse impact data. CCAT shows the standard cognitive ability test demographic pattern — modest mean score differences across race/ethnicity groups in the U.S., consistent with the broader literature on cognitive testing. Criteria publishes these differences openly in the technical manual. The 4/5ths rule applies; hiring teams using CCAT as a strict cutoff should run their own adverse impact analysis on their specific selection ratio.

Strengths. Long publication history. Adequate sample sizes in most validation studies. Transparent methodology. The validity numbers are credible at the meta-analytic level.

Weaknesses to read for. Many published validation studies use the "incumbent" rather than "predictive" design — they correlate current employee test scores with current performance, rather than testing applicants and following them over time. The incumbent design generally produces higher coefficients than the predictive design would. Read the methodology section of any specific study before quoting its number.

SHL Verify Interactive G+

The flagship claim. Verify Interactive G+ measures general mental ability with adaptive precision and produces validity coefficients in the same 0.50–0.65 band against job performance, with the additional claim that adaptive testing reduces measurement error vs fixed-form tests.

What the research base looks like. SHL has an extensive global validation database — hundreds of studies, many large samples, deployed across dozens of countries. The technical manual is dense and covers construct validity (factor structure of the G+ score), criterion validity (large meta-analytic samples), and cross-cultural fairness.

Adverse impact data. SHL also publishes group difference data. Like CCAT, Verify shows the standard cognitive test pattern. SHL's localization work — items adjusted and re-normed for different countries — reduces some sources of cross-cultural unfairness, but the underlying ability score differences remain consistent with the broader research.

Strengths. Larger and more diverse validation database than Criteria's, especially outside the U.S. Adaptive testing methodology is psychometrically more efficient. Strong IRT-based item analysis.

Weaknesses to read for. SHL's largest validation studies are concentrated in specific industries (financial services, consulting, oil & gas). Generalization to your specific role family deserves a local validation study, which SHL will sell you. Some of the published coefficients come from concurrent rather than predictive designs, same caveat as Criteria.

How the two compare on validity, head-to-head

Both vendors land in roughly the same validity band — 0.50 corrected, somewhere in the 0.30s uncorrected — which is consistent with the broader research on cognitive ability tests. There is no published direct head-to-head study showing one is meaningfully more predictive than the other for general use.

Where they actually differ:

  • Test administration error. SHL Verify's adaptive design reduces test-level measurement error, especially at the high end of ability. CCAT's fixed form is more sensitive to guessing and time pressure effects.
  • Range restriction in practice. If you only test candidates who already passed a resume screen, both tests will show lower observed coefficients than the meta-analytic numbers because the candidate pool is range-restricted. This is a property of any test, not a vendor difference.
  • Cross-cultural validity. SHL has more rigorous localization for non-U.S. hiring. For U.S.-only hiring, the gap is smaller.

Validity is necessary but not sufficient

A high validity coefficient tells you the test predicts performance. It does not tell you:

  • Whether the test is the highest-leverage assessment for your specific role
  • Whether the cost is worth the marginal improvement over a simpler alternative
  • Whether hiring managers will actually use the score, or override it on gut feel
  • Whether the test creates candidate experience problems that cost you good candidates upstream

The research on hiring methods consistently shows that combining cognitive ability with one other valid method (work sample, structured interview) produces meaningfully higher combined validity than cognitive alone. Validity coefficients add roughly to the combined R-squared up to the limits of the underlying constructs.

In practice, this means: do not pick CCAT or SHL Verify as your single hiring filter. Pick one of them as the cognitive component, then combine with a structured behavioral interview and a work sample. See our highest-validity hiring loop writeup.

Where ClarityHire fits

ClarityHire does not ship a cognitive ability test. We focus on the work-sample side: coding assessments, live coding, structured behavioral scorecards, and integrity verification.

The pairing of CCAT or SHL Verify (cognitive) with ClarityHire (work sample + structured interview + integrity) is the configuration the research base most strongly supports for knowledge-worker hiring. Validity coefficients of the combined loop reach 0.60+ in the meta-analytic literature, materially higher than any single method alone.

How to evaluate any vendor's validity claims

Whether you are looking at Criteria, SHL, or any other vendor, ask:

  1. What sample is the coefficient from? Concurrent vs predictive design matters. Pre-employment data is the gold standard.
  2. What correction methods were applied? Corrected vs uncorrected coefficients can differ by 0.10–0.20. Both are legitimate; just know which one you're reading.
  3. What is the adverse impact ratio in your context? Vendor-published numbers are aggregate. Run your own analysis on your selection ratio.
  4. What is the local validation story? General validity is a strong default, but a custom local study is the only thing that proves the test works in your setting.

Both Criteria and SHL will support a customer-specific validation study. SHL's is more elaborate (and more expensive); Criteria's is lighter weight. Either is worth doing before scaling either tool across an organization.

See also: our feature comparison and our pricing and ROI breakdown.

criteria corpshlpredictive validitycognitive assessmentspsychometrics

Related Articles