Assessment Design

How to Design Situational Judgment Tests: A Step-by-Step Process

ClarityHire Team(Editorial)8 min read

Why design your own SJT

Off-the-shelf situational judgment tests are generic. They measure judgment in a vacuum, not judgment in your context. A good SJT is rooted in the specific dilemmas your role actually faces. The candidates who excel at your company's version of "handling ambiguity" may not be the same people who ace a generic SJT.

Building your own takes work upfront. It pays back immediately: you surface the dimensions of judgment that matter to your business, and your assessment becomes a training tool for your own hiring managers.

This guide walks through the process of designing a situational judgment test from scratch.

Step 1: Job analysis and critical incidents

Start by asking your top performers: What is the hardest judgment call you made in the last six months? This is grounded research work, not the abstract job descriptions you might use in traditional hiring rubric templates.

Document 8–12 real dilemmas from your organization:

  • A customer escalation with conflicting stakeholder needs
  • A priority conflict between two legitimate asks
  • An ethical or compliance question with no clear rule
  • A resource constraint forcing a trade-off
  • Ambiguity that required a decision despite incomplete information
  • A situation where "following the process" conflicted with customer value
  • Delegation or delegation failure
  • A mistake someone discovered and had to own

Record these as narratives. Who was involved? What was the constraint? What made it hard? How did your best performer think through it?

These become your dilemma scenarios. You are borrowing from real experience, not imagining hypotheticals.

Step 2: Develop response options

For each dilemma, brainstorm 5–6 plausible response options.

The trick: every option should be defensible to someone. You are not looking for one right answer. You are looking for a spectrum from "most effective for our context" to "less effective." This is what distinguishes SJTs from personality tests—there is no "type," only judgment.

Use these lenses to generate options:

  • Risk tolerance: Escalate immediately vs. investigate first
  • Speed vs. quality: Ship fast vs. ensure polish
  • Data vs. intuition: Gather information vs. decide based on experience
  • Process vs. pragmatism: Follow the documented path vs. bend rules for the outcome
  • Individual vs. team: Own the problem vs. involve stakeholders
  • Short-term vs. long-term: Solve today vs. invest in systemic fix

For the software engineer incident response example, options ranged from "page the manager immediately" to "investigate in isolation first." Both are defensible depending on severity and context.

Document why each option is included. What judgment pattern does it reveal?

Step 3: Rank options with your top performers

Do not rank the options yourself. Ask your top three performers, independently, to rank each option from most to least effective in your context.

Collect rankings. Look for consensus:

  • Strong consensus (all three rank an option first, etc.): This reveals what your organization values.
  • Disagreement: This is interesting. It may mean the scenario is ambiguous (good—it is), or it may reveal subcultures in your organization (also interesting).

Example:

  • Performer A: B > D > A > C > E
  • Performer B: B > A > D > C > E
  • Performer C: D > B > A > C > E

Strong consensus on B in the top two. Clear consensus against E. Some debate between A and D.

This is your "master ranking." It reflects your organization's definition of good judgment for that scenario.

Step 4: Draft the assessment instructions

Candidates will not understand the context the way you do. Write a one-paragraph setup that includes:

  • The role (e.g., "You are a senior engineer on-call")
  • The immediate constraint (e.g., "It is 2 AM and a production alert fires")
  • Why the decision is hard (e.g., "The issue is impacting a small subset of users but you cannot immediately identify the cause")
  • What the candidate should do (e.g., "Rank these response options from most to least effective")

Keep setup to 2–3 sentences. Too much context and candidates overthink. Too little and they are lost.

Step 5: Pilot with recent hires

Before using the assessment in hiring, give it to 3–5 recent hires or internal transfers. They understand your context but are close enough to the role transition to think hard about the questions.

Score their responses against your master ranking:

  • How consistent are their rankings with your top performers?
  • Did they misunderstand any scenario?
  • Were any options confusing?
  • Do the questions actually reveal different judgment patterns or does everyone rank the same?

Revise questions that fail this test. A question where everyone picks the same option (or splits randomly) is not measuring anything.

Step 6: Set scoring criteria

Decide in advance how you will use the scores:

  • Most-effective (MD) scoring: Candidates earn points only if their top choice matches the expert ranking. Simple, binary, no subjectivity.
  • Distance scoring: Candidates earn points based on how close their ranking is to the expert ranking. More granular, rewards partial alignment.
  • Pattern matching: Score based on the shape of their choices (e.g., "preference for escalation" or "bias toward action"), not exact ranking.

Most-effective scoring is the most defensible and easiest to apply. "Did the candidate's first choice match the expert consensus?" Yes or no. Clear.

For distance scoring, compute the sum of absolute differences between candidate ranking and expert ranking. Lower is better.

Decide the threshold: is 70% of ranked questions correct enough to pass? Or 85%? This should reflect your role requirements—a safety-critical role might demand higher consistency than a creative role.

Step 7: Validate your assessment

After 10–15 hires, compare:

  • Performance correlation: Do candidates who scored higher on the SJT perform better in the role (measured by manager rating, performance review, or tenure)?
  • Adverse impact: Are any demographic groups scoring significantly lower? If so, review scenarios for bias or cultural specificity.
  • Predictiveness: Can the SJT differentiate between strong performers and weak performers?

If the assessment does not correlate with performance, it is not measuring what matters. Iterate.

Use hiring rubric discipline: document your scoring criteria and review a few assessments together as a team before using the assessment at scale. Calibration prevents drift. You can also layer this with structured interview questions to probe the examples behind their SJT choices.

Common pitfalls to avoid

Scenario is too simple. "You see a bug in production. Do you: A) Report it, B) Fix it." Everyone picks the same answer. Make the scenario harder by adding a real constraint.

Response options are not equidistant in quality. If one option is obviously right, the question tests nothing. Ensure all options are defensible by at least some context.

Scenario is too specific to one person. "Your boss does X, which reminds you of your previous company's culture. How do you respond?" This is measuring personality, not judgment.

Ranking methodology is not clear. Candidates need to understand that "most effective" means "best for your organization in this context," not "most ethical" or "most safe." Tell them explicitly.

You ask too many questions. A 10-question SJT takes 15–20 minutes and provides enough signal. A 50-question SJT takes an hour and fatigues candidates without adding precision.

Making the assessment fair

Situational judgment tests can introduce bias if scenarios reflect cultural assumptions or require knowledge of specific industries or regions.

Review scenarios for:

  • Language accessibility: No idioms, no cultural references, no jargon that requires insider knowledge
  • Equity in stakes: Do all candidates have experience with the dilemmas described, or do some candidates have obvious advantage from privilege?
  • Representation: Do scenarios reflect diverse teams and roles, not just your current composition?

Do not shy away from complex scenarios. Just ensure the complexity is in the judgment call itself, not in cultural translation.

Moving to implementation

Once your SJT is designed and piloted, implement it in your hiring workflow. Common placement:

  • After resume screen. Early assessment to filter for judgment fit before behavioral or technical rounds.
  • Before interview. Complement the interview with objective, ranked assessment.
  • During interview. Use one scenario as a group discussion exercise, not a scored assessment.

Most hiring teams use SJT after resume screen and before interviews, as an efficient lever for screening judgment without consuming interview time.

When interpreting situational judgment test results, pair the score with interview context. A strong SJT result + weak interview signals something is off (candidate can theorize but not execute). A weak SJT result + strong interview signals the assessment may not be measuring what matters in your interview.

Research shows that well-designed SJTs are valid and fair when built through this disciplined process, with lower adverse impact than many alternatives.

ClarityHire's assessment platform supports custom SJT design with built-in scoring, reporting, and candidate experience tools. You can also design in a spreadsheet and grade manually if you prefer—the discipline of design matters more than the tool.

situational-judgmentSJTassessment designjob analysis

Related Articles