Glossary of Testing Terminology

AI Proctored (Secure Browser)

Exam delivery through a secure browser with AI-based remote proctoring. The secure browser prevents access to other applications, while AI algorithms monitor the test-taker's environment for suspicious behavior.

Anchor Items

A set of Items that are shared across Forms to enable Equating.

Angoff/Modified Angoff

A method of Standard Setting. In the Angoff method, experts review each item on an exam form and estimate the probability of a minimally qualified candidate answering each item correctly. This data is analyzed, and where differences exist, discussion occurs. In Modified Angoff, empirical data is used as a “reality check” to better align SME ratings to actual test taker data. Ultimately, final ratings are averaged to calculate the cut score for each level.

API

A system integration method where tests are delivered via an API (Application Programming Interface), allowing secure, automated test data exchange between systems. This enables custom platforms to connect directly with the test delivery service.

Assessment

See Form

Beta Test [ Beta ]

An instance of an Exam composed of Pretest Items to be administered to Candidates for the purpose of gathering Result data which, in turn, will be analyzed to determine the performance of those items.

Beta Testing

The process of administering a Beta Test to Candidates.

Beuk Compromise

An add-on method of adjusting ratings in Angoff method Standard Settings utilizing additional SME estimates combined with delivery data.

Blueprint/Exam Blueprint [ BP ]

A collection of a content areas organized - often hierarchically - into Topics which describe the knowledge, skills, and abilities to be assessed by an Exam. Usually contains weighting by topic to indicates the distribution of Items needed to sample the blueprint. This informs requirements for Item Authoring.

The initial configuration of the blueprint along with its initial topics and their weights are generally derived from a Job Task Analysis.

Booking/Exam Booking

The flow that allows a Candidate to schedule, and where required, pay for an Exam. Booking is often subject to a check for Eligibility.

Candidate/Test Taker

An individual who may take, or has taken, an Exam.

Classical Test Theory (CTT)

A traditional psychometric approach that assumes a person's observed score is the sum of their true score and random error. CTT evaluates item quality using overall test performance.

Competency

A combination of knowledge, skills and/or abilities [ KSA ] that defines an industry, organizationally relevant capability, or similar. Competency is usually expressed in the language of an organization or profession.

Competency Testing

Testing used to make a determination of Minimum Competency, that is, is Candidate minimally Competent to practice in a specific field, perform a specific role, or operate specific equipment or technology, etc. These Exams often lead to a Credential.

This testing only makes a determination of minimum competency - pass or fail - and generally should make no claims or determinations beyond that.

Constraint

A parameter used in Test Assembly to limit and control the Items which are eligible for a given Form of an Exam.

Corrected item-Total Correlation (CITC)

The correlation between an item and the score based on the rest of items on a form, correcting for inflation of the ITC. See Item Discrimination.

Credential [ Certification/License/Badge ]

An indicator that a person is Competent in a given area of interest. Credentials are used to gate hiring or to enforce legal right to practice or operate. Qualification examples include ...

Performing a job or role
Operating equipment or technology
Licensing for practice in a specific field

Dichotomous

In response processing for Item interactions, the notion that the Outcome will evaluate to either correct or incorrect. In most cases this also means the item outcome value for score evaluates to either 1 or 0 points.

See Polytomous for more information.

Domain

In the Certiverse system, domains are used to represent the dimensions, that together, make up the knowledge, skills, and/or abilities [ KSAs ] needed to accomplish a Task. Domains serve as a bridge to an Exam Blueprint, allowing us to automate the initial configuration of Topics and their relative weights on the blueprint.

Eligibility

The notion that some requirement or prerequisite must be met before a Candidate is allowed to Book an Exam.

Enemy Items

At its simplest, Items that should not appear on the same Form of an Exam. Items may be classified as enemies for any number of reasons, but common reasons include:

- One item gives away, or hints at, the answer to another item.
- Items are too similar
- Items measure the same or very similar concepts

Equating

The statistical process of determining comparable Outcomes on different Forms of an Exam. See Anchor Items.

Exam

A collection of 1 or more equivalent Forms. In the Certiverse system, an exam serves as the candidate facing name for Eligibility and Booking purposes as well as for centralizing elements shared across forms such as start/end screens and Score Reports.

Exam Delivery [ Delivery ]

The process of administering an instance of a Form to a Candidate. The output of exam delivery is a Result.

Form

Minimally, a specific set of Items making up a version of the Exam. When multiple forms of an exam exist, they should be functionally equivalent instruments, i.e. any form should be equally valid when administered to any candidate.

In the Certiverse system a form defines the set of items plus additional configuration data required to administer or Deliver the form. This includes delivery parameters indicating how items are to be selected and ordered as well as Scoring rules.

Formative Testing

Testing with the goal of providing feedback to reinforce/practice learning and/or to identify weaknesses for remediation. See Summative Testing for comparison.

Item

Generally, a discrete and self-contained test question. Items are typically composed of three elements:

Stimulus - the question or prompt. Designed to elicit a Response from the Candidate.
Interaction - the way the candidate indicates their answer. One example - and by far the most common - is a multiple-choice interaction. Interactions often provide various configurations. For example, a multiple-choice interaction may be True/False, single response, or multiple response.
Response Processing - the rules that determine how a response to an item is to be evaluated and the values to output. Outcomes processing is necessarily coupled to the interaction type.

The three elements described above are mediated by item type templates in the Certiverse platform.

Item Analysis

A process that evaluates test Items by examining statistics like difficulty and discrimination to determine how well they perform in distinguishing between high and low scorers.

Item Authoring/Item Writing

In general, the process of producing Items for an Exam. In Certiverse, a highly automated process that allows SMEs to write, review, and edit content for exams.

Item Difficulty

A measure of how challenging an Item is for test-takers, typically calculated as the proportion of examinees who answered it correctly (a higher proportion indicates easier items). Also called “p-value,” proportion correct, item mean, and item facility.

Item Discrimination

A statistic, such as the item-total correlation (ITC), that indicates how well an Item differentiates between high-performing and low-performing test-takers. Higher discrimination means the item is better at distinguishing between these groups.

Item Facility

This name for item difficulty acknowledges that numerical scale for item difficulty is backwards (values near 1 are easy, values closer to zero are harder).

Item Mean

See item difficulty.

Item p-value

This term is used for item difficulty (proportion correct) and in hypothesis testing is is the probability value assuming the null hypothesis. In item analysis, we should chose another make like proportion correct, difficulty, or facility; see item difficulty.

Item Response Theory (IRT)

A modern psychometric framework that models the probability of a correct answer based on an examinee's ability and the Item's properties (difficulty, discrimination, and guessing).

Item-Total Correlation (ITC)

See Item Discrimination.

Job Task Analysis [ JTA ]

A Job Task Analysis is a methodology used to establish the Tasks and Domains to be assessed by an Exam. Typically, a JTA includes the following steps ...

Task Elicitation – establish the tasks performed by competent practitioners
Survey – a broad review to provide data to support tasks and their validity
Analysis - judicious filtering and calculations to identify critical tasks and related domains

JTAs are often customized to the job or role. In the case of a job, the domains may be referred to as KSAs but the more generic term Competencies is also widely used. JTA results are the evidence used to author the Exam Blueprint.

Job/Role Title

The title of the job or role intended for this credential.

Job/Role Description

A description of the job or role that the exam is intended to address. This description should clarify to subject matter experts what JTA task statements need to be written, help provide a context for the blueprint topics and weights, and help guide items writers and reviewers. Typically, this description includes the main job duties, expected background and experience, education, etc. It is recommended that this is a description of the minimally-qualified candidate. Many programs include “should be able to...” and “should not be expected to...” qualifications.

Key [ Correct Answer ]

The - or one of the - Response Configurations that, when selected by a Candidate, will evaluate to a positive Outcome. The term is most applicable to interactions where response processing is simple and Dichotomous. For example, a multiple-choice interaction where the key is a single choice (like option "a") or a single collection of choices (like option "b" AND option "d").

The notion of key can break down or become unclear where an interaction has Polytomous response processing. For example, take the following a multiple-response item:

Select all of the US States from the list of choices:

Massachusetts

Missoula

Missouri

Mississippi

Montana

There are 4 correct options (indicated with italics). Assume each correct option is worth 1 point while the single incorrect option is worth -4 points with a lower bound set for the entire interaction at 0 points (to prevent negative scores). It is therefore possible to achieve any integer value between 0 and 4 points.

So, what is the "key"? Arguably each correct option is a key or maybe part of the key.

A more precise way to think about this is that there is not a singular key. Rather, each Choice is a Response Opportunity. The final response submitted is the candidate's response configuration, which in turn is evaluated using response processing to generate an outcome.

For simplicity, in a case like the above, the key may be thought of as the most correct response configuration ... the response configuration that yields the most points. Even this breaks down where there is more than one response configuration that yields maximum points.

Learning Outcome/Learning Outcome Statement

See Task/Task Statement

Learning Objective

See Task/Task Statement

Live [ Form/Item ]

Live simply refers to the notion that the Outcomes from the Item or Form may be used in determination of Competency. This typically means that the content has followed a rigorous process, is fit for purpose, and has the supporting documentation necessary.

Live Proctored (Secure Browser, SEB)

Exam delivered through a secure browser with a live proctor monitoring in real-time via webcam. The secure browser ensures that the test-taker can only access the test environment, and the proctor intervenes in case of violations.

Live Proctored (Without Secure Browser, SEB)

Exam delivered with a live proctor who monitors the test-taker remotely, providing support and guidance throughout the testing process. While the test-taker has the freedom to access their environment, the proctor is available to ensure a smooth and efficient exam experience

Minimally Qualified Candidate

A representation of the theoretical person who's just barely qualified - Competent - by having just enough knowledge or skill to enter the job or role.

Operational Form

A Form of an Exam that is used to make decisions about test taker Competency including calculating Outcomes such as Scores and Pass/Fail determination.

Option [ Response Option/Choice ]

A discrete Response Opportunity associated with a multiple-choice (or similar) Item.

Option, Correct

A discrete Response Opportunity associated with a multiple-choice (or similar) Item, selection of which results in either a positive Outcome OR contributes to a positive outcome. May also be called a Key or a component thereof.

Option, Incorrect

Also known as a distracter as it "distracts" from the correct choice or choices, a discrete Response Opportunity associated with an Item, selection of which results in a negative Outcome.

Organization

In the Certiverse system, represents a real-world entity and acts as a logical container for content owned or administered by that entity. The organization sponsoring the Exam.

Outcome/Score

A value generated or assigned by evaluating a candidate's Response to an Item or set of item responses against a response or outcomes processing rule. Outcome values may be numeric or text depending on the outcome type.

Outcomes are typically classified as item-level or aggregate. Aggregate outcomes are based on sets of items that have been grouped into elements on the form.

Examples of outcomes provided by Certiverse Exam Delivery include ...

Item-level
- score - points achieved
Aggregate
- rawScore - sum of item-level points achieved for aggregated set of items
- percent - calculated percentage correct value
- scaled - mapped value using rawScore as input
- passFail - mapped value using cut score value as input

Pass/Fail

A type of Standard designed to separate Candidates into two categories, either competent (Pass) or not competent (Fail). Common in Competency Testing (certification, licensure).

Polytomous

In response processing for Item interactions, the notion that the Outcome may evaluate to more than correct or incorrect ... often partially correct. This also means the item outcome value for score may evaluate to values other than 1 or 0 points. See example interaction and outcome in Key).

See Dichotomous for more information.

Practice Effects

The tendency of test takers to demonstrate improved performance when Retaking an exam. Typically most notable where test takers are allowed to retake the same Form.

Pretesting/Pretest Items [ Beta Items ]

Pretesting is the notion of administering Items for the purpose of gathering statistical data on those items. For new testing projects, this is typically done en masse via a Beta Test. However, it is often necessary to develop new items for introduction into an existing testing project. This is accomplished by administering pretest items on Live Forms.

As we do not yet know the performance characteristics of these items, their scores will not contribute to any aggregate Outcomes generated for the form. However, from the perspective of the Candidates, it is highly desirable that pretest items are indistinguishable from live items. We want them to apply the same effort in answering these as if they were live items. For this reason, it is best practice to obfuscate pretest items.

Proportion Correct (PC)

See item difficulty.

Response

The Response Configuration of the Item Interaction submitted by the Candidate.

Response Configuration

Any of the possible final states in which an Item might be configured to indicate a Response. For example, a 4-option multiple-choice allowing a single response has 4 response configurations possible: "A", "B", "C", & "D"

However, a 4-option multiple-choice which requires the selection of 2 choices has 6 possible response configurations: "AB" or "AC" or "AD" or "BC" or "BD" or "CD".

With complex items, the number of possible response configurations can grow rapidly!

Response Opportunity

Any opportunity presented to a Candidate to provide a discrete response to an Item Interaction. In the case of interactions such as multiple choice, these response opportunities take the form of predefined Options or choices.

In the case of an interaction such as fill-in-the blank or essay, the response opportunity takes the form of alpha numeric string input. More complicated interactions may provide for other response opportunities such as X Y coordinates on an image or intersectional data on a matching interaction or arbitrarily complex data for a simulation.

Result

A result is the accumulated data from a Candidate sitting a single Form. Data captured includes but is not limited to ...

Candidate identifying data
Timing data (overall, time per item)
Item-level and aggregate Outcomes

Retake/Retest

Retesting/Retaking is the concept of allowing test takers to attempt an Exam more than once. There are often policies or rules around retesting to ensure test takers do not see the same Form when retesting to increase Security and to minimize Practice Effects.

Sampling Error

The discrepancy between the characteristics of a sample and the population from which it was drawn, due to the random nature of sampling. This error decreases with larger sample sizes.

Score

A numeric Outcome.

Score Report/Score Reporting/Outcomes Reporting

The set of Outcomes reported to an individual Candidate for the Form administered to that candidate. Often provides additional information to help the candidate interpret their performance.

[Exam] Security

All of the measures taken to secure the integrity of a testing program.

Standard/Cut Score

The value, typically in “points achieved,” that defines the lower bound for achieving a distinct level of proficiency on an Exam. In Competency Testing (certification, licensure) there is often simply one standard or cut score, often referred to as the passing score.

Standard Setting

For an Exam, the process of defining the distinct levels of achievement or proficiency and the corresponding Cut Scores for those levels. In Competency Testing (certification, licensure) there is often simply one standard or cut score, often referred to as the passing standard.

Standard Setting Form

The specific collection of Items used to perform Standard Setting. The standard setting form need not be a “real” form, i.e. this form may never be intended as a version for delivery. Often it useful to place all available, valid Items onto the standard setting form, making it easy to derive the cut score for each form from item ratings. If a “real” form is used, the choice of form may be arbitrary.

Standard Setting Panel

The group of Subject Matter Experts (SMEs) contributing to a Standard Setting.

Standard Setting Report

A component of Standard Setting, this deliverable contains:

- A description of contributing to the standard setting including their credentials
- Data collected: SME review data, discussion notes, any other data generated, and all decisions
- The final Standard/Cut Score recommendation along with any and all additional considerations that resulted in modifications to that cut score
- purpose that is consumable by all stakeholders, often including the public
- A description of the methodology and the specifics of its implementation
- A listing of the individuals

Subject Matter Expert (SME)

An individual qualified to provide content and/or input on elements of a testing program. Usually someone that has either held the job and performed the Tasks in question, or who has supervised the position, but may include other bona fide experts.

Summative Testing

Testing that assesses the test takers current knowledge, skills, or abilities. These Exams are often used to make serious decisions about the Candidate.

Task/Task Statement

In the Certiverse system, an explicit description of a task required to be Competent in the area (job, role, curriculum) being assessed. Other terms for this include ...

Outcome, Learning Outcome, Learning Outcome Statement
Objective, Learning Objective

Test

See Form, Exam

Test Assembly/Form Assembly

A mechanism for arriving at a set of Items making up a Form. This always addresses the Live items to include but may also address Pretest item selection. Test assembly should honor the Exam Blueprint, assuring the proper distribution content to be assessed.

Test assembly may be accomplished in a number of ways, with item selection being fully manual, semi-automated, or even fully automated - often referred to as automated test assembly or ATA.

Test Center

Traditional exam delivery at a physical test center, where a proctor oversees the test in person. The environment is controlled to ensure exam integrity. (it can be with SEB or without SEB)

Topic/Subtopic

In the Certiverse system, an element on the Exam Blueprint representing a content area to assess and its relative weight. Topics may be nested hierarchically to indicate increasing specificity. Topics and their weights are used to drive both Item Authoring and Test Assembly.

Unproctored

Unproctored exam delivery without a live or AI proctor. With security, this can include measures like locked-down browsers or exam access restrictions.

Unproctored Secure Browser

Unproctored exam delivery with the use of a secure browser that limits the test-taker's ability to navigate away from the test interface, ensuring basic security measures without human or Ai