Cambridge Cognition Examination (CAMCOG)

Evidence Reviewed as of before: 18-03-2009

Editor(s): Lisa Zeltzer, MSc OT; Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Purpose

The Cambridge Cognition Examination (CAMCOG) is the cognitive and self-contained part of the Cambridge Examination for Mental Disorders of the Elderly (CAMDEX). The CAMCOG is a standardized instrument used to measure the extent of dementia, and to assess the level of cognitive impairment. The measure assesses orientation, language, memory, praxis, attention, abstract thinking, perception and calculation (Roth, Tym, Mountjoy, Huppert, Hendrie, Verma, et al., 1986).

In-Depth Review

Purpose of the measure

Available versions

The CAMCOG was developed in 1986 by Roth, Tym, Mountjov, Huppert, Hendrie, Verma and Godddard. In 1999, Roth, Huppert, Mountjoy and Tym reviewed it and then published the CAMCOG-R. In 2000, de Koning, Dippel, van Kooten and Koudstall shortened the 67 items of the CAMCOG to 25 items, known as the Rotterdam CAMCOG (R-CAMCOG).

Features of the measure

Items:

The CAMCOG consists of 67 items, including the 19 items from the Mini Mental State Examination (MMSE) (Folstein, Folstein, & McHugh, 1975). It is divided into 8 subscales: orientation, language (comprehension and expression), memory (remote, recent and learning), attention, praxis, calculation, abstraction and perception (de Koning, van Kooten, Dippel, van Harskramp, Grobbee, Kluft, et al. 1998).

The orientation subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
is comprised of 10 items taken from the MMSE. In the language subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
, comprehension is assessed through nonverbal and verbal responses to spoken and written questions, and expression is assessed through tests of naming, repetition, fluency and definitions. The memory subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
assesses remote memory (famous events and people), recent memory (news items, prime minister, etc.), and learning (the recall and recognition of non-verbal and pictorial information learned incidentally as well as intentionally). Attention is assessed by serial sevens and counting backwards from 20. Praxis is assessed by copying, drawing, and writing as well as carrying out instructions. In the calculation subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
, the client is asked to perform an addition and a subtraction question involving money. For the abstraction subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
, the client is asked about similarities between an apple and a banana, a shirt and a dress, a chair and a table, and a plant and an animal. In the perception subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
, the client is asked to identify photographs of famous people and familiar objects from unusual angles, in addition to the tactile recognition of coins (Huppert, Jorm, Brayne, Girling, Barkley, Bearsdall et al., 1996).

The number of scored items for each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
is as follows (de Koning et al., 1998; Huppert et al., 1996).

CAMCOG subscales	Number of scored items
Orientation	10
Language Comprehension Expression	9 8
Memory Learning Recent Remote	3 4 6
Concentration	2
Praxis	8
Calculation	2
Perception	3
Abstraction	4
Number of scored items	59

Items related to aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) or upper extremity paresis may not be tested in all clients and depend on strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity.

Detailed administration guidelines are in the CAMCOG manual that can be obtained from the Cambridge University Department of Psychiatry.

Scoring:

The CAMCOG total score ranges from 0 to 107. Scores lower than 80 are considered indicative of dementia (de Koning et al., 1998; Roth et al., 1986). Among the 67 CAMCOG items, 39 are scored as ‘right’ or ‘wrong’; 11 are scored on a 3-point scale with ‘wrong’, ‘right to a certain degree’ or ‘completely right’ as response options; 9 items encompass questions or commands, and the score for each item is the sum of the correct answers; and finally 8 items are not scored. Five of the non-scored items are from the MMSE and they are not included in the total score because they are assessed in more detail by other CAMCOG items. The remaining 3 items are optional during the examination (de Koning, Dippel, van Kooten, & Koudstall, 2000; Huppert et al.,1996).

The maximum score per subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
is as follows (Huppert et al., 1996):

CAMCOG subscales	Number of scored items
Orientation	10
Language Comprehension Expression	9 21
Memory Learning Recent Remote	17 4 6
Concentration	4
Praxis	12
Calculation	5
Perception	11
Abstraction	8
Maximum Total Score	107

Time:

The CAMCOG takes 20 to 30 to administer and the R-CAMCOG takes 10 to 15 minutes to administer (de Koning et al, 1998; de Koning et al., 2000; Huppert et al., 1996).

Subscales:

The CAMCOG is comprised of 8 subscales:

Orientation
Language: subdivided into comprehensive and expressive language
Memory: subdivided into remote, recent and learning memory
Attention
Praxis
Calculation
Abstraction
Perception

Equipment:

The CAMCOG requires no specialized equipment. Only the test and a pencil are needed to complete the assessment.

The CAMCOG requires specialized equipment that are enclosed within its manual. The manual can be purchased from the Cambridge University Department of Psychiatry.

Alternative forms of the CAMCOG

Revised CAMCOG (CAMCOG-R): Published in 1999 by Roth, Huppert, Mountjoy and Tym, the CAMCOG-R improved the ability of the measure to detect certain types of dementia and to make clinical diagnoses based on the ICD-10 and DSM-IV. This version includes updated items from the remote memory subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
and the addition of items to assess executive function (Leeds, Meara, Woods & Hobson, 2001; Roth, Huppert, Mountjoy & Tym, 1999).
Rotterdam CAMCOG (R-CAMCOG): Published in 2000, the R-CAMCOG is a shortened version of the CAMCOG with 25 items. It takes 10 to 15 minutes to administer and is as accurate as the CAMCOG in screeningTesting for disease in people without symptoms.
for post-stroke dementia (de Koning et al., 2000).
General Practitioner Assessment of Cognition (GPCOG): Published in 2002 to be used in primary care settings, the GPCOG contains 9 cognitive and 6 informant items that were derived from the Cambridge Cognitive Examination, the Psychogeriatric Assessment Scale (Jorm, Mackinnon, Henderson, Scott, Christensen, Korten et al. 1995) and the instrumental ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living Scale (Lawton & Brody, 1969). The GPCOG takes 4 to 5 minutes to administer and appears to have a diagnostic accuracy similar to the Mini-Mental State Examination (Folstein, Folstein, & McHugh, 1975) in detecting dementia (Brodaty, Pond, Kemp, Luscombe, Harding, Berman et al., 2002).

Client suitability

Can be used with:

Clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.
Clients with different types of dementia

Should not be used with:

The CAMCOG should not be used with clients with severe cognitive impairment.
Items related to aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) and upper extremity paresis might not be tested on all clients and appropriate use depends on strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity.

In what languages is the measure available?

English and Dutch (de Koning et al., 2000).

Summary

What does the tool measure?	The CAMCOG is a standardized instrument for diagnosis and grading of dementia.
What types of clients can the tool be used for?	The CAMCOG can be used with, but is not limited to clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	The CAMCOG takes 20 to 30 minutes to administer.
Versions	Revised CAMCOG (CAMCOG-R); Rotterdam-CAMCOG (R-CAMCOG); General Practitioner Assessment of Cognition (GPCOG)
Other Languages	English; Dutch
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: No studies have examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the CAMCOG in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
ValidityThe degree to which an assessment measures what it is supposed to measure.	Content: – No studies have examined the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension. of the CAMCOG in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. – One study examined the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension. of the R-CAMCOG by reporting the steps for generating the shortened version of the CAMCOG. Criterion: Concurrent: No studies have examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the CAMCOG. Predictive: Six studies examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the CAMCOG and reported that the CAMCOG can be predicted by age, the R-CAMCOG, the Mini-Mental State Examination and cognitive and emotional impairments. Additionally, the CAMCOG was an excellent predictor of dementia 3 to 9 months post-stroke. However, the CAMCOG was not able to predict QOL in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and is not predicted by the Functional Independence Measure. Construct: Convergent: – One study examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the CAMCOG in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and reported excellent correlations between the CAMCOG and the R-CAMCOG and the Mini-Mental State Examination shortly after and 1 year post-stroke. Correlations between the CAMCOG and the Functional Independence Measure range from adequate after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. to poor at 1 year post-stroke. – One study examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the CAMCOG-R and reported excellent correlations between the CAMCOG-R and the Raven Test and the Weigl Test and poor correlations between the CAMCOG-R and the Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. Scale and the Barthel Index using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. . Known Groups: Two studies using student t-test examined known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured. of the CAMCOG and reported that the CAMCOG is able to distinguish between clients with or without dementia as well as aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury. An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) severity in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Floor and ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect."	One study examined the floor / ceiling effects of the CAMCOG in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and reported that 14 items showed ceiling effects but no floor effects
Does the tool detect change in patients?	– No studies have examined the responsivenessThe ability of an instrument to detect clinically important change over time. of the CAMCOG in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. – One study examined the responsivenessThe ability of an instrument to detect clinically important change over time. of the CAMCOG-R and reported that at follow-up scores changes were all statistically significant (p<0.01).
Acceptability	Items related to aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury. An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) and upper extremity paresis might not be tested on all clients due to strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity.
Feasibility	The instructions for administration and coding must be followed closely (Ruchinskas and Curyto, 2003).
How to obtain the tool?	The CAMCOG can be obtained by purchasing the entire CAMDEX from the Cambridge University Department of Psychiatry

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the Cambridge Cognition Examination (CAMCOG) in individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. We identified 6 studies on the CAMCOG, 1 on the CAMCOG-R and 1 on the R-CAMCOG.

Floor/Ceiling Effects

de Koning, Dippel, van Kooten and Koudstaal (2000) analyzed the floor and ceiling effects of the CAMCOG in 300 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. A ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." was found in 2 out of 10 orientation items, 8 out 17 language items, 2 out of 13 memory items, 1 out of 8 praxis items, and 1 out of 3 perception items, with more than 20% of participants scoring the maximum score. No floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect."
was observed in the CAMCOG.

Reliability

No studies have examined the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the CAMCOG in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Validity

Content:

No studies have examined the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension.
of the CAMCOG in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

de Koning et al. (2000) analyzed CAMCOG scores from 300 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and reduced the 59 items of the CAMCOG to the 25 items of the R-CAMCOG. Initially, item reduction was performed by removing 14 items with ceiling effects on the CAMCOG. Next, the language, attention, praxis, and calculation subscales were eliminated due to their low diagnostic accuracy. Finally, items with a very low or very high inter-item correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
were removed.

Criterion:

Concurrent:
No studies have examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the CAMCOG in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Predictive:
Kwa, Limburg, Voogel, Teunisse, Derix and Hijdra (1996a) examined whether age, educational level, side and volume of the infarct, aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) severity, and motor function predicted CAMCOG scores at 3 months after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. in 129 clients. A cut-off of 80 was used to discriminate between normal and abnormal cognitive function. Based on regression analysis with these above-mentioned variables included, age appeared to be the best predictor of CAMCOG scores 3 months post-stroke.
Note: The timeline for the baseline measurements were not reported in the study.

Kwa, Limburg and de Haan (1996b) verified the ability of the CAMCOG, the Rankin Scale (Rankin, 1957), the Barthel Index (Mahoney & Barthel, 1965), the Motricity Index (Colin & Wade, 1990), aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) severity, age, educational level, volume and side of the infarct to predict quality of life in 97 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Linear regression analysis indicated that quality of life is best predicted by the Rankin Scale, volume of infarct and aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) severity.
Note: The timeline for all the measurements were not reported in the study.

de Koning, van kooten, Dippel, van Harskamp, Grobbee, Kluft, et al. (1998) analyzed the ability of the CAMCOG and the Mini-Mental State Examination (MMSE – Folstein, Folstein, & McHugh, 1975) measured shortly after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. to predict dementia measured 3 to 9 months later in 300 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
was calculated by use of c-statistics to calculate the area under the Receiver Operating Characteristic (ROC) curve. The ability of the CAMCOG (AUC = 0.95) and the MMSE (AUC = 0.90) to predict dementia after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. were both considered excellent. These results suggest that the percentage of patients correctly classified according to their dementia level at 3 to 9 months post-stroke is only slightly higher when using the CAMCOG over the MMSE.

de Koning et al. (2000) examined whether the CAMCOG and the R-CAMCOG, measured at hospital admission predicted dementia at 3 to 9 months post-stroke in 300 clients. Predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
, as calculated using c-statistics to estimate the area under the Receiver Operating Characteristic (ROC) curve, were all excellent for the CAMCOG (AUC = 0.95) and the CAMCOG-R (AUC = 0.95). These results suggest that the percentage of patients correctly classified according to their dementia level at 3 to 9 months post-stroke is the same when using the CAMCOG and the R-CAMCOG. Additionally, when using a cut-off of 77 for the CAMCOG and 33 for the R-CAMCOG, both measures showed a sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of 91% and the specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
was 88% and 90%, respectively.

van Heugten, Rasquin, Winkens, Beusmans, and Verhey (2007) estimated the ability of a checklist of cognitive and emotional impairments measured 6 months post-stroke to predict the CAMCOG and the Mini-Mental State Examination (MMSE – Folstein, Folstein, & McHugh, 1975) scores at 12 months in 69 clients. Regression analysis showed that cognitive and emotional impairments explained 31% of the variance on the MMSE and 22% of the variance on the CAMCOG. These results suggest that cognitive and emotional impairments were able to predict the scores of both measures.

Winkel-Witlox, Post, Visser-Meily, and Lindeman (2008) analyzed the ability of the R-CAMCOG, the Mini-Mental State Examination (MMSE – Folstein, Folstein, & McHugh, 1975) and the Functional Independence Measure (FIM – Keith, Granger, Hamilton, & Sherwin, 1987) to predict the CAMCOG in 169 clients. All four outcomes measures were collected shortly after and 1 year post-stroke. Regression analysis showed that after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. the R-CAMCOG explained 83% of variance on the CAMCOG, the MMSE explained 53% and the FIM 11%. At 1 year post-stroke the R-CAMCOG explained 82% of variance on the CAMCOG, the MMSE explained 57% and the FIM only 04%. These results suggest that the R-CAMCOG is the best predictor of the CAMCOG among these independent variables.

Construct:

Convergent/Discriminant:
Winkel-Witlox et al. (2008) examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the CAMCOG by comparing it to R-CAMCOG, the Mini-Mental State Examination (MMSE – Folstein, Folstein, & McHugh, 1975) and the Functional Independence Measure (FIM – Keith, Granger, Hamilton, & Sherwin, 1987) in 169 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Shortly after and at 1 year post-stroke correlations between the CAMCOG and the R-CAMCOG and the MMSE were all excellent (rho1 = 0.92; 066, rho2 = 0.92; 069, respectively). Correlations between the CAMCOG and the FIM was adequate shortly after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (rho1 = 0.35) and poor after 1 year (rho2 = 0.27).

Leeds, Meara, Woods and Hobson (2001) analyzed the construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the CAMCOG-R by comparing it to the Raven Test (Raven, 1982), the Weigl Test (Grewal, Haward, & Davies, 1986), the Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale (Sheikh & Yesavage, 1986) and the Barthel Index (Mahoney & Barthel, 1965) in 83 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Correlations as calculated using Pearson correlations were excellent between the CAMCOG-R and the Raven Test (r = 0.75) and the Weigl Test (r = 0.70). Correlations between the CAMCOG-R and the Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale (r = -0.30) and the Barthel Index (r = 0.20) were poor.

Known groups.
de Koning et al. (1998) analyzed whether the CAMCOG is able to distinguish between individuals with dementia from those without dementia in 300 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured.
, as calculated using student t-test, showed that the CAMCOG was able to discriminate clients with dementia from those without dementia. These results demonstrated that clients with dementia have statistically significant lower scores on the CAMCOG.

Kwa et al. (1996a) verified the ability of the CAMCOG to discriminate between clients without aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) and those with severe aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) in 129 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured.
, as calculated using the student t-test, showed that the CAMCOG was able to differentiate between aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) severity.

Responsiveness

No studies have examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the CAMCOG in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Leeds et al. (2001) examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the CAMCOG-R in 83 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were assessed at baseline and 63 days later. At follow-up, changes on the CAMCOG-R scores were all statistically significant (p<0.01). These results suggest that the CAMCOG-R appears sensitive to change in cognitive status of clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

References

Brodaty, H., Pond, D., Kemp, N.M., Luscombe, G., Harding, L., Berman, K. et al. (2002). The GPCOG: A new screening test for dementia designed for general practice. Journal of the American Geriatrics Society, 50, 530-534.
Collin, C. & Wade, D. (1990). Assessing motor impairment after stroke: A pilot reliability study. J Neurology Neurosurg Psychiatry, 53, 576-579.
de Koning, I., Dippel, D.W.J., van Kooten, F. & Koudstaal, P.J. (2000). A short screening instrument for poststroke dementia: The R-CAMCOG. Stroke, 31, 1502-1508.
de Koning, I., van Kooten, F., Dippel, D.W.J., van Harskamp, F., Grobbee, D.E., Kluft, C. & Koudstaal, P.J. (1998). The CAMCOG: A useful screening instrument for dementia in stroke patients. Stroke, 29, 2080-2086.
Folstein, M.F., Folstein, S. E. & McHugh, P. R. (1975). “Mini-mental state”. A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res, 12(3), 189-198.
Grewal, B., Haward, L. & Davies, I. (1986). Color and form stimulus values in a test of dementia. IRCS Med Sci, 14, 693-694.
Huppert, F.A., Jorm, A.F., Brayne, C., Girling, D.M., Barkeley, C., Bearsdall, et al. (1996). Psychometric properties of the CAMCOG and its efficacy in the diagnosis of dementia. Aging, Neuropsychology, and Cognition, 3, 201-214.
Jorm, A.F., Mackinnon, A.J., Henderson, A.S., Scott, H., Christensen, H., Korten, A.E., et al. (1995). The Psychogeriatric Assessment Scales: A multidimensional alternative to categorical diagnoses of dementia and depression in the elderly. Psychol Med, 25, 447-460.
Keith, R.A., Granger, C.V., Hamilton, B.B., & Sherwin, F.S. (1987). The functional independence measure: A new tool for rehabilitation. Adv Clin Rehabil, 1, 6-18.
Kwa, V.I.H., Limburg, M. & de Haan, R.J. (1996b). The role of cognitive impairment in the quality of life after ischaemic stroke. J Neurol, 243, 599-604.
Kwa, V.I.H., Limburg, M., Voogel, A.J., Teunisse, S., Derix, M.M.A. & Hijdra, A. (1996a). Feasibility of cognitive screening of patients with ischaemic stroke using the CAMCOG: a hospital based study. J Neurol, 243, 405-409.
Lawton, M.P. & Brody, E.M. (1969). Assessment of older people: Self-maintaining and instrumental activities of daily living. Gerontologist, 9, 179-186.
Leeds, L., Meare, R.J., Woods, R. & Hobson, J.P. (2001). A comparison of the new executive functioning domains of the CAMCOG-R with existing tests of executive function in elderly stroke survivors. Age and Ageing, 30, 251-254.
Mahoney, F. & Barthel, D. (1965). Functional evaluation: The Barthel Index. MD State J, 14, 61-65.
Rankin, J. (1957). Cerebral vascular accidents in patients over the age of 60. Scott Med J, 2, 200-215.
Raven, J.C. (1982). Revised manual for Raven’s Coloured Progressive Matrices. Windsor, UK: NFER-Nelson.
Roth, M., Huppert, F., Mountjoy, C., & Tym, E. (1999). The Cambridge Examination for Mental Disorders of the Elderly – Revised. Cambridge: Cambridge University Press.
Roth, M., Tym, E., Mountjoy, C., Huppert, F.A., Hendrie, H., Verma, S. et al. (1986). CAMDEX: A standardized instrument for the diagnosis of mental disorder in the elderly with special reference to the early detection of dementia. British Journal of Psychiatry, 149, 698-709.
Ruchinskas, R.A. & Curyto, K. (2003). Cognitive screening in geriatric rehabilitation. Rehabilitation Psychology, 48(1), 14-22.
Sheikh, J.A. & Yesavage, J.A. (1986). Geriatric depression scale (GDS): Recent findings and development of a shorter version. Clinical Gerontologist, 5, 165-172.
Winkel-Witlox, A.C.M.Te, Post, M.W.M., Visser-Meily, J.M.A., & Linderman, E. (2008). Efficient screening of cognitive dysfunction in stroke patients: Comparison between the CAMCOG and the R-CAMCOG, Mini-Mental State Examination and Functional Independence Measure-cognition score. Disability and Rehabilitation, 30(18), 1386-1391.
van Heugten, C., Rasquin, S., Winkens, I., Beusmans, G., & Verhey, F. (2007). Checklist for cognitive and emotional consequences following stroke (CLCE-24): Development, usability and quality of the self-report version. Clinical Neurology and Neurosurgery, 109, 257-262.

See the measure

How to obtain the CAMCOG

The CAMCOG can be obtained by purchasing the entire CAMDEX from the Cambridge University Department of Psychiatry.

Clock Drawing Test (CDT)

Evidence Reviewed as of before: 19-08-2008

Author(s)*: Lisa Zeltzer, MSc OT; Anita Menon, MSc

Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Purpose

The CDT is used to quickly assess visuospatial and praxis abilities, and may determine the presence of both attention and executive dysfunctions (Adunsky, Fleissig, Levenkrohn, Arad, & Nov, 2002; Suhr, Grace, Allen, Nadler, & McKenna, 1998; McDowell, & Newell, 1996).

The CDT may be used in addition to other quick screening tests such as the Mini-Mental State Examination (MMSE), and the Functional Independence Measure (FIM).

In-Depth Review

Purpose of the measure

The CDT may be used in addition to other quick screeningTesting for disease in people without symptoms.
tests such as the Mini-Mental State Examination (MMSE), and the Functional Independence Measure (FIM).

Available versions

The CDT is a simple task completion test in its most basic form. There are several variations to the CDT:

Verbal command:

Free drawn clock:
The individual is given a blank sheet of paper and asked first to draw the face of a clock, place the numbers on the clock, and then draw the hands to indicate a given time. To successfully complete this task, the patient must first draw the contour of the clock, then place the numbers 1 through 12 inside, and finally indicate the correct time by drawing in the hands of the clock.
Pre-drawn clock:
Alternatively, some clinicians prefer to provide the individual with a pre-drawn circle and the patient is only required to place the numbers and the hands on the face of the clock. They argue that the patient’s ability to fill in the numbers may be adversely affected if the contour is poorly drawn. In this task, if an individual draws a completely normal clock, it is a fast indication that a number of functions are intact. However, a markedly abnormal clock is an important indication that the individual may have a cognitive deficit, warranting further investigation.

Regardless of which type is used (free drawn or pre-drawn), the verbal command CDT can simultaneously assess a patient’s language function (verbal comprehension); memory function (recall of a visual engram, short-term storage, and recall of time setting instructions); and executive function. The verbal command variation of the CDT is highly sensitive for temporal lobe dysfunction (due to its heavy involvement in both memory and language processes) and frontal lobe dysfunction (due to its mediation of executive planningPlanning ability involves anticipating future events, formulating a goal or endpoint, and devising a sequence of steps or actions that will achieve the goal or endpoint" (Anderson, 2008, p. 17)
) (Shah, 2002).

Copy command:

The individual is given a fully drawn clock with a certain time pre-marked and is asked to replicate the drawing as closely as possible. The successful completion of the copy command requires less use of language and memory functions but requires greater reliance on visuospatial and perceptual processes.

Copy command clock

Clock reading test:
A modified version of the copy command CDT simply asks the patient to read aloud the indicated time on a clock drawn by the examiner. The copy command clock-drawing and clock reading tests are good for assessing parietal lobe lesions such as those that may result in hemineglect. It is important to do both the verbal command and the copy command tests for every patient as a patient with a temporal lobe lesion may copy a pre-drawn clock adequately, whereas their clock drawn to verbal command may show poor number spacing and incorrect time setting. Conversely, a patient with a parietal lobe lesion may draw an adequate clock to verbal command, while their clock drawing with the copy command may show obvious signs of neglect.

Time-Setting Instructions:

The most common setting chosen by clinicians is “3 O’clock” (Freedman, Leach, Kaplan, Winocur, Shulman, & Delis, 1994). Although this setting adequately assesses comprehension and motor execution, it does not indicate the presence of any left neglect the patient may have because it does not require the left half of the clock to be used at all. The time setting “10 after 11” is an ideal setting (Kaplan, 1988). It forces the patient to attend to the whole clock and requires the recoding of the command “10” to the number “2” on the clock. It also has the added advantage of uncovering any stimulus-bound errors that the patient may make. For example, the presence of the number “10” on the clock may trap some patients and prevent the recoding of the command “10” into the number “2.” Instead of drawing the minute hand towards the number “2” on the clock to indicate “10 after,” patients prone to stimulus-bound errors will fixate and draw the minute hand toward the number “10” on the clock.

Features of the measure

Scoring:

There are a number of different ways to score the CDT. In general, the scores are used to evaluate any errors or distortions such as neglecting to include numbers, putting numbers in the wrong place, or having incorrect spacing (McDowell & Newell, 1996). Scoring systems may be simple or complex, quantitative or qualitative in nature. As a quick preliminary screeningTesting for disease in people without symptoms.
tool to simply detect the presence or absence of cognitive impairment, you may wish to use a simple quantitative method (Lorentz et al., 2002). However, if a more complex assessment is required, a qualitative scoring system would be more telling.

Different scoring methods have been found to be better suited for different subject groups (Richardson & Glass, 2002; Heinrik, Solomesh, & Berkman, 2004). In patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., no single standardized method of scoring exists. Suhr, Grace, Allen, Nadler, and McKenna (1998) examined the utility of the CDT in localizing lesions in 76 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and 71 controls. Six scoring systems were used to assess clock drawings (Freedman et al., 1994; Ishiai, Sugishita, Ichikawa, Gono, & Watabiki, 1993; Mendez, Ala, & Underwood, 1992; Rouleau, Salmon, Butters, Kennedy, & McGuire, 1992; Sunderland et al., 1989; Tuokko, Hadjistavropoulos, Miller, & Beattie, 1992; Watson, Arfken, & Birge, 1993; Wolf-Klein et al., 1989). Significant differences were found between controls and patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. on all scoring systems for both quantitative and qualitative features of the CDT. However, quantitative indices were not helpful in differentiating between various strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. groups (left versus right versus bilateral strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.; cortical versus subcortical strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.; anterior versus posterior strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.). Qualitative features were helpful in lateralizing lesion site and differentiating subcortical from cortical groups.

A psychometric study in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. by South, Greve, Bianchini, and Adams (2001) compared three scoring systems: the Rouleau rating scale (1992); the Freedman scoring system (1994), and the Libon revised system (1993). These scoring systems were found to be reliable in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (please see for the details of this study).

Subscales:

None typically reported.

Equipment:

Only a paper and pencil is required. Depending on the method chosen, you may need to prepare a circle (about 10 cm in diameter) on the paper for the patient.

Training:

The CDT can be administered by individuals with little or no training in cognitive assessment. Scanlan, Brush, Quijano, & Borson (2002) found that a simple binary rating of clock drawings (normal or abnormal) by untrained raters was surprisingly effective in classifying subjects as having dementia or not. In this study, a common mistake of untrained scorers was failure to recognize incorrect spacing of numbers on the clock face as abnormal. By directing at this type of error, concordance between untrained and expert raters should improve.

Time:

All variations of the CDT should take approximately 1-2 minutes to complete (Ruchinskas & Curyto, 2003).

Alternative forms of the CDT

The Clock Drawing Test-Modified and Integrated Approach (CDT-MIA) is a 4-step, 20-item instrument, with a maximum score of 33. The CDT-MIA emphasizes differential scoring of contour, numbers, hands, and center. It integrates 3 existing CDT’s:

Freedman et al’s free-drawn clock (1994) on some item definitions
Scoring techniques adapted from Paganini-Hill, Clark, Henderson, & Birge (2001)
Some items borrowed from Royall, Cordes, & Polk (1998) executive CLOX

The CDT-MIA was found to be reliable and valid in individuals with dementia, however this measure has not been validated in the strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population (Heinik et al., 2004).

Client suitability

Can be used as a screening instrument with:

Virtually any patient population (Wagner, Nayak, & Fink, 1995). The test appears to be differentially sensitive to some types of disease processes. Particularly, it has proven to be clinically useful in differentiating among normal elderly, patients with neurodegenerative or vascular diseases, and those with psychiatric disorders, such as depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
and schizophrenia (Dastoor, Schwartz, & Kurzman, 1991; Heinik, Vainer-Benaiah, Lahav, & Drummer, 1997; Lee & Lawlor, 1995; Shulman, Gold, & Cohen, 1993; Spreen & Strauss, 1991; Tracy, De Leon, Doonan, Musciente, Ballas, & Josiassen, 1996; Wagner et al., 1995; Wolf-Klein, Silverstone, Levy, & Brod, 1989).

Can be used with:

Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Because the CDT requires a nonverbal response, it may be administered to those with speech difficulties but who have sufficient comprehension to understand the requirement of the task.

Should not be used in:

Patients who cannot understand spoken or written instructions
Patients who cannot write

As with many other neuropsychological screeningTesting for disease in people without symptoms.
measures, the CDT is affected by age, education, conditions such as visual neglect and hemiparesis, and other factors such as the presence of depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
(Ruchinskas & Curyto, 2003; Lorentz, Scanlan, & Borson, 2002). The degree to which these factors affect ones score depends much on the scoring method applied (McDowell & Newell, 1996). Moreover, the CDT focuses on right hemisphere function, so it is important to use this test in conjunction with other neuropsychological tests (McDowell & Newell, 1996).

In what languages is the measure available?

The CDT can be conducted in any language. Borson et al. (1999) found that language spoken did not have any direct effect on CDT test performance.

Summary

What does the tool measure?	Visuospatial and praxis abilities, and may determine the presence of both attention and executive dysfunctions.
What types of clients can the tool be used for?	Virtually any patient population. It has proven to be clinically useful in differentiating among normal elderly, patients with neurodegenerative or vascular diseases, and those with psychiatric disorders, such as depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. and schizophrenia.
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	ScreeningTesting for disease in people without symptoms.
Time to administer	All variations of the CDT should take approximately 1-2 minutes to complete.
Versions	Verbal command: Free drawn clock; Pre-drawn clock; Copy command: Copy command; Clock reading test Time-setting: “10 after 11” The Clock Drawing Test Modified and Integrated Approach (CDT-MIA)
Languages	The CDT can be conducted in any language.
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Test-retest: Out of four studies examining test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). , three reported excellent test-retest and 1 found adequate test-retest. Inter-rater: Out of seven studies examining inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. , six reported excellent inter-rater and one reported adequate (for examiner clocks) to excellent (for free-drawn and pre-drawn clocks inter-rater.
ValidityThe degree to which an assessment measures what it is supposed to measure.	Criterion: Predicted lower functional ability and increased need for supervision on hospital discharge; poor physical ability and longer length of stay in geriatric rehabilitation; activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. of daily living at maximal recovery. Construct: The CDT correlated adequately with the Mini-Mental State Examination and the Functional Independence Measure. Known groups: Significant differences between Alzheimer’s patients and controls detected by CDT.
Does the tool detect change in patients?	Not applicable
Acceptability	The CDT is short and simple. It is a nonverbal task and may be less threatening to patients than responding to a series of questions.
Feasibility	The CDT is inexpensive and highly portable. It can be administered in situations in which longer tests would be impossible or inconvenient. Even the most complex administration and scoring system requires approximately 2 minutes. It can be administered by individuals with minimal training in cognitive assessment.
How to obtain the tool?	A pre-drawn circle can be downloaded by clicking on this link: pre-drawn circle

Psychometric Properties

Overview

Until recently, data on the psychometric properties of the CDT were limited. While there are many possible ways to administer and score the CDT, the psychometric properties of all the various systems seem consistent and all forms correlate strongly with other cognitive measures (Scanlan et al., 2002; Ruchinskas & Curyto, 2003; McDowell & Newell, 1996). Further, scoring of the CDT has been found to be both accurate and consistent in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (South et al., 2001).

For the purposes of this review, we conducted a literature search to identify all relevant publications on the psychometric properties of the more commonly applied scoring methods of the CDT. We then selected to review articles from high impact journals, and from a variety of authors.

Reliability

Test-retest:

Using Spearman rank order correlations of the CDT has been reported by several investigators using a variety of scoring systems:

Manos and Wu (1994) reported an “excellent” 2-day test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of 0.87 for medical patients and 0.94 for surgical patients.
Tuokko et al. (1992) reported an “adequate” test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of 0.70 at 4 days.
Mendez et al. (1992) reported and “excellent” coefficients of 0.78 and 0.76 at 3 and 6 months, respectively.
Freedman et al. (1994) reported test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
as “very low”. However, when the “10 after 11” time setting was used with the examiner clock, which is known to be a more sensitive setting for detecting cognitive dysfunction, test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was found to be “excellent” (0.94).

Inter-rater:

Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the CDT, as indicated by Spearman rank order correlations (not the preferred method of analyses for assessing inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
but one used in earlier measurement research), has also been reported by several investigators:

Sunderland et al. (1989) found “excellent” coefficients ranging from 0.86 to 0.97 and found no difference between clinician and non-clinician raters (0.84 and 0.86, respectively).
Rouleau et al. (1992) found “excellent” inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
, with coefficients ranging from 0.92 to 0.97.
Mendez et al. (1992) reported “excellent” inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of 0.94.
Tuokko et al. (1992) reported high coefficients ranging from 0.94 to 0.97 across three annual assessments.
The modified Shulman scale (Shulman, Gold, Cohen, & Zucchero, 1993) also has “excellent” inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
(0.94 at baseline, 0.97 at 6 months, and 0.97 at 12 months).
Manos and Wu (1994) obtained “excellent” inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
coefficients ranging from 0.88 to 0.96.
Freedman et al. (1994) reported coefficients ranging from 0.79 to 0.99 on the free-drawn clocks, 0.84 to 0.85 using the pre-drawn contours, and 0.63 to 0.74 for the examiner clocks, demonstrating “excellent” inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
.

South et al. (2001) compared the psychometrics of 3 different scoring methods of the CDT (Libon revised system; Rouleau rating scale; and Freedman scoring system) in a sample of 20 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
were measured using the intraclass correlation coefficient (ICC)Intraclass correlation (ICC) is used to measure inter-rater reliability for two or more raters. It may also be used to assess test-retest reliability. ICC may be conceptualized as the ratio of between-groups variance to total variance.. Raters used comparable criteria for each score demonstrating “excellent” inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
. Raters used similar scoring criteria throughout, demonstrating “excellent” intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
. South et al. (2001) concluded that while the Libon scoring system demonstrated a range of reliabilities across different domains, the Rouleau and Freedman systems were in the excellent range.

Validity

In a review, Shulman (2000) reported that most studies achieved sensitivities and specificities of approximately 85% and concluded that the CDT, in conjunction with other widely used tests such as the Mini-Mental State Examination (MMSE), could provide a significant advance in the early detection of dementia. In contrast, Powlishta et al. (2002) concluded from their study that the CDT did not appear to be a useful screeningTesting for disease in people without symptoms.
tool for detecting very mild dementia. Other authors have concluded that the CDT should not be used alone as a dementia screeningTesting for disease in people without symptoms.
test because of its overall inadequate performance (Borson & Brush, 2002; Storey et al., 2001). However, most of the previous studies were based on relatively small sample sizes or were undertaken in a clinical setting, and their results may not be applicable to a larger community population.

Nishiwaki et al. (2004) studied the validityThe degree to which an assessment measures what it is supposed to measure.
of the CDT in comparison to the MMSE in a large general elderly population (aged 75 years or older). The specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of the CDT for detecting moderate-to-severe cognitive impairment (MMSE score = 17) were 77% and 87%, respectively, for nurseIn charge of, but not limited to, the "assessment and provision of care needs; support and education for patients and families; discharge planning."(Suggested by Philips et al, 2002)
administration and 40% and 91%, respectively, for postal administration. The authors conclude that the CDT may have value as a brief face-to-face screeningTesting for disease in people without symptoms.
tool for moderate/severe cognitive impairment in an older community population but is relatively poor at detecting milder cognitive impairment.

Few studies have examined the validityThe degree to which an assessment measures what it is supposed to measure.
of the CDT specifically in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Adunsky et al. (2002) compared the CDT with the MMSE and cognitive Functional Independence Measure (FIM) (cognitive tests used for the evaluation of functional outcomes at discharge in elderly patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.). The tests were administered to 151 patients admitted for inpatient rehabilitation following an acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients (Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
) between the three cognitive tests resulted in r-values ranging from 0.51 to 0.59. Adunsky et al. (2002) concluded that they share a reasonable degree of resemblance to each other, accounting for “adequate” concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of these tests.

Bailey, Riddoch, and Crome (2000) evaluated a test battery for hemineglect in elderly patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and determined that the CDT had questionable validityThe degree to which an assessment measures what it is supposed to measure.
in the assessment of representational neglect. Further, consistent with previous findings (Ishiai et al., 1993; Kaplan et al., 1991), the utility of the CDT as a screeningTesting for disease in people without symptoms.
measure for neglect was not supported from these results. Reasons include the subjectivity in scoring, and questionable validityThe degree to which an assessment measures what it is supposed to measure.
in that the task may also reflect cognitive impairment (Freidman, 1991), constructional apraxia, or impaired planningPlanning ability involves anticipating future events, formulating a goal or endpoint, and devising a sequence of steps or actions that will achieve the goal or endpoint" (Anderson, 2008, p. 17)
ability (Kinsella, Packer, Ng, Olver, & Stark, 1995).

Responsiveness

Not applicable.

References

Adunsky, A., Fleissig, Y., Levenkrohn, S., Arad, M., Nov, S.(2002). Clock drawing task, mini-mental state examination and cognitive-functional independence measure: relation to functional outcome of stroke patients. Arch Gerontol Geriatr, 35(2), 153-60.
Bailey, M. J., Riddoch, J., Crome, P. (2002). Evaluation ofa test battery for hemineglect in elderly stroke patients for use by therapists in clinical practice. Neurorehabilitation, 14(3), 139-150.
Borson, S., Brush, M., Gil, E., Scanlan, J., Vitaliano, P.,Chen, J., Cahsman, J., Sta Maria, M. M., Barnhart, R., Roques, J. (1999). The Clock Drawing Test: Utility for dementia detection in multiethnic elders. J Gerontol A Biol Sci Med Sci, 54, M534-40.
Dastoor, D. P., Schwartz, G., Kurzman, D. (1991).Clock-drawing: An assessment technique in dementia. Journal of Clinical and Experimental Gerontology, 13, 69-85.
Freedman, M., Leach, L., Kaplan, E., Winocur, G., Shulman,K. I., Delis, D. C. (1994). Clock Drawing: A Neuropsychological Analysis (pp. 5). New York: Oxford University Press.
Friedman, P. J. (1991). Clock drawing in acute stroke.Age and Ageing, 20(2), 140-145.
Heinik, J., Vainer-Benaiah, Z., Lahav, D., Drummer, D.(1997). Clock drawing test in elderly schizophrenia patients. International Journal of Geriatric Psychiatry, 12, 653-655.
Heinik, J., Solomesh, I., Berkman, P. (2004). Correlationbetween the CAMCOG, the MMSE and three clock drawing tests in a specialized outpatient psychogeriatric service. Arch Gerontol Geriatr, 38, 77-84.
Heinik, J., Solomesh, I., Lin, R., Raikher, B., Goldray, D.,Merdler, C., Kemelman, P. (2004). Clock drawing test-modified and integrated approach (CDT-MIA): Description and preliminary examination of its validity and reliability in dementia patients referred to a specialized psychogeriatric setting. J Geriatr Psychiatry Neurol, 17, 73-80.
Ishiai, S., Sugishita, M., Ichikawa, T., Gono, S., Watabiki,S. (1993). Clock drawing test and unilateral spatial neglect. Neurology, 43, 106-110.
Kaplan, E. (1988). A process approach to neuropsychologicalassessment. In: T Bull & BK Bryant (Eds.), Clinical neuropsychology and brain function: Research, measurement, and practice (pp. 129-167). Washington DC: American Psychological Association.
Kaplan, R.F., Verfaillie, M., Meadows, M., Caplan, L.R.,Pessin, M. S., DeWitt L. (1991). Changing attentional demands in left hemispatial neglect. Archives of Neurology, 48, 1263-1267.
Kinsella, G., Packer, S., Ng, K., Olver, J., Stark, R.(1995). Continuing issues in the assessment of neglect. Neuropsychological Rehabilitation, 5, 239-258.
Lee, H., Lawlor, B. A. (1995). State-dependent nature of theClock Drawing Task in geriatric depression. Journal of the American Geriatrics Society, 43, 796-798.
Lorentz, W. J., Scanlan, J. M., Borson, S. (2002). Briefscreening tests for dementia. Can J Psychiatry, 47, 723-733.
Manos, P. J., Wu, R. (1994). The Ten Point Clock Test: Aquick screen and grading system for cognitive impairment in medical and surgical patients. International Journal of Psychiatry in Medicine, 24, 229-244.
McDowell, I., Newell, C. (1996). Measuring Health. A Guideto Rating Scales and Questionnaires. 2nd ed. NewYork: Oxford University Press.
Mendez, M. F., Ala, T., Underwood, K. L. (1992). Developmentof scoring criteria for the clock drawing task in Alzheimers disease. Journal of the American Geriatrics Society, 40, 1095-1099.
Nishiwaki, Y., Breeze, E., Smeeth, L., Bulpitt, C. J.,Peters, R., Fletcher, A. E. (2004). Validity of the Clock-Drawing Test as a Screening Tool for Cognitive Impairment in the Elderly. American Journal of Epidemiology, 160(8), 797-807.
Paganini-Hill, A., Clark, L. J., Henderson, V. W., Birge, S.J. (2001). Clock drawing: Analysis in a retirement community. J Am Geriatr Soc, 49, 941-947.
Powlishta, K. K., von Dras, D. D., Stanford, A., Carr D. B.,Tsering, C., Miller, J. P., Morris, J. C. (2002). The Clock Drawing Test is a poor screen for very mild dementia. Neurology, 59, 898-903.
Richardson, H. E., Glass, J.N. (2002). A comparison ofscoring protocols on the clock drawing test in relation to ease of use, diagnostic group and correlations with mini-mental state examination. Journal of the American Geriatrics Society, 50, 169-173.
Rouleau, I., Salmon, D. P., Butters, N., Kennedy, C.,McGuire, K. (1992). Quantitative and qualitative analyses of clock drawings in Alzheimers and Huntington’s. Brain and Cognition, 18, 70-87.
Royall, D. R., Cordes, J. A., Polk, M. (1998). CLOX: anexecutive clock drawing task. J Neurol Neurosurg Psychiatry, 64, 588-594.
Ruchinskas, R. A., Curyto, K. J. (2003). Cognitive screeningin geriatric rehabilitation. Rehabil Psychol, 48, 14-22.
Scanlan, J. M., Brush, M., Quijano, C., Borson, S. (2002).Comparing clock tests for dementia screening: naïve judgments vs formal systems – what is optimal? International Journal of Geriatric Psychiatry, 17(1), 14-21.
Shah, J. (2001). Only time will tell: Clock drawing as anearly indicator of neurological dysfunction. P&S Medical Review, 7(2), 30-34.
Shulman, K. I., Gold, D. P., Cohen, C. A., Zucchero, C. A.(1993). Clock-drawing and dementia in the community: A longitudinal study. International Journal of Geriatric Psychiatry, 8(6), 487-496.
Shulman, K. I. (2000). Clock-drawing: Is it the idealcognitive screening test? International Journal of Geriatric Psychiatry, 15, 548-561.
Shulman, K., Shedletsky, R., Silver, I. (1986). Thechallenge of time: Clock-drawing and cognitive function in the elderly. International Journal of Geriatric Psychiatry, 1, 135-140.
South, M. B., Greve, K. W., Bianchini, K. J., Adams, D.(2001). Inter-rater reliability of Three Clock Drawing Test scoring systems. Applied Neuropsychology, 8(3), 174-179.
Spreen, O., Strauss, E. A. (1991). Compendium ofneuropsychological tests: Administration, norms, and commentary. New York: Oxford University Press.
Storey, J. E., Rowland, J. T., Basic, D., Conforti, D. A.(2001). A comparison of five clock scoring methods using ROC (receiver operating characteristic) curve analysis. Int J Geriatr Psychiatr, 16, 394-9.
Sunderland, T., Hill, J. L., Mellow, A. M., Lowlor, B. A.,Grundersheimer, J., Newhouse, P. A., Grafman, J. H. (1989). Clock drawing in Alzheimer’s disease: a novel measure of dementia severity. J Am Geriatr Soc, 37(8), 725-729.
Suhr, J., Grace, J., Allen, J., Nadler, J., McKenna, M.(1998). Quantitative and Qualitative Performance of Stroke Versus Normal Elderly on Six Clock Drawing Systems. Archives of Clinical Neuropsychology, 13(6), 495-502.
Tracy, J. I., De Leon, J., Doonan, R., Musciente, J.,Ballas, T., Josiassen, R. C. (1996). Clock drawing in schizophrenia. Psychological Reports, 79, 923-928.
Tuokko, H., Hadjistavropoulos, T., Miller, J. A., Beattie,B. L. (1992). The Clock Test, a sensitive measure to differentiate normal elderly from those with Alzheimer disease. Journal of the American Geriatrics Society, 40, 579-584.
Wagner, M. T., Nayak, M., Fink, C. (1995). Bedside screeningof neurocognitive function. In: L. A. Cushman & M. J. Scherer (Eds.), Psychological assessment in medical rehabilitation: Measurement and instrumentation in psychology (pp. 145-198). Washington, DC: American Psychological Association.
Watson, Y. I., Arfken, C. L., Birge, S. J. (1993). Clockcompletion : An objective screening test for dimentia. J Am Geriar Soc, 41(11), 1235-40.
Wolf-Klein, G. P., Silverstone, F. A., Levy, A. P., Brod, M.S. (1989). Screening for Alzheimer’s disease by clock drawing.Journal of the American Geriatrics Society, 37, 730-734.

See the measure

Click here to find a pre-drawn circle that can be used when administering the CDT.

Color Trails Test (CTT)

Evidence Reviewed as of before: 08-11-2012

Author(s)*: Lisa Zeltzer, MSc OT; Valerie Poulin, OT, PhD candidate

Editor(s): Nicol Korner-Bitensky, PhD OT; Annabel McDermott, BOccThy

Purpose

The Color Trails Test (CTT) is a language-free version of the Trail Making Test (TMT) that was developed to allow for broader cross-cultural assessment of sustained attention and divided attention in adults.

In-Depth Review

Purpose of the measure

The Color Trails Test (CTT) (Maj, D’Elia, Satz, Janssen, Zaudig, Uchiyama et al., 1993; D’Elia, Satz, Uchiyama & White, 1996) is a language-free version of the Trail Making Test (TMT) that was developed to allow for broader cross-cultural application to measure sustained attention and divided attention"The allocation of attentional resources across more than one task" (Ponsford, 2008, p. 514)
in adults.

Available versions

There are 4 versions of the CTT (forms A, B, C, and D) but only the first version (form A) has normative data and is the only version that should be used in a clinical setting. Versions B-D are experimental and should be used in research only (Mitrushina, Boone, Razzani, & D’Elia, 2005).

Features of the measure

Items:

The CTT is comprised of two tasks:

CTT1: Must be administered first and requires the respondent to connect circles in an ascending numbered sequence (1-25).
CTT2: Must follow the CTT1 and requires the respondent to connect numbers in an ascending sequence while alternating between pink and yellow colors. Numbers are presented twice, once in pink and once in yellow, so the client must ignore the distracter item (e.g. start at pink 1, avoid pink 2 to select yellow 2, avoid yellow 3 to select pink 3, etc.).

Untimed practice trials are completed for both the CCT1 and CCT2 to ensure that the client understands the task.

Scoring and score interpretation:

Time taken to complete each part of the CTT is recorded in seconds and is compared to normative data. Qualitative aspects of the performance that may be indicative of brain dysfunction (e.g. near misses, prompts required, sequencing"The coordination and proper ordering of the steps that comprise the task, requiring a proper allotment of attention to each step" (Lezak, 1989; as cited in (Baum, Morrison, Hahn & Edwards, 2007))
errors for colour and number) are also recorded.

Time:

The CTT manual reports that it takes 3-8 minutes to complete the CTT. A task is discontinued if the client takes longer than ?240 seconds to complete it.

Equipment:

Table and chair
Test
Pencil
Stopwatch

Training requirements:

This is a level “C” qualification meaning that it requires an experienced professional to administer the test.

Alternative Forms of the Colour Trails Test

Trail Making Test (TMT)
Comprehensive Trail Making Test (Reynolds, 2002)
Delis-Kaplan Executive Function Scale (D-KEFS): includes subtests modeled after the TMT
Oral TMT: an alternative for patients with motor deficits or visual impairments (Ricker & Axelrod, 1994).
Repeat testing TMT: alternate forms have been developed for repeat testing purposes (Franzen et al., 1996; Lewis & Rennick, 1979)

Client suitability

Can be used with:

Individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.
Clients 18-89 years old
Individuals who are colourblind
The CTT requires relatively intact motor abilities (i.e. ability to hold and manoeuvre a pen or pencil, ability to move the upper extremity). The Oral TMT may be more appropriate if the examiner considers that the participant’s motor ability may impact his/her performance.
Clients must be able to understand Arabic numbers and numerical sequence.

Should not be used with:

Clients with motor or coordination impairments (e.g. apraxia). If motor ability may impact performance, consider using the Oral TMT.
Should be used with caution in older adults with low education. Age and education have been reported to influence response times in both parts of the CCT, such that older individuals with low education levels have demonstrated significantly slower response times (D’Elia et al, 1996; Messinis, Malegiannaki, Christodoulou, Panagiotopoulos, & Papathanasopoulos, 2011).

In what languages is the measure available?

This is a language-free measure however cultural norms have been published for the following populations:

Adult Greek population with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (Messinis et al., 2011)
Turkish population with schizophrenia (Güleç, KavakçÃ„±, Güleç, & KüçükalioÃ„ÂŸlu, 2006)
Healthy Turkish population (Dugbartey, Townes & Mahurin, 2000)
Healthy Spanish population (LaRue, Romero, Oritz, Chi Liang, & Lindeman, 1999)
Healthy Brazilian sample (Sant’Ana Rabelo, Pacanaro, Rossetti, Almeida de Sa Leme, de Castro, Guntert, et al., 2010)
Healthy sample from China (Hsieh & Riley, 1997)
Healthy sample from Hong Kong (Lee & Chan, 2000).

Summary

What does the tool measure?	Language-free measure of sustained and divided attention"The allocation of attentional resources across more than one task" (Ponsford, 2008, p. 514) .
What types of clients can the tool be used for?	The CTT can be used with, but is not limited to, patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment tool
Time to administer	The TMT takes approximately 3 to 8 minutes to administer.
Versions	Trail Making Test (TMT) Comprehensive TMT Oral TMT Repeat testing TMT (developed for repeat testing purposes) Symbol TMT Delis-Kaplan Executive Function Scale (D-KEFS)
Other Languages	Language-free measure but norms established for Greek, Turkish, Chinese, Brazilian, and Spanish populations
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: No studies have examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the CTT in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Test-retest: No studies have examined test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the CTT in a stroke population but the authors of the measure report excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). for CTT2 and adequate test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). for CTT1 in a healthy sample. Inter-rater: No studies have examined inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the CTT in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
ValidityThe degree to which an assessment measures what it is supposed to measure.	Content: No studies have examined content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension. of the CTT in patients with stroke. Criterion: Concurrent: One study reported excellent correlations between the CTT1 and CTT2 and the TMT-A and TMT-B respectively. Predictive: Two studies reported that the CTT1 predicted on-road driving test failure in samples of clients that included strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Construct: Convergent: One study reported adequate to excellent correlations between the CTT and the Useful Field of View (UFOV) subtests. Known groups: One study reported significant differences in time to complete the CCT between the patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and healthy adults.
Floor/Ceiling Effects	No studies have examined floor/ceiling effects of the CTT in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Does the tool detect change in patients?	The responsivenessThe ability of an instrument to detect clinically important change over time. of the CTT has not formally been studied, however it has been used to detect changes in a clinical trial of 2 participants with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Acceptability	The CTT is simple and easy to administer and is language-free.
Feasibility	The CTT is relatively inexpensive and highly portable. The CTT must be purchased and should be administered by an experienced professional.
How to obtain the tool?	The CTT can be purchased from: Psychological Assessment Resources (http://www4.parinc.com/Products/Product.aspx?ProductID=CTT)

* Initially developed for a traumatic-brain injured population, the psychometric properties of the tool with this population are described in the administration guide of the tool.

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the CTT in individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. We identified 4 studies.

Floor/Ceiling Effects

No studies have reported on floor/ceiling effects of the CTT when used with an adult strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.

Reliability

Test-retest:
D’Elia et al. (1996) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the CTT in 27 healthy individuals. The CTT was administered twice, two weeks apart. Excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was reported for the CTT2 (r=0.79), and adequate test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was reported for the CTT1 (r=0.64).

Inter-rater:
No studies have reported on inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the CTT when used with an adult strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.

Validity

Content:

No studies have reported on content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension.
of the CTT when used with an adult strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.

Criterion:

Concurrent:
Elkin-Frankston, Lebowitz, Kapust, Hollis, & O’Connor (2007) examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the CTT with the TMT in 29 individuals with various medical conditions including stroke (n=8). Completion times on the CTT and TMT were highly correlated (CTT1 vs. TMT-A: r=0.91; CTT2 vs. TMT-B: r=0.72) suggesting excellent concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
with the original TMT.

Predictive:
Elkin-Frankston et al. (2007) examined the ability of the CTT to predict on-road driving test failure in 29 individuals with various medical conditions including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=8). Patients who failed an on-road driver evaluation performed the CTT1 significantly slower than those who passed (Cohens d=0.66, p<0.05). This relationship was also found for the CTT2 but it did not reach statistical significance.

Hartman-Maeir et al. (2008) examined predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the CTT in a sample of 30 individuals with acquired brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=17) wishing to obtain a drivers licence. There was a significant difference in time taken to complete CTT1 between those who passed and failed the on-road test (Cohen’s d = 0.67, p=0.02). Performance time <60 seconds on the CTT1 was found to predict passing the on-road evaluation, whereas >60 seconds was predictive of failing.

Construct:

Convergent/Discriminant:
Hartman-Maeir, Erez, Ratzon, Mattatia and Weiss (2008) examined convergent validity of the CTT in a sample of 30 individuals with acquired brain injury (including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., n=17) wishing to obtain a drivers licence, using Spearman correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. The CTT1 and CTT2 showed adequate to excellent correlations with Useful Field of View (UFOV) subtests of processing speed (CTT1 r=0.407; CTT2 not significant), divided attention"The allocation of attentional resources across more than one task" (Ponsford, 2008, p. 514)
(r=0.457, 0.486 respectively) and selective attention (r=0.602, 0.629 respectively). Results support validityThe degree to which an assessment measures what it is supposed to measure.
of the CTT as a pre-driving assessment tool.

Known groups:
Messinis, Malegiannaki, Christodoulu, Panagiotopoulos, and Papathanasopoulos (2011) examined known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured.
of the CTT with 25 clients who had recently experience a stroke and 26 healthy participants matched for age, educational level and gender (Greek population). Clients in the strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. group required significantly more time to complete the CTT1 and CCT2 than the healthy controls (p < 0.001).

Responsiveness

Liu, Chan, Lee, and Hui-Chan (2004) used the CTT to evaluate the effectiveness of mental imagery in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=2). In this study, the CTT detected change in both clients with reduced time to complete the CTT1 and CTT2 post-intervention.

Sensitivity/ Specificity

No studies have reported on sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
/specificity of the CTT when used with an adult strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.

References

Barncord, S. W. & Wanlass, R. L. (2001). The Symbol Trail Making Test: test development and utility as a measure of cognitive impairment.Â Applied Neuropsychology, 8, 99-103
D’Elia, L. F., Satz, P., Uchiyama, C.L., & White, T. (1996). Color Trails Test. Odessa, FL: PAR.
Dugbartey, A. T., Townes, B. D., & Mahurin, R. K. (2000). Equivalence of the Color Trail Making Test in nonnative English-speakers. Archives of Clinical Neuropsychology, 15, 425-31.
Elkin-Frankston, S., Lebowitz, B. K., Kapust, L. R., Hollis, A.M., & O’Connor, M.G. (2007). The use of the Colour Trails Test in the assessment of driver competence: preliminary reports of a culture-fair instrument.Â Archives of Clinical Neuropsychology, 22(5), 631-5.
Franzen, M., Paul, D., & Iverson, G. L. (1996). Reliability of alternate forms of the trail making test. The Clinical Neurologist, 10(2), 125-9.
Güleç, H., KavakçÃ„±, O., Güleç, M. Y., & KüçükalioÃ„ÂŸlu, C. I. (2006). The reliability and validity of the Turkish Color Trails Test in evaluating frontal assessment among Turkish patients with schizophrenia. DüÃ…ÂŸünen Adam, 19(4), 180-5.
Hartman-Maeir, A., Erez, A. B., Ratzon, N., Mattatia, T., & Weiss, P. (2008). The validity of the Color Trails Test in the pre-driver assessment of individuals with acquired brain injury. Brain Injury, 22, 994-1008.
Hsieh, S. & Riley, N. (1997, November). Neuropsychological performance in the People’s Republic of China: Age and educational norms for four attentional tasks Presented at the National Academy of Neuropsychology, Las Vegas, Nevada. In Mitrushina, M. Boone, K., & D’Elia L. Handbook of Normative Data for Neuropsychological Assessment. (pp.70-73). New York, NY: Oxford University Press.
LaRue, A., Romero, L., Ortiz, I., Liang, H.C., & Lindeman, R. D. (1999). Neuropsychological performance of Hispanic and non-Hospanic older adults: an epidemiologic survey. Clinical Neuropsychologist, 13, 474-86.
Lee, T. M. & Chan, C. C. (2000). Are Trail Making and Color Trails Tests of equivalent constructs? Journal of Clinical and Experimental Neuropsychology, 22, 529-34.
Lewis, R. F. & Rennick, P. M. (1979). Manual for the repeatable Cognitive-Perceptual-Motor Battery. Grosse Point Park, MI: Axon Publishing Company.
Liu, K. P., Chan, C. C., Lee, T. M., & Hui-Chan, C.W. (2004). Mental imagery for relearning of people after brain injury. Brain Injury, 18(11), 1163-72.
Maj, M., D’Elia, L. D., Satz, P., Janssen, R., Zaudig, M., Uchiyama, C., Starace, F., Galderisi, S., & Chervinsky, A. (1993). Evaluation of two new neuropsychological tests designed to minimize cultural bias in the assessment of HIV-1 Seropositive persons: a WHO study. Archives of Clinical Neuropsyhology, 8, 123-35.
Messinis, L.,Â Malegiannaki, A. C.,Â Christodoulou, T.,Â Panagiotopoulos, V.,Â & Papathanasopoulos, P. (2011). Color Trails Test: normative data and criterion validity for the greek adult population. Archives of Clinical Neuropsychology, 26(4), 322-30.
Mitrushina, M., Boone, K. B., Razzani J., & D’Elia, L. F. (2005). Handbook of normative data for neuropsychological assessment. (2nd ed.). New York: Oxford University Press.
Reynolds, C. (2002). Comprehensive Trail Making Test. Austin, TX: Pro-Ed.
Ricker, J.H. & Axelrod, B. N. (1994). Analysis of an oral paradigm for the Trail Making Test.Â Assessment, 1, 47-51.
Sant’Ana Rabelo, I., Pacanaro, S.V., de Oliveira Rosetti, M., de Sa Leme, I.F., de Castro, N.R., Guntert, C. M., Correa Miotto, E., & Souza de Lucia, M. C. (2010). Color Trails Test: a Brazilian normative sample. Psychology and Neuroscience, 3, 93-9.

See the measure

How to obtain the CTT

The CTT can be purchased from Psychological Assessment Resources (http://www4.parinc.com/Products/Product.aspx?ProductID=CTT)

DOC Screen

Evidence Reviewed as of before: 30-04-2019

Author(s)*: Alexandra Matteau

Editor(s): Annabel McDermott

Content consistency: Gabriel Plumier

Purpose

The DOC screen is a screening tool that can be used to identify individuals at high risk of depression, obstructive sleep apnea and cognitive impairment following a stroke.

In-Depth Review

Purpose of the measure

The DOC screen is a screeningTesting for disease in people without symptoms.
tool that identifies individuals at high risk of depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
, obstructive sleep apnea and cognitive impairment following a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Available versions

The DOC screen was developed by Swartz et al. and was first published in 2013. The tool was developed by combining and modifying three existing validated brief screens, the 2-item Patient Health Questionnaire (PHQ-2), the STOP questionnaire and a 10-point version of the Montreal Cognitive Assessment (MoCA).

Features of the measure

Items:

The DOC screen comprises three screeningTesting for disease in people without symptoms.
tests:

DOC – Mood (PHQ-2)

This test comprises two items with the purpose of screeningTesting for disease in people without symptoms.
for depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
. The test evaluates the degree to which an individual has experienced depressed mood and anhedonia over the past two weeks.

DOC – Apnea (STOP Questionnaire)

This test comprises four items with the purpose of screeningTesting for disease in people without symptoms.
for obstructive sleep apnea: snoring, tiredness during daytime, breathing interruption during sleep, and hypertension.

DOC – Cog (10-point version of the MoCA)

This test comprises three tasks with the purpose of screening for cognitive impairment: clock drawing, abstraction, and 5-word recall (memory).

Scoring:

Each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
has different scoring and is interpreted independently.

DOC – Mood (total score 0-6)

The two items are scored from 0-3 whereby the respondent is asked to rate how often each symptom occurred over the last 2 weeks:

0 = not at all
1 = several days
2 = more than half of the days
3 = nearly every day.

DOC – Apnea (total score 0-4)

The four items are scored on a dichotomic scale (0 = no, 1 = yes) according to whether or not the respondent experiences each symptom.

DOC – Cog (total score 0-10)

Clock drawing task (0-3 points): 1 point each is given for (i) contour, (ii) numbers and (iii) the hands of the clock.
Abstraction task (0-2 points): 1 point is given for each item pair correctly answered.
Delayed recall task (0-5 points): 1 point is given for each word recalled without any cues.

The score for each task is summed to calculate the subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
score.

Each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
is then summed to obtain a total score ranging between 0 and 20.

A raw score interpretation and a regression interpretation can be obtained at http://www.docscreen.ca/.

Time:

The DOC screen takes approximately 5 minutes to complete.

Subscales:

The DOC screen is comprised of three subscales: DOC Mood, DOC Apnea and DOC Cog.

Equipment:

A pencil and the test form are needed to complete the DOC screen.

Training:

No training requirements have been reported. The DOC screen can be administered by any individual who is able to correctly follow the instructions, but must be interpreted by a qualified health professional.

Alternative forms of the DOC Screen:

An alternative version is available and uses different words for the memory and abstraction tasks. This version must be used if the patient has previously been exposed to the MoCA or DOC screen to minimize any learning effects associated with repeated administration.

The E-DOC screen is an electronic version of the tool, which is available through the DOC screen website. The E-DOC screen has not been validated.

Client suitability

Can be used with:

Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
The DOC screen may also be suitable for use among patients with other neurological and vascular disorders such as multiple sclerosis, Alzheimer’s disease, mild cognitive impairment, Parkinson’s Disease and traumatic brain injury. However, no study has been conducted with this population.

Should not be used with:

While no contraindications have been reported, some considerations must be made when completing the test:

A translator, family member or caregiver can provide translation for patients who do not speak English fluently;
Provide visual aid (e.g. glasses) for patients with visual loss;
Speak loudly and clearly for patients with reduced hearing;
Motor tasks such as the clock drawing activity may be difficult for patients with motor impairments – use sound clinical judgement for this task;
Use alternative communication strategies for patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada).

In what languages is the measure available?

English

Summary

What does the tool measure?	DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. , obstructive sleep apnea and cognitive impairment following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
What types of clients can the tool be used for?	Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	ScreeningTesting for disease in people without symptoms. .
Time to administer	Five minutes.
Versions	DOC screen E-DOC screen A second version is available to minimize learning effects associated with repeated administration.
Languages	The DOC screen is only available in English.
Measurement Properties
Reliability	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: No studies have examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the DOC screen. Test-retest: No studies have examined test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the DOC screen. Intra-rater: No studies have examined intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the DOC screen. Inter-rater: No studies have examined inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the DOC screen.
Validity	Criterion: Concurrent: No studies have examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the DOC screen. Predictive: No studies have examined predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the DOC screen. Construct: Convergent/Discriminant: No studies have examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the DOC screen. Known groups: No studies have examined known groups validity. However, one study examined the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative). and reported that the DOC screen is a valid measure that can reliably identify patients at high-risk of depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. , obstructive sleep apnea and cognitive impairment.
Floor/Ceiling Effects	No studies have examined the floor or ceiling effects of the DOC screen.
Does the tool detect change in patients?	Not reported.
Acceptability	The DOC screen is a standardized screeningTesting for disease in people without symptoms. tool suitable for use with stroke patients.
Feasibility	The measure is brief, easy to score and requires no formal training. A study on 1503 patients showed that 89% of participants completed the screen in 5 minutes or less.
How to obtain the tool?	The DOC screen is free to use for clinical and educational purposes. The administration manual and forms are available online from the following website: http://www.docscreen.ca/

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the DOC screen in individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. We identified only one study, which was published in part by the developers of the measure. More studies are required before definitive conclusions can be drawn regarding the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
and validityThe degree to which an assessment measures what it is supposed to measure.
of the DOC screen.

Floor/Ceiling Effects

No studies have examined the floor or ceiling effects of the DOC screen.

Reliability

Test-retest:
No studies have examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the DOC screen.

Inter-rater:
No studies have examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the DOC screen.

Intra-rater:
No studies have examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the DOC screen.

Validity

Criterion:

Predictive:
No studies have examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the DOC screen.

Construct:

Convergent/Discriminant:
No studies have examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the DOC screen.

Known groups:
No studies have examined the known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured.
of the DOC screen.

Responsiveness

No studies have examined the responsiveness of the DOC screen.

Sensitivity and Specificity:

Swartz et al. (2017) examined the sensitivity and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of the DOC screen for detecting depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
, obstructive sleep apnea and cognitive impairment using receiver operating characteristic (ROC), area under the curve analyses (AUC) and the two-cut point approach. DOC-Mood was compared with the Structured Clinical Interview for DSM Disorders (SCID-D) and excellent sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
(92%) and specificity (99%) was identified for detecting depression (AUC=0.898). DOC-Apnea was compared with results on polysomnography (PSG) and excellent sensitivity (95%) and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
(96%) for detecting obstructive sleep apnea was identified (AUC=0.660). DOC-Cog was compared to a 30-minute neuropsychological tests protocol proposed by Hachinski et al. (2006) and excellent sensitivity (100%) and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
(95%) for detecting cognitive impairment was identified (AUC=0.776).

References

Hachinski, V., Iadecola, C., Petersen, R. C., Breteler, M. M., Nyenhuis, D. L., Black, S. E., … & Vinters, H. V. (2006). National Institute of Neurological Disorders and Stroke–Canadian stroke network vascular cognitive impairment harmonization standards. Stroke, 37 (9), 2220-2241.
Swartz, R. H., Cayley, M. L., Lanctôt, K. L., Murray, B. J., Cohen, A., Thorpe, K. E., … & Herrmann, N. (2017). The “DOC” screen: Feasible and valid screening for depression, Obstructive Sleep Apnea (OSA) and cognitive impairment in stroke prevention clinics. PloS one, 12 (4), e0174451.

See the measure

How to obtain the DOC Screen?

The form and manual of administration are available online from the following website: http://www.docscreen.ca/

The Doc screen is free to use for clinical and educational purposes and therefore no permissions are required.

Executive Function Performance Test (EFPT)

Evidence Reviewed as of before: 25-02-2013

Author(s)*: Valérie Poulin, OT, PhD candidate;; Annabel McDermott, OT

Editor(s): Nicol Korner-Bitensky, PhD OT

Content consistency: Gabriel Plumier

Purpose

The Executive Function Performance Test (EFPT) is a performance-based assessment of executive function through observation of four Instrumental Activities of Daily Living (I-ADLs).

In-Depth Review

Purpose of the measure

The Executive Function Performance Test (EFPT) is a performance-based standardized assessment of cognitive function using Instrumental ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living (I-ADLs). The EFPT adopts a top-down approach and is performed in an environmental (real-world) context. The EFPT is used to identify an individual’s: (a) impaired executive functions; (b) capacity for independent functioning; and (c) required amount of assistance for task completion (Baum, 2011).

Available versions

The EFPT was developed by Baum, Morrison, Hahn & Edwards (2003) at the Program in Occupational Therapy at Washington University Medical School.

Features of the measure

Description of Tasks:

The EFPT assesses performance of four functional tasks, completed in the following order:

Simple cooking (oatmeal preparation)
Telephone use
Medication management
Bill payment

The EFPT assesses the client’s ability to complete three executive function components of the task:

Task initiationThe ability to spontaneously start a task or activity (Grieve & Gnanasekaran, 2008)
Task execution (comprising organization, sequencing, and judgment and safety)
Task completion

The EFPT uses a standardized cueing system that enables use with individuals of varying ability (Baum, 2011).

Scoring and Score Interpretation:

The examiner observes the client’s executive functioning during task performance and also records level of cueing required to support task performance.

Executive functions

Initiation: beginning the task. The individual moves to the materials table to collect items needed for the task
Execution: the individual carries out the steps of the task
Organization: arrangement of the tools/materials to complete the task. The individual correctly retrieves and uses the items that are necessary for the task
Sequencing: execution of steps in an appropriate order. The individual carries out the steps in an appropriate order, attends to each step appropriately, and can switch attention from one step to the next
Judgment and safety: avoidance of dangerous situations. The individual exhibits an awareness of safety by actively avoiding or preventing the creation of a situation that would be unsafe.
Completion: termination of the task. The individual indicates that he/she is finished or moves away from the area of the last step.

Cueing hierarchy:

Cues required	Score
No cues required	0
Indirect verbal guidance	1
Gestural guidance	2
Direct verbal assistance	3
Physical assistance	4
Do for the participant	5

The score is the highest level of cue needed by the client to perform the task.

The EFPT results in three overall scores:

Scores	How is it calculated?	What is the score range?
1. Executive function component score	Sum of the numbers recorded on each of the four tasks for initiation, organization, sequencing"The coordination and proper ordering of the steps that comprise the task, requiring a proper allotment of attention to each step" (Lezak, 1989; as cited in (Baum, Morrison, Hahn & Edwards, 2007)) , judgment and completion	Each EF component can range from 0-5, with a total of all four tasks ranging from 0-20
2. Task score	Sum of the five scores for each task	Each task can range from 0-25
3. Total score	Sum ofa the performance on all four tasks	0-100

A higher score indicates that the client requires more cueing and demonstrates more difficulties with executive functions.

Time:

The EFPT takes approximately 30 – 45 minutes to complete.

Training requirements:

While there are no specific training requirements the examiner should have experience delivering cues (as per cue guidance sheet – please see training manual: Baum 2011).

Equipment:

Leave all of the items necessary for all of the tasks in a clear storage box on a table (the “materials table”). Put the box on a lower table or stool if the person is in a wheel chair.

Hand soap in dispenser (as one would find in a home)
Paper Towels (if you use cloth they will need to be washed after each use)
Pan (with handle that gets hot and requires a pot holder)
Pot holder
A pad to put beside the burner to set the pan on when finished (have on the table before they start)
A spoon rest
Measuring cup (glass) – 1 cup
Dry measuring cups
Spoon for stirring
Rubber spatula
Old-fashioned Oats
Bowl
Spoon for eating
Salt shaker
Timer – a timer that can be used for 2 minutes
Pencil/Paper
Phone book
Magnifying Glass
Medicine bottle with instructions with the person’s name on it – filled with sugar-free candy
Medicine bottle with instructions with another person’s name on it filled with sugar-free candy
Crackers
Claritin (or other over-the-counter version) bottle (non prescription) as a distracter – filled with sugar-free candy
Drinking cups
Two bills: one cable (due in 30 days), one phone (due immediately) with pre addressed envelopes mixed with 5 other pieces of mail (letter from credit card company, postcard, flier, letter in a plain white envelope, mail order catalogue) in a Ziploc bag
Chequebook with person’s name on the check
Balance sheet (i.e. account book) with a balance $5.00 less than the bills total
Pen
Calculator
Other distracter items
Tongs
Pepper shaker
An enlarged direction sheet for the cooking task as on the oatmeal box (they may not be able to read it in small print). EXCEPTION: Say cook for 2 minutes (so there is time for them to use the timer and be cued if necessary.)
A stop watch or timer (it is acceptable to use the timer function on a phone)
Prepare a response card for the pre-test questions.
Put Bills and distracter mail in a gallon plastic bag
Put medications in a quart plastic bag

Additional items:

Pre-test questions
Script
Forms B-E
Cueing chart
Behaviour assessment chart

What to consider before beginning:

The EFPT is a standardized cognitive assessment; testing procedures should be followed precisely in order to maintain test validityThe degree to which an assessment measures what it is supposed to measure.
. All items must be administered; if a client refuses to perform a task it can be skipped and performed later.

Conversations and verbal feedback are not permitted.

Multiple administrations may result in a learning effect.

Alternative Forms of the measure

There are no other forms of the assessment.

Client suitability

Can be used with:

Adolescents, adults and elderly adults.
The EFPT is suitable for use with clients with motor impairmentLoss of strength and coordination, decrease in arm or leg movement
. Clients are scored according to the cue level required but are not penalized if they ask for assistance because the impairment necessitates physical assistance (Baum et al., 2008).
The EFPT has been tested on populations with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (Baum et al., 2008), multiple sclerosis (Goverover et al., 2005) and schizophrenia (Katz et al., 2007).
The EFPT has been used with patients with chronic traumatic brain injury (Toglia et al., 2010).

Should not be used with:

The EFPT is not suitable for use with individuals with severe cognitive impairment who are not able to follow directions.

Note: Assessors should carefully consider the effect of apraxia and aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) on performance.

Languages of the measure

The EFPT training manual is available in English. It has been translated and validated in Swedish and Hebrew.

Summary

What does the tool measure?	The EFPT examines executive functions in the context of performing a task.
What types of clients can the tool be used for?	The EFPT can be used with, but is not limited to, clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
What ICF domain is measured?	Activity
Time to administer	30-45 minutes
Versions	An updated EFPT training manual was published in 2011.
Other Languages	The EFPT has been translated and validated in Swedish and Hebrew.
Measurement Properties
Reliability	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: One study reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. for the EFPT total score and adequate to excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. for tasks. Correlations between the EFPT total score and executive function components were excellent. Test-retest: No studies have reported on test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the EFPT in a stroke population. Intra-rater: No studies have reported on the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the EFPT in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population. Inter-rater: One study reported excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. for the EFPT total score and all tasks.
ValidityThe degree to which an assessment measures what it is supposed to measure.	Content: The EFPT was developed based on Baum & Edwards’ (1993) Kitchen Task Assessment. Criterion: Concurrent: Three studies have examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the EFPT in patients with acute or chronic stroke and reported an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Functional Assessment Measure, an adequate to excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Assessment of Motor and Process Skills (AMPS) and the Short Blessed Test, and an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Functional Independence Measure, Weschler Memory Scale-Revised Logical Memory Total Recall Test and Digit Span Backward subtests, Animal Naming Test, Delis-Kaplan Executive Function System (DKEFS) Sorting Test, Verbal Fluency Test and Colour Word Interference Test and the Trail Making Test Part B. Predictive: No studies have reported on the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the EFPT in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population. Construct: Convergent/Discriminant: No studies have reported on discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess. of the EFPT in a stroke population. Known Groups: One study reported that the EFPT was able to discriminate between clients with mild and moderate stroke, and between clients with mild stroke and healthy controls.
Floor/Ceiling Effects	No studies have reported on floor or ceiling effects of the EFPT in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.
SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." / SpecificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).	No studies have reported on sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." or specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative). of the EFPT in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.
Does the tool detect change in patients?	No studies have reported on responsivenessThe ability of an instrument to detect clinically important change over time. of the EFPT in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.
Acceptability	The EFPT is comprised of real world tasks. The tool can be administered to individuals of varying ability due to the flexibilityThe ability to shift between different thoughts and actions so that when a problem arises, one can draw upon past mistakes and successes and use this knowledge to plan solutions (Anderson, 2008) to provide a hierarchy of cues as required.
Feasibility	The EFPT can be administered in a home or rehabilitation setting. The tool is simple to administer and guidelines are clearly stipulated in the test manual. The EFPT assesses what an individual is able to do rather than what he/she cannot do
How to obtain the tool?	The EFPT is free and can be obtained from Carolyn Baum at baumc@wustl.edu, or online through the following websites: Washington University School of Medicine Program in Occupational Therapy The Practice Change Fellows Program

Psychometric Properties

Overview

A literature search was conducted to identify all relevant publications on the psychometric properties of the Executive Function Performance Test (EFPT). While this assessment can be used with various populations, this module addresses the psychometric properties of the measure specifically when used with patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Three studies were identified.

Floor/Ceiling Effects

No studies have reported on the floor or ceiling effects of the EFPT in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.

Reliability

Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.:
Baum et al. (2008) examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the EFPT with a sample of 73 patients with mild to moderate chronic stroke and 22 age- and education-matched healthy controls. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency., calculated using Cronbach’s alpha, was excellent for the total score (?=0.94) and adequate to excellent for test items (cooking: ?=0.86; paying bills: ?=0.78; managing medication: ?=0.88; telephone use: ?=0.77). Correlations between the EFPT total score and executive function components were excellent (initiationThe ability to spontaneously start a task or activity (Grieve & Gnanasekaran, 2008)
: r=0.91; organization: r=0.93; sequencing"The coordination and proper ordering of the steps that comprise the task, requiring a proper allotment of attention to each step" (Lezak, 1989; as cited in (Baum, Morrison, Hahn & Edwards, 2007))
: r=0.88; safety and judgment: r=0.78; task completion: r=0.89).

Test-retest:
No studies have reported on test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the EFPT in a stroke population.

Intra-rater:
No studies have reported on the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the EFPT in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.

Inter-rater:
Baum et al. (2008) examined inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the EFPT with three trained raters and 10 participants (5 clients with stroke and 5 healthy controls). Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
, calculated using intra-class correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients (ICCs) was excellent for the total score (ICC=0.91) and all test items (cooking: ICC=0.94; paying bills: ICC=0.89; managing medication: ICC=0.87; telephone use: ICC=0.79).

Validity

Content:

The EFPT was developed at the Program in Occupational Therapy at Washington University Medical School.

The EFPT was developed based on Baum & Edwards’ (1993) Kitchen Task Assessment.

Criterion:

Concurrent:
Baum et al. (2008) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the EFPT by comparison with functional and neuropsychological tests in a sample of 73 patients with mild to moderate chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Functional tests included the Functional Assessment Measure and the Functional Independence Measure (FIM). Neuropsychological tests included the Weschler Memory Scale-Revised (WMS-R) Logical Memory Total Recall, Digit Span Forward and Digit Span Backward subtests, Animal Naming Test, Short Blessed Test and Trail Making Test. The EFPT showed an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the Functional Assessment Measure (r=-0.68), and adequate correlations with the FIM (r=-0.40), WMS-R Logical Memory Total Recall Test (r=-0.59) and Digit Span Backward (r=-0.49) subtests, Animal Naming Test (r=-0.47), Short Blessed Test (r=0.39) and the Trail Making Test Part B (r=0.39). Correlations with cognitive tests that are not considered to assess executive function were not significant (Trail Making Test Part A, WMS-R Digit Span Forward).

Wolf et al. (2010) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the EFPT by comparison with neuropsychological tests in a sample of 20 patients with mild to moderate acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Pearson correlation coefficients. The EFPT total score showed adequate correlations with the Short Blessed Test (p=0.548) and the Delis-Kaplan Executive Function System (DKEFS) Sorting Test (p=-0.511), Verbal Fluency Test (p=-0.474) and Colour Word Interference Test (p=-0.566), but not the Trail Making Test. The EFPT Cooking task showed adequate correlations with the DKEFS Sorting (1: p=-0.498; 2: p=-0.587) and Verbal Fluency (p=0.527) Tests, and an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the Short Blessed Test (p=0.710). The EFPT Bill Payment task showed adequate correlations with DKEFS Sorting, Colour Word Interference and Trail Making Tests (p=-0.484 to -0.594). The EFPT Telephone task showed an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the DKEFS Colour Word Interference Test (p=-0.499). There were no significant correlations between the EFPT Medication Management task and other neuropsychological tests.

Cederfeldt et al. (2011) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the EFPT by comparison with the Assessment of Motor and Process Skills (AMPS) in a sample of 23 patients with mild acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Spearman’s rank correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
test. The correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the EFPT total sum of all tasks and AMPS process skills was excellent (rho=0.61). Correlations between the four EFPT tasks and AMPS process skills were adequate to excellent (rho=0.54 – 0.60).

Predictive:
No studies have reported on the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the EFPT in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.

Construct:

Convergent/Discriminant:
No studies have reported on convergent/discriminant validityThe degree to which an assessment measures what it is supposed to measure.
of the EFPT in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.

Known Group:
Baum et al. (2008) examined known group validityThe degree to which an assessment measures what it is supposed to measure.
of the EFPT with a sample of 73 patients with mild (n=59) to moderate (n=14) chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and 22 age- and education-matched healthy controls. StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity was classified using the National Institutes of Health StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale (? 5 = mild strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., 6-15 = moderate strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.). The EFPT was able to discriminate among groups, with healthy controls achieving a lower (better) total score than clients with mild strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (p<0.05) and moderate strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (p<0.0001), and clients with mild strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. achieving a lower score than those with moderate strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (p<0.0001). Significant differences were seen between healthy controls and clients with mild strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. for Cooking (p=0.008) and Paying Bills (p=0.03). Significant differences were seen between clients with mild and moderate stroke for Paying Bills (p=0.01), Managing Medication (p=0.001) and Telephone Use (p=0.0001). Analysis of test EF components showed significant differences between healthy controls and clients with mild strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. for sequencing"The coordination and proper ordering of the steps that comprise the task, requiring a proper allotment of attention to each step" (Lezak, 1989; as cited in (Baum, Morrison, Hahn & Edwards, 2007))
(p<0.001) and organization (p<0.04). Significant differences between clients with mild and moderate strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. were seen for organization (p<0.0001), sequencing"The coordination and proper ordering of the steps that comprise the task, requiring a proper allotment of attention to each step" (Lezak, 1989; as cited in (Baum, Morrison, Hahn & Edwards, 2007))
(p<0.001), safety and judgment (p<0.004) and task completion (p<0.01).

Responsiveness

No studies have examined responsivenessThe ability of an instrument to detect clinically important change over time.
of the EFPT in a sample of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., although studies have been conducted among patient groups with other upper limb conditions (see: Beaton et al., 2001; Bot et al., 2004; MacDermid & Tottenham, 2004; Schmitt & Di Fabio, 2004).

References

Baum, C.M. (2011). Executive Function Performance Test: training manual. St. Louis, MO: Washington University.
Baum, C.M. & Edwards, D. (1993). Cognitive performance in senile dementia of the Alzheimer’s type: the Kitchen Task Assessment. The American Journal of Occupational Therapy, 47, 431-6.
Baum, C.M., Morrison, T., Hahn, M., & Edwards, D.F. (2003). Test manual: Executive Function Performance Test. St. Louis, MO: Washington University.
Baum, C.M., Tabor Connor, L., Morrison, T., Hahn, M., Dromerick, A.W., & Edwards, D.F. (2008). Reliability, validity, and clinical utility of the Executive Function Performance Test: a measure of executive function in a sample of people with stroke. The American Journal of Occupational Therapy, 62(4), 446-455.
Cederfeldt, M., Widell, Y., Elgmark Andersson, E., Dahlin-Ivanoff, S., & Gosman-Hedström, G. (2011). Concurrent validity of the Executive Function Performance Test in people with mild stroke. British Journal of Occupational Therapy, 74(9), 443-9.
Goverover, Y., Kalmar, J., Gaudino-Goering, E., Shawaryn, M., Moore, N.B., Halper, J., & DeLuca, J. (2005). The relation between subjective and objective measures of everyday life activities in persons with multiple sclerosis. Archives of Physical Medicine and Rehabilitation, 86, 2303-8.
Katz, N., Tadmore, I., Felzen, B., & Hartman-Maeir, A. (2007). Validity of the Executive Function Performance Test in individuals with schizophrenia. Occupational Therapy Journal of Research, 27, 1-8.
Toglia, J., Johnston, M.V., Goverover, Y., & Dain, B. (2010). A multicontext approach to promoting transfer of strategy use and self regulation after brain injury: an exploratory study. Brain Injury, 24(4), 664-77.
Wolf, T.J., Stift, S., Tabor Connor, L., Baum, C., & The Cognitive Rehabilitation Research Group. (2010). Feasibility of using the EFPT to detect executive function deficits at the acute stage of stroke. Work: Journal of Prevention, Assessment & Rehabilitation, 36(4), 405-12.

See the measure

How to obtain the assessment?

The EFPT can be obtained from Carolyn Baum at baumc@wustl.edu, or online through the following websites:

Kettle Test (KT)

Evidence Reviewed as of before: 22-03-2011

Author(s)*: Katie Marvin, MSc, PT Candidate

Editor(s): Nicol Korner-Bitensky, PhD OT; Annabel McDermott, OT

Purpose

The Kettle Test was developed as a brief performance-based measure designed to assess cognitive skills in a functional context.

In-Depth Review

Purpose of the measure

The Kettle Test was developed as a brief performance-based measure designed to assess cognitive skills in a functional context. The Kettle Test can be used to evaluate the capacity for independent community living in clients with cognitive impairments. Using the functional task of preparing a hot beverage, the cognitive-functional and problem-solving"Goal-directed cognitive activity that arises in situations for which no response is immediately apparent or available" (Luria, 1966; as cited in (Rath et al., 2004))
skills of the client are assessed.

Available versions

The Kettle Test was developed by Dr. Adina Hartman-Maeir, Nira Armon and Dr. Noomi Katz in 2005, and later validated (Hartman-Maeir, Harel & Katz, 2009).

Features of the measure

Items:

The task of preparing two hot beverages is broken down into 13 discrete steps that can be evaluated. These items are described below.

Description of task

The client prepares two cups of hot beverages – one for him/herself and another for the examiner. The examiner asks the client to prepare a hot drink that differs in two ingredients from the one the client chose for him/her self.

Opening the water faucet
Filling the kettle with approximately 2 cups of water
Turning off the faucet
Assembling the kettle
Attaching the electric cord to the kettle
Plugging the electric cord in an electric socket
Turning on the kettle
Assembling the ingredients
Putting the ingredients into the cups
Picking up the kettle when water boils.
Pouring the water into the cups.
Adding milk
Indication of task completion (e.g. verbal, gesture, serving)

What to consider before beginning:

The kettle must be dissembled and equipment set up.

Scoring and Score Interpretation:

All 13 discrete steps of the task are to be scored on a 4-point scale. The total score ranges from 0 to 52 with higher scores indicating the need for greater assistance. The administrator should note any cueing provided to the client in the “comments” section.

The following scoring scale should be used:

0 = Performance intact.
1 = Item completed independently but completed slowly, by trial and error and/or performance was questionable.
2 = Received general cues
3 = Received specific cueing; or
Performance was incomplete (for example, only places part of ingredients in cup, removes the kettle before water boils etc.); or
Performance is deficient (for example, places lid of kettle upside down, uses wrong ingredients or fails to perform step, for example did not turn on kettle, did not add milk etc.)
4 = Received physical demonstration or assistance.

Following performance, the client and administrator are ask to comment on the following:

Description of the process by the examiner.
Recall of the instructions by the client: “What were the steps you had to do?”
The client’s description of the process: “Describe to me what you did from the beginning to the end of the task.”
Rating of performance by the client: “How do you rate your performance on this task between 0 to 100 percent?” (If the client cannot rate his/her performance then suggest the following options: “very good”, “fair”, “not so good”, “not good at all”).
Rating of difficulty by the client: “How difficult was the task for you? Easy (able to by yourself easily); a little difficult; or very difficult (I needed help)”.
Additional comments

Please note that as with most tests that involve everyday problem solving tasks, immediate learning may occur which may impact performance on retesting.

Time:

The average completion time has not been reported, however, it is estimated that the Kettle Test takes approximately 5-20 minutes to complete.

Training requirements:

There is no formal training required to administer the Kettle Test, however the examiner should have some experience and training in observational evaluation of functional performance. Familiarity with the process and scoring is also recommended.

Subscales:

None typically reported.

Equipment:

Electric kettle: it is important to use a kettle that can be dissembled because assembly of the kettle is part of the task.
Ingredients for beverages (e.g. instant/decaffeinated coffee, black/herbal tea, sugar/artificial sweeteners, milk, honey)
Other ingredients (to be used to distract the client, e.g. salt, pepper, oil)
Tray
Dishes and utensils for use during the task, plus extra to distract the client (3 cups, milk pitcher, a bowl, 2 plates, 3 tea spoons, a large spoon, 2 forks, a knife, can opener)

Alternative form of the KT

There are no alternative versions of the Kettle Test.

Client suitability

Can be used with:

Clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., who were living independently in the community prior to strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. who understand spoken or written language.

Should not be used in:

Clients who do not understand spoken or written language.
Since the Kettle Test is administered through direction observation of a task a proxy respondent cannot complete it.

In what languages is the measure available?

The manual has only been released in English (Hartman-Maeir, Armon & Katz, 2005), however, only comprehension of spoken language is required of the client during administration.

Summary

What does the tool measure?	The Kettle Test measures cognitive skills in a functional context.
What types of clients can the tool be used for?	Clients with stroke who were living independently in the community prior to stroke
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment tool
Time to administer	Approximately 5 to 20 minutes.
Versions	There are no alternative versions.
Other Languages	None
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: No studies have examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the Kettle Test. Test-retest: No studies have examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the Kettle Test. Intra-rater: No studies have examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the Kettle Test. Inter-rater: One study examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the Kettle Test and reported excellent inter-rater.
ValidityThe degree to which an assessment measures what it is supposed to measure.	Construct: Convergent: One study reported excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Functional Independence Measure (FIM) Cognitive scale and adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Mini-Mental Status Examination (MMSE), Clock Drawing Test and the Behavioural Inattention Test (BIT) Star Cancellation subtest. Known groups: The Kettle Test was able to discriminate clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. from healthy controls.
Floor/Ceiling Effects	Not yet examined in a stroke population.
Does the tool detect change in patients?	Not yet examined in a stroke population.
Acceptability	The Kettle Test is accepted by clients with stroke as it involves a real-life functional task.
Feasibility	The administration of the Kettle Test is easy and quick to perform.
How to obtain the tool?	A preliminary version of the Kettle Test manual can be obtained from: https://www.sralab.org/rehabilitation-measures/kettle-test

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the Kettle Test. We identified only one study on the psychometric properties of the Kettle Test, which was published in part by the developers of the measure. More studies are required before definitive conclusions can be drawn regarding the reliability and validityThe degree to which an assessment measures what it is supposed to measure.
of the Kettle Test.

Floor/Ceiling Effects

Not yet examined in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.

Reliability

Internal ConsistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.:
Not yet examined in a stroke population.

Test-retest:
Not yet examined in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.

Intra-rater:
Not yet examined in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.

Inter-rater:
Hartman-Maeir, Harel & Katz (2009) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the Kettle Test in 21 clients with stroke admitted to one of two rehabilitation hospitals. Clients were within 1-month post strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and had been living independently prior to stroke. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
between four Occupational Therapists, as measured using Spearman correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient was found to be excellent at both sites (r=.851, p=.001; and r=.916, p=.000).

Validity

Content:

Not yet examined in a stroke population.

Criterion:

Concurrent:
Not yet examined in a stroke population.

Predictive:
Not yet examined in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.

Construct:

Convergent/Discriminant:
Hartman-Maeir, Harel & Katz (2009) examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the Kettle Test by comparing it to other commonly used measures of cognitive ability in 36 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and 36 healthy controls. Correlations were calculated using Pearson Correlation Coefficients. Excellent correlation was found between the Kettle Test and the Cognitive domain of the Functional Independence Measure (FIM) (r=-.659). Adequate correlations were found between the Kettle Test and the Mini-Mental Status Examination (MMSE), Clock Drawing Test and the Behavioural Inattention Test (BIT) Star Cancellation subtest (r=-.478; r=-.566; and r=-.578 respectively).

Known groups:
Hartman-Maeir, Harel & Katz (2009) verified the ability of the Kettle Test to discriminate between healthy controls (n=36) and individuals with stroke (n=36). The healthy controls showed little variability in performance and all scored within a narrow range of 0 to 3 points. The individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. demonstrated great variability in performance and scored within a large range of 1 to 29 points (with higher scores indicating greater need for assistance). The patients with stroke required significantly more assistance in completing the Kettle Test whereas the healthy controls required very minimal to no assistance.

Ecological:

Hartman-Maeir, Harel & Katz (2009) investigated the ecological validityRefers to the extent to which a measure captures behaviours that are reflective of those that would occur in a natural setting
of the Kettle Test in 36 patients with stroke. Basic activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living (BADL) and safety were measured prior to discharge home, using the Motor domain of the Functional Independence Measure (FIM) and the Safety Rating Scale portion of the Routine Task Inventory (RTI-E) (Allen, 1989; Katz 2006). One month later instrumental activities of daily living (IADL)Complex tasks that involve social or societal issues (shopping, bill paying, cooking, housework, etc.) that are done on a regular basis. were assessed using the IADL Scale (Lawton & Brody, 1969). The Kettle Test was found to have excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the Motor domain of the FIM (r=-.759) and adequate correlation with the Safety Rating Scale of the RTI-E and the IADL Scale (r=-.571 and r=-.505 respectively), using Pearson correlation coefficients. The results of this study suggest that performance on the Kettle Test is representative of the functional outcome of patients who are discharged to home.

Responsiveness

Not yet examined in a stroke population.

References

Hartman-Maeir, A., Armon, N. & Katz, N. (2005). The Kettle Test: A cognitive functional screening test. Unpublished protocol. Helene University, Jerusalem, Israel. Retrieved on February 1, 2010 from: http://www.rehabmeasures.org/Lists/RehabMeasures/DispForm.aspx?ID=939
Hartman-Maeir, A., Harel, H. & Katz, N. (2009). Kettle Test – A brief measure of cognitive functional performance: Reliability and validity in a stroke population. American Journal of Occupational Therapy, 64, 592-599.

See the measure

How to obtain the Kettle Test?

https://www.sralab.org/rehabilitation-measures/kettle-test

Mini-Mental State Examination (MMSE)

Evidence Reviewed as of before: 07-11-2010

Author(s)*: Lisa Zeltzer, MSc OT

Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Purpose

The Mini-Mental State Examination (MMSE) was originally developed as a brief screening tool to provide a quantitative evaluation of cognitive impairment and to record cognitive changes over time (Folstein, Folstein, & McHugh, 1975). Since that time it has become recognized that repeated use of the MMSE with the same client reduces its validity, so it is advised that this screening tool not be used repeatedly with the same individual if the time interval between testing is short. Rather than provide a diagnosis, the measure should be used to detect the presence of cognitive impairment (Folstein, Robins, & Helzer, 1983). The MMSE briefly measures orientation to time and place, immediate recall, short-term verbal memory, calculation, language, and construct ability. While the measure was originally used to detect dementia within a psychiatric setting, its use has become widespread. Since 1993, the MMSE has been available with an attached table that enables patient-specific norms to be identified on the basis of age and educational level (Crum, Anthony, Bassett, & Folstein, 1993).

In-Depth Review

Purpose of the measure

The Mini-Mental State Examination (MMSE) was originally developed as a brief screeningTesting for disease in people without symptoms.
tool to provide a quantitative evaluation of cognitive impairment and to record cognitive changes over time (Folstein, Folstein, & McHugh, 1975). Since that time it has become recognized that repeated use of the MMSE with the same client reduces its validity, so it is advised that this screeningTesting for disease in people without symptoms.
tool not be used repeatedly with the same individual if the time interval between testing is short. Rather than provide a diagnosis, the measure should be used to detect the presence of cognitive impairment (Folstein, Robins, & Helzer, 1983). The MMSE briefly measures orientation to time and place, immediate recall, short-term verbal memory, calculation, language, and construct ability. While the measure was originally used to detect dementia within a psychiatric setting, its use has become widespread. Since 1993, the MMSE has been available with an attached table that enables patient-specific norms to be identified on the basis of age and educational level (Crum, Anthony, Bassett, & Folstein, 1993).

Available versions

The MMSE was published by Folstein et al. in 1975.

Features of the measure

Items:

The MMSE consists of 11 simple questions or tasks that look at various functions including: arithmetic, memory and orientation.

Scoring:

The score is the number of correct items. The measure yields a total score of 30. A score of 23 or less is the generally accepted cutoff point indicating the presence of cognitive impairment (Ruchinskas & Curyto, 2003).

Levels of impairment have also been classified as none (24-30); mild (18-23) and severe (0-17) (Tombaugh & McIntyre 1992).

More recently, Folstein, Folstein, McHugh, and Fanjiang. (2001) recommended the following cutoff scores:

Score	Level of impairment
≥ ? 27	None
21-26	Mild
11-20	Moderate
≤ 10	Severe

Crum et al. (1993) reported that cognitive performance as measured by the MMSE varies within the population by age and educational level. There is an inverse relationship between MMSE scores and age, ranging from a median of 29 for those aged 18 to 24 years, to 25 for individuals 80 years of age and older. There is also an inverse relationship between MMSE scores and education. The median MMSE score is 29 for individuals with at least 9 years of schooling, 26 for those with 5 to 8 years of schooling, and 22 for those with 0 to 4 years of schooling.

The following table, created by Crum et al. (1993) can be used to compare your patient’s MMSE score with a reference group based on age and education level.

(Source: Crum et al., 1993)

Age
Education	20-24	25-29	30-34	35-39	40-44
4th grade	22	25	25	23	23
8th grade	27	27	26	26	27
High school	29	29	29	28	28
College	29	29	29	29	29

Age
Education	45-49	50-54	55-59	60-64	65-69
4th grade	23	23	22	23	22
8th grade	26	27	26	26	26
High school	28	28	28	28	28
College	29	29	29	29	29

Age
Education	70-74	75-79	80-84	>84
4th grade	22	21	20	19
8th grade	25	25	25	23
High school	27	27	25	26
College	28	28	27	27

Subscales:

Orientation (total points = 10), Registration (total points = 3), Attention and calculation (total points = 5), Recall (total points = 3), and Language (total points = 9).

Equipment:

The MMSE requires no specialized equipment.

Training:

Little information has been reported on training for the MMSE, however a standardized version of the MMSE has been developed (Molloy & Standish, 1997).

Time:

Administration by a trained interviewer takes approximately 10 minutes.

Alternative form of the MMSE

The Modified mini-mental state examination (3MS) (Teng & Chui, 1987).

An expanded version of the MMSE was developed by Teng and Chui (1987) increasing the content, number and difficulty of items included in the assessment. The score of the 3MS ranges from 0 – 100 with a standardized cut-off point of 79/80 for the presence of cognitive impairment. This expanded assessment takes approximately 5 minutes more to administer than the original MMSE, which takes approximately 10 minutes to complete. Grace et al. (1995) compared the MMSE to the 3MS in geriatric patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the 3MS was excellent (r = 0.80). The 3MS also correlated with a battery of neuropsychological assessments and with some cognitive domains missed by the MMSE. The 3MS was a significantly better predictor of functional outcome (as measured by the Functional Independence Measure) than the MMSE. The 3MS was found to have higher sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
than the MMSE (69% vs. 44%) and similar specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
(80% vs. 79%). The area under the curve (AUC) was 0.798 for the 3MS.

3MS + Clock-drawing (Suhr & Grace, 1999).

The addition of clock drawing, a simple measure of constructional ability, increased the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
in detecting focal brain damage of the 3MS in patients with right hemisphere strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (87%). The addition of the Clock Drawing Test requires about two extra minutes in administration time.

Standardized MMSE (SMMSE) (Molloy & Standish, 1997).

Molloy and Standish (1997) developed the SMMSE to improve the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the measure. The idea was to develop strict guidelines for administration and scoring. To examine the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the SMMSE in 48 older adults, university students were randomized to administer either the MMSE or the SMMSE, and were trained on that test to give to participants on three different occasions. The SMMSE had significantly better inter-rater and intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
compared with the MMSE. The inter-rater variance was reduced by 76% and the intra-rater variance was reduced by 86%. It took less time to administer the SMMSE compared with the MMSE (average 10.5 minutes and 13.4 minutes, respectively. The intraclass correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
(ICC) for the MMSE was adequate (ICC = 0.69), and was excellent for the SMMSE (ICC = 0.90).

Telephone version (ALFI-MMSE) (Roccaforte, Burke, Bayer, & Wengel, 1992).

This version includes 22/30 of the original MMSE items, the majority of which were removed from the last section (language and motor skills). Roccaforte et al. (1992) examined the validityThe degree to which an assessment measures what it is supposed to measure.
of the ALFI-MMSE in 100 geriatric outpatients. Correlations between phone and face-to-face versions of the MMSE were excellent (Pearson’s r = 0.85). Patients tended to score slightly higher on in-person testing than on the telephone. SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
(using a brief neurological screeningTesting for disease in people without symptoms.
test as the criterion) of 67% and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of 100% were reported in a population of elderly, community-dwelling individuals. This was similar to the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
(68%) and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
(100%) reported for screeningTesting for disease in people without symptoms.
with the traditional MMSE.

26-item version of the ALFI-MMSE (T-MMSE) (Roccaforte et al. cited in Newkirk, Kim, Thompson, Tinklenberg, Yesavage, & Taylor, 2004).

The T-MMSE was developed from the ALFI-MMSE. It is a 26-point adaptation, containing a 3-step command: “Say hello, tap the mouthpiece of the phone 3 times, then say I’m back”. It also contains a new question that requests that the patient give the interviewer a phone number where they can usually be reached. The T-MMSE had an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the MMSE (r = 0.88). Neither hearing impairment nor years of education were associated with T-MMSE scores. On the 22 points in common between the 2 scales, scores had an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
(r = 0.88), however, telephone scores tended to be lower than in-face scores (Newkirk et al., 2004). The authors provide tables for the conversion of T-MMSE scores to MMSE scores

Client suitability

Can be used with:

Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (Agrell & Dehlin, 2000; Ozdemir, Birtane, Tabatabaei, Ekuklu, Kokino, & Siranus, 2001; Grace et al., 1995; Suhr & Grace, 1999).

Should not be used with:

The MMSE was ineffective in detecting cognitive impairment in patients with right-sided strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (Grace et al., 1995).
The MMSE is not suitable for use with a proxy respondent as it is administered via direct observation of task completion.
Because the MMSE is heavily language dependent, it is likely to misclassify patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada).
The MMSE has a limited ability to diagnose dementia in general practice and should therefore be used as only one aspect of a patient’s overall cognitive profile (Wind, Schellevis, van Staveren, Scholten, Jonker, & van Eijk, 1997).
The MMSE has been criticized for attempting to assess too many functions in one brief test. An individual’s performance on individual items or within a single domain may be more useful than interpretation of a single, overall score (Tombaugh & McIntyre 1992). However, when used to screen for visual or verbal memory problems, or for problems in orientation or attention, it is not possible to identify acceptable cut-off scores (Blake, McKinney, Treece, Lee, & Lincoln, 2002).
MMSE scores have been shown to be affected by age, level of education, ethnicity, and sociocultural background (Tombaugh & McIntyre, 1992; Bleeker et al., 1988; Lorentz et al., 2002; Shadlen, Larson, Gibbons, McCormick, & Teri, 1999). These variables may introduce bias leading to the misclassification of individuals. For example, highly educated individuals who have mild dementia may well score within normal range on the MMSE because they find the questions easy. Further, poorly educated individuals may have low scores on the MMSE simply because they find the questions difficult. Thus, their scoring on the MMSE may indicate a diagnosis of dementia when none is present. Although these biases are not always present, Agrell and Dehlin (2000) found that age and education did not influence scores in their study, attention to these factors is warranted when interpreting MMSE results.
The MMSE has been found to lack sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (Blake et al., 2002; Suhr & Grace, 1999; Nys et al., 2005). Other studies have reported low levels of sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
among individuals with mild cognitive impairment (Tombaugh & McIntyre, 1992; de Koning et al., 1998) and in patients with right-hemisphere lesions (Dick et al., 1984). One potential solution to increase the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of the MMSE is the addition of a Clock Drawing Test (Suhr & Grace, 1999). Another solution that has been offered is to administer the Neurobehavioral Cognitive Status Examination (NCSE) in lieu of the MMSE. The NCSE is a highly sensitive measure to detect cognitive impairment in patients with brain lesions (Schwamm, Van Dyke, Kiernan, Merrin, & Mueller, 1997).
Da Costa et al. (2010) investigated the cognitive evolution and clinical severity of illiterate and schooled patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. during a 6-month follow-up, using the MMSE and National Institutes of Health StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale (NIHSS) respectively. Significant improvement in clinical severity as measured by NIHSS was observed in both groups (P<0.001); however, only schooled individuals showed a significant improvement in MMSE scores, indicating an improvement in their overall cognitive function (P=0.008). Schooling was found to significantly influence MMSE scores.
Folstein, Folstein, and McHugh (1998) reported that the MMSE demonstrates marked ceiling effects in younger intact individuals and marked floor effects in moderately to severely impaired individuals.

In what languages is the measure available?

Afrikaans	Dutch	Israeli English	Romanian
Arabic	Estonian	Italian	Russian
Argentinean Spanish	Filipino	Japanese	Russian for Estonia
Belgian Dutch	Finnish	Kannada	Serbian
Belgian French	French	Korean	Slovakian
Bosnian	Austrian German	Latvian	South African English
Brazilian Portuguese	German	Lithuanian	Spanish
Bulgarian	Greek	Macedonian	Swedish
Chilean Spanish	Gujarati	Malayalam	Telugu
Chinese	Hebrew	Marathi	Turkish
Croatian	Hindi	Norwegian	UK English
Czech	Hungarian	Polish	Ukranian
Danish	Indian English	Portuguese	Urdu

Authorized translations of the MMSE can be obtained by contacting Custsupp@parinc.com or call 1.800.331.8378

Summary

What does the tool measure?	Cognitive impairment
What types of clients can the tool be used for?	While originally used to detect dementia within a psychiatric setting, its use is now widespread and is available with an attached table that enables patient-specific norms
Is this a screening or assessment tool?	ScreeningTesting for disease in people without symptoms.
Time to administer	Administration by a trained interviewer takes approximately 10 minutes.
Versions	The modified mini-mental state examination (3MS); 3MS + Clock-drawing; Standardized MMSE (SMMSE); Telephone version (ALFI-MMSE); 26-item version of the ALFI-MMSE (T-MMSE)
Other Languages	Afrikaans; Dutch; Romanian; Arabic; Estonian; Italian; Russian; Argentinean Spanish; Filipino; Japanese; Russian for Estonia; Belgian Dutch; Finnish; Kannada; Serbian; Belgian French; French; Korean; Slovakian; Bosnian; Austrian German; Latvian; Brazilian; Portuguese; German; Lithuanian; Spanish; Bulgarian; Greek; Macedonian; Swedish; Chilean Spanish; Gujarati; Malayalam; Telugu; Chinese; Hebrew; Marathi; Turkish; Croatian; Hindi; Norwegian; Czech; Hungarian; Polish; Ukranian; Danish; Portuguese; Urdu
Floor/Ceiling effects	Folstein, Folsten, and McHugh (1998) reported that the MMSE demonstrates marked ceiling effects in younger intact individuals and marked floor effects in individuals with moderate to severe cognitive impairment.
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: Out of nine studies examining the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MMSE, three reported poor internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency., one reported adequate internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency., two reported poor to excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency., two reported excellent internal consistency, one reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. in patients with Alzheimer’s Disease and poor internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. in patients with cognitive impairment. Test-rest: Out of six studies examining the test-rest reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest . of the MMSE, two studies reported excellent test-rest, one reported adequate test-retest, one reported adequate to excellent test-retest, one reported poor to adequate test-rest and one reported poor test-retest. Inter-rater: Out of three studies examining the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the MMSE, one reported excellent inter-rater and two reported adequate inter-rater.
ValidityThe degree to which an assessment measures what it is supposed to measure.	Criterion: The MMSE can discriminate between patients with Alzheimer’s Disease and frontotemporal dementia; can discriminate between patients with left- and right-hemispheric stroke. Construct: Concurrent: MMSE had a poor correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Mattis Dementia Rating Scale; poor to excellent correlations with the Wechsler Adult Intelligence Test; adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Functional Independence Measure; significant correlations with the Montgomery Asberg DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. Rating Scale and the Zung DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. Scale. Predictive: MMSE scores found to be predictive of functional improvement in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. following rehabilitation; discharge destination; developing functional dependence at a 3-year follow-up interval; ambulatory level; length of hospital stay such that for patients with moderate dementia; death.
Does the tool detect change in patients?	Not applicable.
Acceptability	The MMSE is a brief measure to administer. Patient variables such as age, level of education and sociocultural backgroup may affect scores on the measure. It is administered by direct observation and is therefore not appropriate for proxy use.
Feasibility	No specialized equipment is required, and therefore it is a highly portable and inexpensive measure. However, one study reported that physicians found the MMSE too lengthy and unable to contribute much useful information.
How to obtain the tool?	The MMSE can be obtained from the current copyright owner, Psychological Assessment Resources (PAR).

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the MMSE.

Floor/Ceiling Effects

Folstein, Folstein, and McHugh (1998) reported that the MMSE demonstrates marked ceiling effects in younger intact individuals and marked floor effects in individuals with moderate to severe impairment.

Reliability

McDowell, Kristjansson, Hill, and Hebert (1997) examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MMSE used as a screening test for cognitive impairment and dementia. The internal consistency was adequate (alpha = 0.78).

Holzer, Tischler, Leaf, and Myers (1984) examined the prevalence of dementia in a community sample (n = 4,917). In this study, the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MMSE was found to be adequate (alpha = 0.77). ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of individual items ranged from poor (alpha = 0.43 for Orientation) to excellent (alpha = 0.82 for Registration). Calculation/attention items were omitted from this study.

Kay, Henderson, Scott, Wilson, Rickwood, and Grayson (1985) conducted a community survey in 274 individuals over 70 years of age. Rates of dementia were measured by interviewing participants with the MMSE. In this study, the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MMSE was poor (alpha = 0.68).

Foreman (1987) examined the reliability of the MMSE in 66 hospitalized medical-surgical patients (normal, dementia, or delirium) over 65 years of age. The MMSE was found to have an excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. (alpha = 0.96).

Jorm, Scott, Henderson, and Kay (1988) examined whether there was a bias in the MMSE such that individuals with less education (less than or equal to 8th grade) would perform worse on the measure than individuals with more education (more than 8th grade). The MMSE was administered 269 elderly participants. The internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. was found to be poor in both the more educated group (alpha = 0.54) and the less educated group (alpha = 0.65).

Albert and Cohen (1992) administered the MMSE to 40 elderly residents with severe cognitive impairment. The internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MMSE was poor in patients with an MMSE score ≤ 10 (alpha = 0.56). However, when subjects representing the full range of MMSE scores were included, the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. was excellent (alpha = 0.90).

Tombaugh, McDowell, Kristjansson, and Hubley (1996) compared the psychometric properties of the MMSE to the 3MS in community-dwelling participants between the ages of 65-89. Participants were divided into two groups, one with no cognitive impairment (n = 406) and one with Alzheimer’s disease (n = 119). The internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MMSE was poor in the group without cognitive impairment (alpha = 0.62) and was found to be excellent in patients with Alzheimer’s disease (alpha = 0.81).

Hopp, Dixon, Grut, and Backman (1997) administered the MMSE to 44 adults without dementia, who were over the age of 75 years. In this sample, the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MMSE was poor (alpha ranged from 0.31 to 0.52).

Test-retest:
Tombaugh and McIntyre (1992) reviewed studies published on the psychometric properties of the MMSE over the last 26 years. They reported that in studies having a re-test interval of < 2 months, the MMSE has poor to excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
with correlations ranging from 0.38 to 0.99. Twenty-four out of 30 studies reported excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
(r > 0.75).

Folstein et al. (1975) administered the MMSE to 206 patients with dementia syndromes, affective disorder, affective disorder with cognitive impairment, mania, schizophrenia, personality disorders, and to 63 healthy controls. The test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the MMSE when administered twice within 24 hours was excellent, with a Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient of r = 0.89. When the MMSE was given to patients with depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
and dementia twice, 28 days apart, the correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
was excellent, with a Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
of r = 0.99.
Note: Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients are likely to over-estimate reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
and the Pearson is no longer used for test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
.

Schmand, Lindeboom, Launer, Dinkgreve, Hooijer, and Jonker (1995) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the MMSE in healthy older subjects who were examined twice with an interval of 1 year between evaluations. Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was adequate (Spearman’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
= 0.58). The results of this study are similar to those found in O’Connor et al. (1989). These results suggest that the MMSE is not an appropriate measure for detecting subtle cognitive impairment.

Hopp et al. (1997) administered the MMSE to 44 adults without dementia, who were over the age of 75 years. The test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
for 6- 12- and 18-month intervals, using Pearson’s correlations, ranged from adequate to excellent (r = 0.56 to r = 0.80).

Olin and Zelinski (1991) examined the 12-month reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the MMSE in 57 elderly participants without dementia. Poor 12-month test-retest correlations were found for the total MMSE score (r = 0.34 when administering the alternate Attention item, r =0.23 when administering the same Attention item).

Uhlmann, Larson, and Buchner (1987) also examined the 12-month test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the MMSE in outpatients with dementia. In this study, the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was found to be excellent (r = 0.86).

Mitrushina and Satz (1991) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the MMSE in 122 healthy community-residing elderly volunteers between the ages of 57-85. The test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the MMSE was adequate (ranging from r = 0.45 to r = 0.50) over a 1-year interval, and poor over a 2-year period (r = 0.38).

Intra-rater/Inter-rater:
Molloy and Standish (1997) examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the MMSE in comparison to the SMMSE in 48 older adults. University students, who were trained to administer either the MMSE or the SMMSE, tested participants on three different occasions to assess their inter-rater and intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
. An adequate ICC of 0.69 was reported for the traditional MMSE.

Inter-rater:
Dick et al. (1984) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the MMSE in patients with neurological disorders and reported a kappa of 0.63, demonstrating the adequate inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the MMSE.

Fabrigoule, Lechevallier, Crasborn, Dartigues, and Orgogozo (2003) examined the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the MMSE in patients who were likely to develop dementia. Fifty trained general practitioners and psychologists examined patients. There was a significant difference in scores between the general practitioners and the psychologists for the MMSE. The concordance correlation coefficient was 0.87 between evaluations performed by general practitioners and those performed by psychologists.

In a study by O’Connor et al. (1989), 5 coders rated taped interviews with 54 general practice patients aged 75 and over. In this study, the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was excellent, with a mean kappa value of 0.97.

Validity

Criterion:

Although the MMSE is generally considered unidimensional, Jones and Gallo (2000) identified five factors (concentration, language and praxis, orientation, memory, and attention) to support the construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the MMSE as a measure of cognitive mental state among community dwelling older adults.

Concurrent:
Friedl, Schmidt, Stronegger, Fazekas, and Reinhart (1996) examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the MMSE and the Mattis Dementia Rating Scale (MDRS) (Mattis, 1976), two measures commonly used to screen for dementia. Concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
between the MMSE and the MDRS was found to be poor (Pearson’s r = 0.29), as were correlations between the MMSE and MDRS subtests (attention r = 0.18; initiationThe ability to spontaneously start a task or activity (Grieve & Gnanasekaran, 2008)
and perseveration r = 0.04; construction r = 0.10; conceptualization r = 0.17; verbal and non-verbal short-term memory r = 0.27).

Folstein et al. (1975) administered the MMSE to 206 patients with dementia syndromes, affective disorder, affective disorder with cognitive impairment, mania, schizophrenia, personality disorders, and to 63 healthy controls. The concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the MMSE was examined by correlating the measure with the Wechsler Adult Intelligence Scale (WAIS – Wechsler, 1955). The concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
between the MMSE and the WAIS verbal IQ (r = 0.78) and the WAIS performance IQ (r = 0.66) were both excellent.

Hopp, Dixon, Grut, and Backman (1997) administered the MMSE and the Wechsler Adult Intelligence Scale-Revised (WAIS-R, Wechsler, 1981) to 44 adults without dementia, who were over the age of 75 years. Correlations between the MMSE and the WAIS-R Verbal IQ were adequate, ranging from r = 0.36 to r = 0.52. Correlations between the MMSE and WAIS-R Performance IQ were also adequate, ranging from r = 0.37 to r = 0.57. Correlations between the MMSE and the WAIS-R subtests ranged from poor to excellent (r = 0.20 to r = 0.60). Correlations between the MMSE subscales and the WAIS-R were generally lower than r = 0.41. The Language subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
of the MMSE showed the lowest correlations with both WAIS-R Verbal and WAIS-R Performance. Correlations between MMSE subscales and WAIS-R subtests showed that the MMSE subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
, Orientation, had the lowest correlations with all WAIS-R subtests (r = 0.001 to r = 0.40).

Similar to the results by Hopp et al. (1997), Dick et al. (1984) examined the utility of the MMSE for bedside screeningTesting for disease in people without symptoms.
, and serial assessment of cognitive function in 126 neurological patients and found adequate correlations between the MMSE and the Weschler Adult Intelligence Scale (WAIS) (r = 0.55 for WAIS-Verbal; r = 0.56 for WAIS-Performance).

Agrell and Dehlin (2000) reported significant correlations between MMSE scores and the Barthel Index (Mahoney & Barthel, 1965), the Montgomery Asberg DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Rating Scale (MADRS – Montgomery & Asberg, 1979) and the Zung DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale (Zung, 1965).

Diamond, Felsenthal, Macciocci, Butler, and Lally-Cassady (1996) examined the relationship between cognition and ability to benefit from inpatient rehabilitation in 52 patients admitted to geriatric rehabilitation. Functional gain was assessed using the change in Functional Independence Measure (FIM – Keith, Granger, Hamilton, & Sherwin, 1987) score from admission to discharge. The MMSE was not found to be associated with change in FIM score (r = 0.10). However, the MMSE alone and in combination with age correlated adequately with functional status on admission (r = 0.58) and discharge (r = 0.49).

Predictive:
Ozdemir et al. (2001) examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the MMSE in 43 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Baseline total MMSE scores were correlated with discharge Motor Functional Independence Measure (Keith et al., 1987) improvement (r = 0.31). The baseline Orientation subscore of the MMSE correlated significantly with functional ambulationThe ability to walk, with or without the aid of appropriate assistive devices (such as canes or walkers), safely and sufficiently to carry out mobility-related activities of daily living (ADLs). From Perry et al (1995), functional ambulation is referred to as walking in parallell bars for exercise at a speed of about 10/cm per second.
score improvement as measured by the Adapted Patient Evaluation and Conference System functional scale (r = 0.31). These results suggest that baseline total MMSE scores are somewhat predictive of functional improvement in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. after rehabilitation.

Diamond et al. (1996) examined the relationship between cognition and the ability to benefit from inpatient rehabilitation in 52 patients admitted to geriatric rehabilitation. The MMSE was found to be highly predictive of discharge destination such that low MMSE scores were associated with a greater likelihood of nursing home placement (r = 0.68). While only 8% of the uppermost MMSE quartile was discharged to nursing home placement, 62% of the lowest MMSE quartile was discharged to nursing homes.

Aguero-Torres, Fratiglioni, Guo, Viitanen, von Strauss, and Winblad (1998) examined predictors of dependence in activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living (as measured by the Katz index of ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living (Katz, Downs, Cash, Grotz, 1970)) in the elderly. In patients without dementia, the MMSE was found to be one of the strongest predictors for developing functional dependence at a 3-year follow-up interval. Lower MMSE scores were associated with functional dependence in both adults with dementia (OR = 0.8) and in adults without dementia (OR = 0.8). Initial MMSE performance also predicted future functional dependence and decline among adults without dementia (OR = 0.7). Thus, independent of the presence of other chronic conditions, the MMSE may indicate subsequent functional status in a cognitively intact elderly population.

Matsueda and Ishii (2000) retrospectively examined the relationship between MMSE score and ambulatory level (divided into three groups: dependent, partially dependent, and independent) in 162 elderly patients who experienced a hip fracture. A significant relationship was found between initial MMSE score and ambulatory level such that those in the dependent group had the lowest mean MMSE score of only 6.6, those in the partially dependent group had a mean score of 17.9, and those in the independent group had the highest MMSE score of 24.6.

Huusko, Karppi, Avikainen, Kautiainen, and Sulkava (2000) examined the effect of intensive geriatric rehabilitation (intervention group) versus local hospital treatment (control group) on patients with dementia and a hip fracture. MMSE scores were predictive of length of hospital stay such that for patients with moderate dementia (MMSE score of 12-17), the median length of stay was 47 days in the intervention group and 147 days in control group. Patients with mild dementia (MMSE score of 18-23) had a length of stay of 29 days in intervention group and 46.5 days in the control group. No significant differences in mortality or in the length of hospital stay were observed for patients with severe dementia. In the intervention group, 3 months after surgery 91% of the patients with mild dementia and 63% of the patients with moderate dementia were living independently. In the control group, the corresponding figures were 67% and 17%, respectively. The results of this study suggest that the MMSE is associated with the length of hospital and rehabilitation stay, and that length of stay can be impacted on by intervention for those with cognitive impairment.

Pettigrew, Thomas, Howard, Veltkamp, and Toole (2000) examined whether low MMSE scores predict transient ischemic attack, strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., myocardial infarction, or death. Patients were randomized to receive a carotid endarterectomy or best medical therapy in as a means to preserve cognition. A significant relationship was found between a low post-randomization MMSE score and an increased risk of death. Furthermore, patients who experienced strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. after randomization had a significant and persistent reduction in MMSE score.

Construct:

Convergent:
Snowden at al. (1999) examined 140 patients who were part of the Alzheimer’s Disease Patient Registry to evaluate the psychometric properties of a new measure, the Minimum Data Set (MDS). The cognitive performance scores from the MDS were correlated with the MMSE. The MMSE correlated adequately with the MDS (Spearman’s r = -0.45) (this correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
is negative because a low score on the MMSE indicates cognitive impairment, whereas a high score on the MDS indicates impairment). Consistent with previous studies, the MMSE had excellent correlations with the Weschler Adult Intelligence Scale (WAIS) Verbal and Performance IQ scores (r = 0.78 and r = 0.66, respectively).

Discriminant:
Winograd et al. (1994) developed the Physical Performance and Mobility Examination, a measure used to assess 6 domains of physical functioning and mobility for hospitalized elderly. The construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of this measure was examined by comparing it to the MMSE, Activities of Daily Living (ADL)Basic tasks that involve bodily issues (bathing, dressing, toileting, transferring, continence, eating and walking) that are done on a daily basis., Instrumental Activities of Daily Living (IADL)Complex tasks that involve social or societal issues (shopping, bill paying, cooking, housework, etc.) that are done on a regular basis. (Lawton & Brody, 1969), Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale (Yesavage et al., 1983), and modified Medical Outcomes Study Measure of Physical Functioning (MOS-PFR). The MMSE correlated poorly with the Physical Performance and Mobility Examination (r = 0.36), suggesting that these two measures assess different constructs.

Macnight and Rockwood (1995) examined discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess.
of the MMSE by comparing it to a new measure, the Hierarchical Assessment of Balance and Mobility (HABAM) in patients 65 and older. The discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess.
was demonstrated, as the two measures correlated poorly (r = 0.15).

Known groups:
Wetherell, Darby, Emerson, and Miller (1997) found that the MMSE was able to discriminate between patients with Alzheimer’s Disease and frontotemporal dementia.

Kase, Wolf, Kelly-Hayes, Kannel, Beiser, and D’Agostino (1998) found that baseline pre-stroke MMSE scores were significantly lower for patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. than were the scores for matched controls. This difference became more pronounced when the post-stroke scores were compared. The MMSE could discriminate between patients with left- and right-hemispheric strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. In patients with right-hemispheric strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., cognitive impairment was characterized by a significant decline in scores from pre-stroke to post-stroke specifically in the areas of orientation and language. For patients with left hemisphere strokes, a significant decline in scores from pre-stroke to post-stroke were found in all five domains of the MMSE except memory.

Sensitivity and Specificity

Low reported levels of sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
, particularly among individuals with mild cognitive impairment, have been reported for the MMSE (Tombaugh & McIntyre, 1992; de Koning et al. 1998) and may be due to the emphasis placed on language items and a lack of items assessing visual-spatial ability (Grace et al. 1995; de Koning et al. 1998; Suhr & Grace, 1999).

Blake et al. (2002) examined the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of the MMSE for detecting cognitive impairment after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. When the MMSE was compared with cognitive impairment identified an optimum cutoff of <24, with good specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
(88%) and moderate sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
(62%). However, it was not possible to identify suitable cutoff scores to use the MMSE to assess for the presence of either visual or verbal memory deficits.

Nys, van Zandvoort, de Kort, Jansen, Kappelle, and de Haan (2005) administered the MMSE to 34 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and 34 healthy controls. In this study, no optimum cut-off scores yielding both sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
greater than 80%, and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
greater than 60%, could be identified.

References

Agrell, B., Dehlin, O. (2000). Mini mental state examination in geriatric stroke patients. Validity, differences between subgroups of patients, and relationships to somatic and mental variables. Aging (Milano), 12(6), 439-444.
Aguero-Torres, H., Fratiglioni, L., Guo, Z., Viitanen, M., von Strauss, E., Winblad, B. (1998). Dementia is the major cause of functional dependence in the elderly: 3-year follow-up data from population-based
study. American Journal of Public Health, 88,1452-1456.
Albert, M., Cohen, C. (1992). The test for severe impairment: An instrument for the assessment of patients with severe cognitive dysfunction. J Am Geriatr Soc, 40(5), 449-453.
Blake, H., McKinney, M., Treece, K., Lee, E., Lincoln, N. B. (2002). An evaluation of screening measures for cognitive impairment after stroke. Age and Ageing, 31, 451-456.
Bleecker, M. L., Bolla-Wilson, K., Kawas, C., Agnew, J. (1988). Age-specific norms for the Mini-Mental State Exam. Neurology, 10, 1565-1568.
Crum, R. M., Anthony, J. C., Bassett, S. S., Folstein, M. F. (1993). Population-based norms for the mini-mental state examination by age and educational level. JAMA, 18, 2386-2391.
Da Costa, F.A., Bezerra, I.F.D., de Araujo Silva, D.L., de Oliveira, R. & da Rocha, V.M. (2010). Cognitive evolution by MMSE in poststroke patients. International Journal of Rehabilitation Research, 33, 248-253.
de Koning, I., van Kooten, F., Dippel, D. W. J., van Harskamp, F., Grobbee, D. E., Kluft, C., Koudstaal, P. J. (1998). The CAMCOG: A useful screening instrument for dementia in stroke patients. Stroke, 29, 2080-2086.
Diamond, P. T., Felsenthal, G., Macciocci, S. N., Butler, D. H., Lally-Cassady, D. (1996). Effect of cognitive impairment on rehabilitation outcome. American Journal of Physical Medicine & Rehabilitation, 75(1), 40-43.
Dick, J. P., Guiloff, R. J., Stewart, A., Blackstock, J., Bielawska, C., Paul, E. A., Marsden, C. D. (1984). Mini-mental state examination in neurological patients. Journal of Neurology, Neurosurgery, and Psychiatry, 47, 496-499.
Fabrigoule, C., Lechevallier, N., Crasborn, L., Dartigues, J. F., Orgogozo, J. M. (2003). Inter-rater reliability of scales used to measure mild cognitive impairment by general practitioners and psychologists. Current Medial Research and Opinion, 19(7), 603-608.
Folstein, M. F., Folstein, S. E., McHugh, P. R. (1975). “Mini-mental state”. A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res, 12(3), 189-198.
Folstein, M. F., Folstein, S. E., McHugh, P. R. (1998). Key papers in geriatric psychiatry. Mini-Mental State: A practical method for grading the cognitive state of patients for the clinician. Int J Geriat Psychiatry, 13(5), 285-294.
Folstein, M. F., Folstein, S. E., McHugh, P. R., Fanjiang, G. (2001). Mini-Mental State Examination User’s Guide. Odessa, FL: Psychological Assessment Resources.
Folstein, M. F., Robins, L. N., Helzer, J. E. (1983). The Mini-Mental State Examination. Arch Gen Psychiatry, 40(7), 812.
Foreman, M. D. (1987). Reliability and validity of mental status questionnaires in elderly hospitalized patients. Nurs Res, 36(4), 216-220.
Friedl, W., Schmidt, R., Stronegger, W. J., Fazekas, F., Reinhart, B. (1996). Sociodemographic predictors and concurrent validity of the Mini Mental State Examination and the Mattis Dementia Rating Scale. European Archives of Psychiatry and Clinical Neuroscience, 246(6), 317-319.
Grace, J., Nadler, J. D., White, D. A., Guilmette, T. J., Giuliano, A. J., Monsch, A. U., Snow, M. G. (1995). Folstein vs modified Mini-Mental State Examination in geriatric stroke. Stability, validity, and screening utility. Archives of Neurology, 52(5), 477-484.
Holzer, C. E., Tischler, G. L., Leaf, P. J., Myers, J. K. (1984). An epidemiologic assessment of cognitive impairment in a community. Research in Community Mental Health, 4, 3-32.
Hopp, G. A., Dixon, R. A., Grut, M., Backman, L. (1997). Longitudinal and psychometric profiles of two cognitive status tests in very old adults. J Clin Psychol, 53(7), 673-686.
Huusko, T. M., Karppi, P., Avikainen, V., Kautiainen, H., Sulkava, R. (2000). Randomised, clinically controlled trial of intensive geriatric rehabilitation in patients with hip fracture: Subgroup analysis of patients with dementia. British Medical Journal, 321,1107-1111.
Jones, R. N., Gallo, J. J. (2000). Dimensions of the Mini-Mental State Examination among community dwelling older adults. Psychological Medicine, 30, 605-618.
Jorm, A. F., Scott, R., Henderson, A. S., Kay, K. W. (1988). Educational level differences on the Mini-Mental State: The role of test bias. Psychol Med, 18(3), 727-731.
Kase, C. S., Wolf, P. A., Kelly-Hayes, M., Kannel, W. B., Beiser, A., D’Agostino, R. B. (1998). Intellectual decline after stroke: The Framingham study. Stroke, 29, 805-812.
Katz, S., Downs, T. D., Cash, H. R., Grotz, R. C. (1970). Index of Activities of Daily Living. The Gerontologist, 1, 20-30.
Kay, K. W., Henderson, A. S., Scott, R., Wilson, J., Rickwood, D., Grayson, D. A. (1985). Dementia and depression among the elderly living in the Hobart community: The effect of the diagnostic criteria on the prevalence rates. Psychol Med, 15(4), 771-788.
Keith, R. A., Granger, C. V., Hamilton, B. B., Sherwin, F. S. (1987). The functional independence measure: A new tool for rehabilitation. Adv Clin Rehabil, 1, 6-18.
Lawton, M. P., Brody, E. M. (1969). Assessment of older people: Self-maintaining and instrumental activities of daily living. Gerontologist, 9, 179-186.
Lorentz, W. J., Scanlan, J. M., Borson, S. (2002). Brief screening test for dementia. Can J Psychiatry, 47, 723-733.
Macnight, C., Rockwood, K. (1995). A hierarchical assessment of balance and mobility. Age and Ageing, 24(2), 126-130.
Mahoney, F. I., Barthel, D. W. (1965). Functional evaluation: The Barthel Index. Md State Med J, 14, 61-5.
Matsueda, M., Ishii, Y. (2000). The relationship between dementia score and ambulatory level after hip fracture in the elderly. American Journal of Orthopedics, 29,691-693.
Mattis, S. (1976). Mental status examination for organic mental syndrome in the elderly patient. In: Bellak L, Karasu TB, editors. Geriatric Psychiatry. New York: Grune and Stratton, 77-101.
McDowell, I., Kristjansson, B., Hill, G. B., Hebert, R. (1997). Community screening for dementia: The Mini Mental State Exam (MMSE) and modified Mini-Mental State Exam (3MS) compared. Journal of Clinical Epidemiology, 50(4), 377-383.
Mitrushina, M., Satz, P. (1991). Reliability and validity of the Mini-Mental State Exam in neurologically intact elderly. J Clin Psychol, 47(4), 537-543.
Molloy, D. W., Standish, T. I. M. (1997). A guide to the Standardized Mini-Mental State Examination. International Psychogeriatrics, 9(1), 87-94.
Montgomery, S. A., Asberg, M. (1979). A new depression scale designed to be sensitive to change. Brit J Psychiat, 134, 382-389.
Newkirk, L. A., Kim, J. M., Thompson, J. M., Tinklenberg, J. R., Yesavage, J. A., Taylor, J. L. (2004). Validation of a 26-point telephone version of the Mini-Mental State Examination. Journal of Geriatric Psychiatry and Neurology, 17(2), 81-87.
Nys, G. M., van Zandvoort, M. J., de Kort, P. L., Jansen, B. P., Kappelle, L. J., de Haan, E. H. (2005). Restrictions of the Mini-Mental State Examination in acute stroke. Arch Clin Neuropsychol, 20(5), 623-629.
O’Connor, D. W., Pollitt, P. A., Hyde, J. B., Fellows, J. L., Miller, N. D., Brooke, C. P., Reiss, B. B. (1989). The reliability and validity of the Mini-Mental State in a British community survey. J Psychiatr Res, 23(1), 87-96.
Olin, J.T., Zelinski, E.M. (1991). The 12-month reliability of the Mini-Mental State Examination. Psychological Assessment, 3, 427-432.
Ozdemir, F., Birtane, M., Tabatabaei, R., Ekuklu, G., Kokino, S. (2001). Cognitive evaluation and functional outcome after stroke. American Journal of Physical Medicine & Rehabilitation. 80(6), 410-415.
Pettigrew, L. C., Thomas, N., Howard, V. J., Veltkamp, R., Toole, J. F. (2000). Low mini-mental status predicts mortality in asymptomatic carotid arterial stenosis. Neurology, 55,30-34.
Roccaforte, W. H., Burke, W. J., Bayer, B. L., Wengel, S. P. (1992). Validation of a telephone version of the mini-mental state examination. J Am Geriatr Soc, 40(7), 697-702.
Ruchinskas, R. A., Curyto, K. J. (2003). Cognitive screening in geriatric rehabilitation. Rehab Psychol, 48, 14-22.
Schmand, B., Lindeboom, J., Launer, L., Dinkgreve, M., Hooijer, C., Jonker, C. (1995). What is a significant score change on the Mini-Mental State Examination? International Journal of Geriatric Psychiatry, 10, 411-414.
Schwamm, L. H., Van Dyke, C., Kiernan, R. J., Merrin, E. L., Mueller, J. (1987). The Neurobehavioral Cognitive Status Examination: Comparison with the Cognitive Capacity Screening Examination and the Mini-Mental State Examination in a neurosurgical population. Ann Intern Med, 107(4), 486-491.
Shadlen, M. F., Larson, E. B., Gibbons, L., McCormick, W. C., Teri, L. (1999). Alzheimer’s disease symptom severity in Blacks and Whites. Journal of the American Geriatrics Society, 47,482-486.
Snowden, M., McCormick, W., Russo, J., Srebnik, D., Comtois, K., Bowen, J., Teri, L., Larson, E. B. (1999). Validity and responsiveness of the Minimum Data Set. Journal of the American Geriatrics Society, 47(8), 1000-1004.
Suhr, J. A., Grace, J. (1999). Brief cognitive screening of right hemisphere stroke: Relation to functional outcome. Arch Phys Med Rehabil, 80(7), 773-776.
Teng, E. L., Chui, H. C. (1987). The Modified Mini-Mental State (3MS) examination. J Clin Psychiatry, 48(8), 314-318.
Tombaugh, T. N., McIntyre, N. J. (1992). The mini-mental state examination: A comprehensive review. J Am Geriatr Soc, 40(9), 922-935.
Tombaugh, T. N., McDowell, I., Kristjansson, B., Hubley, A. M. (1996). Mini-Mental State Examination (MMSE) and the modified MMSE (3MS): A psychometric comparison and normative data. Psychol Assess, 8(1), 48-59.
Uhlmann, R. F., Larson, E. B., Buchner, D. M. (1987). Correlations of Mini-Mental State and modified Dementia Rating Scale to measures of transitional health status in dementia. J Gerontol, 42(1), 33-36.
Wechsler, D. (1981). Wechsler Adult Intelligence Scale-Revised: Test. New York: Harcourt Brace
Wechsler, D. (1955). Manual for the Wechsler Adult Intelligence Scale. New York: The Psychological Corporation.
Wetherell, M., Darby, A., Emerson, K., & Miller, B. L. (1997). Mini- Mental State Examination performance in Alzheimer’s disease and frontotemporal dementia. International Journal of Rehabilitation and Health, 3,253-265.
Wind, A. W., Schellevis, F. G., van Staveren, G., Scholten, R. J. P. M., Jonker, C., van Eijk, J. M. (1997). Limitations of the mini-mental state examination in diagnosing dementia in general practice. International Journal of Geriatric Psychiatry, 12(1), 101-108.
Winograd, C. H., Lemsky, C. M., Nevitt, M. C., Nordstrom, T. M., Stewart, A. L., Miller, C. J., Bloch, D. A. (1994). Development of a physical performance and mobility examination. J Am Geriatr Soc, 42(7), 743-749.
Yesavage, J. A., Brink, T. L., Rose, T. L., Lum, O., Huang, V., Adey, M. B., Leirer, V. O. (1983). Development and validation of a geriatric depression screening scale: A preliminary report. Journal of Psychiatric Research, 17, 37-49.
Zung, W. W. K. (1965). A self-rating depression scale. Arch Gen Psychiatry, 12, 63-70.

See the measure

How to obtain the MMSE

The MMSE can be obtained from the current copyright owner, Psychological Assessment Resources (PAR).

Montreal Cognitive Assessment (MoCA)

Evidence Reviewed as of before: 20-01-2011

Author(s)*: Lisa Zeltzer, MSc OT; Katie Marvin, MSc PT Candidate

Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Purpose

The Montreal Cognitive Assessment (MoCA) was designed as a rapid screening instrument for the detection of mild cognitive impairment. It was developed in response to the poor sensitivity of the Mini-Mental State Examination (MMSE) in distinguishing clients with mild cognitive impairment from normal elderly clients (Nasreddine et al., 2005). Thus, the MoCA is intended for clients with memory complaints who score within the normal range on the MMSE.

The MoCA assesses the following cognitive domains: attention and concentration, executive functions, memory, language, visuoconstructional skills, conceptual thinking, calculations, and orientation. The measure can be used, but is not limited to patients with stroke.

In-Depth Review

Purpose of the measure

The Montreal Cognitive Assessment (MoCA) was designed as a rapid screeningTesting for disease in people without symptoms.
instrument for the detection of mild cognitive impairment. It was developed in response to the poor sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of the Mini-Mental State Examination (MMSE) in distinguishing clients with mild cognitive impairment from normal elderly clients (Nasreddine et al., 2005). Thus, the MoCA is intended for clients with memory complaints who score within the normal range on the MMSE.

Available versions

The Montreal Cognitive Assessment was developed by Dr Nasreddine in 1996, then validated with the help of Chertkow, Phillips, Whitehead, Bergman, Collin, Cummings, and Hébert in 2004-2005.

Features of the measure

Items:

The items of the MoCA examine attention and concentration, executive functions, memory, language, visuoconstructional skills, conceptual thinking, calculations, and orientation. These items are described in detail below.

Alternating Trail Making: The examiner instructs the client to “Please draw a line, going from a number to a letter in ascending order. Begin here” (points to 1) and draw a line from 1 then to A then to 2 and so on. End here (points to E).
Visuoconstructional Skills – Cube: The examiner gives the following instructions, pointing to the cube: “Copy this drawing as accurately as you can, in the space below“.
Visuoconstructional Skills – Clock: Indicate the right third of the test sheet where a space is provided for the clock drawing item, and give the following instructions: “Draw a clock. Put in all the numbers and set the time to 10 after 11“
Naming: Beginning on the left, point to each figure and say: “Tell me the name of this animal”
Memory: The examiner reads a list of 5 words at a rate of one per second, giving the following instructions: “This is a memory test. I am going to read a list of words that you will have to remember now and later on. Listen carefully. When I am through, tell me as many words as you can remember. It doesn’t matter in what order you say them“. Checkmark the space allocated for each word the client produces on the first trial on the test sheet. When the client indicates that he/she has finished (has recalled all the words), or can recall no more words, read the list a second time with the following instructions: “I am going to read the same list for a second time. Try to remember and tell me as many words as you can, including words you said the first time“. Put a checkmark in the allocated space for each word on the test sheet the client recalls after the second trial. At the end of the second trial, inform the client that she/he will be asked to recall these words again by saying, “I will ask you to recall those words again at the end of the test“
Attention:
- Forward Digit Span: Give the following instruction: “I am going to say some numbers and when I am through, repeat them to me exactly as I said them“. Read the five number sequences at a rate of one digit per second.
- Backward Digit Span: Give the following instruction: “Now I am going to say some more numbers, but when I am through you must repeat them to me in the backwards order“. Read the three number sequences at a rate of one digit per second.
- Vigilance: The examiner reads the list of letters at a rate of one per second, after giving the following instruction: “I am going to read a sequence of letters. Every time I say the letter A, tap you hand once. If I say a different letter, do not tap your hand“
- Serial 7s: The examiner gives the following instruction: “Now I will ask you to count by subtracting seven from 100, and then, keep subtracting seven from your answer until I tell you to stop“. Give this instruction twice if necessary.
Sentence Repetition: The examiner gives the following instructions: “I am going to read you a sentence. Repeat it after me, exactly as I say it [pause]. I only know that John is the one to help today.” Following the response say: “Now I am going to read you another sentence. Repeat it after me, exactly as I say it [pause]. The cat always hid under the couch when dogs were in the room“.
Verbal Fluency: The examiner gives the following instruction: “Tell me as many words as you can think of that begin with a certain letter of the alphabet that I will tell you in a moment. You can say any kind of word you want, except for proper nouns (like Bob or Boston), numbers, or words that begin with the same sound but have a different suffix, for example, love, lover, loving. I will tell you to stop after one minute. Are you ready? [pause]. Now, tell me as many words as you can beginning with the letter F” [time 60 seconds]. “Stop“
Abstraction: The examiner asks the client to explain what each pair of words has in common, starting with the example: “Tell me how an orange and a banana are alike“. If the subject answers in a concrete manner, then say only one additional time: “Tell me another way in which those items are alike“. If the client still doesn’t give the appropriate response (fruit), say “Yes, and they are also both fruit“. Do not give any additional instructions or clarification. After the practice trial say: “Now tell me how a train and a bicycle are alike“. Following the response, administer the second trial, saying: “Now, tell me how a ruler and a watch are alike“. Do not give any additional instructions or prompts
Delayed Recall:The examiner gives the following instruction: “I read some words to you earlier, which I asked you to remember. Tell me as many of those words as you can remember.” Make a checkmark on the test sheet for each of the words correctly recalled spontaneously without any cues, in the allocated space.
Optional: The client can be prompted with semantic category cues for any word that is not recalled. This is to elicit clinical information in order to provide the examiner with additional information regarding the type of memory disorder. For memory deficits due to retrieval failures, performance can be improved with a cue. For memory deficits due to encoding failures, performance does not improve with a cue. No points are awarded for words recalled from a cue.

Make a checkmark in the allocated space if they remembered the word with the help of a category cue. If not, give them a multiple choice cue.

Use the following category and/or multiple-choice cues for each word, when appropriate:
- FACE: category cue: part of the body multiple choice: nose, face, hand
- VELVET: category cue: type of fabric multiple choice: denim, cotton, velvet
- CHURCH: category cue: type of building multiple choice: church, school, hospital
- DAISY: category cue: type of flower multiple choice: rose, daisy, tulip
- RED: category cue: a color multiple choice: red, blue, green
Orientation: The examiner gives the following instructions: “Tell me the date today“. If the client does not give a complete answer, then prompt accordingly by saying: “Tell me the [year, month, exact date, and day of the week]“. Then say: “Now, tell me the name of this place, and which city it is in.”

Scoring:

Sum all subscores. Add one point for a client who has had 12 years or fewer of formal education, for a possible maximum of 30 points. A final total score of 26 and above is considered normal. A final total score below 26 is indicative of mild cognitive impairment.

Below is a breakdown of how each item of the MoCA is to be scored:

Item	How to score
Alternate Trail Making (1 point)	Give 1 point if the following pattern is drawn without drawing any lines that cross: 1-A-2-B-3-C-4-D-5-E. Any error that is not immediately self-corrected earns a score of 0.
Visuoconstructional skills Cube (1 point)	Give 1 point for a correctly executed drawing. Drawing must be 3D; all lines drawn; no lines added; lines are relatively parallel and lengths are similar (rectangular prisms are accepted). A point is not assigned if any of the above-criteria are not met.
Vosuoconstructional skills Clock (3 points)	Contour (1 point): The clock face must be a circle with only minor distortion acceptable (e.g. slight imperfection in closing the circle). Numbers (1 point): All clock numbers must be present with no additional numbers; numbers must be in correct order and placed in approximate quadrants on the clock face; roman numerals are accepted; numbers can be places outside the circle contour. Hands (1 point): There must be 2 hands jointly indicating the correct time; the hour hand must be clearly shorter than the minute hand; hands must be centered within the clock face with their junction close to the clock centre. A point is not assigned for a given element if any of the above-criteria are not met.
Naming (3 points)	One point each is given for the following responses: (1) camel/dromedary, (2) lion, (3) rhinoceros/rhino.
Memory (0 points)	No points are given for Trials 1 and 2.
Attention (6 points)	Digit span (2 points): Give 1 point for each sequence correctly repeated (the correct response for the backwards trial is 2-4-7). Vigilance (1 point): Give 1 point if there are 0-1 errors (an error includes a tap on a wrong letter, or a failure to tap on letter A). Serial 7s (3 points): This item is scored out of 3 points. Give 0 points for no correct subtractions; 1 point for 1 correct subtraction; 2 points for 2-3 correct subtractions; and 3 points if the client successfully makes 4-5 correct subtractions. Count each correct subtraction of 7 beginning at 100. Each subtraction is evaluated independently; that is, if the client responds with an incorrect number but continues to correctly subtract 7 from it, give a point for each correct subtraction. For example, a client may respond “92-85-78-71-64” where the “92” is incorrect, but all subsequent numbers are subtracted correctly. This is 1 error and the item would be given a score of 3.
Sentence Repetition (2 points)	Give 1 point for each sentence correctly repeated. Repetition must be exact. Be alert for errors that are omissions (e.g., omitting “only”, “always”) and substitutions/additions.
Verbal fluency (1 point)	Give 1 point if the 11 words or more are generated in 60 seconds. Record responses in the margins.
Abstraction (2 points)	Only the last 2 item pairs are scored. Give 1 point to each item pair correctly answered. The following responses are acceptable: Train-bicycle = means of transportation, means of traveling, you take trips in both Ruler-watch = measuring instruments, used to measure The following responses are not acceptable: Train-bicycle = they have wheels; Ruler-watch = they have numbers.
Delayed recall (5 points)	Give 1 point for each word recalled freely without any cues.
Orientation (6 points)	Give 1 point for each item correctly answered. The client must tell the exact date and place (name of hospital, clinic, office). No points are awarded if client makes an error of 1 day for the day and date.

Time:

The MoCA takes approximately 10-15 minutes to administer for clients with mild cognitive impairment.

Subscales:

Visuospatial/Executive; Naming; Memory; Attention; Language; Abstraction; Delayed recall; Orientation

Equipment:

Only the MoCA test sheet and a pencil are required to complete the measure.

Training:

The MoCA should be administered by a health professional. No formal training is required to administer the measure.

Alternative form of the MoCA

MoCA – version 2 & 3 (English)

Two alternative versions of the MoCA (English) have been validated for use in instances when repeated administration is necessary, to avoid possible learning effects.

MoCA – modified for individuals with visual impairments.

An alternative version of the MoCA has been validated for use with patients with visual impairments.

Please visit http://www.mocatest.org for further information and to download the alternative forms.

Client suitability

Can be used with:

Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
The MoCA is suitable for any individual who is experiencing memory difficulties but who scores within the normal range on the Mini-Mental State Examination.

Should not be used with:

Because the MoCA is heavily language dependent, it is likely to misclassify patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada).
The MoCA is not suitable for use with a proxy respondent as it is administered via direct observation of task completion.

In what languages is the measure available?

The MoCA has been translated into Arabic, Afrikaans, Chinese (Beijing, Cantonese, Changsha, Hong Kong, Taiwan), Czech, Croatian, Danish, Dutch, Estonian, French, Finnish, German, Greek, Hebrew, Italian, Japanese, Korean, Persian, Polish, Portuguese (Brazil), Russian, Serbian, Sinhalese, Spanish, Swedish, Thai, Turkish, Ukrainian and Vietnamese. These translations can be found at the following website: http://www.mocatest.org.

Summary

What does the tool measure?	Mild cognitive impairment
What types of clients can the tool be used for?	Can be used with but not limited to: • Patients with stroke • Any individual who is experiencing memory difficulties but scores within the normal range on the Mini Mental State Examination.
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Screening
Time to administer	The MoCA takes approximately 10-15 minutes to administer for clients with mild cognitive impairment.
Versions	MoCA (original); MoCA English (version 2); and MoCA English (version 3); MoCA (modified for individuals with visual impairments).
Other Languages	The MoCA has been translated into Arabic, Afrikaans, Chinese (Beijing, Cantonese, Changsha, Hong Kong, Taiwan), Czech, Croatian, Danish, Dutch, Estonian, French, Finnish, German, Greek, Hebrew, Italian, Japanese, Korean, Persian, Polish, Portuguese (Brazil), Russian, Serbian, Sinhalese, Spanish, Swedish, Thai, Turkish, Ukrainian and Vietnamese.
Measurement Properties
Reliability	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: Only one study has examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MoCA and reported excellent levels of internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.. Test-rest: Only one study has examined the test-rest reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest . of the MoCA, and reported excellent test-retest. Intra-rater: No studies have examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the MoCA. Inter-rater: No studies have examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the MoCA.
Validity	Criterion: Concurrent: Excellent correlations with the Mini Mental State Examination (MMSE) have been reported. Construct: Known groups: One study reported that the MoCA can distinguish between patients with mild cognitive impairment and healthy controls.
Floor/Ceiling Effects	No studies have examined the ceiling or floor effects of the MoCA.
Does the tool detect change in patients?	Not Applicable.
Acceptability	The MoCA is not suitable for individuals with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury. An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) or for use with a proxy respondent
Feasibility	The measure is simple to score and only the MoCA test sheet and a pencil are required to complete the measure.
How to obtain the tool?	The MoCA is available at: http://www.mocatest.org.

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the MoCA. As the MoCA is a relatively new measure, to our knowledge, the creators have personally gathered the majority of psychometric data that are currently published on the scale.

Reliability

Test-retest:
Nasreddine et al. (2005) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the MoCA by administering the measure to a subsample of 26 clients (clients with mild cognitive impairment or Alzheimer’s disease, and healthy elderly controls) twice, on average 35 days apart. The correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the two evaluations was excellent (r = 0.92). The mean change in MoCA scores from the first to second evaluation was 0.9 points.

Validity

Criterion:

Concurrent:
Nasreddine et al. (2005) administered the MoCA and the Mini Mental State Examination to 94 patients with mild cognitive impairment, 93 patients with mild Alzheimer’s disease, and 90 healthy elderly controls. The correlation between the MoCA and the MMSE was excellent (r = 0.87).

Sensitivity and Specificity

Four studies examined whether the MoCA could detect patients known to have varying degrees of cognitive impairment and found the MoCA to be more sensitive than the Mini-Mental State Examination (MMSE) in detecting these differences.

Nasreddine et al. (2005) examined whether the MoCA could distinguish between patients with mild cognitive impairment and healthy controls. The DSM-IV and NINCDS-ADRDA criteria were used to establish diagnosis of Alzheimer’s disease and neurological assessments performed by neurologists and geriatricians were used to establish diagnosis of cognitive impairment. At a cutoff score of 26, the MoCA had a sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
in identifying clients with mild cognitive impairment and clients with Alzheimer’s disease of 90% and 100%, respectively, and a specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of 87%. The MoCA’s sensitivity in detecting mild cognitive impairment was considerably more sensitive than was the Mini-Mental State Examination (MMSE) (the sensitivity of the MMSE was poor: 18% for patients with mild cognitive impairment; 78% for patients with Alzheimer’s disease).

Smith, Gildeh and Holmes (2007) evaluated whether the MoCA could detect mild cognitive impairment and dementia in patients attending a memory clinic. Dementia and mild cognitive impairment were diagnosed by neuropsychological assessment involving the ICD-10 criteria and CAMCOG scores. At a cutoff score of 26, the MoCA was found to have excellent sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
for detecting mild cognitive impairment (83%) and dementia (94%), but poor specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
(50% for both mild cognitive impairment and dementia). The specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
was lower than that identified in the earlier study by Nasreddine et al. (2005), likely due to the heterogeneous nature of the control group. The MoCA was also found to be more sensitive than the MMSE (the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of the MMSE was poor: 17% for patients with mild cognitive impairment and 25% for patients with dementia).

Luis, Keegan and Mullan (2009) examined whether the MoCA could distinguish between healthy controls and patients with Alzheimer’s disease or mild cognitive impairment. A diagnosis of Alzheimer’s disease was made by neuropsychological assessment using NINCDS-ADRDA criteria and mild cognitive impairment (MCI) by Petersen’s criteria (Petersen et al., 1999 as cited in Luis, Keegan & Mullan, 2009). At a cutoff score of 26, the MoCA was found to have excellent sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
for detecting MCI (100%) and Alzheimer’s disease and MCI combined (97%), with a poor specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
(35% for both groups of MCI and Alzheimer’s disease+MCI). A cutoff score of 23 was found to be optimal for identifying MCI, providing excellent sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
, 96% and 95% respectively. The MoCA was found to be more sensitive than the MMSE (at a cut-off score of ≤ 24, MMSE sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
for detecting MCI and Alzheimer’s disease+MCI was 17% and 36% respectively).

Dong et al. (2010) evaluated the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of an alternative language version of the MoCA for detecting vascular cognitive impairment and dementia after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Patients underwent neuro-imaging and neuropsychological assessment in order to establish a diagnosis of cognitive impairment or dementia using the DSM-IV criteria. Using an optimum cutoff score of 21, the MoCA correctly identified 90% of patients with cognitive impairment (excellent sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
) and 77% of those without cognitive impairment (adequate specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
). The MoCA was also found to be more sensitive than the MMSE (MMSE sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of 86% and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of 82% for detecting cognitive impairment).

In a population-based study of 413 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. or TIA, the MoCA was found to detect more cognitive deficits than the MMSE. For the purposes of the study, a score of ≥ 27 on the MMSE was used to classify patients as having normal cognitive function, and < 26 on the MoCA to classify mild cognitive impairment (no formal neuropsychological testing was performed to confirm diagnosis). 58% of patients with normal MMSE scores (≥ 27) were found to have scores indicative of mild cognitive impairment when the MoCA was used for screeningTesting for disease in people without symptoms.
(<26). Several of the deficits detected by the MoCA were in domains either not assessed or detected by the MMSE, including executive function and attention (not assessed) and recall and repetition (not detected) (Pendlebury, Cuthbertson, Welch, Mehta & Rothwell, 2010). SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of the MoCA for cognitive impairment could not be established in the study because no formal neuropsychological testing was performed to confirm diagnosis.

Responsiveness

Koski, Xie and Finch (2009) evaluated the MoCA as a quantitative measure of cognitive ability and its responsiveness. By applying Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
techniques to existing data from a geriatric outpatient clinic, the researchers found that in addition to the usefulness of the MoCA as a screening instrument, scores on the MoCA can be used to quantify the amount of cognitive ability a person has and can be used to track changes in cognitive ability over time. The significance of scores and change in scores can be interpreted based on the respondent’s baseline score, for example, a 5-point decrease from a baseline score of 25 is a more statistically significant and meaningful change than that of a 5-point decrease from a baseline score of 15 (please refer to Table 4 in Koski et al., 2009 for statistical significance of change in MoCA scores). Further research to determine the minimal clinically important difference is required.

References

Dong, Y.H., Sharma, V.K., Chan, B.P.L., Venketasubramanian, N., Teoh, H.L., Seet, R.C.S., Tanicala, S., Chan, Y.H. & Chen, C. (2010). The Montreal Cognitive Assessment (MoCA) is superior to the Mini-Mental State Examination (MMSE) for the detection of vascular cognitive impairment after acute stroke. Journal of Neurological Sciences. doi:10.1016/j.jns.2010.08.051
Koski, L., Xie, H. & Finch, L. (2009). Measuring cognition in a geriatric outpatient clinic: Rasch analysis of the Montreal Cognitive Assessment. Journal of Geriatric Psychiatry and Neurology, 22, 151-160.
Luis, C.A, Keegan, A.P. & Mullan, M. (2009). Cross validation of the Montreal Cognitive Assessment in community dwelling older adults residing in the Southeastern US. International Journal of Geriatric Psychiatry, 24, 197-201.
Nasreddine, Z. S., Phillips, N. A., Bediriam, V., Charbonneau, S., Whitehead, V., Collin, I., Cummings, J. L., Chertkow, H. (2005). The Montreal Cognitive Assessment, MoCA: A brief screening tool for mild cognitive impairment. Journal of the American Geriatrics Society, 53, 4, 695-699.
Nasreddine, Z. S., Chertkow, H., Phillips, N., Whitehead, V., Collin, I., Cummings, J. L. The Montreal Cognitive Assessment (MoCA): A brief cognitive screening tool for detection of mild cognitive impairment. Neurology, 62(7): S5, A132. Presented at the American Academy of Neurology Meeting, San Francisco, May 2004.
Nasreddine, Z. S., Chertkow, H., Phillips, N., Whitehead, V., Bergman, H., Collin, I., Cummings, J. L., Hébert, L. The Montreal Cognitive Assessment (MoCA): a Brief Cognitive Screening Tool for Detection of Mild Cognitive Impairment. Presented at the 8th International Montreal/Springfield Symposium on Advances in Alzheimer Therapy. http://www.siumed.edu/cme/AlzBrochure04.pdf p. 90, April 14-17, 2004.
Nasreddine, Z. S., Collin, I., Chertkow, H., Phillips, N., Bergman, H., Whitehead, V. Sensitivity and Specificity of The Montreal Cognitive Assessment (MoCA) for Detection of Mild Cognitive Deficits. Can J Neurol Sci, 30 (2), S2, 30. Presented at Canadian Congress of Neurological Sciences Meeting, Québec City, Québec, June 2003.
Pendlebury, S.T., Cuthbertson, F.C., Welch, S.J.V., Mehta, Z. & Rothwell, P.M. (2010). Underestimation of cognitive impairment by Mini-Mental State Examination versus the Montreal Cognitive Assessment in patients with transient ischemic attack and stroke. Stroke, 41, 1290-1293.
Smith, T., Gildeh, N. & Holmes, C. (2007). The Montreal Cognitive Assessment: Validity and utility in a memory clinic setting. The Canadian Journal of Psychiatry, 52, 329-332.
Wittich, W., Phillips, N., Nasreddine, Z.S. & Chertkow, H. (2010). Sensitivity and specificity of the Montreal Cognitive Assessment modified for individuals who are visually impaired. Journal of Visual Impairment & Blindness, 104(6), 360-368.

See the measure

How to obtain the MoCA?

The MoCA is available at: http://www.mocatest.org.

Multiple Errands Test (MET)

Evidence Reviewed as of before: 08-05-2013

Author(s)*: Valérie Poulin, OT, PhD candidate; Annabel McDermott, OT

Editor(s): Nicol Korner-Bitensky, PhD OT

Expert Reviewer: Deirdre Dawson, PhD OT

Purpose

The Multiple Errands Test (MET) evaluates the effect of executive function deficits on everyday functioning through a number of real-world tasks (e.g. purchasing specific items, collecting and writing down specific information, arriving at a stated location). Tasks are performed in a hospital or community setting within the constraints of specified rules. The participant is observed performing the test and the number and type of errors (e.g. rule breaks, omissions) are recorded.

In-Depth Review

Purpose of the measure

The Multiple Errands Test was developed by Shallice and Burgess in 1991. The measure was intended to evaluate a patient’s ability to organize performance of a number of simple unstructured tasks while following several simple rules.

See Alternative Forms sections below for information regarding other versions.

Features of the measure

Items:

The original Multiple Errands Test (Shallice and Burgess, 1991) was comprised of 8 items: 6 simple tasks (e.g. buy a brown loaf of bread, buy a packet of throat pastilles), 1 task that is time-dependent, and 1 that comprises 4 subtasks (see Description of tasks, below). It should be noted that the MET was originally devised in an experimental context, rather than as a formal assessment.

Description of tasks:

The original Multiple Errands Test (Shallice and Burgess, 1991) was comprised of 8 written tasks to be completed in a pedestrian shopping precinct. Tasks and rules are written on a card provided to the participant before arriving at the shopping precinct. Of the 8 tasks, 6 are simple (e.g. buy a brown loaf of bread, buy a packet of throat pastilles), the 7th requires the participant to be at a particular place 15 minutes after starting the test, and the 8th is more demanding as it comprises 4 sets of information that the participant must obtain and write on a postcard:

the name of the shop most likely to have the most expensive item;
the price of a pound of tomatoes;
the name of the coldest place in Britain yesterday; and
the rate of the exchange of the French franc yesterday.

The card also includes instructions and rules, which are repeated to the participant on arrival at the shopping precinct:

“You are to spend as little money as possible (within reason) and take as little time as possible (without rushing excessively). No shop should be entered other than to buy something. Please tell one or other of us when you leave a shop what you have bought. You are not to use anything not bought on the street (other than a watch) to assist you. You may do the tasks in any order.“

Scoring:

The participant is observed performing the test and errors are recorded according to the following categorizations:

Inefficiencies: where a more effective strategy could have been applied
Rule breaks: where a specific rule (either social or explicitly mentioned in the task) is broken
Interpretation failure: where requirements of a particular task are misunderstood
Task failure: where a task is either not carried out or not completed satisfactorily.

Time taken to complete the assessment is recorded and the total number of errors is calculated.

Alternative versions of the Multiple Errands Test

Different versions of the MET were developed for use in specific hospitals (MET – Hospital Version and Baycrest MET), a small shopping plaza (MET – Simplified Version), and a virtual reality environment (Virtual MET). For each of these versions, 12 tasks must be performed (e.g. purchasing specific items and collecting specific information) while following several rules.

MET – Hospital Version (MET-HV – Knight, Alderman & Burgess, 2002)

The MET-HV was developed for use with a wider range of participants than the original version by adopting more concrete rules and simpler tasks. Clients are provided with an instruction sheet that explicitly directs them to record designated information. Clients must achieve four sets of simple tasks, with a total of 12 separate subtasks:

The client must complete six specific errands (purchase 3 items, use the internal phone, collect an envelope from reception, and send a letter to an external address).
The client must obtain and write down four items of designated information (e.g. the opening time of a shop on Saturday).
The client must meet the assessor outside the hospital reception 20 minutes after the test had begun and state the time.
The client must inform the assessor when he/she finishes the test.

The MET-HV uses 9 rules in order to reduce ambiguity and simplify task demands (Knight et al., 2002). Errors are categorized according to the same definitions as the original MET. The test is preceded by (a) an efficiency question rated using an end-point weighted 10-point Likert scaleLikert scaling is one type of response to items in a questionnaire or tool. For example, Likert scaling would have you rate an item such as "I am satisfied with the care I received" on a scale using a 1-to-5 response scale where:
• 1 = strongly disagree
• 2 = disagree
• 3 = undecided
• 4 = agree
• 5 = strongly agree
You will find various options and scaling methods for the number of response choices (1-to-7, 1-to-9, 0-to-4). Odd-numbered scales usually have a middle value that is labelled Neutral or Undecided. Some tools used forced-choice Likert scaling with an even number of responses and no middle neutral or undecided choice. (“How efficient would you say you were with tasks like shopping, finding out information, and meeting people on time?“); and (b) a familiarity question rated using a 4-point scale (“How well would you say you know the hospital grounds?“). On completion the client answers a question rated using a 10-point scale (“How well do you think you did with the task?“).

MET – Simplified Version (MET-SV – Alderman, Burgess, Knight & Henman, 2003)

The MET-SV includes four sets of simple tasks analogous to those in the original MET, however the MET-SV incorporates 3 main modifications to the original version:

More concrete rules to enhance task clarity and reduce likelihood of interpretation failures;
Simplification of task demands; and
Space provided on the instruction sheet for the participant to record the information they were required to collect.

The MET-SV has 9 rules that are more explicit than the original MET and are clearly presented on the instruction sheet.

Baycrest MET (BMET – Dawson, Anderson, Burgess, Cooper, Krpan & Stuss, 2009)

The BMET was developed with an identical structure to the MET-HV, except that some items, information and a meeting place are specific to the testing environment (Baycrest Center, Toronto). The BMET comprises 12 items and 8 rules. The test manual provides explicit instructions including collecting test materials, language to be used in describing the test, and a pretest section to ensure participants understand the tasks. Scoring was standardized to allow for increased usability. The score sheet allows identification of specific task errors or omissions, other inefficiencies, rule breaks and strategy use (please contact the authors for further details regarding the manual: ddawson@research.baycrest.org).

Virtual MET (VMET – Rand, Rukan, Weiss & Katz, 2009)

The VMET was developed within the Virtual Mall, a functional video-capture virtual shopping environment that consists of a large supermarket with 9 aisles. The system includes a single camera that films the user and displays his/her image within the virtual environment. The VMET is a complex shopping task that includes the same number of tasks (items to be bought and information to be obtained) as the MET-HV. However, the client is required to check the contents of the shopping cart at a particular time instead of meeting the tester at a certain time. Virtual reality enables the assessor to objectively measure the client’s behaviour in a safe, controlled and ecologically valid environment. It enables repeated learning trials and adaptability of the environment and task according to the client’s needs.

What to consider before beginning:

The MET is performed in a real-world shopping area that allows for minor unpredicted events to occur.

Time:

The BMET takes approximately 60 minutes to administer (Dawson et al., 2009).

Training requirements:

It is advised that the assessor reads the test manual and becomes familiar with the procedures for test administration and scoring.

Equipment:

Access to a shopping precinct or virtual shopping environment
Pen and paper
Instruction sheet (according to version being used)

Client suitability

Can be used with:

The MET has been tested on populations with acquired brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Should not be used with:

The MET cannot be administered to patients who are confined to bed.
Participants require sufficient language skills.
Some tasks may need to be adapted depending on the rehabilitation setting.

In what languages is the measure available?

The MET was developed in English.

Summary

What does the tool measure?	The effect of executive function deficits on everyday functioning.
What types of clients can the tool be used for?	The Multiple Errands Test can be used with, but is not limited to, clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	Baycrest MET: approximately 60 minutes (Dawson et al., 2009).
Versions	Multiple Errands Test (MET) (Shallice and Burgess, 1991) MET – Simplified Version (MET-SV) (Alderman et al., 2003) MET – Hospital Version (MET-HV) (Knight, Alderman & Burgess, 2002) Virtual MET (Rand, Rukan, Weiss & Katz, 2009) Baycrest MET (Dawson et al., 2009) Modified version of the MET-SV and MET-HV (including 3 alternate versions) (Novakovic-Agopian et al., 2011, 2012)
Other Languages	N/A
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: One study reported adequate internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MET-HV in a sample of patients with chronic acquired brain injury including stroke. Test-retest: No studies have reported on the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the MET with a population of patients with stroke. Intra-rater: No studies have reported on the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the MET with a population of patients with stroke. Inter-rater: – One study reported excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the MET-HV in a sample of patients with chronic acquired brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. – One study reported adequate to excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the BMET in a sample of patients with acquired brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
ValidityThe degree to which an assessment measures what it is supposed to measure.	Criterion: Concurrent: No studies have reported on the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the MET in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population. Predictive: One study examined predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the MET-HV with a sample of patients with acquired brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and reported poor to adequate correlations between discharge MET-HV performance and community participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. measured by the Mayo-Portland Adaptability Inventory (MPAI-4). Construct: Convergent/Discriminant: – Three studies* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the MET-HV and reported excellent correlations with the Modified Wisconsin Card Sorting Test (MWCST), Behavioural Assessment of Dysexecutive Syndrome battery (BADS), Dysexecutive questionnaire (DEX), IADL questionnaire and FIM Cognitive score; and an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Rivermead Behavioural Memory Test (RBMT). – One study* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the MET-SV and reported adequate correlations with the Weschler Adult Intelligence Scale – Revised Full Scale IQ (WAIS-R FSIQ), MWCST, BADS and Cognitive Estimates test; and poor to adequate correlations with the DEX. – One study* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the BMET and reported adequate to excellent correlations with the Sickness Impact Profile and Assessment of Motor and Process Skills. – Three studies* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the VMET and reported excellent correlations with the MET-HV, BADS, IADL questionnaire, Semantic Fluencies test, Tower of London test, Trail Making Test, Corsi’s supra-span test, Street’s Completion Test and the Test of Attentional Performance. Note: Correlations between the MET and other measures of everyday executive functioning and IADLs used in these studies also provide support for the ecological validityRefers to the extent to which a measure captures behaviours that are reflective of those that would occur in a natural setting of the MET. Known Groups:* – Two studies reported that the MET-HV is able to differentiate between individuals with acquired brain injury (including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.) vs. healthy adults, and between healthy older adults vs. healthy younger adults. – One study reported that the MET-SV is able to differentiate between clients with brain injury including stroke vs. healthy adults. – One study reported that the BMET is able to differentiate between clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. vs. healthy adults. – Three studies reported that the VMET is able to differentiate between clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. vs. healthy adults, and between healthy older adults vs. healthy younger adults. SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." /Specificity: – One study reported 85% sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." and 95% specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative). when using a cut-off score ≥ 7 errors on the MET-HV with clients with chronic acquired brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. – One study reported 82% sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." and 95.3% specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative). when using a cut-off score ≥ 12 errors on the MET-SV with clients with brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Floor/Ceiling Effects	No studies have reported on the floor/ceiling effects of the MET.
Does the tool detect change in patients?	ResponsivenessThe ability of an instrument to detect clinically important change over time. of the MET has not been formally evaluated, however: – One study used a modified version of the MET-HV and MET-SV to measure change following intervention; – One study used the MET-HV and the VMET to detect change in multi-tasking skills of clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. following intervention.
Acceptability	The MET provides functional assessment of executive function as it enables clients to participate in real-world activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. .
Feasibility	Administration of the MET requires access to a shopping area and so is not always feasible in a typical clinical setting. Some tasks may need to be adapted depending on the rehabilitation setting. Administration time can be lengthy. Ecological validityRefers to the extent to which a measure captures behaviours that are reflective of those that would occur in a natural setting is supported.
How to obtain the tool?	The Baycrest MET can be obtained at https://cognitionandeverydaylifelabs.com/multiple-errands-test/

Psychometric Properties

Overview

A literature search was conducted to identify publications on the psychometric properties of the Multiple Errands Test (MET) relevant to a population of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Of the 10 studies reviewed, 8 included a mixed population of patients with acquired brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Studies have reviewed psychometric properties of the original MET, Hospital Version (MET-HV), Simplified Version (MET-SV), Baycrest MET (BMET) and Virtual MET (VMET), as indicated in the summaries below. While research indicates that the MET demonstrates adequate validity and reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
in populations with acquired brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., further research regarding responsivenessThe ability of an instrument to detect clinically important change over time.
of the measure is warranted.

Floor/Ceiling Effects

No studies have reported on floor/ceiling effects of the MET with a stroke population.

Reliability

Internal consistency:
Knight, Alderman & Burgess (2002) calculated internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MET-HV in a sample of 20 patients with chronic acquired brain injury (traumatic brain injury, n=12; strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., n=5, both TBI and strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., n=3) and 20 healthy control subjects matched for gender, age and IQ, using Cronbach’s alpha. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. was adequate (α=0.77).

Test-retest:
No studies have reported on the test-retest reliability of the MET.

Inter-rater:
Knight, Alderman & Burgess (2002) calculated inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the MET-HV error categories in a sample of 20 patients with chronic acquired brain injury (traumatic brain injury, n=12; stroke, n=5, both TBI and stroke, n=3) and 20 healthy control subjects matched for gender, age and IQ, using intraclass correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Participants were scored by 2 assessors. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was excellent (ICC ranging from 0.81-1.00). The ‘rule breaks’ error category demonstrated the strongest inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
(ICC=1.00).

Dawson, Anderson, Burgess, Cooper, Krpan and Stuss (2009) examined inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the BMET with clients with stroke (n=14) or traumatic brain injury (n=13) and healthy matched controls (n=25), using Intraclass Correlation Coefficients and 2-way random effects models. Participants were scored by two assessors. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was adequate to excellent for the five summary measures used: mean number of tasks completed accurately (ICC = 0.80), mean number of rules adhered to (ICC = 0.71), mean number of total errors (ICC = 0.82), mean number of total rules broken (ICC = 0.88) and mean number of requests for help (ICC = 0.71).

Validity

Content:

Shallice & Burgess (1991) evaluated the MET in a sample of 3 patients with traumatic brain injury (TBI) who demonstrated above-average performance on measures of general ability and normal or near-normal performance on frontal lobe tests, and 9 age- and IQ-matched controls. Participants were monitored by two observers and were scored according to number of errors (inefficiencies, rule breaks, interpretation failures, task failures and total score) and qualitative observation. The patients demonstrated qualitatively and quantitatively impaired performance, particularly relating to rule breaks and inefficiencies. The most difficult subtest was the least sensitive part of the procedure and presented difficulties for both patients and control subjects.

Criterion:

Concurrent:
No studies have reported on the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the MET in a stroke population.

Predictive:
Maier, Krauss & Katz (2011) examined predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the MET-HV in relation to community participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. with a sample of 30 patients with acquired brain injury including stroke (n=19). Community participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. was measured using the Mayo-Portland Adaptability Inventory (MPAI-4) ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. Index (M2PI), completed by the participant and a significant other. The MET-HV was administered 1 week prior to discharge from rehabilitation and the M2PI was administered at 3 months post-discharge. Analyses were performed using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
analysis and partial correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
controlling for cognitive status using FIM Cognitive scores. Predictably, higher MET-HV error scores correlated with more restrictions in community participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations.. There were adequate correlations between participants’ and significant others’ M2PI total score and MET-HV total error score (r = 0.403, 0.510 respectively), inefficiencies (r = 0.353, 0.524 respectively) and rule breaks (r = 0.361, 0.449 respectively). The ability for the MET total error score to predict the M2PI significant other score remained significant but poor following partial correction controlling for cognitive status using FIM Cognitive scores (r = 0.212).

Construct:

Convergent/Discriminant:
Knight, Alderman & Burgess (2002)* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MET-HV by comparison with tests of IQ and cognitive functioning, traditional frontal lobe tests and ecologically sensitive executive function tests, in a sample of 20 patients with chronic acquired brain injury (traumatic brain injury, n=12; strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., n=5, both TBI and strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., n=3). Tests of IQ and cognitive functioning included the National Adult Reading Test – Revised Full Scale Intelligence Quotient (NART-R FSIQ), Weschler Adult Intelligence Scale – Revised Full Scale Intelligence Quotient (WAIS-R FSIQ), Adult Memory and Information Processing Battery (AMIPB), Rivermead Behavioural Memory Test (RBMT) and Visual Objects and Space Perception battery (VOSP). Frontal lobe tests included verbal fluency, the Cognitive Estimates Test (CET), Modified Card Sorting Test (MCST), Tower of London Test (TOLT) and versions of the hand manipulation and hand alternation tests. Ecologically sensitive executive function tests included the Behavioural Assessment of the Dysexecutive Syndrome battery (BADS) and the Test of Everyday Attention (TEA) Map Search and Visual Elevator tasks. The Dysexecutive (DEX) questionnaire was also used, although proxy reports were used rather than self-reports due to identified lack of insight of individuals with brain injury. There were excellent correlations between the MCST percentage perseverative errors with MET-HV rule breaks (r=0.66) and MET-HV total errors (r=0.67) following Bonferroni adjustment. There were excellent correlations between the BADS Profile score and the MET-HV task failures (r = -0.58), interpretation failures (r = 0.64) and total errors (r = -0.57). There was an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the DEX intentionality factor and MET-HV task failures (r = 0.70). In addition, the relationship between the MET-HV and DEX was re-evaluated to control for possible confounding effects; controlling variables age, familiarity and memory function with respect to MET-HV task failures resulted in excellent correlations with the DEX total score (r = 0.79) and DEX inhibitionThe ability to suppress automatic actions that are inappropriate in a given context that interfere with a certain behavior (Grieve & Gnanasekaran, 2008)
(r = 0.69), intentionality (r = 0.76) and executive memory (r = 0.67) factors. There was an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the RBMT Profile Score and the MET-HV number of task failures (r=-0.57). There were no significant correlations between the MET and other tests of IQ and cognitive functioning (MET-HV, NART-R FSIQ, WAIS-R FSIQ, AMIPB, VOSP), and other frontal lobe tests (verbal fluency, CET, TOLT, hand manipulation and hand alternation tests), other ecologically sensitive executive function tests (TEA Map Search and Visual Elevator tasks) or other DEX factors (positive affect, negative affect).
Note: Initial correlations were measured using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient and significance levels were subsequently adjusted by Bonferroni adjustment to account for multiple comparisons; results reported indicate significant correlations following Bonferroni adjustment.

Rand, Rukan, Weiss & Katz (2009a)* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MET-HV by comparison with measures of executive function and IADLs with a sample of 9 patients with subacute or chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Spearman correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Executive function was measured using the BADS Zoo Map test and IADLs were measured using the IADL questionnaire. There were excellent negative correlations between the BADS Zoo Map and MET-HV outcome measures of total number of mistakes (r = -0.93), partial mistakes in completing tasks (r = -0.80), non-efficiency mistakes (r = -0.86) and time to complete the MET (r = -0.79). There were excellent correlations between the IADL questionnaire and the MET-HV number of mistakes of rule breaks (r = 0.80) and total number of mistakes (r = -0.76).

Maier, Krauss & Katz (2011)* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MET-HV by comparison with the FIM Cognitive score with a sample of 30 patients with acquired brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=19), using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
analysis. There was an excellent negative correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between MET-HV total errors score and FIM Cognitive score (r = -0.67).

Alderman, Burgess, Knight and Henman (2003)* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MET-SV by comparison with tests of IQ, executive function and everyday executive abilities with 50 clients with brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=9). Neuropsychological tests included the WAIS-R FSIQ, BADS, Cognitive Estimates Test, FAS verbal fluency test, a modified version of the Wisconsin Card Sorting Test (MWCST) and the DEX. There were adequate correlations between MET-SV task failure errors and WAIS-R FSIQ (r = -0.32), MWCST perseverative errors (r = 0.39), BADS profile score (r = -0.46) and Zoo-Map (r = -0.46) and Six Element Test (r = -0.41) subtests. There were adequate negative correlations between MET-SV social rule breaks and the Cognitive Estimates (r = -0.33), and between MET-SV task rule breaks, social rule breaks and total rule breaks and the BADS Action Program subtest (r = -0.42, -0.40, -0.43 respectively). There were poor to adequate negative correlations between the DEX and MET-SV rule breaks (r = -0.30), task failures (r = -0.25) and total errors (r = -0.37).

In a subgroup analysis of individuals with brain injury who passed traditional executive function tests but failed the MET-SV (n=17), there were adequate to excellent correlations between MET-SV inefficiencies and DEX factors of intentionality and negative affect (r = 0.59, -0.76); MET-SV interpretation failures and DEX inhibitionThe ability to suppress automatic actions that are inappropriate in a given context that interfere with a certain behavior (Grieve & Gnanasekaran, 2008)
and total (r = -0.67, -0.57); MET-SV total and actual rule breaks and DEX inhibitionThe ability to suppress automatic actions that are inappropriate in a given context that interfere with a certain behavior (Grieve & Gnanasekaran, 2008)
(r = -0.70, 0.66), intentionality (r = 0.60, 0.64) and total (r = -0.57, 0.59); MET-SV social rule breaks and DEX positive and negative affect (r = 0.79, -0.59); MET-SV task failures and DEX inhibitionThe ability to suppress automatic actions that are inappropriate in a given context that interfere with a certain behavior (Grieve & Gnanasekaran, 2008)
and positive affect (r = -0.58, -0.52), and MET-SV total errors and DEX intentionality (r = 0.67).

Dawson et al. (2009)* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the BMET by comparison with other measures of IADL and everyday function with 14 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
. Other measures included the DEX (significant other report), StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Profile (SIP), Assessment of Motor and Process Skills (AMPS) and Mayo Portland Adaptability Inventory (MPAI) (significant other report). There were excellent correlations between the BMET number of rules broken and the SIP – Physical (r = 0.78) and Affective behavior (r = 0.64) scores and the AMPS motor score (r = -0.75). There was an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the BMET time to completion and SIP physical score (r = 0.54).

Rand et al. (2009a)* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the VMET by comparison with the BADS Zoo Map test and IADL questionnaire with the same sample of 9 patients with subacute or chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Spearman correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. There was an excellent negative correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the BADS Zoo Map and VMET outcome measure of non-efficiency mistakes (r = -0.87), and between the IADL and VMET total number of mistakes (r = -0.82).

Rand et al. (2009a) also examined the relationships between the scores of the VMET and those of the MET-HV using Spearman and Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Among patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., there were excellent correlations between MET-HV and VMET outcomes for the total number of mistakes (r = 0.70), partial mistakes in completing tasks (r = 0.88) and non-efficiency mistakes (r = 0.73). Analysis of the whole population indicated adequate to excellent correlations between MET-HV and VMET outcomes for the total number of mistakes (r = 0.77), complete mistakes of completing a task (r = 0.63), partial mistakes in completing tasks (r = 0.80), non-efficiency mistakes (r = 0.72) and use of strategies (r = 0.44), but not for rule break mistakes.

Raspelli et al. (2010) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the VMET by comparison with neuropsychological tests, with 6 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and 14 healthy subjects. VMET outcome measures included time, searched item in the correct area, sustained attention, maintained sequence and no perseveration. Neuropsychological tests included the Trail Making Test, Corsi spatial memory supra-span test, Street’s Completion Test, Semantic Fluencies and Tower of London test. There were excellent correlations between the VMET variable ‘time’ and the Semantic Fluencies test (r = -0.87) and the Tower of London test (r = -0.82); between the VMET variable ‘searched item in the correct area’ and the Trail Making Test (r = 0.96); and between the VMET variables ‘sustained attention’, ‘maintained sequence’ and ‘no perseveration’ and Corsi’s supra-span test (r = 0.84) and Street’s Completion Test (r = -0.86).

Raspelli et al. (2012) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the VMET by comparison with the Test of Attentional Performance (TEA) with 9 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. VMET outcome measures included time, errors, inefficiencies, rule breaks, strategies, interpretation failures and partial-task failures. Authors reported excellent correlations between the VMET outcomes time, inefficiencies and total errors and TEA tests (range r = -0.67 to 0.81).
Note: Other neuropsychological tests were administered but correlations are not reported (Mini Mental Status Examination (MMSE), Beck DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Inventory (BDI), State and Trait Anxiety Index (STAI), Behavioural Inattention Test (BIT) – Star Cancellation Test, Brief Neuropsychological Examination (ENB) – Token Test, Street’s Completion Test, Stroop Colour-Word Test, Iowa Gambling Task, DEX and ADL/IADL Tests).
*Note: The correlations between the MET and other measures of everyday executive functioning and IADLs also provide support for the ecological validityRefers to the extent to which a measure captures behaviours that are reflective of those that would occur in a natural setting
of the MET (as reported by the authors of these articles).

Known Group:
Knight, Alderman & Burgess (2002) examined known-group validityThe degree to which an assessment measures what it is supposed to measure.
of the MET-HV in a sample of 20 patients with chronic acquired brain injury (traumatic brain injury, n=12; strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., n=5, both TBI and strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., n=3) and 20 healthy control subjects (hospital staff members) matched for gender, age and IQ*. Clients with brain injury made significantly more rule breaks (p=0.002) and total errors (p<0.001), and achieved significantly fewer tasks (p<0.001) than control subjects. Clients with brain injury used significantly more strategies such as looking at a map (p=0.008), reading signs (p=0.006), although use of strategies had little effect on test performance. The test was able to discriminate between individuals with acquired brain injury and healthy controls.
*Note: IQ was measured using the National Adult Reading Test – Revised Full Scale Intelligence Quotient (NART-R FSIQ).

Rand et al. (2009a) examined known group validityThe degree to which an assessment measures what it is supposed to measure.
of the MET-HV with 9 patients with subacute or chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., 20 healthy young adults and 20 healthy older adults, using Kruskal Wallis H. Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. made more mistakes than older adults on VMET outcomes of total mistakes, mistakes in completing tasks, partial mistakes in completing tasks and non-efficiency mistakes, but not rule break mistakes or use of strategies mistakes. Older adults made more mistakes than younger adults on VMET outcomes of total mistakes, partial mistakes in completing tasks and non-efficiency mistakes, but not mistakes in completing tasks, rule break mistakes or use of strategies mistakes.

Alderman et al. (2003) examined known group validityThe degree to which an assessment measures what it is supposed to measure.
of the MET-SV with 46 individuals with no history of neurological disease (hospital staff members) and 50 clients with brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=9), using a series of t-tests. Clients with brain injury made significantly more rule breaks (t = 4.03), task failures (t = 10.10), total errors (t = 7.18), and social rule breaks (chi square 4.3) than individuals with no history of neurological disease. Results regarding errors were preserved when group comparisons were repeated using age, familiarity and cognitive ability (measured by the NART-R FSIQ) as covariates (F = 11.79, 40.82, 27.92 respectively). There was a significant difference in task failures between groups after covarying for age, IQ (measured by the WAIS-R FSIQ) and familiarity with the shopping centre (F = 11.57). Clients with brain injury made approximately three times more errors as healthy individuals. For both groups, rule breaks and task failures were the most common errors.

Dawson et al. (2009) examined known group validityThe degree to which an assessment measures what it is supposed to measure.
of the BMET with 14 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and 13 healthy matched controls, using a series of t-tests. Clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. performed significantly worse on number of tasks completed accurately (d = 0.84, p<0.05), rule breaks (d = 0.92, p<0.05) and total failures (d = 1.05, r<0.01). The proportion of group members who completed fewer than 40% (< 5) tasks satisfactorily was also significantly different between the two groups (28% of clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. vs. 0% of healthy matched controls, p<0.05).
Note: d is the effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
; effect sizes ≥ 0.7 are considered large.

Rand et al. (2009a) examined known group validityThe degree to which an assessment measures what it is supposed to measure.
of the VMET with a sample of 9 patients with subacute or chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., 20 healthy young adults and 20 healthy older adults, using Kruskal Wallis H. Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. made more mistakes than older adults on all VMET outcomes except for rule break mistakes. Older adults made more mistakes than young adults on all VMET outcomes except for the use of strategies mistakes.

Raspelli et al. (2010) examined known group validityThe degree to which an assessment measures what it is supposed to measure.
of the VMET with 6 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and 14 healthy subjects. There were significant differences between groups in time taken to execute the task (higher for healthy subjects) and in the partial error ‘Maintained task objective to completion’.

Raspelli et al. (2012) examined known group validityThe degree to which an assessment measures what it is supposed to measure.
of the VMET with 9 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., 10 healthy young adults and 10 healthy older adults, using Kruskal-Wallis procedures. Results showed that clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. scored lower in VMET time and errors than older adults, and that older adults scored lower in VMET time and errors than young adults.

SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
/ SpecificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
:
Knight, Alderman & Burgess (2002) investigated sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of the MET-HV in a sample of 20 patients with chronic acquired brain injury (traumatic brain injury, n=12; strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., n=5, both TBI and strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., n=3) and 20 healthy control subjects matched for gender, age and IQ*. A cut-off score ≥ 7 errors (i.e. 5th percentile of total errors of control subjects) resulted in correct identification of 85% of participants with acquired brain injury (85% sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
, 95% specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
).
*Note: IQ was measured using the National Adult Reading Test – Revised Full Scale Intelligence Quotient (NART-R FSIQ).

Alderman et al. (2003) reported on sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of the MET-SV with 46 individuals with no history of neurological disease and 50 clients with brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=9). Using a cutoff score ≥ 12 errors (i.e. 5th percentile of controls) results in 44% sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
(i.e. correct classification of clients with brain injury) and 95.3% specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
(i.e. correct classification of healthy individuals). The authors caution that deriving a single measure based only on number of errors fails to consider between-group qualitative differences in performance. Accordingly, error scores were recalculated to reflect “normality” of the error type, with weighting of errors according to prevalence in the healthy control group (acceptable errors seen in up to 95% of healthy controls = 1; errors demonstrated by ≥ 5% of healthy controls = 2; errors unique to the patient group = 3). Using a cutoff score ≥ 12 errors (5th percentile of controls) resulted in 82% sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
and 95.3% specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
. The MET-SV was more sensitive than traditional tests of executive function (Cognitive Estimates, FAS Verbal Fluency, MWCST), and MET-SV error category scores were highly predictive of rating s of executive symptoms of patients who passed traditional executive function tests but failed the MET-SV shopping task.

Responsiveness

Two studies used the MET (MET-HV, VMET and modified version of the MET-HV & MET-SV) to measure change following intervention.

Novakovic-Agopian et al. (2011) developed a modified version of the MET-HV and MET-SV to be used in local hospital settings. They developed 3 alternate forms that were used in a pilot study examining the effect of goal-oriented attentional self-regulation training with a sample of 16 patients with chronic brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. or cerebral hemorrhage (n=3). A pseudo-random crossover design was used. During the first 5 weeks, one group (Group A) completed goal-oriented attentional self-regulation training while the other group (Group B) only received a 2-hour educational instructional session. In the subsequent phase, conditions were switched such that participants in Group B received goals training for 5 weeks while those in Group A received educational instruction. At week 5 the group that received goal training first demonstrated a significant reduction in task failures (p<0.01), whereas the group that received the educational session demonstrated no significant improvement in MET scores. From week 5 to week 10 there were no significant changes in MET scores in either group.

Rand, Weiss and Katz (2009b) used the MET-HV and VMET to detect change in multi-tasking skills of 4 clients with subacute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. following virtual reality intervention using the VMall virtual supermarket. Clients demonstrated improved performance on both measures following 3 weeks of multi-tasking training using the VMall virtual supermarket.

References

Alderman, N., Burgess, P.W., Knight, C., & Henman, C. (2003). Ecological validity of a simplified version of the multiple errands shopping test. Journal of the International Neuropsychological Society, 9, 31-44.
Dawson, D.R., Anderson, N.D., Burgess, P., Cooper, E., Krpan, K.M., & Stuss, D.T. (2009). Further development of the Multiple Errands Test: Standardized scoring, reliability, and ecological validity for the Baycrest version. Archives of Physical Medicine and Rehabilitation, 90, S41-51.
Knight, C., Alderman, N., & Burgess, P.W. (2002). Development of a simplified version of the Multiple Errands Test for use in hospital settings. Neuropsychological Rehabilitation, 12(3), 231-255.
Maier, A., Krauss, S., & Katz, N. (2011). Ecological validity of the Multiple Errands Test (MET) on discharge from neurorehabilitation hospital. Occupational Therapy Journal of Research: Occupation, Participation and Health, 31(1) S38-46.
Novakovic-Agopian, T., Chen, A.J.W., Rome, S., Abrams, G., Castelli, H., Rossi, A., McKim, R., Hills, N., & D’Esposito, M. (2011). Rehabilitation of executive functioning with training in attention regulation applied to individually defined goals: A pilot study bridging theory, assessment, and treatment. The Journal of Health Trauma Rehabilitation, 26(5), 325-338.
Novakovic-Agopian, T., Chen, A. J., Rome, S., Rossi, A., Abrams, G., DÃŠ¼esposito, M., Turner, G., McKim, R., Muir, J., Hills, N., Kennedy, C., Garfinkle, J., Murphy, M., Binder, D., Castelli, H. (2012). Assessment of Subcomponents of Executive Functioning in Ecologically Valid Settings: The Goal Processing Scale. The Journal of Health Trauma Rehabilitation, 2012 Oct 16. [Epub ahead of print]
Rand, D., Rukan, S., Weiss, P.L., & Katz, N. (2009a). Validation of the Virtual MET as an assessment tool for executive functions. Neuropsychological Rehabilitation, 19(4), 583-602.
Rand, D., Weiss, P., & Katz, N. (2009b). Training multitasking in a virtual supermarket: A novel intervention after stroke. American Journal of Occupational Therapy, 63, 535-542.
Raspelli, S., Carelli, L., Morganti, F., Poletti, B., Corra, B., Silani, V., & Riva, G. (2010). Implementation of the Multiple Errands Test in a NeuroVR-supermarket: A possible approach. Studies in Health Technology and Informatic, 154, 115-119.
Raspelli, S., Pallavicini, F., Carelli, L., Morganti, F., Pedroli, E., Cipresso, P., Poletti, B., Corra, B., Sangalli, D., Silani, V., & Riva, G. (2012). Validating the Neuro VR-based virtual version of the Multiple Errands Test: Preliminary results. Presence, 21(1), 31-42.
Shallice, T. & Burgess, P.W. (1991). Deficits in strategy application following frontal lobe damage in man. Brain, 114, 727-741.

See the measure

How to obtain the Multiple Errands Test?

See the papers below for test instructions of the Simplified Version (MET-SV) and the Hospital Version (MET-HV):

Alderman, N., Burgess, P.W., Knight, C., & Henman, C. (2003). Ecological validityRefers to the extent to which a measure captures behaviours that are reflective of those that would occur in a natural setting
of a simplified version of the multiple errands shopping test.Journal of the International Neuropsychological Society, 9, 31-44.
Knight, C., Alderman, N., & Burgess, P.W. (2002). Development of a simplified version of the Multiple Errands Test for use in hospital settings.Neuropsychological Rehabilitation, 12(3), 231-255.

The Baycrest MET can be obtained at https://cognitionandeverydaylifelabs.com/multiple-errands-test/

Trail Making Test (TMT)

Evidence Reviewed as of before: 22-04-2012

Author(s)*: Katie Marvin, MSc. PT

Editor(s): Nicol Korner-Bitensky, PhD OT; Annabel McDermott, OT

Purpose

The Trail Making Test (TMT) is a widely used test to assess executive function in patients with stroke. Successful performance of the TMT requires a variety of mental abilities including letter and number recognition mental flexibility, visual scanning, and motor function.

In-Depth Review

Purpose of the measure

The Trail Making Test (TMT) is a widely used test to assess executive abilities in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Successful performance of the TMT requires a variety of mental abilities including letter and number recognition mental flexibilityThe ability to shift between different thoughts and actions so that when a problem arises, one can draw upon past mistakes and successes and use this knowledge to plan solutions (Anderson, 2008)
, visual scanningDuring this intervention the person with USN is encouraged to conduct voluntary eye movements toward the neglected visual field (usually the left side) by performing a task in that hemispace. The treatment often includes a visual target that the patient uses as an anchor to direct voluntary gaze control while scanning.
, and motor function.

Performance is evaluated using two different visual conceptual and visuomotor tracking conditions: Part A involves connecting numbers 1-25 in ascending order; and Part B involves connecting numbers and letters in an alternating and ascending fashion.

Available versions

The TMT was originally included as a component of the Army Individual Test Battery and is also a part of the Halstead-Reitan Neuropsychological Test Battery (HNTB).

Features of the measure

Description of tasks:

The TMT is comprised of 2 tasks – Part A and B:

Part A: Consists of 25 circles numbered from 1 to 25 randomly distributed over a page of letter size paper. The participant is required to connect the circles with a pencil as quickly as possible in numerical sequence beginning with the number 1.
Part B: Consists of 25 circles numbered 1 to 13 and lettered A to L, randomly distributed over a page of paper. The participant is required to connect the circles with a pencil as quickly as possible, but alternating between numbers and letters and taking both series in ascending sequence (i.e. 1, A, 2, B, 3, C…).

What to consider before beginning:

The TMT requires relatively intact motor abilities (i.e. ability to hold and maneuver a pen or pencil, ability to move the upper extremity. The Oral TMT may be a more appropriate version to use if the examiner considers that the participant’s motor ability may impact his/her performance.
Cultural and linguistic variables may impact performance and affect scores.

Scoring and Score Interpretation:

Time taken to complete each task and number of errors made during each task are recorded and compared with normative data. Time to complete the task is recorded in seconds, whereby the greater the number of seconds, the greater the impairment.

In some reported methods of administration, the examiner pointed out and explained mistakes during the administration.

A maximum time of 5 minutes is typically allowed for Part B. Participants who are unable to complete Part B within 5 minutes are given a score of 300 or 301 seconds. Performance on Part B has not been found to yield any more information on strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity than performance on Part A (Tamez et al., 2011).

Ranges and Cut-Off Scores
	Normal	Brain-damage
TMT Part A	1-39 seconds	40 or more seconds
TMT Part B	1-91	92 or more seconds

Adapted from Reitan (1958) as cited in Matarazzo, Wiens, Matarazzo & Goldstein (1974).

Time:

Approximately 5 to 10 minutes

Training requirements:

No training requirements have been reported.

Equipment:

A copy of the measure
Pencil or pen
Stopwatch

Alternative versions of the Trail Making Test

Color Trails (D’Elia et al., 1996)
Comprehensive Trail Making Test (Reynolds, 2002)
Delis-Kaplan Executive Function Scale (D-KEFS) – includes subtests modeled after the TMT
Oral TMT – an alternative for patients with motor deficits or visual impairments (Ricker & Axelrod, 1994).
Repeat testing – alternate forms have been developed for repeat testing purposes (Franzen et al., 1996; Lewis & Rennick, 1979)
Symbol Trail Making Test – developed as an alternative to the Arabic version of the TMT, for populations with no familiarity with the Arabic numerical system (Barncord & Wanlass, 2001)

Client suitability

Can be used with:

Patients with stroke and brain damage.

Should not be used with:

Patients with motor deficiencies. If motor ability may impact performance, consider using the Oral TMT.

In what languages is the measure available?

Arabic, Chinese and Hebrew

Summary

What does the tool measure?	Executive function in patients with stroke.
What types of clients can the tool be used for?	The TMT can be used with, but is not limited to, patients with stroke.
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment tool
Time to administer	The TMT takes approximately 5 to 10 minutes to administer.
Versions	Color Trails Comprehensive Trail Making Test Delis-Kaplan Executive Function Scale (D-KEFS) Oral TMT Repeat testing – alternate forms have been developed for repeat testing purposes Symbol Trail Making Test
Other Languages	Arabic, Chinese and Hebrew
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Test-retest: Two studies examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the TMT among patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and found adequate to excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). .
ValidityThe degree to which an assessment measures what it is supposed to measure.	Content: One study examined the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension. of the TMT and found it to be a complex test that involves aspects of abstraction, visual scanningDuring this intervention the person with USN is encouraged to conduct voluntary eye movements toward the neglected visual field (usually the left side) by performing a task in that hemispace. The treatment often includes a visual target that the patient uses as an anchor to direct voluntary gaze control while scanning. and attention. Criterion: Predictive: Several studies have examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the TMT and have found Part B to be predictive of fitness to drive following stroke. Construct: Convergent: One study examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the TMT and found poor to adequate correlations with the Category Test, Wisconsin Card Sort Test, Paced Auditory Serial Addition Task and the Visual Search and Attention Test. Known groups: Three studies have examined the known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured. of the TMT and found that the TMT was able to differentiate between patients with and without brain damage however, it was not sensitive to differentiating between front and non-frontal brain damage.
Floor/Ceiling Effects	One study found Part A of the TMT to have significant ceiling effects.
Does the tool detect change in patients?	The responsivenessThe ability of an instrument to detect clinically important change over time. of the TMT has not formally been studied, however the TMT has been used to detect changes in a clinical trial with patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Acceptability	The TMT is simple and easy to administer.
Feasibility	The TMT is relatively inexpensive and highly portable. The TMT is public domain and can be reproduced without permission. It can be administered by individuals with minimal training in cognitive assessment.
How to obtain the tool?	The Trail Making Test (TMT) can be purchased from: http://www.reitanlabs.com

Psychometric Properties

Overview

A literature search was conducted to identify all relevant publications on the psychometric properties of the Trail Making Test (TMT) involving patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Floor/Ceiling Effects

In a study by Mazer, Korner-Bitensky and Sofer (1989) that investigated the ability of perceptual testing to predict on-road driving outcomes in patients with stroke, part A of the TMT was found to have significant ceiling effects. For this reason, Part A was excluded from study results as it was deemed too easy for participants when evaluating the ability of the TMT to predict on-road driving test outcomes. No ceiling effects for part B were found.

Reliability

Test-retest:
Matarazzo, Wiens, Matarazzo and Goldstein (1974) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the TMT and other components of the Halstead Impairment Index with 29 healthy males and 16 60-year old patients with diffuse cerebrovascular disease. Adequate test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was found for both Part A and Part B of the TMT in the healthy control group (r=0.46 and 0.44 respectively), as calculated using Pearson correlation coefficients. Excellent and adequate test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was found for Part A and Part B of the TMT respectively (r=0.78 and 0.67), among participants with diffuse cerebrovascular disease.

Goldstein and Watson (1989) investigated the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the TMT as a part of the Halstead- Reitan Battery in a sample of 150 neuropsychiatric patients, including patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Test-retest correlations were calculated using Pearson Correlation Coefficients for the entire sample and for the sub-group of patients with stroke. Excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
for both Part A and Part B were found (0.94 and 0.86 respectively) in the sub-group of patients with stroke; and adequate reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
for the entire participant sample (0.69 and 0.66 respectively).

Intra-rater:
No studies were identified examining the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the TMT in patients with stroke.

Inter-rater:
No studies were identified examining the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the TMT in patients with stroke.

Validity

Content:

O’Donnell, MacGregor, Dabrowski, Oestreicher & Romero (1994) examined the face validityA form of content validity, face validity is assessed by having 'experts' (this could be clinicians, clients, or researchers) review the contents of the test to see if the items seem appropriate. Because this method has inherent subjectivity, it is typically only used during the initial phases of test construction.
of the TMT in a sample of 117 community-dwelling patients, including patients with stroke. The results suggest that the TMT is a complex test that involves aspects of abstraction, visual scanningDuring this intervention the person with USN is encouraged to conduct voluntary eye movements toward the neglected visual field (usually the left side) by performing a task in that hemispace. The treatment often includes a visual target that the patient uses as an anchor to direct voluntary gaze control while scanning.
and attention.

Criterion:

Concurrent:
No studies were identified examining the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the TMT.

Predictive:
Mazer, Korner-Bitensky and Sofer (1998) examined the ability of the TMT and other measures of perceptual function to predict on-road driving test outcomes in 84 patients with subacute stroke. For Part B of the TMT, a cut-off score of < 3 errors demonstrated high positive predictive value (85%) and low negative predictive value (48%) for successful completion of driving evaluation. The Motor Free Visual Perception Test (MFVP) and the TMT Part B, when combined, demonstrated the highest predictive value for on-road driving test outcome. Participants who scored poorly on both the MFVP and TMT Part B had 22 times the likelihood of failing the on-road evaluation.

Devos, Akinwuntan & Nieuwboer (2011) conducted a systematic reviewA systematic review is a summary of available research on a given topic that compares studies based on design and methods. It summarizes the findings of each, and points out flaws or potentially confounding variables that may have been overlooked. A critical analysis of each study is done in an effort to rate the value of its stated conclusions. The research findings are then summarized, and a conclusion is provided.
to identify the best determinants of fitness to drive following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The TMT Part B was evaluated in 2 studies (Mazer et al., 1998 and Mazer et al., 2003) and found to be one of the best predictors of passing on-road driving evaluation tests (effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
= 0.81, p<0.0001). In addition, when using a cutoff score of 90 seconds, the TMT Part B had a sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of 80% and a specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of 62% for detecting unsafe on-road performance. In a subsequent systematic reviewA systematic review is a summary of available research on a given topic that compares studies based on design and methods. It summarizes the findings of each, and points out flaws or potentially confounding variables that may have been overlooked. A critical analysis of each study is done in an effort to rate the value of its stated conclusions. The research findings are then summarized, and a conclusion is provided.
by Marshall et al. (2007), the TMT was, again, found to be one of the most useful predictors of fitness for driving post-stroke.

Construct:

Convergent/Discriminant:
O’Donnell et al. (1994) examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the TMT and four other neuropsychological tests: Category Test (CAT), Wisconsin Card Sort Test (WCST), Paced Auditory Serial Addition Task (PASAT), and Visual Search and Attention Test (VSAT). The study involved 117 community-dwelling adults, including patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Poor to adequate correlations were found between the TMT and the other measures (CAT r=0.38; WCST r=0.31; PASAT r=0.44; VAST r=0.30), using Pearson product-moment correlations.

Known groups:
Reitan (1955) examined the ability of the TMT to differentiate between patients with and without organic brain damage, including patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Highly significant differences in mean and sum scores were found between the two groups (p<0.001) on both parts of the TMT, suggesting that the TMT is able to different between patients with and without brain damage.

Corrigan and Hinkeldey (1987) examined the relationship between Part A and Part B of the TMT. Data was collected from the charts of 497 patients receiving treatment at a rehabilitation centre. Patients with traumatic brain injury and stroke comprised a large majority of the sample. A difference (B-A) and a ratio (B/A) score were calculated. The difference score was highly correlated with intelligence and severity of impairment and only moderately correlated with age, education and memory functioning. The B/A ratio appeared to show greatest sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
to differences in cerebral lateralization of damage.

Tamez et al. (2011) examined the effects of frontal versus non-frontal stroke and severity of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. on TMT performance in 689 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The TMT, Digit Span and National Institute of Health StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale (NIHSS) were administered within 72 hours of hospital admission. Stroke severity was classified according to the NIHSS, and frontal or non-frontal lesions by CT or MRI scans. Performance on both Part A and Part B of the TMT were significantly correlated with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity using the NIHSS. Patients with frontal and non-frontal lesions were found to score equally on Part A and Part B (p>0.05). Results of this study suggest that the TMT is sensitive to brain damage, however, there is little evidence to support the widely held assumption that Trails B is more sensitive to frontal lesions than Part A.

Sensitivity/ Specificity:

No studies were identified examining the specificity of the TMT in patients with stroke.

Responsiveness

Barker-Collo, Feigin, Lawes, Senior and Parag (2000) assessed the course of recovery of attention span in 43 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. over a 6-month period. The TMT and other measures of attention were administered at baseline (within 4 weeks following stroke onset), 6 weeks, and 6 months after stroke. Although the responsivenessThe ability of an instrument to detect clinically important change over time.
of the TMT was not formally assessed in this study, the scale was sensitive enough to detect an improvement in attention at 6 weeks and 6 months following stroke.

References

Barker-Collo, S., Feigin, V., Lawes, C., Senior, H., & Parag, V. (2010). Natural history of attention deficits and their influence on functional recovery from acute stages to 6 months after stroke.Neuroepidemiology, 35(4), 255-262.
Barncord, S.W. & Wanlass, R.L. (2001). The Symbol Trail Making Test: Test development and utility as a measure of cognitive impairment. Applied Neuropsychology, 8, 99-103
Corrigan, J. D. & Hinkeldey, N. S. (1987). Relationships between Parts A and B of the Trail Making Test. Journal of Clinical Psychology, 43(4), 402-409.
D’Elia, L.F., Satz, P., Uchiyama, C.I. & White, T. (1996). Color Trails Test. Odessa, Fla.:PAR.
Devos, H., Akinwuntan, A. E., Nieuwboer, A., Truijen, S., Tant, M., & De Weerdt, W. Screening for fitness to drive after stroke: a systematic review and meta-analysis.Neurology, 76(8), 747-756.
Elkin-Frankston, S., Lebowitz, B.K., Kapust, L.R., Hollis, A.M., & O’Connor, M.G. (2007). The use of the Colour Trails Test in the assessment of driver competence: Preliminary reports of a culture-fair instrument. Archives of Clinical Neuropsychology, 22, 631-635.
Goldstein, G. & Watson, J.R. (1989). Test-retest reliability of the Halstead-Reitan Battery and the WAIS in a Neuropsychiatric Population. The Clinical Neuropsychologist, 3(3), 265-273.
O’Donnell, J.P., Macgregor, L.A., Dabrowski, J.J., Oestreicher, J.M., & Romero, J.J. (1994). Construct validity of neuropsychological tests of conceptual and attentional abilities. Journal of Clinical Psychology, 50(4), 596-560.
Mark, V. W., Woods, A. J., Mennemeier, M., Abbas, S., & Taub, E. Cognitive assessment for CI therapy in the outpatient clinic.Neurorehabilitation, 21(2), 139-146.
Marshall, S.C., Molnar, F., Man-Son-Hing, M., Blair, R., Brosseau, L., Finestone, H.M., Lamothe, C, Korner-Bitensky, N., & Wilson, K. (2007). Predictors of driving ability following stroke: A systematic review. Topics in Stroke Rehabilitation, 14(1):98-114.
Matarazzo, J.D., Wiens, A.N., Matarazzo, R.G., & Goldstein, S.G. (1974). Psychometric and clinical test-retest reliability of the Halstead Impairment Index in a sample of healthy, young, normal men. The Journal of Nervous and Mental Disease, 188(1), 37-49.
Mazer, B.L., Korner-Bitensky, N.A., & Sofer, S. (1998). Predicting ability to drive after stroke. Archives of Physical Medicine and Rehabilitation, 79, 743-750.
Mazer, B.L., Sofer, S., Korner-Bitensky, N., Gelinas, I., Hanley, J. & Wood-Dauphinee, S. (2003). Effectiveness of a visual attention retraining program on the driving performance of clients with stroke. Archives of Physical Medicine and Rehabilitation, 84, 541-550.
Reitan, R.M. (1955). The relation of the Trail Making Test to organic brain damage. Journal of Consulting Psychology, 19(5), 393-394.
Reynolds, C. (2002). Comprehensive Trail Making Test. Austin, Tex,: Pro-Ed.
Ricker, J.H. & Axelrod, B.N. (1994). Analysis of an oral paradigm for the Trail Making Test. Assessment, 1, 47-51.
Strauss, E., Sherman, E.M.S., & Spreen, O. (2006).A Compendium of neuropsychological tests: Administration, norms, and commentary.(3rd. ed.).NY. Oxford University Press.
Tamez, E., Myersona, J., Morrisb, L., Whitea, D. A., Baum C., & Connor, L. T. (2011). Assessing executive abilities following acute stroke with the trail making test and digit span.Behavioural Neurology, 24(3), 177-185.

See the measure

How to obtain the Trail Making Test (TMT)?

The Trail Making Test (TMT) can be purchased from:

Reitan Neuropsychology Laboratory
P.O. Box 66080
Tucson, AZ
85728

http://www.reitanlabs.com