ABILHAND

Evidence Reviewed as of before: 17-06-2012

Author(s)*: Annabel McDermott, OT

Editor(s): Nicol Korner-Bitensky, PhD OT

Purpose

The ABILHAND is a semi-structured item-response questionnaire that measures manual ability according to an individual’s perceived difficulty performing daily bimanual tasks.

In-Depth Review

Purpose of the measure

The ABILHAND is an interview-based assessment tool that measures a patient’s perceived difficulty using his/her hands to perform manual activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
in daily life. The ABILHAND assesses active function of the upper limbs. The tool measures an individual’s ability to perform bimanual tasks, regardless of strategies used to complete the task (Ashford et al., 2008; Penta et al., 1998)

Available versions

The ABILHAND was originally developed by Penta et al. (1998) as a 56-item, 4-level questionnaire of unimanual and bimanual ability for patients with rheumatoid arthritis. The original ABILHAND was intended to measure rehabilitation outcomes and to provide guidelines for goal setting in treatment planningPlanning ability involves anticipating future events, formulating a goal or endpoint, and devising a sequence of steps or actions that will achieve the goal or endpoint" (Anderson, 2008, p. 17)
(Gustafsson et al., 2004). Penta et al. (2001) found that patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. were able to complete unimanual activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
with the unaffected limb, regardless of hand dominance, whereas bimanual tasks were more difficult. Accordingly, a version was developed specifically for patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. that only included bimanual items, as well as two alternate unimanual’ activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
that require skillful use of the affected hand (cutting nails, filing nails). Penta et al. (2001) also reviewed the 4-level scoring criterion (impossible, very difficult, difficult, easy) and found that patients rarely used the very difficult’ score. This indicated that the two intermediate scoring criteria (very difficult, difficult) were not sufficiently differentially distinct. Accordingly, the strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. version of the ABILHAND was developed with a 3-level scoring criterion (impossible, any difficulty, easy).

Other impairment-specific versions were subsequently created with modified item sets and levels. Each version of the ABILHAND has its own Rasch-derived item difficulty calibrations that rely on computerized algorithms to obtain the patient’s overall measure from his/her responses (Simone et al., 2011).

Features of the measure

Items:

The ABILHAND is an inventory of 23 bimanual activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
(from most difficulty to least difficult):

Hammering a nail
Threading a needle
Peeling potatoes with a knife
Cutting own nails
Wrapping up gifts
Filing own nails
Cutting meat
Peeling onions
Shelling hazel nuts
Opening a screw-topped jar
Fastening zipper of jacket
Tearing open pack of chips
Buttoning up a shirt
Sharpening a pencil
Spreading butter on a slice of bread
Fastening a snap
Buttoning up trousers
Taking the cap off a bottle
Opening mail
Squeezing toothpaste on a toothbrush
Pulling up the zipper of trousers
Unwrapping a chocolate bar
Washing hands

Scoring:

The patient is asked to rate his/her perceived difficulty performing items without help, according to the following scoring criteria:

0 = impossible
1 = difficult
2 = easy

Tasks that the patient has not performed in the past 3 months are not scored and are encoded as missing responses.

The ABILHAND was developed using the Rasch measurement model, which provides a method to convert the ordinal raw score into a linear measure on a unidimensional scale. Item scores are entered into the WINSTEPS computer program, and raw ordinal data is converted to linear measures expressed in logits (log-odds probability units). The total score is scaled along a unidimensional continuum with 0 at the centre of the scale, whereby the higher the logit number, the greater the patient’s perceived ability (Gustafsson et al., 2004).

What to consider before beginning:

Users should note that self-estimated measures (i.e. when scores are not based on clinician observation of performance) are subject to overestimation or underestimation of actual performance, depending on motivation and cognitive skills (Penta et al. 2001).

Clinicians should consider patient factors such as self-esteem, insight, vision, hearing, language and cognitive function prior to administering the ABILHAND (Gustafsson et al., 2004).

Mpofu & Oakland (2010) advise caution when using the ABILHAND to measure improvements in impairment of the affected upper limb after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. rehabilitation. The ABILHAND does not take into consideration the arm used to perform a task or compensatory strategies employed to complete the task. Accordingly, improvement in scores may be based on use of compensatory strategies rather than on improvement in the affected arm.

Time:

The ABILHAND takes 10 to 30 minutes to administer (Ashford et al., 2008; Connell et al., 2012).

Training requirements:

No training requirements have been specified for the ABILHAND, although administration by a clinician is recommended (Ashford et al., 2008).

Equipment:

The ABILHAND is a semi-structured questionnaire that does not require specific equipment, however the WINSTEPS computer program is required to process raw scores.

Client suitability

Can be used with:

Individuals with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.
Individuals with rheumatoid arthritis
Individuals with systemic sclerosis

Should not be used with:

Due to the subjective nature of the patient’s reports, this measure should not be used with individuals with severe cognitive deficits (Penta et al., 2001).
The ABILHAND may not be suitable for use with patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) or apraxia (Gustafsson et al., 2004).

In what languages is the measure available?

French
English
Dutch
Italian
Swedish

Summary

What does the tool measure?	Manual ability of the upper extremity.
What types of clients can the tool be used for?	The ABILHAND can be used with, but is not limited to, patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	10-30 minutes
Versions	AH-RA for rheumatoid arthritis (46 items, 4 levels) AH-RA revised version (27 items, 3 levels) ABILHAND-ULA for upper limb amputees (22 items; 4 levels) SSC-adapted ABILHAND for systemic sclerosis (26 items, 3 levels) ABILHAND – neuromuscular age-independent version (22 items) ABILHAND-Kids (21 items)
Other Languages	French, English, Swedish, Dutch, Italian
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: – Order of difficulty of items has been confirmed by Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model. . – One study reported a high item reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest . index. – One study reported high person separation reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest . . Test-retest: No studies have reported on the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the ABILHAND. Intra-rater: No studies have reported on the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the ABILHAND. Inter-rater: No studies have reported on the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the ABILHAND.
ValidityThe degree to which an assessment measures what it is supposed to measure.	Content: – One study reported that the 23 items of the ABILHAND define a common continuum of manual ability, and items are coherent with the overall questionnaire and contribute to the measurement of manual ability. – One study examined stability of item difficulty of the ABILHAND and found that item hierarchy was substantially retained across different groupings (impairment, age, sex, ability). – One study reported that scores explained 84% of observed variance. The main factor across the residuals explained only 11.4% of the residual variance (1.8% of the total variance). Criterion: Concurrent: One study examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the ABILHAND among patients with chronic upper limb impairment resulting from conditions including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and reported adequate correlations with the Box and Block Test, Jamar handgrip and Purdue pegboard test, and an adequate negative correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Nine Hole Peg Test. Predictive: No studies have reported on the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the ABILHAND. Construct: Convergent/Discriminant: No studies have reported on the convergent/discriminant validityThe degree to which an assessment measures what it is supposed to measure. of the ABILHAND. Known Groups: – One study reported highly significant differences in ABILHAND scores between patients with tetraparesis, hemiparesis, other neurological impairments (multiple sclerosis, Parkinson’s disease, ataxia) and healthy subjects. – One study reported no correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between ABILHAND scores and country, age, sex, time since strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., affected side, lesion site or tactile sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." ; poor correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with grip strength and manual dexterity of the unaffected limb; poor negative correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. ; adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with grip strength and manual dexterity of the affected limb; and excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with upper limb motricity.
Floor/Ceiling Effects	No studies have reported on the floor/ceiling effects of the ABILHAND.
Does the tool detect change in patients?	– No studies have reported on the responsivenessThe ability of an instrument to detect clinically important change over time. of the ABILHAND. – One study reported that the ABILHAND demonstrates 92% sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." and 80% specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative). at a lower cutoff score of 80/100.
Acceptability	The ABILHAND is non-invasive and quick to administer. The items are considered reflective of real-life activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. (i.e. ecologically valid).
Feasibility	The ABILHAND is portable and is suitable for administration in various settings. The assessment is quick to administer and requires minimal specialist equipment or training.
How to obtain the tool?	The ABILHAND is available in Penta, M., Tesio, L., Arnould, C., Zancan, A., & Thonnard, J-L. (2001). The ABILHAND questionnaire as a measure of manual ability in chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. patients: Rasch-based validation and relationship to upper limb impairment. StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., 32, 1627-34

Psychometric Properties

Overview

A literature search was conducted to identify all relevant publications on the psychometric properties of the ABILHAND. While additional studies have been conducted on other ABILHAND versions, this review specifically addresses the psychometric properties of the 23-item strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. version of the ABILHAND, unless otherwise specified. Two studies were identified.

Floor/Ceiling Effects

No studies have reported on the floor or ceiling effects of the ABILHAND. However, given the hierarchical relationship of items, lower-level tasks of the ABILHAND may be susceptible to floor effects (Ashford et al., 2008).

Reliability

Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.:
Penta et al. (2001) examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the original 56-item ABILHAND in a sample of 103 patients with chronic stroke using Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
and reported high reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
(Rasch separation reliability=0.90; person separation reliability=0.90). The authors examined the stability of the scale through differential item functioning (DIF) tests among 12 subgroups: sex (male/female); country (Belgium/Italy); age (< 60/≥ 60), affected side (dominant/nondominant); delay since strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (< 2 years/≥ 2 years), level of depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
, dexterity and manual ability of the unaffected limb, grip strength, dexterity and sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of the affected limb, and motricity of the affected limb. The difficulty hierarchy of the ABILHAND was uniformly perceived by patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Simone et al. (2011) examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the ABILHAND in a sample of 126 patients with chronic upper limb impairment resulting from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=83), multiple sclerosis (n=17), peripheral or cerebellar ataxia (n=13), spinal cord lesion (n=10) or Parkinson’s disease (n=3), and 24 health subjects. The ABILHAND demonstrated high reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
(item reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
index=0.94; Cronbach’s alpha=0.99). All items of the ABILHAND fit the Rasch model satisfactorily. There were at least 4 strata of statistically different measures, indicating that variance across scores did not reflect randomness. The authors also examined stability of item difficulty through differential item functioning (DIF) by comparing 4 different groupings of the sample pool: impairment (hemiparesis vs. other); age (≤ 69 vs. > 69); sex (male vs. female); and ability (above median vs. below median). There was a very moderate DIF across the grouping criteria, whereby item hierarchy was substantially retained for all subgroups: impairment (1 outlier: buttoning a shirt); sex (6 outliers: fastening a snap, shelling hazel nuts, hammering a nail, wrapping up gifts, peeling potatoes, spreading butter); age (4 outliers: threading a needle, wrapping up gifts, spreading butter, fastening a snap); and ability (2 outliers: sharpening a pencil, cutting meat).

Test-retest:
No studies have reported on the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the ABILHAND.

Intra-rater:
No studies have reported on the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the ABILHAND.

Inter-rater:
No studies have reported on the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the ABILHAND. Note, however that inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
is less necessary because administration of the ABILHAND does not rely on clinician-observation of patient performance.

Validity

Content:

Penta et al. (2001) examined the measure of perceived difficulty of the ABILHAND in a sample of 103 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Item distribution ranged from 1.72 to -2.18 logits. All items fit the Rasch model and the 23 items define a common continuum of manual ability. All point measure correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients (RPM) were positive, indicating that all items are coherent with the overall questionnaire and contribute to the measurement of manual ability. Although fit statistics indicated that most activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
adequately measure recovery of manual ability in chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., 1 item obtained an outlier outfit value (buttoning up a shirt, mean square=1.64), and four items obtained outlier infit values (cutting meat, mnsq=0.69; shelling hazel nuts, mnsq=1.33; tearing open a packet of chips, mnsq=1.22; sharpening a pencil, mnsq=0.65).

Penta et al. (2001) examined the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension.
of the ABILHAND by comparing the ranking of item difficulty with expert opinion of four occupational therapists regarding the involvement of the affected hand in each activity. The following classifications were used: (1) the item does not require the affected limb, if it is broken down into several unimanual sequences; (2) the task requires the affected upper limb to stabilize an object but does not involve any fingers; and (3) the task requires precision grip, grip strength, dexterity or any digital activity from the affected side. Findings indicate that more difficult items also tend to require a greater degree of use of the affected limb, whereas easier items do not require the use of the affected limb.

Simone et al. (2011) examined the validityThe degree to which an assessment measures what it is supposed to measure.
of the ABILHAND in a sample of 126 patients with chronic upper limb impairment resulting from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=83), multiple sclerosis (n=17), peripheral or cerebellar ataxia (n=13), spinal cord lesion (n=10) or Parkinson’s disease (n=3), and 24 health subjects. Modeled scores explained 84% of observed variance. The main factor across the residuals explained only 11.4% of the residual variance (1.8% of the total variance).

Criterion:

Concurrent:
Simone et al. (2011) compared the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the ABILHAND, Jamar handgrip, Box and Block Test (BBT), Purdue pegboard test and Nine Hole Peg Test (NHPT) in a sample of 126 patients with chronic upper limb impairment resulting from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., multiple sclerosis, sensory or cerebellar ataxia, spinal cord lesion or Parkinson’s disease, and 24 healthy subjects, using Pearson’s r. Adequate correlations were found between the ABILHAND and the Jamar handgrip (r=0.377, p=0.001), BBT (r=0.481, p=0.000) and the Purdue pegboard test (r=0.493, p=0.000), and an adequate negative correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
was found between the ABILHAND and the NHPT (r=-0.370, r=0.007).

Construct:

Convergent/Discriminant:
No studies have reported on the convergent/discriminant validityThe degree to which an assessment measures what it is supposed to measure.
of the ABILHAND.

Known Group:
Penta et al. (2001) examined the relationship of the ABILHAND measures to other demographic and clinical variables in a sample of 103 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using univariate ANOVA and correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients (Mann-Whitney U test, Kruskal-Wallis H tests, Spearman p, Pearson r). Tests revealed no significant differences in ABILHAND measures according to demographic indexes of country (Belgium/Italy), sex or age. Clinical variables such as time since strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., affected side (dominant/nondominant), lesion site and tactile sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of either limb (measured using the Semmes-Weinstein tactile sensation test) were not significantly related to ABILHAND measures. There was a poor correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between ABILHAND measures and grip strength (Jamar handgrip, R=0.242, P<0.014) and manual dexterity (Box and Block Test, R=0.248, P=0.012) of the unaffected limb, and a poor negative correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
(Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale, p=-0.213, P=0.030). ABILHAND measures demonstrated an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with grip strength (R=0.562, P<0.001) and manual dexterity (R=0.598, P<0.001) of the affected limb, and an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with upper limb motricity (Brunnstrom upper limb motricity test, p=0.730, P<0.001). Results showed a direct relationship between ABILHAND measures of manual ability and impairment on the affected side, where more complex combinations of manual dexterity without/without grip strength and/or upper limb motricity impairment correlated with higher manual disability.

Simone et al. (2011) examined the known-group validityThe degree to which an assessment measures what it is supposed to measure.
of the ABILHAND in a sample of 126 patients with chronic upper limb impairment resulting from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., multiple sclerosis, sensory or cerebellar ataxia, spinal cord lesion or Parkinson’s disease, and 24 healthy subjects, using Kruskal-Wallis test. Highly significant differences (P<0.001) were found between patients with tetraparesis, hemiparesis, other neurological impairments (multiple sclerosis, Parkinson’s disease, ataxia) and control participants.

Responsiveness

Simone et al. (2011) reported a satisfactory match between the distribution of item difficulty levels and patients’ ability levels. The average ability of healthy controls vs. patients with chronic upper limb impairment resulting from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., multiple sclerosis, sensory or cerebellar ataxia, spinal cord lesion or Parkinson’s disease was 89 (standard error=8) vs. 63 (standard error=17).

SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
& SpecificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
:
Simone et al. (2011) examined the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of the ABILHAND in a sample of 126 patients with chronic upper limb impairment resulting from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., multiple sclerosis, sensory or cerebellar ataxia, spinal cord lesion or Parkinson’s disease, and 24 healthy subjects. An “impairment-normality” cut-off was computed through logistic regression and a lower cut-off score of 80/100 is proposed for healthy controls (area under ROC curve=0.9097, p<0.05). This allowed correct classification of patients vs. healthy controls with a 92% sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
rate and 80% specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
rate, whereby 82% of the sample was correctly classified.

References

Ashford, S., Slade, M., Malaprade, F., & Turner-Stokes, L. (2008). Evaluation of functional outcome measures for the hemiparetic upper limb: a systematic review. Journal of Rehabilitation Medicine, 40, 787-95.
Connell, L.A. & Tyson, S.F. (2012). Clinical reality of measuring upper-limb ability in neurological conditions: a systematic review. Archives of Physical Medicine and Rehabilitation, 93, 221-8.
Gustafsson, S., Sunnerhagen, K.S, & Dahlin-Ivanoff, D. (2004). Occupational therapists’ and patients’ perceptions of ABILHAND, a new assessment tool for measuring manual ability. Scandinavian Journal of Occupational Therapy, 11, 107-17.
Mpofu, E. & Oakland, T. (2010). Rehabilitation and Health Assessment: Applying ICF Guidelines. New York: Springer Publishing Company.
Penta, M., Tesio, L., Arnould, C., Zancan, A., & Thonnard, J-L. (2001). The ABILHAND questionnaire as a measure of manual ability in chronic stroke patients: Rasch-based validation and relationship to upper limb impairment. Stroke, 32, 1627-34.
Simone, A., Rota, V., Tesio, L., & Perucca, L. (2011). Generic ABILHAND questionnaire can measure manual ability across a variety of motor impairments. International Journal of Rehabilitation and Research, 34, 131-40.

See the measure

How to obtain the ABILHAND:

The ABILHAND is available in Penta, M., Tesio, L., Arnould, C., Zancan, A., & Thonnard, J-L. (2001). The ABILHAND questionnaire as a measure of manual ability in chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. patients: Rasch-based validation and relationship to upper limb impairment. StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., 32, 1627-34.

Action Research Arm Test (ARAT)

Evidence Reviewed as of before: 09-06-2011

Author(s)*: Sabrina Figueiredo, BSc

Editor(s): Lisa Zeltzer, MSc OT; Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Purpose

The Action Research Arm Test (ARAT) is an evaluative measure to assess specific changes in limb function among individuals who sustained cortical damage resulting in hemiplegia (Lyle, 1981). It assesses a client’s ability to handle objects differing in size, weight and shape and therefore can be considered to be an arm-specific measure of activity limitation (Platz, Pinkowski, Kim, di Bella, & Johnson, 2005).

In-Depth Review

Purpose of the measure

The Action Research Arm Test (ARAT) is an evaluative measure to assess specific changes in limb function among individuals who sustained cortical damage resulting in hemiplegiaComplete paralysis of the arm, leg, and trunk on one side of the body that results from damage to the parts of the brain that control muscle movements. Hemiplegia is not a progressive condition, nor is it a disease. (Lyle, 1981). It assesses a client’s ability to handle objects differing in size, weight and shape and therefore can be considered to be an arm-specific measure of activity limitation (Platz, Pinkowski, Kim, di Bella, & Johnson, 2005).

Available versions

The ARAT was developed by Ronald Lyle in 1981 by adapting the Upper Extremity Function Test (UEFT) (Carroll, 1965). The UEFT test administration and scoring was simplified, the time required to administer the test was shorted, and items were grouped based on the hierarchical scale (Guttman Scale) (Lang, Wagner, Dromerick, & Edwards, 2006). Due to the need for more specific and detailed instructions related to the client’s position, scoring and test administration, Yozbatiran, Der-Yeghiaian, and Cramer (2008) proposed a standardized approach to the ARAT.

Features of the measure

Items:

The ARAT consists of 19 items grouped into four subscales: grasp, grip, pinch, and gross movement. Each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
constitutes a hierarchical Guttman scale, which means that all items are ordered according to ascending difficulty. In the ARAT, if the client succeeds in completing the most difficult item in a subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
, this suggests he/she will succeed in the easier items for that same subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
. Similarly, failure on an item suggests the client will be unable to complete the remaining more challenging items in the subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
.

According to the rules defained by Lyle (1981), the client must first try to perform the most difficult task in a subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
. If the maximum score (score = 3) is obtained for this task then the maximum score for this entire subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
should be assigned, and the evaluator should move to the next subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
to be administered. When the client is unable to complete the most difficult item (scoring between 0-2), then the easiest item in this specific subscale should be performed. If the client fails completely (score = 0) when performing the easiest task, then the other intermediate items must not be tested, the entire subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
should be scored as zero, and the evaluator should then move to the next subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
. However, if the client succeeds at the easiest task either partially (score = 1 or 2) or completely (score = 3), then all the other tasks in that same subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
should be tested before moving to the next subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
. Following these rules, the items administered will range from a minimum of 4 to a maximum of 19 (van der Lee, Roorda, & Lankhorst, 2002).

The ARAT must be administered in a formal setting, since a specially designed table and chair are required (see equipment section for more information). For the starting position, the client should be seated in a chair, with a firm back and no armrests. The client’s trunk should be in contact with the back of the chair at all times during the test performance. Instructions about the required seating posture should be provided to the client prior to initiating the test. Additionally, reminders about the maintenance of this position should be given to the client when this condition is not respected. The client’s feet should be in contact with the floor throughout testing (van der Lee, DeGroot, Beckerman, Wagenaar, Lankhorst, & Bouter, 2001a; Yozbatiran et al., 2008). Both hands should be tested, beginning with the non- or less-affected hand, in order to practice and register baseline scores. Should the client be unable to understand the instructions for the required task, the evaluator should demonstrate the task and allow the client to try it as a trial (Yozbatiran et al., 2008). To facilitate recording the time for each task, the client’s hands should start and finish the task with palms down on the table. However, for the gross movement tasks, the client’s hands should be placed pronated on their lap. (Lyle, 1981; Yozbatiran et al., 2008).

In the grasp and pinch subscales, testing materials are lifted 37 cm from the surface of the table to the top of the shelf. In the grip subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
, testing materials are moved from one side of the table to the other. Finally, in the gross movement subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
, the client is requested to place the hand being tested either behind his/her head, on top of his/her head, or to his/her mouth (Lyle, 1981; Hsieh, Hsueh, Chiang, & Lin, 1998; Hsueh, Lee, & Hsieh, 2002a). The proper sequence for testing is 1) grasp subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
, 2) grip subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
, 3) pinch subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
, 4) gross movement subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
(Lyle, 1981). The ARAT comes with simple instructions to guide the evaluator on scoring and administering the test (Lyle, 1981).

Scoring:

The ARAT is scored on a four-level ordinal scale (0-3) (Lyle, 1981).

0 = can not perform any part of the test,
1 = performs the test partially,
2 = completes the test, but takes abnormally long, time
3 = performs the test normally

In order to facilitate scoring, time limits have been suggested (Wagenaar, Meijer, van Wierinen, Kuik, Hazenberg, Lindeboom, Wichers, & Rijswijk, 1990; Yozbatiran et al., 2008). Incorporating the time limits to Lyle’s scoring definition, the new scoring system would be:

0 = cannot perform any part of the test;
1 = performs the test partially;
2 = completes the test, but takes an abnormally long time, varying from 5 to 60 seconds.
If a client takes more than 60 seconds to perform an item, the evaluator should interrupt after 60 seconds and a score of 1 is given on that specific item.
3 = performs the test normally in less than 5 seconds.

The subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
scores range according to the number of items on each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
, as follows:

Subscales on the ARAT	Number of items per subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).	Score ranges per subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
Grasp subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).	6 items	Score 0-18
Grip subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).	4 items	Score 0-12
Pinch subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).	6 items	Score 0-18
Gross Movement subscale	3 items	Score 0-9

The total score on the ARAT ranges from 0 to 57, with the lowest score indicating that no movements can be performed, and the upper score indicating normal performance. Thus, higher scores will indicate better performance (Lang et al., 2006; van der Lee et al., 2002). The ARAT scores is a continuous measure, with no categorical cutoff scores. Therefore the score obtained at the ARAT does not allow classifying the clients into categories such as normal, mild limited, or severely limited.

Time:

The time required to complete the ARAT will depend on the number of items administered. Based on its hierarchical design, the ARAT was constructed to save testing time. Thus, no more than 7-10 minutes should be required to assess a client with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (DeWeerdt, & Harrinson, 1985). However, if all 19 items are performed, the ARAT usually takes 20 minutes to administer (van der Lee et al., 2002). In one study by Hsieh and colleagues (1998), the ARAT took, on average, 8 minutes to administer to clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Subscales:

The ARAT is divided in four subscales: Grasp; Grip; Pinch and Gross movement.

The grasp and pinch subscales have 6 items each, the grip subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
has 4 items, and the gross movement has 3 items (Lyle, 1981).

Equipment:

Standardized equipment is required to administer the ARAT. It can be ordered only from Netherlands’ representatives. The average cost for this equipment is approximately 850 Euros ($1200 CAD) with an additional delivery fee of 179 Euros ($252 CAD).

The complete ARAT kit consists of:

A specially designed table of 92cm x 45cm x 83cm high, with a shelf of 93cm x 10cm, positioned 37cm above the main surface of the table (Lyle, 1981; Hsueh et al., 2002a).
A chair with back rest and no arm rests, that should be placed 44cm above floor level (Lyle, 1981; Hsueh et al., 2002a).
Woodblocks of 2.5, 5, 7.5 and 10cm³ (Lyle, 1981; Hsueh et al., 2002a).
A cricket ball 7.5cm in diameter (Lyle, 1981; Hsueh et al., 2002a).
Two alloy tubes: one 2.25cm in diameter x 11.5 cm long, the second one 1.0cm in diameter x 16cm long (Lyle, 1981; Hsueh et al., 2002a).
A washer and bolt; which is a type of screw with its anchor (Lyle, 1981; Hsueh et al., 2002a).
Two glasses (Lyle, 1981; Hsueh et al., 2002a).
A marble 1.5cm in diameter (Lyle, 1981; Hsueh et al., 2002a).
A ball bearing 6mm in diameter (Lyle, 1981; Hsueh et al., 2002a).
A stopwatch (Wagenaar et al., 1990; Yozbatiran et al., 2008)
Paper and pencil for the evaluator.

Training:

None typically reported.

Alternative forms of the Action Research Arm Test

None.

Client suitability

Can be used with:

The ARAT was constructed for assessing recovery of upper limb function following cortical damage (Lyle, 1981).
Clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Should not be used in:

When administering the ARAT for clients with finger amputation, pinch subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
should be scored as 0 as well all other tasks that require movement of an amputated body part (Yozbatiran et al., 2008).

In what languages is the measure available?

There are no official translations of the ARAT.

Nevertheless, some peer-reviewed publications from the Netherlands and Taiwan have used the ARAT as an outcome measure, which may indicate that instructions have been informally translated to other languages (Hsieh et al., 1998; Hsueh et al., 2002a; van der Lee et al., 2002).

Summary

What does the tool measure?	The ARAT measures specific changes in limb function among individuals who sustained cortical damage resulting in hemiplegiaComplete paralysis of the arm, leg, and trunk on one side of the body that results from damage to the parts of the brain that control muscle movements. Hemiplegia is not a progressive condition, nor is it a disease..
What types of clients can the tool be used for?	The ARAT can be used with, but is not limited to clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	An average of 7 to 10 minutes.
Versions	There are no alternative versions.
Other Languages	There are no official translations.
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: One study examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the ARAT and reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. using Cronbach’s alpha. Test-retest: Three studies have examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the ARAT. All reported excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). using ICCs. Intra-rater: Four studies have examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the ARAT and reported excellent intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. using Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. , intraclass correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. coefficients (ICC) and weighted kappa. Inter-rater: Seven studies examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the ARAT and reported excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. using Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. , Intra ICC and weighted kappa.
ValidityThe degree to which an assessment measures what it is supposed to measure.	Criterion: Concurrent: One study has examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the ARAT and reported adequate to excellent correlations with the Box and Block Test (BBT) and the Nine-Hole Peg Test (NHPT) at pre and post-treatment. Predictive: No studies have examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the ARAT. Construct: Convergent: Seven studies examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the ARAT and reported excellent correlations between the ARAT and the Brunnstrom-Fugl-Meyer test; the upper extremity subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). of the Motor Assessment scale; the Motricity Index; the upper extremity movement of Modified Motor Assessment Chart; the BTT; the motor function subscore of the Fugl-Meyer test; the Hemispheric StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale; upper extremity strength and grasp speed. Adequate correlations were reported between the ARAT and the passive joint motion/joint pain of the Fugl-Meyer test, the Functional Independence Measure and spasticityInvoluntary muscle tightness and stiffness that can occur after a stroke. It is characterized by exaggerated deep tendon reflexes that interfere with muscular activity, gait, movement, or speech. Spasticity can increase initially but wane down later on, after stroke. . Poor correlations were reported between the ARAT and the sensation score of the Fugl-Meyer test, the Ashworth scale, the Modified Barthel Index, the National Institutes of Health StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale, the light touch sensation and pain.
Floor/Ceiling Effects	– One study examined the floor/ceiling effects of the ARAT in clients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and reported that at earlier phases of the strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., floor effects were poor. At discharge from the acute rehabilitation ward, ceiling effects on the ARAT were adequate. – One study examined the floor/ceiling effects of the ARAT in strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. clients with mild to moderate hemiparesis and reported adequate floor and ceiling effects.
SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." / SpecificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).	No studies have examined the specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative). of the ARAT.
Does the tool detect change in patients?	Six studies have examined the responsivenessThe ability of an instrument to detect clinically important change over time. of the ARAT and reported that the ARAT has a moderate to large Standardized Response MeanThe standardized response mean (SRM) is calculated by dividing the mean change by the standard deviation of the change scores. , moderate to large effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation". and large responsivenessThe ability of an instrument to detect clinically important change over time. ratio, therefore, is able to detect change in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Acceptability	When administering the ARAT to clients with upper extremity amputations attention is required when scoring (i.e. – a score of 0 is given).
Feasibility	The administration of the ARAT is quick and simple, but requires standardized equipment.
How to obtain the tool?	Information on the ARAT can be obtained in the study by Lyle (1981), Hsieh et al. (1998), van der Lee et al. (2002), Rabadi & Rabadi (2006), and Yozbatiran et al. (2008) and at the website: http://www.aratest.eu/Index_english.htm Standardized equipment can be purchased from the following website: http://www.aratest.eu/ or from http://www.saliarehab.com/

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the Action Research Arm Test (ARAT) in individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. We identified twelve studies. The ARAT appears to be floor effects.

Floor/Ceiling Effects

Hsueh and Hsieh (2002b) examined floor and ceilings effects for the ARAT and the Upper Extremity Motor Assessment Scale (Carr, Shepherd, Nordholm, & Lynne, 1985) in 48 clients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were assessed at admission and discharge from an acute rehabilitation ward. At admission, the ARAT total score demonstrated a poor floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect."
, with 52.1% of participants scoring 0. Although all subscales were classified as having a poor floor effect, when comparing ARAT’s subscales among themselves, 72.9% of participants were unable to perform the pinch subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
, 70.8% were unable to perform both grasp and grip subscales and 52.1 % were unable to complete the gross movement subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
. At discharge, the ARAT total score demonstrated an adequate ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect.", with only 7% of participants scoring the maximal value. When analyzing ARAT’s subscales individually the gross movement subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
presented the poorest ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect.", with 29.2% of participants scoring the maximum score, followed by 27% of participants on the grasp subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
. The grip and pinch subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
had the best classification, with an adequate ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." of 18.8% and 16.7%, respectively.

Compared to the ARAT, at admission the Upper Extremity Motor Assessment Scale had 58% of participants scoring the minimal value, indicating a poor floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect."
. However, at discharge the Upper Extremity Motor Assessment Scale demonstrated a more adequate ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." than the ARAT, with only 4.3 % of participants obtaining the maximum score.

Reliability

Internal ConsistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.:
Nijland et al. (2010) investigated the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the ARAT in 40 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. with mild to moderate hemiparesis. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the ARAT, as calculated using Cronbach’s Coefficient Alpha was excellent (α = 0.98).

Test-retest:
Note: From the descriptions provided of the following studies it appears that some authors called the testing test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
while others called the same analysis intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
.

Lyle (1981) examined test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
in 20 individuals who sustained cortical damage, either from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. or traumatic brain lesion. The mean age was 53 years, ranging from 26 to 72 years. Participants were re-assessed with a 1-week interval by the same rater and under the same conditions. The test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
, as calculated using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, was excellent (r = 0.98).

Hsueh, Lee, and Hsieh (2002a) evaluated test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
performed using a regular table instead of the specially designed table for this test in 61 individuals with sub-acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and a mean age of 63 years old. Participants were re-assessed after a two-day interval by the same rater. The test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
, as calculated using the Intraclass Correlation Coefficient (ICC)Intraclass correlation (ICC) is used to measure inter-rater reliability for two or more raters. It may also be used to assess test-retest reliability. ICC may be conceptualized as the ratio of between-groups variance to total variance., was excellent for the total score (ICC = 0.99) as well as for the grasp, grip, pinch and gross movement subscales (ICC = 0.99, 0.98, 0.96 and 0.95, respectively).

Platz, Pinkowski, van Wijck, Kim, di Bella, and Johnson (2005) estimated test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
for the ARAT, the Box and Block Test (Cromwell, 1965; Mathiowetz, Volland, Kashman, & Weber, 1985a), and the Fugl-Meyer Test upper extremity items (including items from the Motor function, Sensation and Passive Joint Motion/Joint pain subscores) (Fugl-Meyer, Jääskö, Leyman, Olsson, & Steglind, 1975) in 23 participants with upper extremity paresis either from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., multiple sclerosis, or traumatic brain injury. The participant’s most affected arm was re-assessed 1 week later by the same rater. The test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the ARAT total score, as calculated using ICC’s and Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, was excellent (ICC = 0.96 and rho = 0.96). Furthermore, test-retest reliabilities for each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
were all excellent: grasp (ICC = 0.94 and rho = 0.96), grip (ICC = 0.94 and rho = 0.95), pinch (ICC = 0.89 and rho = 0.89) and gross movement (ICC = 0.97 and rho = 0.97).
Note: These results applies only to the most affected upper limb.

Intra-rater:
Wagenaar, Meijer, van Wierinen, Kuik, Hazenberg, Lindeboom, Wichers and Rijswijk (1990) evaluated intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
in seven patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The timeframe for assessments were not provided by the author. Intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
as calculated using Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, was excellent (rho = 0.99).

Van der Lee, DeGroot, Beckerman, Wagenaar, Lankhorst, and Bouter (2001a) estimated intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
in 20 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and a median age of 62 years. Participants were evaluated by the same rater at three points in time. At the baseline assessment participants were videotaped. The second assessment was 4-27 months following the first assessment, and the final assessment was 4-6 weeks after. Scoring the last two assessments was based on the videotaped recorded at baseline. Intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
results were analyzed between the two first assessments, where scoring sources were different (live vs. videotape) and between the two last assessments, were scoring sources were the same (videotape only). Intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
, as calculated using ICC and Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, was excellent (ICC = 0.99 and rho = 0.99), independent of scoring sources. Intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
, as calculated using weighted kappa was also excellent: scoring with the same information source resulted in a kappa = 1.00 versus only a slightly lower kappa when scoring from two different information sources (kappa = 0.94). The gross movement subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
showed the lowest weighted kappa value (kappa = 0.83), suggesting that this subscale had the lowest agreement level.

Yozbatiran, Der-Yeghiaian, and Cramer (2008) examined intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
in 8 clients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were re-assessed by the same rater and under the same conditions with a 1-week interval. Intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
for the total score, as calculated using ICC and Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, was excellent (ICC = 0.99 and rho = 0.99). Additionally, the same excellent level of intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
was found for the grasp, grip, pinch, and gross motor movement subscales (ICC = 0.98 and rho = 0.93; ICC = 0.97 and rho = 0.93; ICC = 0.99 and rho = 0.98; ICC = 0.93 and rho = 0.91, respectively).

Nijland et al. (2010) investigated the psychometric properties of the ARAT and Wolf Motor Function Test in 40 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. with mild to moderate hemiparesis. 18 patients participated in the reproducibility testing of the ARAT and were assessed twice by the same observer approximately 10 days apart. Intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
, as analyzed using the ICC was found to be excellent (ICC = 0.97).

Inter-rater:
Lyle (1981) examined inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
in 20 individuals who had sustained cortical damage, either from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. or traumatic brain injury. The mean age was 53 years, ranging from 26 to 72 years. Participants were assessed independently by two different raters. Agreement between raters as calculated using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, was excellent (r = 0.99).

Hsieh, Hsueh, Chiang, and Lin (1998) assessed inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
in 50 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Their mean age was 65 years old. Participants were evaluated independently, on three different days, by three raters. ICC for the total score showed excellent agreement (ICC = 0.98). Agreement between raters was also excellent for grasp, grip, pinch and gross movement subscales (ICC = 0.98; ICC = 0.96; ICC = 0.96; ICC = 0.95, respectively).

Van der Lee et al. (2001a) estimated inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
in 20 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and a median age of 62 years old. Participants were videotaped and scored independently by two raters. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
, as calculated using ICC, weighted kappa, and Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, was excellent (ICC = 0.98; kappa = 0.93; rho = 0.99). With respect to the individual subscales, the gross movement scale had the lowest weighted kappa value (kappa = 0.87), suggesting this subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
has the lowest agreement between raters.

Hsueh, Lee, and Hsieh (2002a) evaluated inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the ARAT performed with a regular table instead of the specially designed table for this test in 61 individuals with sub-acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and a mean age of 63 years old. Participants were re-assessed with a two-day interval by three different raters. ICC for the total score showed excellent agreement (ICC = 0.99) as well as for grasp, grip, pinch and gross movement subscales (ICC = 0.99; ICC = 0.98; ICC = 0.96; ICC = 0.94, respectively).

Platz et al. (2005) analyzed inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the ARAT, the Box and Block Test and the Fugl-Meyer Test upper extremity items (including items from the Motor function, Sensation and Passive Joint Motion/Joint pain subscores) in 44 individuals with upper limb paresis either from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., multiple sclerosis, or traumatic brain injury. Participants had the most affected arm videotaped and scored independently by two raters. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
for the ARAT total score, as calculated using the ICC and Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, was excellent (ICC = 0.99 and rho = 0.99). Additionally, the scores for each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
were provided and inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
for grasp (ICC = 0.99 and rho = 0.99), grip (ICC = 0.96 and rho = 0.95), pinch (ICC = 0.99 and rho = 0.99) and gross movement (ICC = 0.98 and rho = 0.98) subscales were all excellent.
Note: These results applies only to the most affected upper limb.

Yozbatiran et al. (2008) evaluated inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
in 9 clients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were scored simultaneously and independently by two raters. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
for the total score, as calculated using the ICC and Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, was excellent (ICC = 0.99 and rho = 0.96). The same excellent level of inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was found for the grasp, grip, pinch and gross motor movement subscales (ICC = 0.99 and rho = 1; ICC = 0.99 and rho = 0.99; ICC = 0.99 and rho = 0.98; ICC = 0.97 and rho = 0.93, respectively).

Nijland et al. (2010) investigated the psychometric properties of the ARAT and Wolf Motor Function Test in 40 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. with mild to moderate hemiparesis. 18 patients participated in the reproducibility testing of the ARAT and were assessed in random order by two observers, within one week. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
, as analyzed using the ICC was found to be excellent (ICC = 0.92).

Validity

Content:

Lyle, 1981 generated the 19 ARAT items from the 33 items of the Upper Extremity Function Test (UEFT – Caroll, 1965). Item reduction was based on a low inter-item correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, on item redundancy, confirmed through a very high inter-item correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
(above r = 0.9) and on items that were extremely difficult to perform. Nevertheless, ARAT items were not based on a theoretical model (Finch, Brooks, Stratford, & Mayo, 2002).

Criterion:

Concurrent:
No gold standardA measurement that is widely accepted as being the best available to measure a construct.
exists against which to compare the ARAT.

Lin, Chuang, Wu, Hsieh and Chang (2010) compared the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the ARAT, Box and Block Test (BBT) and Nine-Hole Peg Test (NHPT) for evaluating hand dexterity in 59 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The Fugl-Meyer Assessment of Sensorimotor Recovery After StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (FMA), Motor Activity Log (MAL) and StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale (SIS) were also administered to assess the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the ARAT, BBT and NHPT. Using Spearman rank correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient, the ARAT, BBT and NHPT were found to have adequate to excellent correlations at pre-treatment (ranging from rho=-0.55 to -0.80) and post-treatment (ranging from rho=-0.57 to -0.71). In addition, the ARAT and BBT were found to have adequate correlations with the FMA, MAL and SIS (ranging from rho=0.31-59); however, the NHPT had only poor to adequate correlations with the FMA and MAL (ranging from rho=-0.16 to -0.33); and adequate to excellent correlations with the SIS (ranging from rho=-0.58 to -0.66). When considering both the results of responsivenessThe ability of an instrument to detect clinically important change over time.
and validation components of the study, the ARAT and BBT are believed to be more appropriate than the NHPT for evaluating dexterity.

Construct:

Convergent/Discriminant:
DeWeerdt and Harrison (1985) evaluated the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the ARAT by comparing it to the Fugl-Meyer test (Fugl-Meyer et al., 1975) in 53 clients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Their mean age was 68 years. Correlations were calculated at two points in time after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. onset using Spearman correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. Excellent correlations were found between the ARAT and Fugl-Meyer test at 2 months (rho = 0.91) and at 8 months (rho = 0.94) post-stroke.

Wagenaar, Meijer, van Wierinen, Kuik, Hazenberg, Lindeboom, Wichers and Rijswijk (1990) evaluated the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the ARAT by comparing it to the Sollerman test (Jacobson-Sollerman & Sperling, 1977) in seven patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. An excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, as calculated using Spearman rho, was found (rho = 0.94).
Note: The Sollerman test measures hand grip function using 20 different daily life activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
requiring hand movements.

Hsieh et al. (1998) assessed convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the ARAT by comparing it to the Upper Extremity portion of the Motor Assessment Scale (Carr et al., 1985), the arm subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
of the Motricity Index (Demeurisse, Demol, & obaye, 1980), and the upper extremity movements of the Modified Motor Assessment Chart (Lindmark & Hamrin, 1988) in 50 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The mean age of clients was 65 years old. Correlations were calculated using Pearson CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
Coefficients. Excellent correlations were found between the ARAT and the Upper Extremity part of the Motor Assessment Scale ((r = 0.96), Motricity Index (r = 0.87) and the upper extremity movements of the Modified Motor Assessment Chart (r = 0.94).

Platz et al. (2005) tested convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the ARAT by comparing it to the Box and Block Test (Cromwell, 1965; Mathiowetz et al., 1985a), the Fugl-Meyer Test upper extremity items (including items from the Motor Function, Sensation and Passive Joint Motion/Joint Pain subscores) (Fugl-Meyer et al., 1975), the Motricity Index (Demeurisse et al., 1980), the Ashworth Scale (Ashworth, 1964), the Hemispheric StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale (Adams, Meador, Sethi, Grotta, & Thomson, 1986) and the Modified Barthel Index (Collin, Wade, Davies, & Horne, 1988) in 56 participants with upper extremity paresis either from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=37), multiple sclerosis (n=14), or traumatic brain injury (n=5). Correlations were calculated using the Spearman CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
Coefficient. Excellent correlations were found between the ARAT and the Box and Block Test (rho = 0.95), the Motor Function subscore of the Fugl-Meyer Test (rho = 0.92), the Motricity Index (rho = 0.81), and the Hemispheric StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale (rho = -0.66). Adequate correlations were found between the ARAT and the Passive Joint Motion/Joint Pain subscore of Fugl Meyer Test (rho = 0.42). Poor correlations were found between the ARAT and the Sensation Subscore of the Fugl-Meyer Test (rho = 0.29), the Ashworth Scale (rho = -0.29) and the Modified Barthel Index (rho = 0.04).
Note: Negative correlations are observed because a high score on the ARAT indicates normal performance, whereas a low score on the Hemispheric StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale and the Ashworth Scale indicates normal performance.

Lang, Wagner, Dromerick, and Edwards (2006) evaluated the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the ARAT in 50 individuals with acute to sub acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., mean age of 63 years old, attending an acute neurology strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. service at three points in time: admission (day 0); post intervention (day 14); and 90 days poststroke (day 90). The ARAT was compared to measures of sensorimotor impairment (e.g. light touch sensation, pain, elbow joint spasticityInvoluntary muscle tightness and stiffness that can occur after a stroke. It is characterized by exaggerated deep tendon reflexes that interfere with muscular activity, gait, movement, or speech. Spasticity can increase initially but wane down later on, after stroke.
, upper extremity strength), to kinematic measures (e.g. reach and grasp), to the Functional Independence Measure (FIM) (Keith, Granger, Hamilton, & Sherwin, 1987), and to the National Institutes of Health StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale (NIHSS) (Brott, Adams, Olinger, Marler, Barsan, Biller, et al., 1989). At day 0, excellent correlations were found between the ARAT and upper extremity strength (r = 0.60) and grasp speed (r = 0.60). Adequate correlations were found between the ARAT and grasp efficiency (r = 0.42), reach efficiency (r = -0.38) and reach speed (r = 0.40), and the FIM upper extremity score (r = 0.38). Poor correlations were found between the ARAT and NIHSS (r = -0.15); light touch sensation (r = 0.15), pain (r = 0.10), elbow joint spasticityInvoluntary muscle tightness and stiffness that can occur after a stroke. It is characterized by exaggerated deep tendon reflexes that interfere with muscular activity, gait, movement, or speech. Spasticity can increase initially but wane down later on, after stroke.
(r = -0.28) and the FIM total score (r = 0.20). At day 14, excellent correlations were found between the ARAT and grasp efficiency (r = 0.60) and the FIM upper extremity scores (r = 0.62). Adequate correlations were found between the ARAT and elbow spasticityInvoluntary muscle tightness and stiffness that can occur after a stroke. It is characterized by exaggerated deep tendon reflexes that interfere with muscular activity, gait, movement, or speech. Spasticity can increase initially but wane down later on, after stroke.
(r = 0.49), upper extremity strength (r = 0.42), reach efficiency (r = -0.58), grasp speed (r = 0.36) and the FIM total score (r = 0.52). Poor correlations were found between the ARAT and NIHSS (r = -0.24), light touch sensation (r = -0.20), and pain (r = -0.12). At day 90, excellent correlations were found between the ARAT and upper extremity strength (r = 0.60). Adequate correlations were found between the ARAT and elbow spasticityInvoluntary muscle tightness and stiffness that can occur after a stroke. It is characterized by exaggerated deep tendon reflexes that interfere with muscular activity, gait, movement, or speech. Spasticity can increase initially but wane down later on, after stroke.
(r = -0.42), reach efficiency (r = -0.42), reach speed (r = 0.50), grasp efficiency (r = -0.48), grasp speed (r = 0.38) and the FIM upper extremity (r = 0.42) and total scores (r = 0.40). Poor correlations were found between the ARAT and the NIHSS (r = -0.29), light touch sensation (r = 0.00), and pain (r = 0.22). In summary, from this study’s findings it appears that the NIHSS, light touch sensation, and pain do not appear to relate to the ARAT. The relationship between the ARAT and FIM scores is stronger early on post-stroke and stabilizes by the ninetieth day.

Rabadi and Rabadi (2006) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the ARAT by comparing it to the Fugl-Meyer Assessment (Fugl-Meyer et al., 1975) at admission and discharge from an acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. rehabilitation unit in 104 inpatients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. with a mean age of 72 years. The correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between ARAT and the Fugl-Meyer Assessment was excellent both at admission (rho = 0.77) and discharge (rho = 0.87).

Yozbatiran et al. (2008) estimated the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the ARAT by comparing it to the arm motor Fugl-Meyer Assessment (Fugl-Meyer et al., 1975) score in 12 clients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. at a mean age of 61 years. Excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
(r = 0.94) was found between the ARAT and arm motor Fugl-Meyer score.

Known groups:
No studies have examined known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured.
of the ARAT.

Responsiveness

Van der Lee, Beckerman, Lankhorst, and Bouter (2001b) evaluated the responsivenessThe ability of an instrument to detect clinically important change over time.
on the ARAT and Fugl-Meyer Assessment (Fugl-Meyer et al., 1975) in 22 clients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., mean age of 58 years old, receiving intensive forced use treatment. Participants were assessed two weeks pre- and two weeks post- treatment. A responsivenessThe ability of an instrument to detect clinically important change over time.
ratio was calculated. Compared to the Fugl-Meyer Assessment, the ARAT had a greater responsivenessThe ability of an instrument to detect clinically important change over time.
ratio (2.03 for ARAT vs. 0.41 for Fugl-Meyer) suggesting that the ARAT is more sensitive to detecting change.
Note: The responsivenessThe ability of an instrument to detect clinically important change over time.
ratio is a variant of effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
and higher values indicate better responsivenessThe ability of an instrument to detect clinically important change over time.
.

Van der Lee, Roorda, Beckerman, and Lankhorst (2002) estimated the responsivenessThe ability of an instrument to detect clinically important change over time.
of a modified version of the ARAT in 63 participants with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. In this study, researchers did not follow Lyle’s standardized instructions. Instead, they administered all 19 ARAT items to verify any possible effect of this format on its psychometric properties. A responsivenessThe ability of an instrument to detect clinically important change over time.
ratio was calculated. Compared to the hierarchical version proposed by Lyle, performing all 19 items was found to improve the measure’s responsivenessThe ability of an instrument to detect clinically important change over time.
, with a responsivenessThe ability of an instrument to detect clinically important change over time.
ratio of 1.7 compared to 1.2 with Lyle’s version.
Note: The responsivenessThe ability of an instrument to detect clinically important change over time.
ratio can be considered an estimate of effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
normalized to the variability in a stable population and higher values indicate better responsivenessThe ability of an instrument to detect clinically important change over time.
.

Hsueh et al. (2002b) analyzed the responsivenessThe ability of an instrument to detect clinically important change over time.
of the ARAT and the upper extremity section of the Motor Assessment Scale (Carr et al., 1985) in 48 participants having acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and a mean age of 62 years. Participants were assessed at two points in time: admission and discharge from the acute rehabilitation centre. The ARAT total score demonstrated a moderate effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
of 0.52, while the Motor Assessment Scale total score demonstrated a small effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
of 0.45.

Lang et al. (2006) examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the ARAT in 50 participants with acute to subacute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., with a mean age of 63 years old, receiving constraint-induced movement therapy (CIMT). Assessments were performed at three points in time: baseline, immediately post-treatment, and 2.5 months post-treatment. Effects sizes and responsivenessThe ability of an instrument to detect clinically important change over time.
ratios were calculated. ARAT total and subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
scores at the first follow-up evaluation were similar, with moderate to large effect sizes (ARAT total score = 1.01; grasp subscore = 1.04; pinch subscore = 0.85; grip subscore = 1.01; and gross movement subscore = 0.72). The second follow-up evaluation demonstrated large effect sizes, with individual higher values when compared to the first evaluation (ARAT total score = 1.39; grasp subscore = 1.22; pinch subscore = 1.49; grip subscore = 1.32 and gross movement subscore = 0.98). The responsivenessThe ability of an instrument to detect clinically important change over time.
ratio for the ARAT total score at the first follow-up evaluation was 5.2 and at the second was 7.0. These two responsivenessThe ability of an instrument to detect clinically important change over time.
estimations suggest that the ARAT is a sensitive tool for detecting change even months after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. onset.
Note: ResponsivenessThe ability of an instrument to detect clinically important change over time.
ratio is a variant of effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
and higher values indicate better responsivenessThe ability of an instrument to detect clinically important change over time.
.

Rabadi and Rabadi (2008) assessed the responsivenessThe ability of an instrument to detect clinically important change over time.
of the ARAT and the Fugl-Meyer Assessment (Fugl-Meyer et al., 1975) in 104 participants with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., with a mean age of 72 years, undergoing inpatient rehabilitation. Participants were evaluated at admission and discharge from acute care. The Standardized Response MeanThe standardized response mean (SRM) is calculated by dividing the mean change by the standard deviation of the change scores.
(SRM) was used to calculate responsivenessThe ability of an instrument to detect clinically important change over time.
. Amongst these upper extremity tests, the ARAT was less sensitive than the Fugl-Meyer Assessment (SRM = 0.68 and 0.74, respectively). However, since the difference between the SRMs for these two measures was minimal, these tests can be considered equally sensitive to change during inpatient acute rehabilitation. This result is contrary to the one presented by Van der Lee at al. (2002). The reason for this difference may be due to the difference in these studies population age and strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity.
Note: SRM is a variant of effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
and higher values indicate better responsivenessThe ability of an instrument to detect clinically important change over time.
.

Lin, Chuang, Wu, Hsieh and Chang (2010) evaluated the responsivenessThe ability of an instrument to detect clinically important change over time.
of the ARAT, Box and Block Test (BBT), the Nine-Hole Peg Test (NHPT) for evaluating hand dexterity in 59 patients with subacute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (< 6-months) and Brunnstrom stage IV to VI for proximal and distal upper extremity function. Patients were randomly assigned to receive constraint-induced therapyA form of intervention that involves restraining the unaffected upper or lower extremity in order to encourage movement of the affected limbs. For persons with USN, constraint-induced therapy involves restraining the unaffected arm or hand using a sling or padded mitt, in order to promote visual scanning and movement in the neglected hemispace.
, bilateral arm training or control treatment and received 2 hours of therapy, 5 days per week for 3 weeks. Assessments were performed at baseline and 3 weeks. Using Standardized Response MeanThe standardized response mean (SRM) is calculated by dividing the mean change by the standard deviation of the change scores.
(SRM) to calculate responsivenessThe ability of an instrument to detect clinically important change over time.
, the ARAT, BBT and NHPT were all found to have moderate SRM (0.79, 0.74, 0.64 respectively), indicating sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
for detecting change in hand dexterity. When considering both the results of responsivenessThe ability of an instrument to detect clinically important change over time.
and validation components of the study, the ARAT and BBT are believed to be more appropriate than the NHPT for evaluating dexterity.

References

Adams, R.J., Meador, K.J., Sethi, K.D., Grotta, J.C., & Thomson, D.S. (1986). Graded neurologic scale for the use in acute hemispheric stroke treatment protocols. Stroke, 18, 665-669.
Ashworth, B. (1964). Preliminary trial of carisoprodol in multiple sclerosis. Practitioner, 192, 540-542.
Brott, T. G., Adams, H. P., Olinger, C. P., Marler, J. R., Barsan, W. G., Biller, J., Spilker, J., Holleran, R., Eberle, R., Hertzberg, V., Rorick, M., Moomaw, C. J., & Walker, M. (1989). Measurements of acute cerebral infarction: a clinical examination scale. Stroke, 20, 864 -70.
Carroll, D. (1965). A quantitative test of upper extremity function. Journal of Chronic Disability, 18, 479-91.
Carr, J.H., Shepherd, R.B., Nordholm, L., & Lynne, D. (1985). Investigation of a new motor assessment scale for stroke patients. Physical Therapy, 65, 175- 180.
Collin, C., Wade, D.T., Davies, S., & Horne, V. (1988). The Barthel ADL Index: a reliability study. International Disability Study, 10, 61-63.
Cromwell, F.S (1965). Occupational therapists manual for basic skills assessment: primary prevocational evaluation. Pasadena, (CA): Fair Oaks Printing; 29-31.
Demeurisse, G., Demol, O., & Robaye, E. (1980). Motor evaluation in vascular hemiplegia. European Neurology, 19(6), 382-389.
De Weerdt, W.J.G., & Harrison, M.A. (1985). Measuring recovery of arm hand function in stroke patients: a comparison of the Brunnstrom-Fugl-Meyer test and the Action Research Arm Test. Physiotherapy Canada, 37, 65-70.
Finch, E., Brooks, D., Stratford,P.W, & Mayo, N.E. (2002). Physical Outcome Measures: A guide to enhance physical outcome measures. Ontario, Canada: Lippincott, Williams, & Wilkins.
Fugl-Meyer, A.R., Jääskö, L., Leyman, I., Olsson, S., & Steglind, S. (1975). The post-stroke hemiplegic patient 1. A method for evaluation of physical performance. Scandinavian Journal of Rehabilitation Medicine, 7, 13-31.
Gowland, C., Van-Hullenaar, S., Torresin, W., et al., (1995). Chedoke-McMaster Stroke Assessment: development, validation, and administration manual. Hamilton, (ON), Canada: School of Rehabilitation Science, McMaster University
Heller, A., Wade, D.T., Wood, V.A., Sunderland, A., Hewer, R., & Ward, E. (1987). Arm function after stroke: measurement and recovery over the first three months. Journal of Neurology, Neurosurgery & Psychiatry, 50(6), 714-719.
Hsieh, C.L., Hsueh, I.P, Chiang, F., & Lin, P. (1998). Inter-rater reliability and validity of the action research arm test in stroke patients. Age and Ageing, 27, 107-113.
Hsueh, I.P, Lee, M.M., & Hsieh, C.L. (2002a). The action research arm test: Is it necessary for patients being tested to sit at a standardized table? Clinical Rehabilitation, 16, 382-388.
Hsueh, I.P. & Hsieh, C.L. (2002b). Responsiveness of two upper extremity function instruments for stroke inpatients receiving rehabilitation. Clinical Rehabilitation, 16, 617-624.
Jacobson-Sollerman, X & Sperling, Y. (1977). Grip function of the healthy hand in a standardized hand function test. A study of the Rancho Los Amigos test. Scandinavian Journal of Rehabilitation Medicine, 9(3), 123-129.
Keith, R.A, Granger, C.V., Hamilton, B.B., & Sherwin, F.S. (1987). The Functional Independence Measure: a new tool for rehabilitation. In: Eisenberg, M.G. & Grzesiak, R.C. (Ed.), Advances in clinical rehabilitation (pp. 6-18). New York: Springer Publishing Company.
Kellor, M., Frost, J., Silberberg, N., Iversen, I., & Cummings R. (1971). Hand strength and dexterity. American Journal of Occupational Therapy, 25, 77-83.
Lang, C.E., Wagner, J.M, Dromerick, A.W., & Edwards, D.F. (2006). Measurement of upper extremity function early after stroke: properties of the action research arm test. Archives Physical Medicine and Rehabilitation, 87, 1605-1610.
Lin, K-C., Chuang, L-L., Wu, C-Y., Hseih, Y-W. & Chang, W-Y. (2010). Responsiveness and validity of three dexterous function measures in stroke rehabilitation. Journal of Rehabilitation Research and Development, 47(6), 563-572.
Lindmark, B. & Hamrin, E. (1988). Evaluation of function capacity after stroke as a basis for active intervention: Presentation of a modified chart for motor capacity assessment and its reliability. Scandinavian Journal of Rehabilitation Medicine, 20, 103-109.
Lyle, R.C. (1981). A performance test for assessment of upper limb function in physical rehabilitation treatment and research. International Journal of Rehabilitation and Research, 4, 483-492.
Mathiowetz, V., Volland, G., Kashman, N., & Weber, K. (1985a). Adult norms for the box and block test of manual dexterity. American Journal of Occupational Therapy, 39, 386-391.
Mathiowetz, V., Weber, K., Kashman, N., & Volland, G. (1985b). Adult norms for the nine hole peg test of finger dexterity. Occupational Therapy Journal of Research, 5, 24 -33.
Nijland, R., van Wegen, E., Verbunt, J, van Wijk, R., van Kordelaar, J. & Kwakkel, G. (2010) A comparison of two validated tests for upper limb function after stroke: The Wolf Motor Function Test and the Action Research Arm Test. Journal of Rehabilitation Medicine, 42, 694-696.
Platz, T., Pinkowski, C., van Wijck, F., Kim, I.H., di Bella, P., & Johnson, G. (2005). Reliability and validity of arm function assessment with standardized guidelines for the Fugl-Meyer Test, Action Research Arm Test and Box and Block Test: a multicentre study. Clinical Rehabilitation, 19(4), 404-411.
Rabadi, M.H. & Rabadi, F.M. (2006). Comparison of the action research arm test and the Fugl-Meyer Assessment as measures of upper-extremity motor weakness after stroke. Archives of Physical of Medicine Rehabilitation, 87, 962-966.
van der Lee, J.H, Beckerman, H., Lankhorst, G.J., Bouter, L.M. (2001a). The responsiveness of the Action Research Arm Test and the Fugl-Meyer Assessment Scale in chronic stroke patients. Journal of Rehabilitation Medicine, 33, 110-113.
Van der Lee, J.H, Groot, V., Beckerman, H., Wagenaar, R.C., Lankhorst, G.J., Bouter, L.M. (2001b). The intra-rater and interrater reliability of the action research arm test: a practical test of upper extremity function in patients with stroke. Archives of Physical of Medicine Rehabilitation, 82, 14-19.
Van der Lee, J.H, Roorda, L.D., & Lankhorst, G.J. (2002). Improving the Action Research Arm Test: a unidimensional hierarchical scale. Clinical Rehabilitation, 16, 646-653.
Yozbatiran, N., Der-Yerghiaian, L., & Cramer, S.C. (2008). A standardized approach to performing the action research arm test. Neurorehabilitation & Neural Repair, 22(1), 78-90.
Wagenaar, R.C., Meijer, O.G., van Wieringen, P.C., Kuik, D.J., Hazenberg, G.J., Lindeboom, J., et al. (1990). The functional recovery of stroke: a comparison between neuro-developmental treatment and the Brunnstrom method. Scandinavian Journal of Rehabilitation and Medicine, 22, 1-8.

See the measure

How to obtain the Action Research Arm Test:

The ARAT can be obtained in the study by Lyle (1981), Hsieh et al. (1998), Van der Lee et al. (2002), Rabadi & Rabadi (2006), and Yozbatiran et al. (2008) and from the website: http://www.aratest.eu/Index_english.htm Standardized equipment can be purchased from the website: http://www.aratest.eu/ or from http://www.saliarehab.com/.

Box and Block Test (BBT)

Evidence Reviewed as of before: 09-06-2011

Author(s)*: Sabrina Figueiredo, BSc

Editor(s): Lisa Zeltzer, MSc OT; Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Purpose

The Box and Block Test (BBT) measures unilateral gross manual dexterity. It is a quick, simple and inexpensive test. It can be used with a wide range of populations, including clients with stroke.

In-Depth Review

Purpose of the measure

The Box and Block Test (BBT) measures unilateral gross manual dexterity. It is a quick, simple and inexpensive test. It can be used with a wide range of populations, including clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Available versions

The original version of the BBT was developed, in 1957, by Jean Hyres and Patricia Buhler. This version was modified into the current one by E. Fuchs and P. Buhler (Cromwell, 1976). In 1985, normative data on the BBT was established by Mathiowetz, Volland, Kashman, and Weber.

Features of the measure

Items:

The BBT is composed of a wooden box divided in two compartments by a partition and 150 blocks. The BBT administration consists of asking the client to move, one by one, the maximum number of blocks from one compartment of a box to another of equal size, within 60 seconds. The box should be oriented lengthwise and placed at the client’s midline, with the compartment holding the blocks oriented towards the hand being tested. In order to practice and register baseline scores, the test should begin with the unaffected upper limb. Additionally, a 15-second trial period is permitted at the beginning of each side. Before the trial, after the standardized instructions are given to clients, they should be advised that their fingertips must cross the partition when transferring the blocks, and that they do not need to pick up the blocks that might fall outside of the box (Mathiowetz, Volland, Kashman, & Weber, 1985-1).

Scoring:

Clients are scored based on the number of blocks transferred from one compartment to the other compartment in 60 seconds (Mathiowetz et al., 1985-1). Higher scores are indicative of better manual dexterity. During the performance of the BBT, the evaluator should be aware of whether the client’s fingertips are crossing the partition. Blocks should be counted only when this condition is respected. Furthermore, if two blocks are transferred at once, only one block will be counted. Blocks that fall outside the box, after trespassing the partition, even if they don’t make it to the other compartment, should be counted.

Mathiowetz et al. (1985-1) reported that healthy male adults, aged 20 to 80 years, transfer an average of 77 blocks (SD ±11.6) with the right hand and 75 blocks (SD ±11.4) with the left hand within the 60 second limit. Scores for normal healthy men, aged 60 years old or more ranged from 61 to 70 blocks. Healthy female adults, aged 20 to 80 years, transfer an average of 78 blocks (SD ±10.4) with the right hand and 76 blocks (SD ±9.5) with the left hand. Scores for normal healthy women, aged 60 years old or more, ranged from 63 to 76 blocks. The score on the BBT and age are inversely correlated, meaning that average scores on the BBT decrease with older age.

Time:

The BBT requires 2 to 5 minutes to administer (Finch, Brooks, Stratford, & Mayo, 2002; Mathiowetz et al., 1985-1).

Subscales:

None.

Equipment:

The standardized equipment consists of:
A wooden box dimensioned in 53.7 cm x 25.4 cm x 8.5 cm. The partition should be placed at the middle of the box, dividing it in two containers of 25.4 cm each. (Mathiowetz et al., 1985-1).
150 wooden cubes – 2.5 cm in size (Mathiowetz et al., 1985-1). Stopwatch.

Training of administrator:

None typically reported.

Alternative forms of the Box and Block Test

None.

Client suitability

Can be used with:

Clients with stroke.

Should not be used in:

The BBT cannot be used with clients who have severe upper extremity impairment.
The BBT cannot be used with clients with severe cognitive impairment.

In what languages is the measure available?

There are no official translations of the BBT. The specific instructions provided to the client are in English. Clinicians and researchers may be using “home-grown” translations of the instructions as evidenced from peer-reviewed publication from Sweden, French Canada, Italy and Germany that have used the BBT as an outcome measure. (Broeren, Rydmark, Bjorkdahl, & Sunnerhagen, 2007; Dannenbaun, Michalsen, Desrosiers, & Levin, 2002; Mercier & Bourbonnais, 2004; Platz, Pinkowski, Kim, di Bella, & Johnson, 2005; Schneider, Schonle, Altenmuller, & Munte, 2007).

Summary

What does the tool measure?	Unilateral gross manual dexterity.
What types of clients can the tool be used for?	The BBT can be used with, but is not limited to clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	From 2 to 5 minutes.
Versions	There are no alternative versions.
Other Languages	There are no official translations.
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: No studies have examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the BBT. Test-retest: Two studies have examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the BBT. Both reported excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). using ICC’s. Inter-rater: Two studies have examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the BBT and reported excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. using correlation coefficients and ICC. One study used Pearson correlation and the other, ICC and Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. .
ValidityThe degree to which an assessment measures what it is supposed to measure.	Criterion: Concurrent: One study has examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the BBT and reported adequate to excellent correlations with the Action Research Arm Test (ARAT) and the Nine-Hole Peg Test (NHPT) at pre and post-treatment. Predictive: One study has examined predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. and reported that the BBT, compared to the NHPT, the Frenchay Arm Test, Grip Strength and the Stroke Rehabilitation Assessment of Movement (STREAM) was the best predictor of upper limb function 5 weeks post-stroke. Construct: Convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. : Three studies have examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the BBT and reported excellent correlations between the BBT and the Minnesota Rate of Manipulation Test, the ARAT, the Hemispheric StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale and the motor function score of the Fugl-Meyer Assessment (FMA). Adequate correlations were reported between the BBT and the SMAF, the Ashworth scale and the Passive Joint Motion/Joint Pain subscore of the FMA. Poor correlations were reported between the BBT and the Sensation subscore of the FMA and the Modified Barthel Index.
Floor/Ceiling Effects	No studies have examined floor/ceiling effects of the BBT
SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." / SpecificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).	No studies have examined sensitivity/specificity of the BBT
Does the tool detect change in patients?	Two studies have examined the responsivenessThe ability of an instrument to detect clinically important change over time. of the BBT and reported that the BBT has moderate to large Standardized Response MeanThe standardized response mean (SRM) is calculated by dividing the mean change by the standard deviation of the change scores. , therefore, is able to detect change in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Acceptability	The BBT should not be used clients with severe upper extremity impairment and severe cognitive impairments.
Feasibility	The administration of the BBT is quick and simple, however requires standardized equipment.
How to obtain the tool?	The BBT instructions can be obtained in the study by Mathiowetz et al. (1985) Standardized equipment can be obtained at the website: http://www.sammonspreston.com/Supply/Product.asp?Leaf_Id=7531

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the Box and Block Test (BBT) in healthy individuals and individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. We identified four studies. The BBT appears to be responsive in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Floor/Ceiling Effects

No studies have examined floor/ceiling effects of the BBT.

Reliability

Test-retest:
Desrosiers, Bravo, Hebert, Dutil, and Mercier (1994) examined test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the BBT in 34 elderly with upper limb sensorimotor impairments from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=13) and other conditions. Participants were re-assessed with a 1-week interval by the same rater and under the same conditions. The test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
for the BBT was reported as excellent (ICC = 0.97; ICC = 0.96) for the right and left hand, respectively.

Inter-rater:
Mathiowetz, Volland, Kashman, and Weber (1985-1) assessed the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the BBT in 26 healthy young females. Participants were evaluated simultaneously and independently by two raters. Pearson correlationcoefficients showed excellent agreement (r = 1.00; r = 0.99) for the right and left hand, respectively.
Note: Pearson correlation coefficient is not the statistical analysis of choice for assessing inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
as it may artificially inflate agreement.

Platz et al. (2005) as described earlier also analyzed inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the BBT, the Action Research Arm Test (Lyle, 1981), and the FMA upper extremity items including items from the motor function, sensation and passive joint motion/joint pain sub-scores (Fugl-Meyer et al., 1975) in 44 individuals with upper limb paresis either from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., multiple sclerosis, or traumatic brain injury. Participants had the most affected arm videotaped and scored independently by two raters. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
for the BBT, as calculated using the ICC and Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, was excellent (ICC = 0.99 and r = 0.99).
Note: This result applies only to the most affected upper limb.

Validity

Content:

Not available.

Criterion:

Concurrent:
No gold standardA measurement that is widely accepted as being the best available to measure a construct.
exists against which to compare the BBT.

Lin, Chuang, Wu, Hsieh and Chang (2010) compared the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the BBT, Action Research Arm Test (ARAT) and Nine-Hole Peg Test (NHPT) for evaluating hand dexterity in 59 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The Fugl-Meyer Assessment (FMA), Motor Activity Log (MAL) and StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale (SIS) were also administered to assess the concurrent validity of the BBT, ARAT and NHPT. Using Spearman rank correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient, the BBT, ARAT and NHPT were found to have adequate to excellent correlations at pre-treatment (ranging from rho=-0.55 to -0.80) and post-treatment (ranging from rho=-0.57 to -0.71). In addition, the BBT and ARAT were found to have adequate correlations with the FMA, MAL and SIS (ranging from rho=0.31-59); however, the NHPT had only poor to adequate correlations with the FMA and MAL (ranging from rho=-0.16 to -0.33); and adequate to excellent correlations with the SIS (ranging from rho=-0.58 to -0.66). When considering both the results of responsivenessThe ability of an instrument to detect clinically important change over time.
and validation components of the study, the BBT and ARAT are believed to be more appropriate than the NHPT for evaluating dexterity.

Predictive:
Higgins, Mayo, Desrosiers, Salbach and Ahmed (2005) estimated wheter the BBT, Nine-Hole Peg Test (Kellor, Frost, Silberberg, Iversen, & Cummings, 1971; Mathiowetz, Weber, Kashman, & Volland, 1985-2), Frenchay Arm Test (Heller, Wade, Wood, Sunderland, Hewer, & Ward, 1987), Grip Strength (Mathiowetz, Kashman, Volland, Weber, Dowe, & Rogers, 1985-3), and StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Rehabilitation Assessment of Movement (STREAM – Daley, Mayo, Wood-Dauphine, Danys, & Cabot, 1997) were able to predict upper limb function, measured by the BBT, at 5 weeks post-stroke. Predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the BBT was measured in 55 participants with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Assessments were performed at two points in time: one and five weeks post-stroke. Compared to the other upper limb performance tests, the BBT when performed at one week post-stroke, was the best predictor of upper limb function at five months post-stroke, followed by the STREAM.

Construct:

Convergent/Discriminant:
Cromwell (1976) examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the BBT by comparing it to the Minnesota Rate of Manipulation Test (American Guidance Service, 1969) in an unspecified population. The correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between BBT and the Minnesota Rate of Manipulation Test was excellent (r = 0.91).

Desrosiers et al. (1994) assessed the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the BBT by comparing it to the Functional Autonomy Measurement System – FAMS, known as the SMAF in French (Hebert, Carries, & Bilodeau, 1988), and to the Action Research Arm Test (ARAT – Lyle, 1981) in 104 elderly with upper limb impairments secondary to strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=53) amongst other conditions. Excellent correlations (r = 0.80) were found between the BBT and the ARAT. Adequate pearson correlations were found between the BBT and the FAMS (r = 0.47; r = 0.51) for the right and left hand, respectively.

Platz et al. (2005) tested the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the BBT by comparing it to the Action Research Arm Test (ARAT – Lyle, 1981) and to the Fugl-Meyer Assessment (FMA)upper extremity items including items from the motor function, sensation and passive joint motion/joint pain sub-scores (Fugl-Meyer et al., 1975) using Spearman CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, in 56 participants with upper extremity paresis either from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=37) or other conditions. Excellent correlations were found between the BBT and the ARAT (r = 0.95) and the Motor Function sub-score (r = 0.92) of the FMA. Furthermore, the BBT was correlated with more general measures of impairment and activity limitation, such as the Ashworth Scale (Ashworth, 1964), the Hemispheric StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale (Adams, Meador, Sethi, Grotta, & Thomson, 1986) and the Modified Barthel Index (Collin, Wade, Davies, & Horne, 1988). Excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
was found between the BBT and the Hemispheric StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale (r = -0.67). Adequate correlations were found between the BBT and the passive joint motion/joint pain sub-score of the FMA (r = 0.43) and the Ashworth Scale (r = -0.38). Poor correlations were found between the BBT and the sensation sub-score of the FMA (r = 0.28) and the Modified Barthel Index (r = 0.04).
Note: Negative correlations are observed because a high score on the BBT indicates better performance, whereas a low score on the Hemispheric StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale or the Ashworth Scale indicates better performance.

Responsiveness

Higgings et al. (2005) evaluated the responsivenessThe ability of an instrument to detect clinically important change over time.
on the BBT, Frenchay Arm Test (Heller et al., 1987), Grip strength (Mathiowetz et al., 1985-3) and the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Rehabilitation Assessment of Movement (STREAM – Daley et al., 1997) in 50 participants with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were assessed one and four weeks post-stroke. The Standardized Response MeanThe standardized response mean (SRM) is calculated by dividing the mean change by the standard deviation of the change scores.
(SRM) was used to calculate responsivenessThe ability of an instrument to detect clinically important change over time.
. Amongst these upper extremity performance tests, the BBT was the most sensitive to detecting change, having a large SRM of 0.8.
Note: SRM is a variant of effect size and higher values indicate better responsivenessThe ability of an instrument to detect clinically important change over time.
.

Lin, Chuang, Wu, Hsieh and Chang (2010) evaluated the responsivenessThe ability of an instrument to detect clinically important change over time.
of the BBT, the Action Research Arm Test (ARAT) and the Nine-Hole Peg Test (NHPT) for evaluating hand dexterity in 59 patients with subacute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (< 6-months) and Brunnstrom stage IV to VI for proximal and distal upper extremity function. Patients were randomly assigned to receive constraint-induced therapyA form of intervention that involves restraining the unaffected upper or lower extremity in order to encourage movement of the affected limbs. For persons with USN, constraint-induced therapy involves restraining the unaffected arm or hand using a sling or padded mitt, in order to promote visual scanning and movement in the neglected hemispace.
, bilateral arm training or control treatment and received 2 hours of therapy, 5 days per week for 3 weeks. Assessments were performed at baseline and 3 weeks. Using Standardized Response MeanThe standardized response mean (SRM) is calculated by dividing the mean change by the standard deviation of the change scores.
(SRM) to calculate responsivenessThe ability of an instrument to detect clinically important change over time.
, the BBT, ARAT and NHPT were all found to have moderate SRM (0.74, 0.64, 0.79 respectively), indicating sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
for detecting change in hand dexterity. When considering both the results of responsivenessThe ability of an instrument to detect clinically important change over time.
and validation components of the study, the BBT and ARAT are believed to be more appropriate than the NHPT for evaluating dexterity.

References

American Guidance Service. The Minnesota Rate Manipulative Tests. Examiner’s manual. Circle Pines, (MN): Author; 1969.
Adams, R.J., Meador, K.J., Sethi, K.D., Grotta, J.C., & Thomson, D.S. (1986). Graded neurologic scale for the use in acute hemispheric stroke treatment protocols. Stroke 18, 665-669.
Ashworth, B. (1964). Preliminary trial of carisoprodol in multiple sclerosis. Practitioner, 192, 540-542.
Broeren, J., Rydmark, M., Bjorkdahl, A., & Sunnerhagen, K.S. (2007). Assessment and training in a 3-dimensional virtual environment with haptics: a report on 5 cases of motor rehabilitation in the chronic stage after stroke. Neurorehabilitation & Neural Repair, 21(2), 180-189.
Collin, C., Wade, D.T., Davies, S., & Horne, V. (1988). The Barthel ADL Index: a reliability study. International Disability Study, 10, 61-63.
Cromwell, F.S (1965). Occupational therapists manual for basic skills assessment: primary prevocational evaluation. Pasadena, (CA): Fair Oaks Printing; 29-31.
Daley, K., Mayo, N.E., Wood-Dauphinee, S., Danys, I., & Cabot, R. (1997). Verification of the Stroke Rehabilitation Assessment of Movement (STREAM). Physiotherapy Canada, 49, 269-278.
Dannenbaum, R.M., Michaelsen, S.M., Desrosiers, J., & Levin, M.F. (2002). Development and validation of two new sensory tests of the hand for patients with stroke. Clinical Rehabilitation, 16(6), 630-639.
Desrosiers, J., Bravo, G., Hébert, R., Dutil, É., & Mercier, L. (1994). Validation of the box and block test as a measure of dexterity of elderly people: reliability, validity and norms studies. Archives of Physical Medicine and Rehabilitation, 75, 751-755.
Desrosiers, J., Rochette, A.,Â Hebert, R.,Â & Bravo, G. (1997). The Minnesota manual dexterity test: reliability, validity and reference values studies with healthy elderly People. Canadian Journal of Occupational Therapy, 64(5), 270-276.
Finch, E., Brooks, D., Stratford,P.W, & Mayo, N.E. (2002). Physical Outcome Measures: A guide to enhance physical outcome measures. Ontario, Canada: Lippincott, Williams & Wilkins.
Fugl-Meyer, A.R., Jääskö, L., Leyman, I., Olsson, S., & Steglind, S. (1975). The post-stroke hemiplegic patient 1. A method for evaluation of physical performance. Scandinavian Journal of Rehabilitation Medicine, 7, 13-31.
Hébert, R., Carrier, R., & Bilodeau, A. (1988). The functional autonomy measurement system (SMAF): description and validation of an instrument for the measurement of handicaps. Age Ageing, 17, 293-302.
Heller, A., Wade, D.T., Wood, V.A., Sunderland, A., Hewer, R., & Ward, E. (1987). Arm function after stroke: measurement and recovery over the first three months. Journal of Neurology, Neurosurgery & Psychiatry, 50(6), 714- 719.
Higgins, J., Mayo, N.E., Desrosiers, J., Salbach, N.M., & Ahmed, S. (2005). Upper-limb function and recovery in the acute phase poststroke. Journal of Rehabilitation Research & Development, 42(1), 65-76.
Jebsen, R.H., Taylor, N., Trieschmann, R.B., Trotter, M.J., & Howard, L.A. (1969). An objective and standardized test of hand function. Archives of Physical Medicine and Rehabilitation, 50, 311-319.
Kellor, M., Frost, J., Silberberg, N., Iversen, I., & Cummings R. (1971). Hand strength and dexterity. American Journal of Occupational Therapy, 25, 77-83.
Lin, K-C., Chuang, L-L., Wu, C-Y., Hseih, Y-W. & Chang, W-Y. (2010). Responsiveness and validity of three dexterous function measures in stroke rehabilitation. Journal of Rehabilitation Research and Development, 47(6), 563-572.
Lyle, R.C. (1981). A performance test for assessment of upper limb function in physical rehabilitation treatment and research. International Journal of Rehabilitation and Research, 4, 483-492.
Mathiowetz, V., Volland, G., Kashman, N., & Weber, K. (1985-1). Adult norms for the box and block test of manual dexterity. American Journal of Occupational Therapy, 39, 386-391.
Mathiowetz, V., Weber, K., Kashman, N., & Volland, G. (1985-2). Adult norms for the nine hole peg test of finger dexterity. Occupational Therapy Journal of Research, 5, 24 -33.
Mathiowetz, V., Kashman, N., Volland, G., Weber, K., Dowe, M., & Rogers, S. (1985-3). Grip and pinch strength: normative data for adults. Archives of Physical and Medicine and Rehabilitation, 66, 69-72.
Mercier, C. & Bourbonnais, D. (2004). Relative shoulder flexor and handgrip strength is related to upper limb function after stroke. Clinical Rehabilitation, 18(2), 215-221.
Platz, T., Pinkowski, C., van Wijck, F., Kim, I.H., di Bella, P., & Johnson, G. (2005). Reliability and validity of arm function assessment with standardized guidelines for the Fugl-Meyer Test, Action Research Arm Test and Box and Block Test: a multicentre study. Clinical Rehabilitation, 19(4), 404-411.
Schneider, S., Schonle, P.W., Altenmuller, E., & Munte, T.F. Using musical instruments to improve motor skill recovery following a stroke. Journal of Neurology, 254(10), 1339-1346.
Tiffin, J. (1968). Purdue Pegboard Examiner Manual. Chicago, USA: Science Research Associates.

See the measure

How to obtain the BBT

The BBT instructions can be obtained in the study by Mathiowetz et al. (1985)

Standardized equipment can be obtained at the website:
http://www.sammonspreston.com/Supply/Product.asp?Leaf_Id=7531

By clicking here, you can access a video showing how to administer the assessment.

Chedoke Arm and Hand Activity Inventory (CAHAI)

Evidence Reviewed as of before: 08-01-2009

Author(s)*: Sabrina Figueiredo, BSc

Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Expert Reviewer: Susan Barreca,MSc, PT

Purpose

The Chedoke Arm and Hand Activity Inventory (CAHAI) is a functional assessment of the recovering arm and hand after stroke. The CAHAI compliments the Chedoke-McMaster Stroke Assessment (Barreca, Stratford, Masters, Lambert, Griffiths, and McBay, 2006).

In-Depth Review

Purpose of the measure

The Chedoke Arm and Hand Activity Inventory (CAHAI) is a functional assessment of the recovering arm and hand after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The CAHAI compliments the Chedoke-McMaster Stroke Assessment (Barreca, Stratford, Masters, Lambert, Griffiths, and McBay, 2006).

Available versions

The CAHAI was developed by Barreca, Gowland, Stratford, Huijbregts, Griffiths, Torresin, Dunkley, Miller, and Masters in 2004 to address the need for a valid, clinically relevant, and responsive functional assessment of the recovering paretic upper limb.

Three shortened versions of the CAHAI were developed by Barreca, Stratford, Masters, Lambert, Griffiths, and McBay in 2006. The shortened versions have 7, 8 or 9 items and are identified as CAHAI-7, CAHAI-8, CAHAI-9, respectively.

Features of the measure

Items:

The original CAHAI consists of 13 functional items that are non-gender specific, involve both upper limbs, and incorporates a range of movements and grasps that reflect stages of motor recovery following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The following items were generated from a review of the scientific literature on strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., as well as from input from individuals with stroke and their families (Barreca et al., 2004):

Open a jar of coffee
Dial 911
Draw a line with a ruler
Pour a glass of water
Wring out a washcloth
Do up five buttons
Dry back with a towel
Put toothpaste on a toothbrush
Cut medium consistency putty
Clean eye glasses
Zip up a zipper
Place a container on a table
Carry a bag up the stairs

The CAHAI-7 utilizes the first 7 items, CAHAI-8 the first 8 items, and CAHAI-9 the first 9 items. The 13 items together represent the original CAHAI (Barreca et al., 2006). On average, clients with stroke consider items 1, 2, 4 and 12 easy to perform; items 8, 10, 11, and 13 moderately difficult; and items 3, 6, 7, and 9 the most difficult (Barreca et al., 2004).

Detailed administration guidelines are in the development manual that can be obtained can be obtained by visiting the official website: http://www.cahai.ca

Scoring:

Each item of the CAHAI is scored on a 7-point quantitative scale, similar to the scale used in the Functional Independence Measure (FIM) (Keith, Granger, Hamilton, & Sherwin, 1987)

A score of

1 = client needs total assistance and the weak upper limb performs less than 25% of the task;
2 = client needs maximal assistance and the weak upper limb performs 25% to 49% of the task. There are no signs of arm or hand manipulation, only stabilization;
3 = client needs moderate assistance and the weak upper limb performs 50% to 74% of the task. Begins to show signs of arm or hand manipulation;
4 = client needs minimal assistance (light touch) and the weak upper limb performs more than 75% of the task;
5 = client requires supervision, coaxing, or cueing;
6 = client requires use of assistive devicesAssistive devices are any piece of equipment that you use to make your daily activities easier to perform.
or requires more than reasonable time, or there are safety concerns; and
7 = total independence in completing the task.

The minimal possible score for the CAHAI is 13 and the maximum is 91, with higher scores indicating greater functional independence (Barreca et al., 2004; Barreca, Stratford, Lambert, Masters, & Streiner, 2005; Barreca, Stratford, Masters, Lambert, & Griffiths, 2006b).

The affected limb is also scored according to its positioning and functioning during test performance. The therapist should record the performance of the affected limb on each item by checking the appropriate box. The scoring table for the CAHAI is as follows: (Barreca et al., 2004):

Items	Affected Limb
1) Open a jar of coffee	Holds jar	Holds lid
2) Call 911	Holds receiver	Dials phone
3) Draw a line with ruler	Holds ruler	Holds pen
4) Put toothpaste on toothbrush	Holds toothpaste	Holds brush
5) Cut medium consistency putty	Holds knife	Holds fork
6) Pour a glass of water	Holds glass	Holds pitcher
7) Clean a pair of eyeglasses	Holds glasses	Wipes lenses
8) Zip up the zipper	Holds zipper	Holds zipper pull
9) Dry back with towel	Reaches for towel	Grasps towel end

Note: Standardized instructions on scoring can be obtained by visiting the official website: http://www.cahai.ca

Time:

The time to administer and score the CAHAI is approximately 25 minutes (Barreca et al., 2004; Barreca et al., 2006).

Subscales:

None

Equipment required:

CAHAI-7

Version (Items 1-7) requires all items in Equipment List A

Equipment List A

height adjustable table
chair/wheelchair without armrests
dycem
200g jar of coffee
push-button telephone
12″/30cm ruler
8.5″ x 11″ paper
pencil
2.3L plastic pitcher with lid filled with 1600 ml. Water
250 ml plastic cup
wash cloth
wash basin (24.5 cm. in diameter, height 8 cm.)
Pull-on vest with 5 buttons (one side male & one side female), buttons (1.5 cm. In diameter, 7 cm. apart)
bath towel (65cm X 100cm)

CAHAI-8

Version (Items 1-8) requires all items in Equipment List A and B

Equipment List B

75ml toothpaste with screw lid, >50% full
toothbrush

CAHAI-9

Version (Items 1-9) requires all items in Equipment List A, B, and C

Equipment List C

dinner plate (Melamine or heavy plastic, 25 cm. in diameter)
medium resistance putty
knife and fork
built up handles the length of the utensil handle

CAHAI-13

Version (Items 1-13) requires all items in Equipment List A, B, C, and D

Equipment List D

27″/67cm metal zipper in polar fleece poncho
eyeglasses
handkerchief
Rubbermaid 38L container (50 x 37 x 27cm)
4 standard size steps with rail
plastic grocery bag holding 4lb/2kg weight

Training:

Training may be provided by the authors as a half-day workshop. There is a training DVD available in English for a cost of $29.00 Canadian including shipping. Only cheque or money orders are processed.

Alternative forms of the CAHAI

CAHAI-7, CAHAI-8, CAHAI-9

Client suitability

Can be used with:

Clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Should not be used in:

To date, there is no information on restrictions of using the CAHAI.

In what languages is the measure available?

English, French, German, Hebrew, Italian

Summary

What does the tool measure?	The CAHAI assess upper limb functional recovery.
What types of clients can the tool be used for?	The CAHAI can be used with, but is not limited to clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	An average of 20 to 25 minutes
Versions	CAHAI, CAHAI-9, CAHAI-8, CAHAI-7.
Other Languages	English, French, German, Hebrew and Italian.
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: Two studies have examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the CAHAI and its shortened versions and reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. using Cronbach’s alpha. Test-retest: One study examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the CAHAI and reported excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). using using the Intraclass Correlation Coefficient (ICC)Intraclass correlation (ICC) is used to measure inter-rater reliability for two or more raters. It may also be used to assess test-retest reliability. ICC may be conceptualized as the ratio of between-groups variance to total variance.. Intra-rater: No studies have examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the CAHAI. Inter-rater: One study examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the CAHAI and reported excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. using ICC.
Validity	Content: One study examined the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension. of the CAHAI and reported that items were generated from a review of scientific literature and from input from clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., their family and caregivers. Items with poor frequency endorsement, difficulty to be standardized, and high inter-item correlation were eliminated. Criterion: Concurrent: One study examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the CAHAI and the CAHAI-9 and reported that the CAHAI-9 was not able to predict individual scores and individual change scores of the CAHAI, using regression analysis. Predictive: No studies have examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the CAHAI. Construct: Convergent: Three studies examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the CAHAI and reported excellent correlations between all versions of the CAHAI and the Action Research Arm Test, and all versions of the CAHAI and the Chedoke-McMaster Stroke Assessment (CMSA), and poor to moderate correlations between the CAHAI and the CMSA shoulder pain score, using Pearson CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. . Known Groups: Three studies examined longitudinal/known groups validity of all versions of the CAHAI and reported that all versions are able to distinguish changes between subjects with acute and chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., and mild from severe impairments, using ROC curve (Receiver Operation Characteristic).
Floor/Ceiling Effects	No studies have examined the floor/ceiling effects of the CAHAI.
SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." / SpecificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).	No studies have examined the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." /specificity of the CAHAI.
Does the tool detect change in patients?	One study examined the responsivenessThe ability of an instrument to detect clinically important change over time. of the CAHAI and reported that the minimal detectable change between two evaluations in stable patients was 6.3 points.
Acceptability	The CAHAI is highly accepted by clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. since is made up of real-life and non-gender specific items.
Feasibility	The administration of the CAHAI is easy and quick to perform.
How to obtain the tool?	The CAHAI can be obtained free of charge by visiting the official website: http://www.cahai.ca

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the Chedoke Arm and Hand Activity Inventory (CAHAI) in individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. We identified four studies. The CAHAI appears to be responsive in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Floor/Ceiling Effects

No studies have examined floor/ceiling effects of the CAHAI.

Reliability

Internal ConsistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.:
Barreca, Gowland, Stratford, Huijbregts, Griffiths, Torresin, Dunkley, Miller, and Masters (2004) assessed the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the CAHAI in 100 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the CAHAI, as calculated using Cronbach’s Coefficient Alpha was excellent (α = 0.98).

Barreca, Stratford, Masters, Lambert, Griffiths, and McBay (2006) examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the CAHAI-7, CAHAI-8, and CAHAI-9 in 39 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of all shortened versions of the CAHAI, as calculated using Cronbach’s Coefficient Alpha, was excellent (α = 0.97; α = 0.98; α = 0.98, respectively).

Test-retest:
Barreca et al. (2006) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the shortened version of the CAHAI in 39 clients with stroke. Participants were stratified into two different groups based on the amount of expected improvement. Participants were re-assessed following a 36 hour interval. The test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
as calculated using Intraclass Correlation Coefficient (ICC)Intraclass correlation (ICC) is used to measure inter-rater reliability for two or more raters. It may also be used to assess test-retest reliability. ICC may be conceptualized as the ratio of between-groups variance to total variance. was excellent for all shortened versions: CAHAI-7 (ICC = 0.96), CAHAI-8 (ICC = 0.97), and CAHAI 9 (ICC = 0.97).

Intra-rater:
No studies have examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the CAHAI.

Inter-rater:
Barreca, Stratford, Lambert, Masters, and Streiner (2005) assessed the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the CAHAI in 39 clients with stroke. Participants were stratified into two different groups based on the amount of expected improvement. Participants were re-assessed following a 36 hours interval. The inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
as calculated using Intraclass Correlation Coefficient (ICC)Intraclass correlation (ICC) is used to measure inter-rater reliability for two or more raters. It may also be used to assess test-retest reliability. ICC may be conceptualized as the ratio of between-groups variance to total variance., was excellent (ICC = 0.98).

Validity

Content:

Barreca et al. (2004) performed a literature review to generate items for the CAHAI. From this review, 177 items were selected. Eighty-one clients with stroke, their families and caregivers were surveyed about important and relevant items regarding strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. recovery, which generated an additional 574 items. To reduce the 725 generated items to 26 items, only bilateral, gender-neutral items, that fell into the domains identified by the clients as important that were easy to obtain were kept. This version, with 26 items, was then tested in 20 participants with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Items that were difficult to standardize or those with the potential for safety concerns were eliminated. Items with a high degree of difficulty were added in order to minimize possible ceiling effects. Inter-item correlation analyses of this new version (which contained 25 items), identified some redundant items (r > 0.90). Items with poor frequency endorsement, difficulty to standardize and high inter-item correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
were eliminated, resulting in the 13 finalized items.

Criterion:

Concurrent:
Barreca, Stratford, Masters, Lambert, & Griffiths (2006b) examined the ability of the CAHAI-9 to predict the scores and change scores of the original CAHAI in 105 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Mean scores and mean change scores of the CAHAI-9 accurately predicted means scores and mean change scores of the CAHAI. However, individual scores and individual change scores of the CAHAI-9 displayed moderate variability in predicting individual scores and change scores of CAHAI. The findings indicate that the CAHAI-9 should not be administered with the intent to predict the CAHAI.

Construct:

Convergent/Discriminant:
Barreca et al. (2005) estimated convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the CAHAI by comparing it to Chedoke-McMaster StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Assessment (CMSA – Gowland, Stratford, Ward, Moreland Torresin, VanHullenaar et al., 1993; Gowland, VanHullenaar, Torresin, et al., 1995) arm-hand sum score, and with the Action Research Arm Test (ARAT – Lyle, 1981) in 39 participants with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Assessments were performed at baseline and 2 to 6 weeks later. Correlations, as calculated using Pearson CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
Coefficient were excellent between the CAHAI and the ARAT (r = 0.93) and between the CAHAI and the CMSA arm-hand at baseline (r = 0.81) and at follow up (r = 0.89). In the same study, the authors analyzed discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess.
of the CAHAI by comparing it to the CMSA shoulder pain score in the same 39 participants with stroke. The correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the CAHAI and CMSA shoulder pain score as calculated using Pearson CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, was adequate at baseline (r = 0.47) and at follow-up (r = 0.39).

Barreca et al. (2006) assessed the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the CAHAI-7, CAHAI-8 and CAHAI-9 by comparing them to the Action Research Arm Test (ARAT), CAHAI and CMSA in 39 individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Pearson Correlations were used. Correlations between the ARAT and CAHAI-7 (r = 0.95), CAHAI-8 (r = 0.95) and CAHAI-9 (r = 0.94) were all excellent , as well as between the CAHAI and all the shortened versions (r = 0.99), and between the CMSA and CAHAI-7 (r = 0.85), CAHAI-8 (r = 0.84), and CAHAI-9 (r = 0.84).

Barreca et al., (2006b) determined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the CAHAI-9 and CAHAI by comparing them to the ARAT (Lyle, 1981) in 105 individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Re-assessments were performed with a 36 hours interval. Pearson CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
Coefficients were excellent between the CAHAI-9 and ARAT at baseline (r = 0.93), and at follow-up (r = 0.95), as well as between the CAHAI at baseline (r = 0.93), and at follow-up (r = 0.95).

Known groups:
Barreca et al. (2005) analyzed the longitudinal validityLongitudinal validity is the extent to which changes on one measure will correlate with changes on another measure.
of the CAHAI in 39 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. by comparing change scores on the CAHAI with change scores on the arm-hand sum and on the shoulder pain dimensions of the Chedoke-McMaster StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Assessment (CMSA – Gowland et al., 1995) and on the Action Research Arm Test (ARAT – Lyle, 1981). Change scores correlations, as calculated using Pearson Correlation Coefficient, was excellent between the CAHAI and the ARAT (r = 0.86), adequate between the CAHAI and the CMSA arm-hand sum (r = 0.52) and poor between the CAHAI and the CMSA shoulder pain (r = -0.24). In a second analysis, Barreca et al. (2005) analyzed whether the CAHAI was more adept then the CMSA and the ARAT at distinguishing change in patients with mild/moderate impairments from patients with severe impairments in 39 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Longitudinal/known groups validityThe degree to which an assessment measures what it is supposed to measure.
, as calculated using Receiver Operating Characteristic (ROC) demonstrated an excellent area under the curve for the CAHAI (ROC = 0.95). The ARAT and CMSA presented an adequate area under the curve (ROC = 0.88; ROC = 0.76), respectively.
Note: ROC curve analysis quantifies a measure’s ability to distinguish between groups as an area under the ROC curve. Greater areas indicate the measure is better at discriminating between individuals in the two groups.

Barreca et al. (2006) assessed the longitudinal validityLongitudinal validity is the extent to which changes on one measure will correlate with changes on another measure.
of the CAHAI and its three shortened versions in 39 participants with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were divided according to stroke’s severity into acute and chronic groups. The CAHAI, CAHAI-7, CAHAI-8, and CAHAI-9 were administered at admission and discharge (2 to 6 weeks after admission) to verify which version was more adept to detecting changes in patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. from patients with chronic stroke. Longitudinal/known groups validityThe degree to which an assessment measures what it is supposed to measure.
, as calculated using Receiver Operating Characteristic (ROC) demonstrated an excellent area under the curve for all versions of the CAHAI as follows: CAHAI (ROC = 0.95); CAHAI -7 (ROC = 0.97); CAHAI-8 (ROC = 0.93), and CAHAI-9 (ROC = 0.94), meaning all versions of CAHAI are equally able to distinguish changes between different groups in strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Barreca et al. (2006b) examined the longitudinal validityLongitudinal validity is the extent to which changes on one measure will correlate with changes on another measure.
of the CAHAI, CAHAI-9 and the ARAT in 105 individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were stratified between mild/moderate impairments and severe impairments, and those with mild/moderate impairments were expected to show greater changes across two repeated measures. The three outcome measures were administered at two points in time to verify which of them were more adept to detecting changes in clients with mild/moderate impairment from clients with severe impairment. Longitudinal/known groups validityThe degree to which an assessment measures what it is supposed to measure.
, as calculated using Receiver Operating Characteristics, were adequate for the ARAT (ROC = 0.72), the CAHAI -9 (ROC = 0.82), and the CAHAI (ROC = 0.86). This ROC analysis indicated that the CAHAI was the best measure to detect change among patients with mild/moderate impairment from patients with severe impairment.

Responsiveness

Barreca et al. (2005) assessed the minimal detectable change of the CAHAI in 39 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were assessed at two points in time: at admission, and after 2 to 6 weeks. For the CAHAI, the minimal detectable change was 6.3 points, meaning that stable patients displayed random fluctuations of 6.3 CAHAI points or less when assessed on two different occasions.

References

Barreca, S.R., Gowland, C.K., Stratford, P.W., et al. (2004). Development of the Chedoke Arm and Hand Activity Inventory: Theoretical constructs, item generation, and selection. Topics in Stroke Rehabilitation, 11(4), 31- 42.
Barreca, S.R., Stratford, P.W., Lambert, C.L., Masters, L.M., & Streiner, D.L. (2005). Test-retest reliability, validity, and sensitivity of the Chedoke Arm and Hand Activity Inventory: a new measure of upper-limb function for survivors of stroke. Archives of Physical Medicine and Rehabilitation, 86, 1616-1622.
Barreca, S.R., Stratford, P.W., Masters, L.M., Lambert, C.L., Griffiths, J., McBay, C. (2006). Validation of three shortened versions of the Chedoke Arm and Hand Activity Inventory. Physiotherapy Canada, 58, 148-156.
Barreca, S.R., Stratford, P.W., Masters, L.M., Lambert, C.L., Griffiths, J. (2006b). Comparing two versions of the Chedoke Arm and Hand Activity Inventory with the Action Research Arm Test. Physical Therapy, 86(2), 245-253.
Gowland, C., Stratford, P., Ward, M., Moreland, J., Torresin, W., VanHullenaar, S. et al.(1993). Measuring physical impairment and disability with the Chedoke-McMaster Stroke Assessment. Stroke, 24,58-63.
Gowland, C., VanHullenaar, S., Torresin, W., et al. (1995). Chedoke-McMaster Stroke Assessment: development, validation, and administration manual. Hamilton, ON, Canada: School of Rehabilitation Science, McMaster University.
Heller, A., Wade, D.T., Wood, V.A., Sunderland, A., Hewer, R., & Ward, E. (1987). Arm function after stroke: measurement and recovery over the first three months. Journal of Neurology, Neurosurgery & Psychiatry, 50(6), 714-719.
Keith, R.A, Granger, C.V., Hamilton, B.B., & Sherwin, F.S. (1987). The Functional Independence Measure: a new tool for rehabilitation. In: Eisenberg, M.G. & Grzesiak, R.C. (Ed.), Advances in clinical rehabilitation (pp. 6-18). New York: Springer Publishing Company.
Kellor, M., Frost, J., Silberberg, N., Iversen, I., & Cummings R. (1971). Hand strength and dexterity. American Journal of Occupational Therapy, 25, 77-83.
Lyle, R.C. (1981). A performance test for assessment of upper limb function in physical rehabilitation treatment and research. International Journal of Rehabilitation and Research, 4, 483-492.
Mathiowetz, V., Kashman, N., Volland, G., Weber, K., Dowe, M., & Rogers, S. (1985). Grip and pinch strength: normative data for adults. Archives of Physical and Medicine and Rehabilitation, 66, 69-72.
Mathiowetz, V., Weber, K., Kashman, N., & Volland, G. (1985b). Adult norms for the nine hole peg test of finger dexterity. Occupational Therapy Journal of Research, 5, 24 -33.

See the measure

How to obtain the CAHAI

The CAHAI can be obtained free of charge by visiting the official website: http://www.cahai.ca

Comprehensive Coordination Scale (CCS)

Evidence Reviewed as of before: 11-11-2021

Author(s)*: Sandra R. Alouche; Marika Demers; Roni Molad ; Mindy F. Levin

Purpose

In-Depth Review

Purpose of the measure

The Comprehensive Coordination Scale (CCS) is a measure of coordination of multiple body segments at both motor performance (endpoint movement) and quality of movement (joint rotations and interjoint coordination) levels based on observational kinematics. Coordinated movements are defined as movements of one or more limbs or body segments that occur together in identifiable temporal (i.e., timing) and spatial (i.e., positional/angular) patterns, concerning the desired action. It can be measured at a specific point in time during the movement or over the whole movement time.

The CCS can be used by healthcare professionals to assess coordination in older adults and individuals with various neurological conditions. The CCS is composed of six different tests: the Finger-to-Nose Test, the Arm-Trunk Coordination Test, the Finger Opposition Test, the Interlimb Coordination (synchronous anti-phase forearm rotations) Test, the Lower Extremity MOtor COordination Test (LEMOCOT) and the Four-limb Coordination (Upper and lower limb movements) Test.

Available versions

The CCS was developed by Alouche et al. (2021) from valid and reliable tests used in clinical practice and research to assess complementary aspects of motor coordination of the trunk, upper limb (UL), lower limb (LL) and combinations of them. Behavioral elements used to perform each test were identified and rating scales were developed to guide observational kinematic analysis by expert consensus (Alouche et al., 2021).

Features of the measure

Items:
The CCS consists of 6 different tests used in either clinical practice or research to assess complementary aspects of motor coordination of the trunk, upper limb (UL), lower limb (LL) and combinations of them.

Finger-to-Nose Test (FTN)
Arm-Trunk Coordination Test (ATC)
Finger Opposition Test (FOT)
Interlimb Coordination Test (ILC-2)
Lower Extremity MOtor COordination Test (LEMOCOT)
Four-limb Coordination Test (ILC-4)

Body parts tested	Type of test	Test	Behavioral elements scored
Upper limb	Unilateral	Finger-to-Nose (FTN)	Spatial: Stability, smoothness, accuracy Temporal: Speed
Trunk and arm	Unilateral	Arm-Trunk Coordination test (ATC)	Spatial: Accuracy, interjoint coordination
Upper limb (fine dexterity)	Unilateral	Finger Opposition (FOT)	Spatial: Selectivity Temporal: Timing
Interlimb coordination=both upper limbs	Bilateral	Alternate movements of two upper limbs (ILC-2)	Spatial: Compensation Temporal: Synchronicity/ timing
Lower limb	Unilateral	Lower Extremity MOtor COordination Test (LEMOCOT)	Spatial: Smoothness, accuracy Temporal: Speed
Four-limb coordination = upper limbs and lower limbs	Bilateral	Alternate movements of both hands and feet (ILC-4)	Temporal: Timing/ complexity

Scoring:
Multiple behavioral elements of each test are scored on separate rating scales ranging from 3 (normal coordination) to 0 (impaired coordination) to assess different elements of motor behavior needed to perform the action.
The CCS includes a total of 13 rating scales for the 6 tests.
The CCS score ranges from 0 to 69 points, with higher scores indicating better motor coordination. The CCS total score represents a coordination score for the whole body.
The CCS scores can be broken into 4 subscores: UL, LL, Unilateral, Bilateral.
UL: 54 points (includes FTN-24 points, ATC-12 points, FOT-12 points, and ILC2-6 points).
LL: 12 points (includes LEMOCOT-12 points).
Unilateral: 30 points (includes FTN-12 points, ATC-6 points, FOT-6 points, and LEMOCOT-6 points).
Bilateral: 9 points (includes ILC2-6 points and ILC4-3 points).
The manual describes the initial position, the instructions, and the detailed scoring.

What to consider before beginning:
The CCS is scored based on observational kinematics.

Time:
The CCS takes approximately 10-15 minutes to administer (Molad et al., 2021).

Training requirements:
The healthcare professional should read the CCS manual available on Open Science Framework: Marika Demers, Mindy F Levin, Roni Molad, and Sandra Alouche. 2021. “Comprehensive Coordination Scale.” OSF. July 12. osf.io/8h7nm.

Equipment:

Chair with back support and without armrests (suggested seat height: 46 cm)
Footstool, if needed
Targets:
- One 2.54 cm-diameter sticker (FNT)
- One target (sphere of 2.54 cm-diameter or a cube of similar dimensions) on an adjustable height support (ATC)
- Two 5 cm-diameter stickers placed 30 cm (centre-to-centre) apart and attached to a cardboard (LEMOCOT test)
Stopwatch / timer
Table (optional, suggested height: 72 cm)
Pillow (optional)

Client suitability

Can be used with:

Individuals with neurological disorders

Should not be used with:

No information availble

In what languages is the measure available?

English

Summary

What does the tool measure?	Temporal and spatial aspects of coordination.
What types of clients can the tool be used for?	The CCS can be used with patients with neurological disorders.
Is this a screening or assessment tool?	Assessment tool.
Time to administer	10-15 minutes.
ICF Domain	Body function.
Other Languages	French Canadian, Portuguese (both not published)
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: One study has reported high internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the CCS in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population (Molad et al., 2021). Test-retest: One study examined test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the CCS within a stroke population and reported excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). (ICC = 0.97; 95% CI: 0.93-0.98; Molad et al., 2021). Intra-rater: One study examined intra-rater reliability of the CCS within a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population and reported excellent intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. (ICC = 0.97; 95% CI: 0.93-0.98; Molad et al., 2021). Inter-rater: One study examined intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the CCS within a stroke population and reported excellent intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. (ICC = 0.98, 95% CI: 0.95-0.99; Molad et al., 2021).
ValidityThe degree to which an assessment measures what it is supposed to measure.	Content: One study has examined the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension. of the CCS. Using a Delphi Study done by a panel of experts. The CCS was found to have strong content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension. (Alouch et al., 2021). Criterion: Concurrent: Concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the CCS has not been examined within a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population. Predictive: Predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the CCS has not been examined within a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population. Construct: Convergent/Discriminant: One study has examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the CCS within a stroke population and reported: Adequate convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. with Fugl-Meyer-Total Score (ρ=0.602; p=0.001) and Fugl-Meyer-Motor Score (ρ=0.585; p<0.001) (Molad et al, 2021). Known Groups: One study has examined the known-group validityThe degree to which an assessment measures what it is supposed to measure. of the upper-limb Interlimb Coordination Test (ICL2), a subscale of the CCS, within a stroke population and reported that the ICL2 is able to distinguish between aged-match healthy individiuals and chronic stroke survivors (Molad & Levin, 2021).
Floor/Ceiling Effects	One study reported excellent floor and ceiling effects for the CCS (Molad et al., 2021).
Does the tool detect change in patients?	No studies have reported on the responsivenessThe ability of an instrument to detect clinically important change over time. of the CCS within a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.
Acceptability	The CCS is non-invasive and quick to administer. The use of visual observation instead of complex and costly motion analysis equipment to analyze movement makes this scale clinically accessible and easy to use.
Feasibility	The CCS is free and is suitable for administration in various settings. The assessment requires minimal specialist equipment or training. It takes 10-15 minutes to be completed.
How to obtain the tool?	Alouche SR, Molad R, Demers M, Levin MF. Development of a Comprehensive Outcome Measure for Motor Coordination; Step 1: Three-Phase Content ValidityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension. Process. Neurorehabil Neural Repair. 2021 Feb;35(2):185-193. doi: 10.1177/1545968320981955. [Supplementary materials] The CCS manual can be accessed on the Open Science Framework website: Marika Demers, Mindy F Levin, Roni Molad, and Sandra Alouche. 2021. “Comprehensive Coordination Scale.” OSF. July 12. osf.io/8h7nm.

Psychometric Properties

Overview

A literature search was conducted to identify all relevant publications on the psychometric properties of the Comprehensive Coordination Scale (CCS) in individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. We identified two studies.

Floor/Ceiling Effects

Molad et al. (2021) examined floor/ceiling effects of the CCS in a sample of 30 participants with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. There were no floor/ceiling effects for the total score of the CCS and CCS-Bilateral subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
. For the CCS-UL and CCS-LL subscales, 3.3% and 6.7% of participants reached the maximal score, respectively. Ten percent of participants scored 0 or 30 on the CCS-Unilateral subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
.

Reliability

Intra-rater:
Molad et al. (2021) assessed the intra-rater reliability of the CCS in 30 chronic stroke survivors. The intra-rater reliability was evaluated with intraclass correlation coefficients (ICC) with 95% confidence intervals (CI). The CCS has excellent intra-rater reliability (ICC = 0.97; 95%; CI: 0.93-0.98). All four subscales also have excellent intra-rater reliability: CCS-UL subscale (ICC = 0.96; 95%; CI: 0.92-0.98), CCS-LL subscale (ICC = 0.79; 95%; CI: 0.36-0.92), CCS-Unilateral (ICC = 0.98; 95%; CI: 0.96-0.99) and CCS-Bilateral scores (ICC = 0.95; 95%CI: 0.89-0.97).

Inter-rater:
Molad et al. (2021) assessed the inter-rater reliability of the CCS in 30 chronic stroke survivors. The inter-rater reliability was evaluated with intraclass correlation coefficients (ICC) with 95% confidence intervals (CI). The CCS has excellent inter-rater reliability (ICC = 0.98; 95%; CI: 0.95-0.99). All four subscales also have excellent inter-rater reliability: CCS-UL subscale (ICC = 0.96; 95%; CI: 0.91-0.98), CCS-LL subscale (ICC = 0.76; 95%; CI: 0.25-0.9), CCS-Unilateral scores (ICC = 0.99; 95%; CI: 0.97-0.99) and CCS-Bilateral (ICC = 0.95; 95%; CI: 0.89-0.98).

Validity

Content:
Alouche et al. (2021) conducted a 3-phase content validation supporting the importance, level of comprehension and feasibility of the CCS in identifying and quantifying coordination of movements made by individuals with neurological deficits in a clinical setting. First, a literature review was performed to generate unilateral and bilateral tests of UL, LL, and trunk coordination currently used in clinical practice or research studies for the CCS. From the 2761 studies reviewed, 5 tests were selected: FTN, ATC, LEMOCOT, ILC2, and ILC4. A Delphi study, using a structured questionnaire with open-ended questions, was done with 8 expert clinicians and researchers to identify the relative importance of each test, test element, and rating scales, the level of comprehension of the instructions, and the feasibility of each test. Then, a focus group meeting was held with 6 experts to refine the instructions and the rating scales. A consensus was reached to add the Finger Opposition Test (FOT) to the final version of the CCS to assess the selectivity and timing of finger movements.

Criterion:
Concurrent:
No studies have reported on the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the CCS.

Construct:
Convergent/Discriminant:
Molad et al. (2021) examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
in a sample of 30 chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. survivors. Convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the total CCS was measured with the Fugl-Meyer Assessment (total score and motor score). Adequate convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the CCS with FMA-Total Score (ρ=0.602; p=0.001) and FMA-Motor Score (ρ=0.585; p<0.001) was obtained. The convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the subcales was measured with the Fugl-Meyer Assessment, prehension and pinch strength, Box and Blocks and 10-meter walk test. CCS-UL and CCS-Unilateral scores were moderate to strongly correlated with the Fugl-Meyer Assessment (total score and motor score), prehension and pinch strength, Box and Blocks and 10-meter walk test. The CCS-LL subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
was moderately correlated with the Fugl-Meyer Assessment (total score and motor score) and the Box and Blocks. The CCS-Bilateral subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
was moderately correlated with the Fugl-Meyer Assessment (total score and UL motor score) and the Box and Blocks.

Known Group:
Molad & Levin (2021) examined the known group validityThe degree to which an assessment measures what it is supposed to measure.
of the ILC2 subscale in a sample of 13 strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. survivors and 13 healthy participants. They compared ILC2 scores with trunk and upper limb kinematics during synchronous bilateral anti-phase forearm rotations in 4 conditions: self-paced internally-paced, fast internally-paced, slow externally-paced, and fast externally-paced. Healthy participants had near maximal ILC2 scores and high temporal and spatial coordination indices. However, participants with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. had lower ILC2 scores and used trunk and shoulder compensations to perform the task. ILC2 scores distinguished between healthy participants and participants with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Responsiveness

The responsivenessThe ability of an instrument to detect clinically important change over time.
for the CCS has not been established.

Measurement error:
Molad et al. (2021) examined the measurement error in a sample of 30 chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. survivors. The standard error of the measurement (SEM) was calculated based on the standard deviation (SD) of the sample and the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of measurement. The minimal detectable change (MDC)Minimal Detectable Change (MDC) refers to the minimal amount of change outside of error that reflects true change by a patient between two time points (rather than a variation in measurement). at the 95% confidence level was computed. The CCS SEM was 1.80 points and the MDC95 was 4.98 points. The SEM and MDC values for the CCS, the CCS-UL, CCS-Unilateral and CCS-bilateral were less than 17%. Only the CCS-LL had an MDC greater than 17%. For the CCS and all subscales, the SEM was smaller than the MDC.

References

Alouche, S.R., Molad, R., Demers, M., Levin, M.F. (2021) Development of a Comprehensive Outcome Measure for Motor Coordination; Step 1: Three-Phase Content Validity Process. Neurorehabil Neural Repair. 35(2):185-193. doi: 10.1177/1545968320981955. PMID: 33349134.

Molad, R., Alouche, S.R., Demers, M., Levin, M.F. (2021) Development of a Comprehensive Outcome Measure for Motor Coordination, Step 2: Reliability and Construct Validity in Chronic Stroke Patients. Neurorehabil Neural Repair. 35(2):194-203. doi: 10.1177/1545968320981943. PMID: 33410389.

Molad, R., & Levin, M. F. (2021) Construct validity of the upper-limb Interlimb Coordination Test (ILC2) in stroke. Neurorehabil Neural Repair [epub ahead of print]. doi: 10.1177/1545968321105809. PMID: 34715755

See the measure

The tool is available as supplementary material in:
Alouche SR, Molad R, Demers M, Levin MF. Development of a Comprehensive Outcome Measure for Motor Coordination; Step 1: Three-Phase content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension.
Process. Neurorehabil Neural Repair. 2021 Feb;35(2):185-193. doi: 10.1177/1545968320981955. [Supplementary materials]

The CCS manual can be accessed on the Open Science Framework website:
Marika Demers, Mindy F Levin, Roni Molad, and Sandra Alouche. 2021. “Comprehensive Coordination Scale.” OSF. July 12. osf.io/8h7nm.

Disabilities of the Arm, Shoulder and Hand (DASH)

Evidence Reviewed as of before: 19-06-2012

Author(s)*: Annabel McDermott, OT

Editor(s): Nicol Korner-Bitensky, PhD OT

Expert Reviewer: Natasha Lannin (Associate Professor, OT)

Content consistency: Gabriel Plumier

Purpose

The Disabilities of the Arm, Shoulder and Hand (DASH) is a self-report questionnaire that measures disability and symptoms of upper limb musculoskeletal disorders.

In-Depth Review

Purpose of the measure

The Disabilities of the Arm, Shoulder and Hand (DASH) is a self-report questionnaire that measures physical function and symptoms of the upper limb. The DASH can be used for any joint and any musculoskeletal condition of the upper limb (Hudak et al., 1996; Veehof et al., 2002), which permits comparison across upper limb diagnoses (Atroshi et al., 2000). The DASH is intended for discriminative and evaluative purposes (Schmitt & Di Fabio, 2004).

The DASH demonstrates validity and responsivenessThe ability of an instrument to detect clinically important change over time.
in proximal and distal upper limb disorders (Beaton et al., 2001). The DASH demonstrated better clinimetric properties than other shoulder disability questionnaires including the Simply Shoulder Test (SST), American Shoulder and Elbow Surgeons Standardised Shoulder assessment Form (ASES) and the Shoulder Pain and Disability Index (SPADI – Bot et al., 2004).

Available versions

The DASH was developed by the American Academy of Orthopedic Surgeons, the Council of the Musculoskeletal Specialty Societies, and the Institute for Work and Health as a region-specific instrument to measure patients’ perception of disability and symptoms associated with any joint or condition of the upper limb (Hudak et al., 1996; Veehof et al., 2002).

The third edition of the DASH has been recently published to incorporate the latest research and new information regarding cross-cultural use of the measure.

Features of the measure

Items:

The DASH consists of 30 items that measure: (a) physical function (21 items); (b) symptom severity (5 items); and (c) social or role function (4 items).

Ability to do the following activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
:

Open a tight or new jar
Write
Turn a key
Prepare a meal
Push open a heavy door
Place an object on a shelf above your head
Do heavy household chores (e.g. wash walls, wash floors)
Garden or do yard work
Make a bed
Carry a shopping bag or briefcase
Carry a heavy object (over 5kg)
Change a light bulb overhead
Wash or blow dry your hair
Wash your back
Put on a pullover sweater
Use a knife to cut food
Recreational activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
that require little effort (e.g. card playing, knitting)
Recreational activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
that require taking some force or impact through the arm, shoulder or hand (e.g. golf, hammering, tennis)
Recreational activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
that require you to move the arm freely (Frisbee, badminton)
Managing transportation needs (getting from one place to another0
Sexual activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Extent to which arm, shoulder or hand problems interfered with normal social activities with family, friends, neighbours or groups
Extent to which arm, shoulder or hand problems limited work or other regular daily activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.

Severity of the following symptoms:

Arm, shoulder or hand pain
Arm, shoulder or hand pain when performing activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Tingling
Weakness
Stiffness
Difficulty in sleeping
Impact on self-image

The DASH also includes two optional modules regarding work and sports/performing arts that investigate the individual’s difficulty:

Using the usual technique for the activity (work; sport/instrument)
Performing the activity due to arm, shoulder or hand pain
Performing the as well as he/she would like
Spending the usual amount of time on the activity

Scoring:

The most recent version of the DASH uses a 5-point Likert scaleLikert scaling is one type of response to items in a questionnaire or tool. For example, Likert scaling would have you rate an item such as "I am satisfied with the care I received" on a scale using a 1-to-5 response scale where:
• 1 = strongly disagree
• 2 = disagree
• 3 = undecided
• 4 = agree
• 5 = strongly agree
You will find various options and scaling methods for the number of response choices (1-to-7, 1-to-9, 0-to-4). Odd-numbered scales usually have a middle value that is labelled Neutral or Undecided. Some tools used forced-choice Likert scaling with an even number of responses and no middle neutral or undecided choice. that rates the individual’s difficulties the preceding week. Lower scores indicate no difficulty, limitations or symptoms whereas higher scores indicate inability to perform tasks or extreme difficulties or symptomatology.

Items 1 – 21	1 = no difficulty 2 = mild difficulty 3 = moderate difficulty 4 = severe difficulty 5 = unable
Item 22	1 = not at all 2 = slightly 3 = moderately 4 = quite a bit 5 = extremely
Item 23	1 = not limited at all 2 = slightly limited 3 = moderately limited 4 = very limited 5 = unable
Items 24 – 28	1 = none 2 = mild 3 = moderate 4 = severe 5 = extreme
Optional work and sports/performing arts modules:	1 = no difficulty 2 = mild difficulty 3 = moderate difficulty 4 = severe difficulty 5 = unable

The DASH total score is calculated as a percentage (0=no disability to 100=maximal disability), using the following calculation:

[(Sum of completed responses ÷ number of completed responses) – 1] x 25

The final score for each optional module is calculated as follows:

[(Sum of completed responses ÷ 4) – 1] x 25

Note: A DASH total score cannot be calculated if more than 3 items have not been answered. Total scores for the additional modules cannot be calculated if there are any missing items.

Where 3 or fewer items have been missed, missing responses are replaced by the mean value of the responses to other items before summing.

Please note that earlier versions of the DASH use a different scoring system.

What to consider before beginning:

A study by Ring et al. (2006) showed a strong correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the DASH and measures of depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
(Center for Epidemiologic Studies – DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
) and anxiety (Pain Anxiety Symptoms Scale) in a sample of 235 patients with discrete hand problems (e.g. carpal tunnel syndrome, de Quervain tenosynovitis, lateral elbow pain, trigger finger, distal radial fracture). Subsequently, Lozano Calderon et al. (2010) conducted a study with 516 patients requiring hand surgery and adjusted DASH scores for the influence of depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
. This resulted in a significant decrease in the mean and standard deviation of DASH scores, although the decrease in variation was small. There was a high correlation between DASH and depression-adjusted DASH scores, indicating no notable benefit to adjusting DASH scores for depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
. Given the high incidence of depression among patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., consideration of the correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between disability and depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
should be considered when using the DASH.

Time:

The DASH takes approximately 5 minutes to administer with patients with musculoskeletal disorders (Bot et al., 2004). Administration with patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. may require more time and support materials.

Training requirements:

No specific training requirements are specified.

Equipment:

No specific equipment is required.

Alternative Forms of the Measure

The QuickDASH is an 11-item questionnaire that was developed from the DASH using a concept-retention’ approach (Beaton et al., 2005). The QuickDASH is comprised of the following items:

Open a tight or new jar
Do heavy household chores (e.g. wash walls, wash floors)
Carry a shopping bag or briefcase
Wash your back
Use a knife to cut food
Recreational activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
that require taking some force or impact through the arm, shoulder or hand (e.g. golf, hammering, tennis)
Extent to which arm, shoulder or hand problems interfered with normal social activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
with family, friends, neighbours or groups
Extent to which arm, shoulder or hand problems limited work or other regular daily activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Arm, shoulder or hand pain
Tingling
Difficulty in sleeping

The QuickDASH also retains the optional work and sports/performing arts modules (Beaton et al., 2005).

Like the DASH, the QuickDASH uses a 5-point Likert rating scale and the total score is calculated as a percentage (0=no disability – 100=most severe disability). At least 10 of the 11 items must be completed for correct use. The QuickDASH demonstrates similar test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
, validityThe degree to which an assessment measures what it is supposed to measure.
and responsiveness to the DASH and may demonstrate better precision in detecting different degrees of disability than the DASH. Although there is a high correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the QuickDASH and the DASH, an exact match between the numeric scores of the two assessments is not guaranteed (Beaton et al., 2005). Due to the smaller number of items, the QuickDASH is considered to be more efficient than the DASH (Beaton et al., 2005; Gummesson et al., 2006). However, the DASH is more suitable than the QuickDASH for use when monitoring"The process of checking the task over time for ‘quality control’ and the adjustment of behavior" (Stuss, 2009, p. 9-10)
arm pain and function over time in individual patients.

Client suitability

Can be used with:

Individuals with upper limb musculoskeletal impairment.
Due to limited research regarding patient acceptability, the DASH may be more suitable for patients with mild impairment.

Should not be used with:

Languages of the measure

Approved translations have been made in the following languages:

Afrikaans
Arabic
Armenian
Chinese (Hong Kong)
Chinese (Taiwan)
Czech
Danish
Dutch
English (Australia)
English (Hong Kong)
English (South Africa)
Finnish
French Canadian
French
German
Greek
Hebrew
Hungarian
Italian
Japanese
Korean
Lithuanian
Malay
Norwegian
Persian (Iran)
Polish
Portugese (Brazil)
Portugese (Portugal)
Romanian
Russian
Serbian
Sinhala (Sri Lanka)
Spanish (Argentina)
Spanish (Puerto Rico)
Spanish (Spain)
Swedish
Thai
Turkish

Translations are also in progress for the following languages:

Croatian
Estonian
Filipino
Isi-Xhosa
Latvian
Malayalam
Slovak
Spanish (Chile)
Spanish (Dominican Republic)
Ukrainian

Summary

What does the tool measure?	Upper extremity disability and pain.
What types of clients can the tool be used for?	Individuals with musculoskeletal disorders of the upper limb.
Is this a screening or assessment tool?	Assessment
Time to administer	Five minutes.
Versions	DASH QuickDASH
Other Languages	Afrikaans, Arabic, Armenian, Chinese (Hong Kong), Chinese (Taiwan), Czech, Danish, Dutch, English (Australia), English (Hong Kong), English (South Africa), Finnish, French Canadian, French, German, Greek, Hebrew, Hungarian, Italian, Japanese, Korean, Lithuanian, Malay, Norwegian, Persian (Iran), Polish, Portugese (Brazil), Portugese (Portugal), Romanian, Russian, Serbian, Sinhala (Sri Lanka), Spanish (Argentina), Spanish (Puerto Rico), Spanish (Spain), Swedish, Thai, Turkish.
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: No studies have reported on the internal consistency of the DASH among patients with stroke. Test-retest: No studies have reported on the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the DASH among patients with stroke. Intra-rater: No studies have reported on the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the DASH among patients with stroke. Inter-rater: No studies have reported on the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the DASH among patients with stroke.
Validity	Content: The DASH was developed by item generation (clinical expert input, literature review and patient focus groups) and item reduction (expert review, and psychometric and clinimetric analysis). One study that examined the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension. of the DASH in a sample of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. suggested a disordered rating scale structure and item hierarchy that is not suitable for clinical use. Criterion: Concurrent: No studies have reported on the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the DASH among patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Predictive: No studies have reported on the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the DASH among patients with stroke. Construct: Convergent/Discriminant: One study reported moderate correlations between manual ability and pain. Known Groups: No studies have reported on the known-groups validityThe degree to which an assessment measures what it is supposed to measure. of the DASH among patients with stroke.
Floor/Ceiling Effects	No studies have reported on the floor/ceiling effects of the DASH among patients with stroke.
Does the tool detect change in patients?	No studies have reported on the responsivenessThe ability of an instrument to detect clinically important change over time. among patients with stroke.
Acceptability	The DASH is simple to comprehend, quick to complete and is comprised of real-life, non-gender specific items. Due to limited research regarding patient acceptance, this tool may be more suitable for patients with mild impairment.
Feasibility	The DASH is a versatile measure that can be used for clinical or research purposes. However there is insufficient research regarding use of the DASH with patients with stroke and concerns that without testing, the clinical utility of the DASH remains unknown.
How to obtain the tool?	Visit the DASH website for more information: https://dash.iwh.on.ca/

Psychometric Properties

Overview

A literature search was conducted to identify all relevant publications on the psychometric properties of the DASH. While numerous studies have been conducted with other patient groups, this review specifically addresses the psychometric properties relevant to patients with stroke. At the time of publication there was 1 conference paper but no published studies specific to patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Floor/Ceiling Effects

No studies have reported on the floor/ceiling effects of the DASH in a sample of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The DASH demonstrates no floor or ceiling effects in patients with shoulder and combined shoulder-upper limb problems (Bot et al., 2004).

Reliability

Test-retest:
No studies have examined test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the DASH in a sample of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., although studies conducted among patient groups with other upper limb conditions indicate excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
(see: Atroshi et al., 2000; Bot et al., 2004; Beaton et al., 2001).

Intra-rater:
No studies have examined intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the DASH in a sample of patients with stroke.

Inter-rater:
No studies have examined inter-rater reliability of the DASH in a sample of patients with stroke.

Validity

Content:

The DASH was developed in two stages of item generation and item reduction. The first stage of item generation involved clinical expert input, review of 13 relevant outcome measurement scales and patient focus groups to identify possible items. The second stage of item reduction involved preliminary item review by three content experts, secondary review by a panel of 15 experts for content/face validity and item importance, and subsequent pre-testing on 20 individuals with upper extremity difficulties. Further item reduction was conducted by psychometric and clinimetric analysis among patients with upper limb conditions, including (i) field-testing in a cross-sectional study of 407 patients with various upper limb problems, and (ii) importance- and difficulty- rating in a second sample of 76 patients. This resulted in the 30-item questionnaire (Hudak et al., 1996; Marx et al., 1999).

Lannin et al. (2010) examined the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension.
of the DASH in a sample of 157 patients with stroke. Analysis of the original rating scale revealed a disordered structure; Rasch measurement modeling was used to transform ordinal ratings into a collapsed linear measure, which resulted in conformation to expectations of the model. The study also found that the hierarchy of the original 30 items is not appropriate for clinical use as there are few items suitable for the most disabled patient.

Franchignoni et al. (2010) investigated the dimensionality, rating scale diagnostics and model fit of the DASH (Italian version) on a sample of 238 patients with upper extremity disorders (excluding strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.). The authors noted that some items do not rely exclusively on upper limb function (e.g. item 9: Make a bed; item 20: manage transportation needs), and that items measure different ICF constructs (impairment, activity limitation and participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. restriction). The authors found that patients were not able to reliably use the 5-level rating scale. Factor analysis revealed 3 underlying constructs of: (i) manual functioning (items 1-5, 7-11, 16-18, 20, 21); (ii) shoulder range of motion (items 6, 12-15, 19); and (iii) symptoms and consequences (items 22-30). Two items (Tingling, Sexual ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
) showed misfit by Rash Analysis. While results from this study identify issues to consider when using the DASH, it is important to note that patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. were excluded from the sample population.

Criterion:

Concurrent:
No studies have reported on the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the DASH in a sample of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Predictive:
No studies have reported on the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the DASH in a sample of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Construct:

Convergent/Discriminant :
Lannin et al. (2010) conducted a comparison of the DASH with a self-report questionnaire of upper limb function and an observation upper limb movement assessment in 90 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The authors reported moderate correlations between manual ability and pain (statistical data not provided).

While no other studies have examined construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the DASH in a sample of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., numerous studies conducted among patient groups with other upper limb conditions report adequate to excellent correlations with constructs of function and pain (see: Atroshi et al., 2000; Beaton et al., 2001; Bot et al., 2004; Kirkley et al., 1998; Schmitt & Di Fabio, 2004; SooHoo et al., 2002; Turchin et al., 1998).

Known Group:
No studies have examined known-group validity of the DASH in a sample of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., although studies have been conducted among patient groups with other upper limb conditions (see: Beaton et al., 2001).

Responsiveness

No studies have examined responsiveness of the DASH in a sample of patients with stroke, although studies have been conducted among patient groups with other upper limb conditions (see: Beaton et al., 2001; Bot et al., 2004; MacDermid & Tottenham, 2004; Schmitt & Di Fabio, 2004).

References

Atroshi, I., Gummesson, C., Andersson, B., Dahlgren, E. & Johansson, A. (2000). The disabilities of the arm, shoulder and hand (DASH) outcome questionnaire: reliability and validity of the Swedish version evaluated in 176 patients. Acta Orthopaedica Scandinavica, 71(6), 613-8.
Beaton, D.E., Katz, J.N., Fossel, A.H., Wright, J.G., Tarasuk, V., & Bomardier, C. (2001). Measuring the whole or the parts? Validity, reliability, and responsiveness of the Disabilities of the Arm, Shoulder and Hand outcome measure in different regions of the upper extremity. Journal of Hand Therapy, 14, 128-46.
Beaton, D.E., Wright, J.G., Katz, J.N., and the Upper Extremity Collaborative Group. (2005). Development of the QuickDASH: comparison of three item-reduction approaches. The Journal of Bone and Joint Surgery, 87-A(5), 1038-46.
Bot, S.D.M., Terwee, C.B., van der Windt, D.A.W.M., Bouter, L.M., Dekker, J., & de Vet, H.C.W. (2004). Clinimetric evaluation of shoulder disability questionnaires: a systematic review of the literature. Annals of the Rheumatic Diseases, 63, 335-41.
Franchignoni, F., Biordano, A., Sartorio, F., Vercelli, S., Pascariello, B., & Ferriero, G. (2010). Suggestions for refinement of the Disabilities of the Arm, Shoulder and Hand outcome measure (DASH): a factor analysis and Rasch validation study. Archives of Physical Medicine and Rehabilitation, 91, 1370-7.
Gummesson, C., Ward, M.M., & Atroshi, I. (2006). The shortened disabilities of the arm, shoulder and hand questionnaire (QuickDASH): validity and reliability based on responses within the full-length DASH. BMC Musculoskeletal Disorders, 7(44). doi:10.1186/1471-2474-7-44.
Hudak, P.L., Amadio, P.C., Bombardier, C., and the Upper Extremity Collaborative Group. (1996). Development of an upper extremity outcome measure: the DASH (Disabilities of the Arm, Shoulder, and Hand). American Journal of Industrial Medicine, 29, 602-8.
Kirkley, A., Griffin, S., McLintock, H., & Ng, L. The development and evaluation of a disease-specific quality of life measurement tool for shoulder instability: The Western Ontario Shoulder Instability Index (WOSI). The American Journal of Sports Medicine, 26(6), 764-72.
Lannin, N. McCluskey, A. Cusick, A. Ashford, S. Ross, L. (2010) Measuring function in everyday life: enhancing the Disabilities of the Arm Shoulder Hand questionnaire for use post-stroke. World Federation of Occupational Therapy, Santiago, Chile, May.
Lozano Calderon, S.A., Zurakowski, D., Davis, J.S., & Ring, D. (2010). Quantitative adjustment of the influence of depression on the Disabilities of the Arm, Shoulder and Hand (DASH) questionnaire. Hand, 5, 49-55.
MacDermid, J.C. & Tottenham, V. (2004). Responsiveness of the Disabilities of the Arm, Shoulder and Hand (DASH) and patient-rated wrist/hand evaluation (PRWHE) in evaluating change after hand therapy. Journal of Hand Therapy, 17, 18-23.
Marx, R.G., Bombardier, C., Hogg-Johnson, S., & Wright, J.G. (1999). Clinimetric and psychometric strategies for development of a health measurement scale. Journal of Clinical Epidemiology, 52(2) 105-11.
Ring, D., Kadzielski, J., Fabien, L., Zurakowski, D., Malhotra, L.R., & Jupiter, J.B. (2006) Self-reported upper extremity health status correlates with depression. The Journal of Bone and Joint Surgery, 88-A(9), 1983-8).
Schmitt, J.S. & Di Fabio, R. (2004). Reliable change and minimum important difference (MID) proportions facilitated group responsiveness comparisons using individual threshold criteria. Journal of Clinical Epidemiology, 57, 1008-18.
SooHoo, N.F., McDonald, A.P., Seiler, J.G., & McGillivrary, G.R. (2002). Evaluation of construct validity of the DASH questionnaire by correlation to the SF-36. Journal of Hand Surgery, 27A, 537-41.
Turchin, D.C., Beaton, D.E. & Richards, R.R. (1998). Validity of observer-based aggregate scoring systems as descriptors of elbow pain, function and disability. The Journal of Bone and Joint Surgery, 80A(2), 154-62.
Veehof, M.M., Sleegers, E.J.A., van Veldhoven, N.H.M.J., Schuurman, A.H., & van Meeteren, N.L.U. (2002). Psychometric qualities of the Dutch language version of the Disabilities of the Arm, Shoulder, and Hand questionnaire (DASH-DLV). Journal of Hand Therapy, 15, 347-54.

See the measure

How to obtain the DASH?

You can obtain a copy of the DASH through https://dash.iwh.on.ca/

Frenchay Arm Test (FAT)

Evidence Reviewed as of before: 17-09-2012

Author(s)*: Katie Marvin, MPT

Editor(s): Annabel McDermott, OT

Purpose

The Frenchay Arm Test (FAT) is a measure of upper extremity proximal motor control and dexterity during ADL performance in patients with impairments resulting from neurological conditions. The FAT is an upper extremity specific measure of activity limitation.

In-Depth Review

Purpose of the measure

The Frenchay Arm Test (FAT) is a measure of upper extremity proximal motor control and dexterity during ADL performance in patients with impairments of the upper extremity resulting from neurological conditions. The FAT is an upper extremity specific measure of activity limitation.

Available versions

None typically reported.

Features of the measure

Description of tasks:

Clients sit comfortably at a table with hands on their lap; each test item starts from this position. Clients are then asked to use their affected arm to:

Stabilize a ruler, while drawing a line with a pencil held in the other hand. To pass, the ruler must be held firmly.
Grasp a cylinder (12 mm diameter, 5 cm long), set on its side approximately 15 cm from the table edge, lift it about 30 cm and replace it without dropping.
Pick up a glass, half full of water positioned about 15 to 30 cm from the edge of the table, drink some water and replace without spilling.
Remove and replace a sprung clothes peg from a 10mm diameter dowel, 15 cm long set in a 10 cm base, 15 to 30 cm from table edge. Not to drop peg or knock dowel over.
Comb hair (or imitate); must comb across top, down the back and down each side of head.

What to consider before beginning:

Before administering the FAT, the clinician should ensure that the client is able to comprehend either written or spoken language.
The FAT has been criticized for lacking assessment of quality of movement and performance (Kopp, 1997). In addition, clients were found to either pass or fail all or most subtests, indicating that the FAT may not be sensitive to change or subtleties in progress (Hsieh, Hsueh, Chiang & Lin, 1998), especially in clients performing in the upper range of arm function (Wade, et al., 1983).

Scoring and Score Interpretation:

Each item is scored as either pass (=1) or fail (=0). Total scores range from 0 to 5.

Time:

The FAT takes approximately 3 minutes to administer.

Training requirements:

None typically reported, however familiarity with the measure is recommended.

Equipment:

Ruler
Pencil
Paper
Cylinder (12mm diameter, 5 cm long)
Glass (Half filled with water)
Clothes peg
Dowel (15mm)
Hair comb

Alternative Forms of the FAT

None typically reported

Client suitability

Can be used with:

Clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.

Should not be used in:

Clients with difficulty understanding written and spoken language

Languages of the measure

English
French
Dutch

Summary

What does the tool measure?	The FAT measures upper extremity proximal control and dexterity during performance of functional tasks.
What types of clients can the tool be used for?	The FAT can be used with, but is not limited to clients with stroke.
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	The FAT takes approximately 3 minutes to administer.
Versions	There are no alternative versions of the FAT.
Other Languages	French and Dutch
Measurement Properties
Reliability	Intra-rater: One study examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the FAT in clients with stroke and found adequate to excellent intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. . Inter-rater: One study examined the inter-rater reliability of the FAT in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and found excellent inter-rater reliability.
Validity	SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." / SpecificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative). : Two studies compared the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." of the FAT with that of the Nine-Hole Peg Test (NHPT) and found the NHPT to be more sensitive than the FAT for detecting impaired upper extremity function in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Floor/Ceiling Effects	No studies have examined the floor/ceiling effects of the FAT in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Does the tool detect change in patients?	No studies have investigated the responsivenessThe ability of an instrument to detect clinically important change over time. of the FAT in clients with stroke.
Acceptability	The FAT has been criticized for lacking assessment of quality of movement and performance (Kopp, 1997). In addition, clients were found to either pass or fail all or most subtests, indicating that the FAT may not be sensitive to change (Hsieh, Hsueh, Chiang & Lin, 1998). The FAT is quick to complete and should not produce any undue fatigue for patients.
Feasibility	The FAT is short and easy to administer and score.
How to obtain the tool?	For more information on the FAT, please visit the article by Parker, Wade & Langton Hewer (1986).

Psychometric Properties

Overview

A literature search was conducted to identify all relevant publications on the psychometric properties of the Frenchay Arm Test (FAT) in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Two studies were found and have been reviewed in this module. More studies are required before definitive conclusions can be drawn regarding the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
and validityThe degree to which an assessment measures what it is supposed to measure.
of the FAT.

Floor/Ceiling Effects

No studies have examined the floor/ceiling effects of the FAT in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Reliability

Internal constancy:
No studies have examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the FAT in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Intra-rater:
Heller, Wade, Wood, Sunderland, Hewer, and Ward (1987) examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the FAT, Nine-Hole Peg Test (NHPT), Finger Tapping Rate (Lezak, 1983), and Grip Strength (Mathiowetz, Kashman, Volland, Weber, Dowe, & Rogers, 1985) in 10 patients with subacute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were re-assessed with a 2-week interval by the same rater. In this study, results describe the range of reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the four measures mentioned above, and values for each individual measure were not provided. Spearman rho correlation coefficient was adequate to excellent (ranging for all four measures from r = 0.68 to 0.99).
Note: Although is not possible to discern the exact value for the FAT reliability, all values were considered adequate to excellent and statistically significant, suggesting that the FAT may be reliable with stable strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. clients.

Inter-rater:
Heller et al. (1987) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the FAT, Nine-Hole Peg Test (NHPT), Finger Tapping Rate (Lezak, 1983), and Grip Strength (Mathiowetz et al., 1985) in 10 patients with subacute stroke. Participants were assessed twice within a week by two raters. Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients were excellent (ranging for all four measures from r = 0.75 to 0.99).
Note: In this study, individual values for each measure were not provided. Although is not possible to discern the exact value for the FAT reliability, all values were considered excellent.

Test-retest:
No studies have examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the FAT in clients with stroke.

Validity

Content:

No studies have examined the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension.
of the FAT in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Criterion:

Concurrent:
No studies have examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the FAT in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Predictive:
No studies have examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the FAT in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Construct:

Convergent/Discriminant:
No studies have examined the discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess.
of the FAT in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Known Groups:
No studies have examined the known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured.
of the FAT in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Sensitivity/specificity:
Heller et al. (1987) investigated the specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of the FAT and the Nine Hole Peg Test (NHPT) in 56 clients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. All of the clients that scored less than 5/5 on the FAT were correctly identified as having impaired dexterity, as identified by using the normal cut-off scores for the NHPT. However, 48 percent of patients that scored 5/5 on the FAT scored in the below normal range on the Nine Hole Peg Test. These results indicate that the NHPT is more sensitive than the FAT for detecting impaired upper extremity function in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Parker, Wade & Hewer (1986) compared the specificity of the FAT and the Nine-Hole Peg Test (NHPT) in 187 clients with sub-acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants that were able to successfully place nine pegs in the pegboard were further categorized according to those who completed the NHPT in less than 19 seconds (n=37) and those who required over 19 seconds (n=69). For the FAT, 114 participants score 5/5, 33 participants scored in the middle range (1/5 – 4/5) and 36 participants scored 0/5. Researchers concluded that the NHPT is more sensitive than the FAT because 13 percent of participants who scored perfectly on the FAT placed less than 9 pegs on the NHPT and all participants who scored perfectly on the NHPT (9 pegs placed in less than 19 seconds) also scored 5/5 on the FAT.

Responsiveness

No studies have examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the FAT in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

References

Heller, A., Wade, D.T., Wood, V.A., Sunderland, A., Langton Hewer, R., & Ward, E. (1987). Arm function after stroke: Measurement and recovery over the first three months. Journal of Neurology, Neurosurgery, and Psychiatry, 50, 714-719.
Hsieh, C-L., Hsueh, P. Chiang, F-M., & Lin, P-H. (1998). Inter-rater reliability and validity of the Action Research Arm Test in stroke patients. Age and Ageing, 27, 107-113.
Parker, V.M., Wade, D.T., & Langton Hewer, R. (1986). Loss of arm function after stroke: Measurement, frequency, and recovery. International Rehabilitative Medicine, 8, 69-73.
Wade, D.T., Langton-Hewer, R., Wood, V.A., Skilbeck, C.E., & Ismail, H.M. (1983). The hemiplegic arm after stroke: Measurement and recovery. Journal of Neurology, Neurosurgery and Psychiatry, 46, 521-524.

See the measure

For more information on the FAT, please review the article by Parker, Wade & Langton Hewer (1986).

Jebsen Hand Function Test (JHFT)

Evidence Reviewed as of before: 17-09-2012

Author(s)*: Jennifer Vissers

Editor(s): Annabel McDermott, OT; Nicol Korner-Bitensky, PhD OT

Purpose

The Jebsen Hand Function Test (JHFT) assesses fine motor skills, weighted and non-weighted hand function activities during performance of activities of daily living.

In-Depth Review

Purpose of the measure

The Jebsen Hand Function Test (JHFT) is a standardized evaluative measure of functional hand motor skills (Hummel et al., 2005).

Available versions

The JHFT was developed in 1969 by Jebsen, Taylor, Treischmann, Trotter, and Howard (Cook, McCluskey, & Bowman, 2006). The JHFT is also referred to as the Jebsen-Taylor Hand Function Test or the Jebsen-Taylor Test of Hand Function.

A 3-item version (Modified Jebsen Hand Function Test, MJT) was developed by Bovend’Erdt et al. (2004) to measure gross functional dexterity in patients with moderate unilateral or bilateral upper limb impairment.

An 8-item Australian version was developed by Agnew and Maas (1982). It consists of the original 7 items with the addition of a grip strength item, measured using the Jamar dynamometer (Cook, McCluskey, & Bowman, 2006).

Features of the measure

Items:

The JHFT consists of 7 items that measure: (a) fine motor skills; (b) weighted functional tasks; and (c) non-weighted functional tasks (Jebsen et al., 1969):

Writing a short sentence (24 letters, 3rd grade reading difficulty)
Turning over a 3×5 inch card
Picking up small common objects
Simulated feeding
Stacking checkers
Picking up large light cans
Picking up large heavy cans

Administration guidelines specify that testing begin with the non-dominant hand (Jebsen et al., 1969). Further details about the administration procedures of the JHFT can be found in the original article by Jebsen et al. (1969).

Items of the Modified Jebsen Hand Function Test (MJT) (Bovend’Erdt et al., 2004):

Turning over 5 cards
Stacking 4 cones
Spooning 5 kidney beans into a bowl (simulated feeding)

Scoring:

Each item is scored according to time taken to complete the task. Times are rounded to the nearest second (Spinal Cord Injury Rehabilitation Evidence, 2010). The scores for all 7 items are then summed for a total score. Jebsen et al. (1969) established norms with a sample of 300 healthy subjects of different age groups (20-29 years, 30-39 years, 40-49 years, 50-59 years, 60-94 years). With the exception of writing, all items took under 10 seconds to perform. See Jebsen et al. (1969) for norms according to age, gender and hand use (dominant/non-dominant).

What to consider before beginning:

It is necessary to identify the patient’s dominant hand before beginning the JHFT. When working with patients with stroke it is recommended to take into consideration the area(s) of cortical insult, as damage to areas of the brain responsible for speech and language function may affect performance on the writing task (Celink et al., 2007). Prior to beginning the writing task, individuals should be reminded to use reading glasses if necessary (Jebsen et al., 1969).

Time:

The JHFT requires 15 – 45 minutes to complete.

Training requirements:

No specific training is required.

Equipment:

The JHFT does not require standardized equipment but the following equipment is used (Jebsen et al., 1969):

wooden board (41 1/2 inches long x 11 1/4 inches wide x 3/4 inch thick)
ball point pen
8×11 inch sheets unruled paper
5×8 inch index cards
3×5 inch index cards
1 pound coffee can
1 inch paper clips
teaspoon
5 kidney beans
standard size wooden checkers
5 empty 303 cans
5 full (1 pound) 303 cans.

Test equipment can be collated by the clinician or purchased as pre-packaged assessment kits from suppliers including:

Performance Health (https://www.performancehealth.com/jamar-hand-function-test)
Mobility Smart (https://www.mobilitysmart.co.uk/jebsen-taylor-hand-function-test-kit.html)
Amazon (amazon.com)

Client suitability

Can be used with:

Clients with neurological or musculoskeletal conditions, e.g. strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., spinal cord injury, arthritis (Cook, McCluskey, & Bowman, 2006).
This assessment has been administered in clients over 8 years of age (Cook, McCluskey, & Bowman, 2006).

Should not be used with:

Individuals with speech and language disorders may have difficulty understanding instructions.
The writing task can be excluded for individuals with speech and language difficulties due to dominant cerebral hemisphere stroke (Beebe & Lang, 2009, 2007; Hummel et al., 2005).

Languages of the measure

English
Portuguese (Ferreiro, dos Santos, & Conforto, 2010)

Summary

What does the tool measure?	Hand function
What types of clients can the tool be used for?	The JHFT can be use with, but is not limited to clients with stroke.
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	15-45 minutes
Versions	JHFT Modified Jebsen Hand Function Test (MJT) JHFT Australian version, Portuguese version
Other Languages	English, Portuguese
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: One study reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the JHFT (Portuguese version), and adequate to excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of individual items. Test-retest: One study reported adequate to excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of JHFT individual items. One study reported excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the MJT. Intra-rater: One study reported excellent intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the JHFT (Portuguese version). Inter-rater: One study reported excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the JHFT (Portuguese version) and individual items.
ValidityThe degree to which an assessment measures what it is supposed to measure.	Content: No studies have examined the content validity of the JHFT. Criterion: Concurrent: Two studies reported excellent correlation between the JHFT and grip strength, pinch strength, Action Research Arm Test, Nine Hole Peg Test, and StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale – Hand Domain. One study reported an excellent correlation between the MJT and the Nine Hole Peg Test and an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with grip strength. Predictive: No studies have examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the JHFT. Construct: No studies have examined the construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed. of the JHFT. One study reported no significant difference in scores on the JHFT (Portuguese version) according to education level or hand dominance.
Floor/Ceiling Effects	No studies have examined the floor or ceiling effects of the JHFT.
SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." / SpecificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).	No studies have reported on the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." or specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative). of the JHFT.
Does the tool detect change in patients?	One study reported moderate responsivenessThe ability of an instrument to detect clinically important change over time. of the JHFT from 1 to 3 months post-stroke, and from 3 to 6 months post-stroke.
Acceptability	The JHFT is comprised of simple, familiar, and functional tasks. Consideration must be paid to individuals with speech and language difficulties, who may have difficulty understanding instructions and performing the writing task.
Feasibility	The JHFT is easy to administer and does not require standardized equipment.
How to obtain the tool?	Information regarding test administration is provided in: Jebsen, R.H., Taylor, N., Trieschmann, R.B., Trotter, M.J., & Howard, L.A. (1969). An objective and standardized test of hand function. Archives of Physical Medicine and Rehabilitation, 50(6), 311 – 319. Assessment kits can be purchased from: Performance Health (https://www.performancehealth.com/jamar-hand-function-test) Mobility Smart (https://www.mobilitysmart.co.uk/jebsen-taylor-hand-function-test-kit.html) Amazon (www.amazon.com)

Psychometric Properties

Overview

A literature search was conducted to identify all relevant publications on the psychometric properties of the Jebsen Hand Function Test (JHFT). While studies have been conducted with other patient groups, this review specifically addresses the psychometric properties relevant to patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. At the time of publication five studies were identified: three relating to the JHFT, and one each for the JHFT (Portuguese version) and the Modified Jebsen Hand Function Test (MJT).

Floor/Ceiling Effects

No studies have examined the floor or ceiling effects of the JHFT.

Reliability

Test-retest:
Jebsen et al. (1969) examined test-retest reliability of the JHFT in a sample of 26 patients with a range of upper limb conditions including hemiparesis from cerebral vascular disease (n=5), using Pearson’s correlation coefficient. Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of individual tasks was adequate to excellent (writing: r=0.67, 0.84; cards: r=0.91, 0.78; small objects: r=0.93, 0.85; simulated feeding: r=0.92, 0.60; checkers: r=0.99, 0.91; large light objects: r=0.89, 0.67; large heavy objects: r=0.89, 0.92, dominant and non-dominant hands respectively).

Bovend’Eerdt et al. (2004) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the Modified Jebsen Hand Function Test (MJT) in a sample of 26 individuals with neurological disorders including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=12), Multiple Sclerosis (n=7), head injury (n=4), and tumours (n=3). The mean time between retesting was 9.6 days. The study reported excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the MJT (r = 0.95), using Pearson’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient.

Intra-rater:
Ferreiro, dos Santos, & Conforto (2010) examined intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the JHFT (Portuguese version) with a sample of 40 patients with stroke and reported excellent intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
(ICC=0.997), using intraclass correlation coefficient (ICC)Intraclass correlation (ICC) is used to measure inter-rater reliability for two or more raters. It may also be used to assess test-retest reliability. ICC may be conceptualized as the ratio of between-groups variance to total variance..

Inter-rater:
Ferreiro, dos Santos, & Conforto (2010) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the JHFT (Portuguese version) with a sample of 40 patients with stroke using intraclass correlation coefficient (ICC)Intraclass correlation (ICC) is used to measure inter-rater reliability for two or more raters. It may also be used to assess test-retest reliability. ICC may be conceptualized as the ratio of between-groups variance to total variance., and reported excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
(ICC=1.0). Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
for individual items was also excellent (writing, ICC=0.999; card turning, ICC=0.977; small common objects, ICC=0.998; simulated feeding, ICC=0.991; checkers, ICC=0.995; large light objects, ICC=0.988; large heavy objects, ICC=0.991).

Validity

Content:

Criterion:

Concurrent:
Beebe & Lang (2009) examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the JHFT with grip and pinch strength (measured by dynamometer), the Action Research Arm Test (ARAT) , Nine Hole Peg Test (NHPT), and the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale – Hand domain (SIS-Hand) in a sample of 33 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Spearman’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
. Measures were administered at 1 month, 3 months and 6 months post-stroke. The JHFT demonstrated excellent correlations with grip strength (r=0.79-0.81), pinch strength (0.60-0.79), ARAT (r=0.87-0.95), NHPT (0.84-0.97) and SIS-Hand (0.61-0.83) at all time points.
Note: The study did not use the first task of the JHFT (writing a sentence) due to its dependence on hand dominance and education level.

Beebe & Lang (2007) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the JHFT with grip and pinch strength (measured by dynamometer), Action Research Arm Test (ARAT), 9-Hole Peg Test (NHPT), and StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale – Hand Function Subscale (SIS-Hand) in a sample of 32 participants with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Pearson’s product moment correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
. The JHFT demonstrated excellent correlations with ARAT (r=-0.89), grip strength (r=-0.76), pinch strength (r=-0.68), 9-HPT (r=-0.89), and SIS-Hand Function (r=-0.82).
Note: The study did not use the first task of the JHFT (writing a sentence) due to its dependence on hand dominance and education level.

Bovend’Eerdt et al. (2004) examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the Modified Jebsen Hand Function Test (MJT) with the University of Maryland Arm Questionnaire for Stroke (UMAQS), Nine Hole Peg Test (NHPT), and grip strength (measured by dynamometer) in a sample of 26 individuals with neurological disorders including stroke (n=12), Multiple Sclerosis (n=7), head injury (n=4), and tumours (n=3). Measures were administered on two occasions (T1, T2) on average 9.6 days apart. The MJT showed excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the NHPT (r=0.86 and 0.88 on T1 and T2 respectively) and adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with grip strength (r=0.44, significant on T2 only), using Pearson’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. Correlations between the MJT and UMAQS were not significant at either time point.

Construct:

No studies have examined the construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the JHFT.

Known Groups:
Ferreiro et al. (2010) reported no significant difference in scores on the JHFT (Portuguese version) according to education level or hand dominance in a sample of 40 patients with stroke.

Responsiveness

Beebe & Lang (2009) measured the responsiveness of the JHFT with a sample of 33 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using the single population effect size method. Measures were taken at 1, 3 and 6 months post-stroke, during which time participants received conventional strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. rehabilitation. The JHFT demonstrated moderate responsivenessThe ability of an instrument to detect clinically important change over time.
from 1 to 3 months post-stroke (ES=0.69) and from 3 to 6 months post-stroke (ES=0.73).

References

Beebe, J.A. & Lang, C.E. (2007). Relating movement control at 9 upper extremity segments to loss of hand function in people with chronic hemiparesis. Neurorehabilitation and Neural Repair, 21(3), 279 – 291.
Beebe, J.A. & Lang, C.E. (2009). Relationships and responsiveness of six upper extremity function tests during the first six months of recovery after stroke. Journal of Neurologic Physical Therapy, 33(2), 96-103.
Bovend’Erdt, T.J.H., Dawes, H., Johansen-Berg, H., & Wade, D.T. (2004). Evaluation of the Modified Jebsen Test of Hand Function and the University of Maryland Arm Questionnaire for Stroke. Clinical Rehabilitation, 18, 195-202
Celnik, P., Hummel, F., Harris-Love, M., Wolk, R., & Cohen, L. (2007). Somatosensory stimulation enhances the effects of training functional hand tasks in patients with chronic stroke. Archives of Physical Medicine and Rehabilitation, 88, 1369-76.
Cook, C., McCluskey, A., & Bowman, J. (2006). Jebsen Test of Hand Function. Penrith South, NSW: University of Western Sydney. Retrieved from http://www.maa.nsw.gov.au/default.aspx?MenuID=376
Duncan, P., Richards, L., Wallace, D., Stoker-Yates, J., Pohl, P., Luchies, C., Ogle, A., & Studenski, S. (1998). A randomized, controlled pilot study of a home-based exercise program for individuals with mild and moderate stroke. Stroke, 1998(29), 2055-2060.
Ferreiro, K.N., dos Santos, R.L., & Conforto, A.B. (2010). Pyschometric properties of the Portuguese version of the Jebsen-Taylor test for adults with mild hemiparesis. Revista Brasileira de Fisioterapia (Brazilian Journal of Physiotherapy), 14(5), 377-81.
Jebsen, R.H., Taylor, N., Trieschmann, R.B., Trotter, M.J., & Howard, L.A. (1969). An objective and standardized test of hand function. Archives of Physical Medicine and Rehabilitation, 50(6), 311 – 319.
Hummel, F., Celnik, P., Giraux, P., Floel, A., Wu, W., Gerloff, C., & Cohen, L. (2005). Effects of non-invasive cortical stimulation on skilled motor function in chronic stroke. Brain, 2005(128), 490-9.
Poole, J. (2003). Measures of Adult Hand Function: Arthritis Hand Function Test (AHFT), Grip Ability Test (GAT), Jebsen Test of Hand Function, and The Rheumatoid Hand Functional Disability Scale (The Duruöz Hand Index [DHI]). Arthritis and Rhematism (Arthritis Care and Research), 49(5S), S59-66.
Spinal Cord Injury Rehabilitation Evidence. (2010). Jebsen Hand Function Test. Retrieved from http://www.scireproject.com/outcome-measures/jebsen-hand-function-test
Wu, C., Seo, H., & Cohen, L. (2006). Influence of electric somatosensory stimulation on paretic-hand function in chronic stroke. Archives of Physical Medicine and Rehabilitation, 87, 351-7.

See the measure

How to obtain the JHFT?

Administration instructions are published in Jebsen, R.H., Taylor, N., Trieschmann, R.B., Trotter, M.J., & Howard, L.A. (1969). An objective and standardized test of hand function. Archives of Physical Medicine and Rehabilitation, 50(6), 311 – 319.

While the JHFT does not require standardized equipment, assessment kits can be purchased from:

Performance Health (https://www.performancehealth.com/jamar-hand-function-test)
Mobility Smart (https://www.mobilitysmart.co.uk/jebsen-taylor-hand-function-test-kit.html)
Amazon (amazon.com)

Leeds Adult Spasticity Impact Scale (LASIS)

Evidence Reviewed as of before: 13-06-2012

Author(s)*: Annabel McDermott, OT

Editor(s): Nicol Korner-Bitensky, PhD OT

Purpose

The Leeds Adult Spasticity Impact Scale (LASIS) is a measure of passive arm function, suitable for patients with spasticity and little or no active movement of the upper extremity.

In-Depth Review

Purpose of the measure

The Leeds Adult SpasticityInvoluntary muscle tightness and stiffness that can occur after a stroke. It is characterized by exaggerated deep tendon reflexes that interfere with muscular activity, gait, movement, or speech. Spasticity can increase initially but wane down later on, after stroke.
Impact Scale (LASIS) is a measure of passive arm function that is administered by semi-structured interview to the patient or carer. It consists of 12 items of low difficulty that evaluate performance of daily functional tasks in the individual’s normal environment. The LASIS is useful for patients with minimal or no active movement or function but with self-care issues of the upper extremity (Ashford et al., 2008).

Available versions

The LASIS was originally published as the Patient Disability and Carer Burden Scale by Bhakta et al. (1996), which included 8 patient items and 4 carer items (Bhakta et al., 2000). The four carer items have been excluded from the current version of the LASIS.

Features of the measure

Items:

The LASIS consists of 12 items that measure passive and low-level active function.

Passive function items:

Cleaning the palm (affected hand)*
Cutting fingernails (affected hand)*
Cleaning the affected elbow*
Cleaning the affected armpit*
Cleaning the unaffected elbow*
Putting arm through coat sleeve*
Difficulty putting on a glove
Difficulty rolling over in bed
Doing physiotherapy exercises to arm*

Active function items:

Difficulty balancing in standing*
Difficulty balancing when walking*
Hold object steady, use other hand (jar)

* Items originally included in the Patient Disability and Carer Burden Rating Scale (Bhakta et al., 2000).

Scoring:

Items are rated between 0 – 4 according to the following criteria:

0 = No difficulty
1 = Little difficulty
2 = Moderate difficulty
3 = A great deal of difficulty
4 = Inability to perform the activity

The total score is calculated as the sum of individual scores, divided by the total number of questions answered. This results in a total score between 0 – 4 that represent disability or carer burden (Ashford et al., 2008).

Note: As the final score does not rely on responses to all 12 items, it may not reflect actual disability or function in the arm (Ashford et al., 2008).

Description of tasks:

The LASIS is administered through semi-structured interview with the patient or carer, with regard to the patient’s performance of tasks over the past 7 days.

Time:

The LASIS takes approximately 10 minutes to administer (Ashford et al., 2008).

Training requirements:

The LASIS should be administered by a clinician (Ashford et al., 2008).

Equipment:

Equipment such as a jar may be required to validate responses.

Alternative form of the Leeds Adult Spasticity Impact Scale (LASIS)

None reported.

Client suitability

Can be used with:

Patients with spasticityInvoluntary muscle tightness and stiffness that can occur after a stroke. It is characterized by exaggerated deep tendon reflexes that interfere with muscular activity, gait, movement, or speech. Spasticity can increase initially but wane down later on, after stroke.
, including patients with stroke.

Should not be used with:

None reported.

Languages of the measure

No translations reported.

Summary

What does the tool measure?	Passive and low-level active function of the upper limb.
What types of clients can the tool be used for?	Patients with upper limb spasticity, including patients who have experienced a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment tool
Time to administer	10 minutes
Versions	The LASIS was originally published as the Patient Disability and Carer Burden Scale, which included four dressing and grooming items that have been excluded from the current version of the LASIS.
Other Languages	None reported
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: No studies have reported on the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the LASIS. Test-retest: No studies have reported on the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the LASIS. Intra-rater: No studies have reported on the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the LASIS. Inter-rater: No studies have reported on the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the LASIS.
Validity	Content: No studies have reported on the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension. of the LASIS. Criterion: Concurrent: No studies have reported on the concurrent validity of the LASIS. Predictive: No studies have reported on the predictive validity of the LASIS. Construct: Convergent/Discriminant: No studies have reported on the convergent/discriminant validity of the LASIS. Known Groups: No studies have reported on the known-groups validityThe degree to which an assessment measures what it is supposed to measure. of the LASIS.
Floor/Ceiling Effects	No studies have reported on the floor or ceiling effects of the LASIS.
Does the tool detect change in patients?	No studies have reported on the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." of the LASIS in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Acceptability	The LASIS is useful for patients with minimal or no active movement or function of the upper extremity.
Feasibility	Administrative burden due to calculation of total score, but not complex.
How to obtain the tool?	Further information can be found here.

Psychometric Properties

Overview

A literature search was conducted to identify all relevant publications on the psychometric properties of the Leeds Adult SpasticityInvoluntary muscle tightness and stiffness that can occur after a stroke. It is characterized by exaggerated deep tendon reflexes that interfere with muscular activity, gait, movement, or speech. Spasticity can increase initially but wane down later on, after stroke.
Impact Scale (LASIS). At the time of publication no studies have reported on the psychometric properties of the LASIS in the strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.

Floor/Ceiling Effects

While no studies have investigated the floor or ceiling effects of the LASIS when used with a stroke population, it ca be anticipated that ceiling effects may exist when the LASIS is used with high-functioning patients, due to the hierarchical relationship of items (Ashford et al., 2008).

Reliability

Validity

Content:

No studies have reported on the content validity of the LASIS.

Criterion:

Predictive:
No studies have reported on the predictive validity of the LASIS.

Construct:

Convergent/Discriminant:
No studies have reported on the convergent/discriminant validityThe degree to which an assessment measures what it is supposed to measure.
of the LASIS.

Known Group:
No studies have reported on the known-groups validity of the LASIS.

Responsiveness

No studies have reported on the responsivenessThe ability of an instrument to detect clinically important change over time.
of the LASIS.

SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
/Specificity:
No studies have reported on the sensitivity or the specificity of the LASIS.

References

Ashford, S., Slade, M., Malaprade, F., & Turner-Stokes, L. (2008). Evaluation of functional outcome measures for the hemiparetic upper limb: A systematic review. Journal of Rehabilitation Medicine, 40, 787-95.
Bhakta, B.B., Cozens, J.A., Chamberlain, M.A., & Bamford, J.M. (2000). Impact of botulinum toxin type A on disability and carer burden due to arm spasticity after stroke: a randomised double blind placebo controlled trial. Journal of Neurological Neurosurgery and Psychiatry, 69, 217-21.

See the measure

How to obtain the LASIS?

Further information can be found here.

Motor Activity Log (MAL)

Evidence Reviewed as of before: 28-03-2019

Author(s)*: Annabel McDermott, OT

Content consistency: Gabriel Plumier

Purpose

The Motor Activity Log (MAL) is a subjective measure of an individual’s real life functional upper limb performance. The MAL is administered by semi-structured interview to determine (a) how much, and (b) how well the individual uses his upper limb in his own home (Ashford et al., 2008, Li et al., 2012; Simpson & Eng, 2013).

In-Depth Review

Purpose of the measure

The Motor Activity Log (MAL) was developed by Taub et al. (1993) as a subjective outcome measure of an individual’s real life functional upper limb performance. The MAL is administered by semi-structured interview to determine (a) how much (Amount of Use – AOU), and (b) how well the individual uses his upper limb (Quality of Movement – QOM) in his own home (Ashford et al., 2008, Li et al., 2012; Simpson & Eng, 2013).

Available versions

There are four versions of the original MAL-30, according to number of items.

MAL-14: Contains unilateral and simple items, to detect change in individuals with limited arm function.
MAL-26: Contains the same items as the MAL-14 as well as 11 additional items and 1 optional item chosen by the patient; this version includes some bilateral tasks.
MAL-28: Contains the same items as the MAL-14 and MAL-26, and additional items that challenge reach and strength.
MAL-12: A short version of the MAL-28 (Ashford et al., 2008).

Other adaptations of the MAL include:

Graded Motor Activity Log (Morera Silva et al., 2018)
Lower-Functioning Motor Activity Log (LF-MAL)
Lower-Extremity Motor Activity Log
Pediatric Motor Activity Log – Revised

Features of the measure

The MAL is comprised of two scales:

Amount of Use (AOU) scale – the amount the individual uses the paretic arm; and
Quality of Movement (QOM) scale – the patient’s perceived quality of movement while performing the functional activity (Ashford et al., 2008).

The MAL-QOM scale captures components of amount of arm use and has been shown to be more reliable than the MAL-AOU scale, and as such can be used independently (Uswatte & Taub, 2005).

Items:

Items the original MAL-30

Turn on a light with a light switch
Open drawer
Remove an item from a drawer
Pick up phone
Wipe off a kitchen counter or other surface
Get out of a car
Open refrigerator
Open a door by turning a door knob/handle
Use a TV remote control
Wash your hands
Turning water on/off with knob/lever on faucet
Dry your hands
Put on your socks
Take off your socks
Put on your shoes
Take off your shoes
Get up from a chair with armrests
Pull chair away from table before sitting down
Pull chair toward table after sitting down
Pick up a glass, bottle, drinking cup, or can
Brush your teeth
Put on makeup base, lotion, or shaving cream on face
Use a key to unlock a door
Write on paper
Carry an object in your hand
Use a fork or a spoon for eating
Comb your hair
Pick up a cup by a handle
Button a shirt
Eat half a sandwich or finger foods

Additional Items for the MAL-45

Removing bills from a wallet
Taking individual coins out of a pocket or purse
Removing keys out of a pocket or purse
Using a zipper pull
Pouring liquid from a bottle
Buckling a belt
Popping top of beverage can
Removing top from a medicine bottle
Keypad press
Use of keyboard/computer
Putting on or taking off watch band
Putting on glasses
Pumping a soap dispenser
Swiping a credit card or a card for an ATM
Adjusting a home or hotel air conditioner or heat

Items of the MAL-12:

Pick up phone
Open a door by turning a door knob
Eat half a sandwich or finger food
Turn water on/off with faucet
Pick up a glass
Pick up toothbrush and brush teeth
Use a key to open a door
Letter writing/typing
Use removeable computer storage
Pick up fork or spoon, use for eating
Pick up cup by handle
Carry an object from place to place

Items of the MAL-14:

Putting arm through coat sleeve
Steady myself while standing
Carry an object from place to place
Pick up fork or spoon, use for eating
Comb hair
Pick up cup by handle
Hand craft/card playing
Hold a book for reading
Use towel to dry face or other body part
Pick up a glass
Pick up toothbrush and brush teeth
Shaving/makeup
Use a key to open a door
Letter writing/typing

The MAL-26 includes the 14 items from the MAL-14 as well as the following items:

Pour coffee/tea
Peel fruit/potatoes
Dial number on the phone
Open/close a window
Open an envelope
Take money out of a wallet or purse
Undo buttons on clothing
Buttons on clothing
Undo a zip
Do up a zip
Cut fingernails (affected hand)
Other optional activity

Items of the MAL-28:

Turn on a light with a light switch
Open a drawer
Remove item of clothing from drawer
Pick up phone
Wipe kitchen counter
Get out of car
Open refrigerator
Open a door by turning a door knob
Use a TV remote control
Wash your hands
Turn water on/off with faucet
Dry your hands
Put on your socks
Take off your socks
Put on your shoes
Take off your shoes
Get up from chair with armrests
Pull chair away from table before sitting
Pull chair toward table after sitting
Pick up a glass
Pick up toothbrush and brush teeth
Use a key to unlock a door
Steady self while standing
Carry an object from place to place
Comb hair
Pick up cup by handle
Buttons on clothing (shirt, trousers)
Eat half a sandwich or finger food

For each item, the individual is asked whether he/she attempted the activity in the past 7 days, and the relevant score is assigned according to his/her response. The examiner can verify the response by paraphrasing it back to the individual (Uswatte & Taub, 2005). The MAL can also be used with caregivers.

Scoring:

The MAL is administered by semi-structured interview and items are scored by patients according to their performance of each task over the past 7 days; the MAL-28 can also be used to score performance over the past 3 days (Ashford et al., 2008; Uswatte & Taub, 2005).

The MAL adopts a 6-point ordinal scale, although patients can attribute a half-score, resulting in 11-point Likert scales with specified anchoring definitions at 6 points (Uswatte & Taub, 2005):

Amount of Use scale scoring:

0: Never – The weaker arm was not used at all for that activity.
1: Very rarely – Occasionally used the weaker arm, but only very rarely.
2: Rarely – Sometimes used the weaker arm but did the activity most of the time with the stronger arm.
3: Half pre-stroke – Used the weaker arm about half as much as before the stroke.
4: Three quarters pre-stroke – Used the weaker arm almost as much as before the stroke.
5: Same – Used the weaker arm as often as before the stroke.

Quality of Movement scale scoring:

0: Never – The weaker arm was not used at all for that activity.
1: Very rarely – The weaker arm was moved during the activity but was not very helpful.
2: Rarely – The weaker arm was of some use during the activity but needed some help from the stronger arm but moved very slowly or with difficulty.
3: Fair – The weaker arm was used for that activity, but the movements were slow or were made only with some effort.
4: Almost normal – The movements made by the weaker arm for the activity were almost normal but not quite as fast or accurate as normal
5: Normal – The ability to use the weaker arm for that activity was as good as before the strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Scale total scores (summary scores) are the mean of the item scores.

What to consider before beginning:

The MAL is subject to experimenter bias and also the patient’s ability to accurately recall upper limb use (Page & Levine, 2003; Uswatte & Taub, 2005).

Ashford et al. (2008) noted an inadequate relationship between overall/item scores and the qualitative meaning, and an unclear Minimal clinically important difference.

Taub & Uswatte (2000) discuss the use of the MAL as an outcome measure in Constraint-Induced Movement Therapy (CIMT) research and recommend an upper cut-off score of 2.5 on the MAL-AOU, as the effect of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. can impose an upper physiological limit on the amount of improvement that can be produced. The authors also note that individuals who score > 2.5 do not demonstrate learned non-use, which is the aim of CIMT.

Time:

All versions of the MAL are administered through structured interview with the patient and/or carer and require more than 10 minutes to administer. (Ashford et al., 2008).

Training requirements:

The MAL can be administered by health professionals who have reviewed the manual and literature.

Equipment:

Survey instrument and pencil.

Client suitability

Can be used with:

The MAL is suitable for use with adults and elderly adults following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and their caregivers. It is suitable for use in the subacute and chronic stages of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. recovery.

Should not be used in:

Not specified.

The MAL is often used to measure outcomes following constraint induced movement therapy (Li et al., 2012; Page, 2003). The MAL is commonly used in research in conjunction wi with the Wolf Motor Function Test, Fugl-Meyer Assessment or the Action Research Arm Test (Santisteban et al., 2016; Simpson & Eng, 2013).

In what languages is the measure available?

Brazilian-Portuguese (Saliba et al., 2011)
English
German (Khan et al., 2013)
Portuguese (Pereira et al., 2011)
Turkish translation and cultural adaptation (Cakar et al., 2010).

Summary

What does the tool measure?	Real life upper limb performance.
What types of clients can the tool be used for?	Individuals following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and their caregivers.
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
What domain of the ICF does this measure?	Activity/participation
Time to administer	20 minutes
Versions	MAL-30 MAL-28 MAL-26 MAL-14 MAL-12 Graded Motor Activity Log Lower-Functioning Motor Activity Log (LF-MAL) Lower-Extremity Motor Activity Log Pediatric Motor Activity Log – Revised
Other Languages	Brazilian-Portuguese, English, German, Portuguese, Turkish.
Measurement Properties
Reliability	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: – MAL-14: Two studies reported excellent internal consistency. – MAL: One study reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.; one study reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. among patients with mild-moderate hemiparesis and adequate to excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. among patients with severe hemiparesis. – MAL-28 (Turkish): One study reported excellent internal consistency. – MAL-30 (German): One study reported excellent internal consistency. – Grade 4/5 MAL: One study reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.. Test-retest: – MAL-14: One study reported excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). ; one study reported adequate to excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). . – MAL: One study reported excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). ; one study reported adequate to excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). . – MAL-28 (Turkish): One study reported excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). . – MAL-28 (Brazilian): One study reported excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). . – MAL-45: One study reported excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). . – Grade 4/5 MAL: One study reported excellent test-retest reliability. Intra-rater: No studies have reported on the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the MAL. Inter-rater: MAL-14: One study reported adequate inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. .
ValidityThe degree to which an assessment measures what it is supposed to measure.	Content: No studies have reported on content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension. of the MAL. Criterion: Concurrent: – MAL-14: One study reported excellent correlations with accelerometry. – MAL: Three studies reported an excellent correlation with SIS – Hand function domain; adequate correlations with the BBT, ARAT, FAI; poor to adequate correlations with SIS, SS-QOL, NEADL; and poor correlations with the Nine Hole Peg Test. – MAL-30 (German): One study reported excellent negative correlations with WMFT-PT; excellent correlations with WMFT-FA and Grip strength scores, CMSA – Arm and Hand scores, isometric strength. – MAL-45: 1 study reported excellent correlations with the Abilhand. Predictive: No studies have reported on predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the MAL. Construct: – MAL-14: One study reported excellent correlations between QOM and AOU patient/carer change scores; one study reported an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between AOU and QOM scales. – MAL: One study reported an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between AOU and QOM scales; one study reported an adequate correlation between AOU and QOM scales; one study conducted item analysis and removed two items due to low item-total correlations and reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest . coefficients; one study conducted item fit analysis and principal component analysis. – MAL (Brazilian): One study reported an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between AOU and QOM scales. – MAL-30 (German): One study reported excellent correlations between AOU and QOM scales. – MAL-28 (Turkish): One study reported an excellent correlation between AOU and QOM scales. – LF-MAL: One study reported an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between the AOU and QOM scales. Convergent/Discriminant: – MAL-14: Three studies reported excellent correlations with ARAT, accelerometry, Simple Test for Evaluating Hand Function (STEF). – MAL: Seven studies reported excellent correlations with Actual Amount of Use Test, WMFT; adequate to excellent correlations with accelerometry ratios, SIS 2.0 – Hand function scale, FMA-UE; adequate correlations with ARAT, Motor Assessment Scale – Upper Extremity, 16 Hole Peg Test, grip strength; SF-36 – Physical domain; poor to adequate correlations with accelerometry ratios of the less affected arm; poor correlations with the SIS 2.0 – Mobility scale. – MAL-28 (Turkish): One study reported excellent correlations with WMFT-FA; adequate negative correlations with the WMFT-PT. – MAL (Brazilian): One study reported adequate correlations with grip strength of the more affected arm. Known Group: MAL: One study reported correlations with accelerometry was stronger among patients with paresis of the dominant arm vs. the non-dominant arm.
Floor/Ceiling Effects	– Floor effects are evident when detecting change in lower level and passive functional tasks. – One study found modest floor effects when the MAL-28 was administered to patients with upper extremity motor recovery at Brunnstrom stage III and higher; and modest floor effects when the LF-MAL was administered to patients with upper extremity motor recovery at Brunnstrom stage III and lower.
Does the tool detect change in patients?	The MAL can be used to detect change
Acceptability	The MAL reflects real life functional performance. It is simple and non-invasive to administer.
Feasibility	The MAL is a free tool that requires no additional equipment. It can be administered in the clinical setting or the patient’s home. No additional training is required.
How to obtain the tool?	Click here to see the Motor Activity Log manual.

Psychometric Properties

Overview

A literature search was conducted to identify all relevant publications on the psychometric properties of the MAL. Twenty-six studies were identified, most of which included patients in the chronic phase of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. recovery. This review includes different versions of the MAL – the original MAL-30, MAL-28, MAL-14, MAL-45, LF-MAL, Grade 4/5 MAL and Turkish, Brazilian and German versions.

Floor/Ceiling Effects

Chuang et al. (2017) examined floor/ceiling effects of the 30-item MAL in a sample of 403 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The MAL was administered to patients with motor recovery of the proximal and distal upper limb at Brunnstrom stage III and higher. Results showed modest floor effects within this cohort, whereby 17.3% of participants received minimum scores on the MAL.

Chuang et al. (2017) examined floor/ceiling effects of the LF-MAL in a sample of 134 patients with chronic stroke. The LF-MAL was administered to patients with motor recovery of the proximal and distal upper limb at Brunnstrom stage III and lower. Results showed modest floor effects within this cohort, whereby 16.4% of participants received minimum scores on the LF-MAL.

Reliability

Uswatte et al. (2005b) examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MAL-14 in a sample of 41 patients with chronic stroke and their caregivers, using Cronbach’s alpha. Correlation among items was excellent for patients’ MAL-QOM (a = 0.87) and caregivers’ MAL-AOU and MAL-QOM (a > 0.83). The authors also examined internal consistency of the MAL-14 (QOM scale only) in a sample of 27 patients with chronic stroke. CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
among items was excellent for the MAL-QOM (a = 0.81).

Uswatte et al. (2006b) examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MAL-28 in a sample of 222 patients with subacute/chronic stroke and their caregivers, using Cronbach’s alpha. Responses from both patient and caregiver groups showed excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
among items for the MAL-AOU (patients a = 0.94; caregivers a = 0.95) and the MAL-QOM (patients a = 0.94; caregivers a = 0.95).

Huseyinsinoglu et al. (2011) examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MAL-28 (Turkish version) in a sample of 30 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Cronbach’s alpha. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. was excellent for the MAL-AOU (a = 0.96) and MAL-QOM (a = 0.96).

Khan et al. (2013) examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MAL-30 (German version) in a sample of 42 patients with acute to chronic stroke, using Cronbach’s alpha. Measures were taken at baseline, discharge from rehabilitation and at 6-month follow-up. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. for the MAL-AOU and MAL-QOM were excellent at all timepoints (a = 0.98-0.995). The authors also calculated internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. based on an elimination procedure of items that scored “N/A” down to 26 items and reported that internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. remained high at all timepoints (a = 0.94-0.98).

Taub et al. (2013) reported on internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the Grade 4/5 MAL, referencing unpublished data from Morris (2009) that used a sample of 30 individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Cronbach’s alpha. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. for the Grade 4/5 MAL was excellent (a = 0.95).

Chuang et al. (2017) examined the 6-point rating system of the MAL and found rater difficulty discriminating among the 6 levels of functional ability. Results showed that 15 items of the MAL-AOU and MAL-QOM displayed disordering of step difficulty. Accordingly, the 6 levels were collapsed into 4 levels to restore reversed threshold (0 = 0; 1-2 = 1; 3-4 = 2; 5 = 3); using the 4-point system 9 items still showed disordered ordering, so the levels were further collapsed into 2 categories (0 = 0; 1 to 3 = 1), at which point all items exhibited ordering. The authors examined unidimensionality of the 30-item MAL in a sample of 403 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using the revised scoring system. Item fit analysis of the MAL revealed that 7 items* of the MAL-AOU and MAL-QOM were a poor fit and were removed. Principal component analysis (PCA) of the remaining 23 items showed that Rasch measures accounted for 76% of the variance for both the MAL-AOU and MAL-QOM, with an eigenvalue of the first residual factor of 2.7. This indicates that the 23 items constitute unidimensional constructs. The authors examined reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the revised MAL (23 items, 4-point rating system), using Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
. With Pearson separation values of 2.4 and 2.6 for the MAL-AOU and MAL-QOM respectively, the revised version was sensitive to distinguish among 3 strata of upper limb performance. Pearson reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
coefficients were 0.85 and 0.87 (respectively), suggesting good reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
. Results showed no Differential Item Functioning (DIF) items across age, gender or hand dominance. Item difficulty hierarchy was consistent with clinical expectation, however items were more difficult than individuals’ ability, suggesting unsuitable targeting for the participants of this sample.

* Misfit items: (6) Get out of car; (12) Dry your hands; (18) Pull a chair away from the table before sitting down; (19) Pull chair toward table after sitting down; (21) Brush your teeth; (24) Write on paper; (29) Button a shirt.

Chuang et al. (2017) examined the 6-point rating system of the LF-MAL and found disordered thresholds; accordingly, the 6 levels were collapsed into 3 levels to restore reversed threshold (0 = 0; 1-3 = 1; 4-5 = 2); this 3-point rating system achieved step ordering. The authors examined unidimensionality of the LF-MAL in a sample of 134 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using the revised 3-point scoring system. Item fit analysis of the LF-MAL-AOU revealed that 6 items were out of the acceptable range; PCA of the remaining 24 items showed that the Rasch dimension explained 70.5% of the variance, with an eigenvalue of 2.6 of the first residual factor. Item fit analysis of the LF-MAL-QOM revealed that 7 items were out of the acceptable range; PCA of the remaining 23 items showed that the Rasch dimension explained 71.0% of the variance, with an eigenvalue of the first residual factor of 2.5. The authors examined reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the revised LF-MAL (25 items, 3-point rating system), using Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
. With Pearson separation values of 1.9 for both the LF-MAL-AOU and LF-MAL-QOM, the revised version was sensitive to distinguish 2 strata of upper limb performance. Pearson reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
coefficients were 0.79 for both the LF-MAL-AOU and LF-MAL-QOM, indicating acceptable reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
. Results showed no DIF items across age, gender or hand dominance. Item difficulty hierarchy was consistent with clinical expectation, however items were more difficult than individuals’ ability, suggesting unsuitable targeting for the participants of this sample.

* Misfit items: (5) Wipe off a kitchen counter or another surface; (6) Get out of a car; (7) Open a refrigerator; (19) Apply soap to your body while bathing (LF-MAL-QOM only); (21) Brush your teeth; (23) Steady yourself while standing; (24) Carry an object in your hand.

Moreira Silva et al. (2018) examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MAL-30 in a sample of 66 individuals with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Cronbach’s alpha. Participants were classified according to upper extremity motor function using the Fugl-Meyer Assessment – Upper Extremity (FMA-UE): mild to moderate hemiparesis (FMA-UE ≥ 32, n = 49) or severe hemiparesis (FMA-UE ≤31, n = 17). Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MAL-AOU and MAL-QOM was excellent among participants with mild-moderate hemiparesis (a = 0.95), and adequate to excellent among participants with severe hemiparesis (MAL-AOU: a = 0.79; MAL-QOM: a = 0.89). Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
was used to further evaluate reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the MAL-30. Item calibration of the MAL-AOU and MAL-QOM revealed one misfit (#19: Pull a chair toward table after sitting down). Item separation index of the MAL-AOU and MAL-QOM was 2.92 and 2.59 (respectively) suggesting 5 levels of difficulty for the MAL-AOU and 4 levels of difficulty for the MAL-QOM. Pearson separation index of the MAL-AOU and MAL-QOM was 2.62 and 2.58 (respectively), suggesting 4 ability levels for both the MAL-AOU and the MAL-QOM.

Test-retest:
Miltner et al. (1999) examined test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the MAL in a sample of 15 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Measures were taken within a 2-week interval before participants began constraint-induced movement therapy. Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was excellent (r = 0.98).

Johnson et al. (2003) examined test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the MAL-45 in a sample of 12 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Pearson’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. Measures were taken within a 3-week interval. Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was excellent for the MAL-AOU (r=0.96) and MAL-QOM (r = 0.99).

van der Lee et al. (2004) examined test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the MAL-14 in a sample of 56 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using the Bland and Altman method. Measures were taken within a 2-week interval before participants commenced an intervention program. Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was excellent for the for MAL-AOU (r = 0.70 to 0.85) and the MAL-QOM (r = 0.61 to 0.71).

Uswatte et al. (2005b) examined test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the MAL-14 in a sample of 41 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and their caregivers, using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was excellent for patient MAL-QOM scores (r = 0.91), and adequate for patient MAL-AOU scores (r = 0.44), and caregiver MAL-AOU and MAL-QOM scores (r = 0.61, r = 0.50 respectively).

Uswatte et al. (2006b) examined 2-week test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the MAL-30 in a sample of 116 patients with subacute/chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and their caregivers, using Intra Class Coefficients (ICC). Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
for the MAL-AOU and MAL-QOM was excellent among patients (ICC = 0.79, ICC = 0.82, respectively), and adequate among caregivers (ICC = 0.66, ICC = 0.72, respectively). There was a trend toward an increase from test 1 to test 2 among both patients and caregivers (patient MAL-AOU: 0.3 ± 0.6, p = 0.04; patient MAL-QOM: 0.3 ± 0.5, p = 0.02; caregiver MAL-AOU: 0.4 ± 0.7, p = 0.05; caregiver MAL-QOM: 0.4 ± 0.7, p = 0.02), although increases were less than the minimal clinically important difference (< 0.5 points).

Huseyinsinoglu et al. (2011) examined 3-day test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the MAL-28 (Turkish version) in a sample of 30 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using intraclass coefficients (ICC) and Spearman correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was excellent for the MAL-AOU (ICC = 0.97, r = 0.94) and the MAL-QOM (ICC = 0.96, r = 0.93).

Saliba et al. (2011) examined test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the MAL (Brazilian version), using intra-class correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients (ICC). Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
for the MAL-AOU and MAL-QOM was excellent (ICC = 0.98).

Taub et al. (2013) reported on test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the Grade 4/5 MAL, referencing unpublished data from Morris (2009) that used a sample of 10 individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
for the Grade 4/5 MAL was excellent (r = 0.95).

Inter-rater:
Uswatte et al. (2005b) examined inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the MAL-14 in a sample of 41 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and their caregivers using Intra Class Coefficients (ICC). Participants received Constraint-Induced Movement Therapy (CIMT) or time-matched general fitness rehabilitation for two weeks. ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
between patient and carer pre-treatment scores was adequate (ICC = 0.52, p < 0.01); reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
between patient and carer change scores following treatment was adequate (ICC = 0.7, p < 0.0001).

Validity

Content:

No studies have reported on content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension.
of the MAL.

Criterion:

Concurrent:
Johnson et al. (2003) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the MAL-45 in a sample of 12 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. by comparison with the Abilhand, using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Correlations with the Abilhand were excellent for the MAL-AOU (r = 0.71, p < 0.05) and MAL-QOM (r = 0.88, p < 0.05).

Uswatte et al. (2005b) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the MAL-14 (QOM scale only) in a sample of 27 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. by comparison with accelerometry of the affected arm, using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Correlations between the MAL-QOM and accelerometer recordings at pre-treatment (r = 0.70, p < 0.05) were excellent. Correlations between MAL-QOM change scores from pre-treatment to post-treatment and corresponding change scores on accelerometer readings were also excellent (r = 0.91, p < 0.01).

Lin et al. (2010b) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the MAL-30 by comparison with the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale 3.0 (SIS) and the Stroke-Specific Quality of Life Scale (SS-QOL), using Spearman rank correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n = 74) were randomized to receive distributed constraint-induced movement therapy, bilateral arm training or neurodevelopmental therapy, and measures were taken at baseline and post-treatment (3 weeks). There were significant poor to adequate correlations between the MAL-AOU and most SIS domains at baseline (r = 0.24-0.58) and post-treatment (r = 0.24-0.59). There were significant excellent correlations between the MAL-QOM and the SIS – Hand function domain at baseline (r = 0.65) and post-treatment (r = 0.68), and significant poor to adequate correlations between the MAL-QOM and most other SIS domains at baseline (r = 0.26-0.52) and post-treatment (r = 0.28-0.51). There were significant correlations between the MAL-AOU and some SS-QOL domains at baseline (r = 0.25-0.37) and post-treatment (r = 0.24-0.35), and between the MAL-QOM and some SS-QOL domains at baseline (r = 0.28-0.38) and post-treatment (r = 0.26-0.39).

Wu et al. (2011) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the MAL-30 in a sample of 77 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. by comparison with a modified version of the Nottingham Extended ADL Scale (NEADL) and the Frenchay ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Index (FAI), using Spearman rank correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Measures were taken at pre-treatment and 3 weeks later at post-treatment. Correlations with the NEADL were poor to adequate (MAL-AOU: r = 0.3; MAL-QOM: r = 0.2-0.3). Correlations with the FAI were adequate (MAL-AOU: r = 0.3-0.4); MAL-QOM: r = 0.3).

Khan et al. (2013) examined cross-sectional concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the MAL-30 (German version) by comparison with the Wolf Motor Function Test (WMFT) – Time and Functional ability subtests, the Chedoke McMaster StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Assessment (CMSA) – Arm and Hand subtests, the grip strength scale, and isometric strength measured by handheld dynamometer (mean of shoulder and elbow flexion and extension), using Spearman’s rank correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Patients with acute to chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n = 42) received inpatient rehabilitation and measures were taken at baseline; discharge from hospital and at 6-month follow-up. Significant negative correlations were seen with the WMFT – Time scores (MAL-AOU r = -0.747 – -0.878; MAL-QOM r = -0.770 – -0.901). Correlations were excellent at all time points with the WMFT – Functional ability (MAL-AOU r = 0.769 – 0.808, MAL-QOM r = 0.789 – 0.837), the CSMA – Arm (MAL-AOU r = 0.680 – 0.765; MAL-QOM r = 0.691 – 0.798) and CSMA – Hand (MAL-AOU r = 0.692 – 0.801; MAL-QOM r = 0.717 – 0.803), grip strength (MAL-AOU r = 0.698 – 0.716; MAL-QOM r = 0.659-.0733) and isometric strength (MAL-AOU r = 0.643-0.719; MAL-QOM r = 0.714-0.726).

Predictive:
No studies have examined predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the MAL.

Construct:

Uswatte et al. (2006b) conducted item analysis of the original MAL-30 using item-total correlations, reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
and proportion of missing data (with an a priori cut-off of 20%) in a sample of 222 patients with subacute/chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and their caregivers. Of the 30 items, 25 items were completed by > 80% of caregivers and 28 items were completed by > 80% of patients; analysis of these 28 items indicated item-total correlations > 0.5 for 92% of items, and reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
coefficients > 0.5 for 89% of items. The remaining 2 items (write on paper: 48% missing data; put makeup/shaving cream on face: 20% missing data) showed lower item-total correlations and reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
coefficients and were dropped accordingly.

van der Lee et al. (2004) examined construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the MAL-14 in a sample of 56 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Spearman’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. There was an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the MAL-AOU and MAL-QOM (r = 0.95, p < 0.001).

Uswatte et al. (2005b) examined construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the MAL-14 (QOM scale only) in a sample of 27 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. by comparison with patient/caregiver MAL-AOU scores, using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Correlations were excellent between MAL-QOM change scores from pre-treatment to post-treatment and corresponding change scores in patient MAL-AOU (r = 0.80, p < 0.01), carer MAL-AOU (r = 0.73, p < 0.01) and carer MAL-QOM (r = 0.70, p < 0.01).

Uswatte et al. (2006a) examined construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the MAL-30 in a sample of 169 individuals with subacute/chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. There was an excellent correlation between the MAL-AOU and MAL-QOM (r = 0.92, p < 0.001).

Huseyinsinoglu et al. (2011) examined construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the MAL-28 (Turkish version) in a sample of 30 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Spearman’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. The correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the MAL-AOU and the MAL-QOM was excellent (r = 0.95).

Saliba et al. (2011) examined construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the MAL (Brazilian version) in a sample of 77 individuals with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
. There was an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the MAL-AOU and the MAL-QOM (r = 0.97, p < 0.0001).

Khan et al. (2013) examined construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the MAL-30 (German version), using Spearman’s rank correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Patients with acute to chronic stroke (n = 42) received inpatient rehabilitation and measures were taken at baseline, discharge from hospital and at 6-month follow-up. There was an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the MAL-AOU and MAL-QOM at all timepoints (r = 0.994, 0.982, 0.980).

Chuang et al. (2017) examined construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the MAL-30 in a sample of 403 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. with motor recovery of the proximal and distal upper limb at Brunnstrom stage III and higher, using Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
. CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the MAL-AOU and MAL-QOM was adequate (r = 0.603), indicating that the subscales are not highly correlated and can be perceived as different concepts.

Chuang et al. (2017) examined construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the LF-MAL in a sample of 134 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. with motor recovery of the proximal and distal upper limb at Brunnstrom stage III and lower, using Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
. CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the LF-MAL-AOU and LF-MAL-QOM was adequate (r = 0.607), indicating that the subscales are not highly correlated and can be perceived as different concepts.

Convergent/Discriminant:
van der Lee et al. (2004) examined cross-sectional convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MAL-14 by comparison with the Action Research Arm Test (ARAT) in a sample of 56 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Spearman’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. There were excellent correlations between the MAL-AOU and the ARAT (r = 0.63, p < 0.001) and between the MAL-QOM and the ARAT (r = 0.63, p < 0.001).

Uswatte et al. (2005a) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MAL-14 in a sample of 20 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. by comparison with accelerometry of the affected arm, using Spearman rank correlations. There was an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the MAL-14 and accelerometry (r = 0.74, p < 0.001).

Uswatte et al. (2006a) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MAL-30 (QOM scale only) in a sample of 169 patients with subacute/chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. by comparison with accelerometry of the affected arm and the Actual Amount of Use Test (AAUT), using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Correlations between the MAL-QOM and accelerometry ratios (ratio summary variable, impaired arm summary variable) were adequate (r = 0.52, r = 0.41 respectively, p < 0.001). The correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the MAL-QOM and AAUT was excellent (r = 0.94, p < 0.001).

Uswatte et al. (2006b) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MAL-30 in a sample of 222 patients with subacute/chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and their caregivers by comparison with accelerometry of the affected arm, and the SIS 2.0 – Hand function scale, using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Comparison of the MAL with accelerometry ratios showed adequate to excellent correlations for patient scores (MAL-AOU: r = 0.47; MAL-QOM: r = 0.52, p < 0.01), and adequate correlations for caregiver scores (MAL-AOU: r = 0.57; MAL-QOM, r = 0.61, p < 0.01). Comparison of the MAL and SIS – Hand function scores showed excellent correlations for patient scores (MAL-AOU: r = 0.68; MAL-QOM: r = 0.72, p < 0.01), and adequate correlations for caregiver scores (MAL-AOU: r = 0.35, MAL-QOM: r = 0.40, p < 0.01).

Uswatte et al. (2006b) examined divergent validityThe degree to which an assessment measures what it is supposed to measure.
of the MAL-30 in a sample of 222 patients with subacute/chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and their caregivers by comparison with accelerometry of the less affected arm, and the SIS 2.0 – Mobility scale, using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Comparison of the MAL with accelerometry ratios of the less affected arm showed poor correlations for patient scores (MAL-AOU: r = 0.14; MAL-QOM: r = 0.14, p > 0.05), and poor to adequate correlations for caregiver scores (MAL-AOU: r = 0.25; MAL-QOM, r = 0.23, p < 0.001). Comparison of the MAL and SIS – Mobility scores showed poor correlations for patient scores (MAL-AOU: r = 0.14; MAL-QOM: r = 0.14, p > 0.05), and poor correlations for caregiver scores (MAL-AOU: r = 0.10, MAL-QOM: r = 0.07, p > 0.05).

Hammer and Lindmark (2010) examined cross-sectional convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MAL-30 by comparison with the FMA-UE, ARAT, Motor Assessment Scale – Upper Extremity score (MAS-UE), 16-hole peg test (16HPT) and the Grippit ratio of isometric grip strength, using Spearman’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. Patients with subacute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n = 30) were randomized to receive forced use therapy or standard upper limb rehabilitation, and measures were taken at baseline, post-treatment (2 weeks) and follow-up (3 months). Correlations were significant and adequate with all measures: FMA-UE (r = 0.43-0.52); ARAT (r = 0.31-0.51); MAS-UE (r = 0.41-0.54); 16HPT (r = -0.41 – -0.67); Grippit (r = 0.41-0.53).

Huseyinsinoglu et al. (2011) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MAL-28 (Turkish version) by comparison with the WMFT – Performance Time (WMFT-PT) and – Functional Ability (WMFT-FA) scores in a sample of 30 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. There were excellent correlations with the WMFT-FA (MAL-AOU, r=0.63; MAL-QOM: r = 0.63), and adequate negative correlations with the WMFT-PT (MAL-AOU: r = -0.56; MAL-QOM: r = -0.55).

Saliba et al. (2011) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MAL (Brazilian version) by comparison with grip strength of the more severely affected upper limb in a sample of 77 individuals with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
. There were adequate correlations between grip strength and the MAL-AOU (r = 0.51, p < 0.0001) and the MAL-QOM (r =0 .57, p < 0.0001).

Sterr et al. (2014) examined divergent validityThe degree to which an assessment measures what it is supposed to measure.
of the MAL in a sample of 65 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. by comparison with the Short Form 36 (SF-36), StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale (SIS), Hospital Anxiety and DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale (HADS) and Visual Analog Mood Score (VAMS), using regression analysis. Participants received four different Constraint-Induced Movement Therapy (CIMT) treatment protocols that differed in intensity and use of a constraint. Following treatment there was a significant positive association between the MAL-AOU and the SF-36 Physical domain (r = 0.38m p = 0.025) and a trend towards a moderate association with the SIS Total score (r = 0.43, p = 0.061).

Shindo et al. (2015) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MAL-14 in a sample of 34 patients with acute/subacute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. by comparison with the Simple Test for Evaluating Hand Function (STEF), using Spearman’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. There was a significant and excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the assessments (MAL-AOU: r = 0.805; MAL-QOM: r = 0.768).

Simpson, Conroy & Beaver (2015) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MAL-28 in a sample of 9 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., by comparison with the FMA, Wolf Motor Function Test and StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale, using Spearman’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. There were excellent correlations between baseline MAL-AOU and FMA (ρ = 0.6889, p < 0.0132) and MAL-QOM and FMA (ρ = 0.7276, p < 0.0073).

Moreira Silva et al. (2018) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MAL-30 in a sample of 66 individuals with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. by comparison with the FMA-UE, using Spearman’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. There was a significant and excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the FMA-UE (MAL-AOU: r = 0.87; MAL-QOM: r = 0.87).

Chen et al. (2018) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MAL in a sample of 82 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. by comparison with accelerometry of the affected arm, using Pearson’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. There was an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with accelerometry (MAL-AOU: r = 0.47; MAL-QOM: r = 0.57).

Known Group:
Uswatte et al. (2006b) examined known-group validityThe degree to which an assessment measures what it is supposed to measure.
of the MAL in a sample of 222 patients with subacute/chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and their caregivers. Correlations between the MAL and accelerometry ratio was stronger among patients with paresis of their dominant arm (MAL-AOU: r = 0.56; MAL-QOM: r = 0.59) than among patients with paresis of the non-dominant arm (MAL-AOU: r = 0.28; MAL-QOM: r = 0.34).

Responsiveness

Taub et al. (1993) reported on Effect sizes (ES) of the MAL in a sample of 9 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants received two weeks of upper extremity restraint and measures were taken at baseline, post-treatment and follow-up (1 month, 2 years). Effect sizes were large from baseline to 1-month follow-up (2.80) and from baseline to 2-year follow-up (2.95).

Kunkel et al. (1999) reported on ES of the MAL in a sample of 5 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants received two weeks of Constraint-Induced Movement Therapy (CIMT) and measures were taken at baseline, post-treatment and follow-up (3 months). Effect sizes were large from baseline to post-treatment (MAL-AOU: 9.57; MAL-QOM: 3.24), and from baseline to 3-month follow-up (MAL-AOU: 7.59; MAL-QOM: 1.99).

Taub et al. (1999) reported on ES of the MAL in a sample of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. who received CIMT and reported a large effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
for lower-functioning individuals (n = 11, d = 4.0) and higher functioning individuals (n = 40, d = 3.3). The ES was larger for lower-functioning patients due to lower variability in scores from baseline to post-treatment.

Miltner et al. (1999) reported on ES of the MAL in a sample of 15 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants received two weeks of CIMT and measures were taken at baseline, post-treatment and follow-up (4 weeks and 6 months). Effect sizes were large from first contact to post-treatment (MAL-AOU: 2.07; MAL-QOM: 1.33), from first contact to 4 weeks post-treatment (MAL-AOU: 2.98; MAL-QOM: 1.70), and from first contact to 6-month follow-up (MAL-AOU: 2.68; MAL-QOM: 2.14).

van der Lee et al. (1999) reported on ES of the MAL in a sample of 66 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were randomly assigned to receive forced manual therapy or bimanual training based on neurodevelopmental techniques for two weeks. A 25-item modified version of the MAL was used. There were no significant between-group differences in MAL-QOM scores following treatment. There was a significant difference in MAL-AOU scores, in favour of forced use therapy. The mean difference in gain was 0.52 points (95% CI, 0.11-0.93). Improvements exceeded the Minimal Clinically Important Difference of 0.50 within both groups. The treatment effect was clinically relevant for patients with hemineglect.

van der Lee et al. (2004) examined responsivenessThe ability of an instrument to detect clinically important change over time.
and longitudinal construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the MAL-14 in a sample of 56 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. who were randomized to receive CIMT or bimanual training for a 2-week intervention period. ResponsivenessThe ability of an instrument to detect clinically important change over time.
was measured by responsivenessThe ability of an instrument to detect clinically important change over time.
ratios (RR). Results showed adequate responsivenessThe ability of an instrument to detect clinically important change over time.
for the MAL-AOU and MAL-QOM (RR = 1.9, 2.0 respectively). Longitudinal validityLongitudinal validity is the extent to which changes on one measure will correlate with changes on another measure.
was measured by comparing MAL change scores with the Action Research Arm Test (ARAT) change scores and a global change rating (GCR), using Spearman’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. Change scores between measures were not significant nor highly correlated (MAL-AOU vs. ARAT: r = 0.16, p = 0.23; MAL-QOM vs. ARAT: r = 0.16, p = 0.25; MAL-AOU vs. GCR: r = 0.20, p = 0.15; MAL-QOM vs. GCR: r = 0.22, p = 0.10).

Uswatte et al. (2005b) examined responsivenessThe ability of an instrument to detect clinically important change over time.
of the MAL-14 in a sample of 41 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. who received CIMT or time-matched general fitness rehabilitation, and their caregivers. ResponsivenessThe ability of an instrument to detect clinically important change over time.
was measured by responsivenessThe ability of an instrument to detect clinically important change over time.
ratios (RR). Results showed high responsivenessThe ability of an instrument to detect clinically important change over time.
for patient scores (MAL-AOU: 3.2; MAL-QOM: 4.5), and caregiver scores (MAL-AOU: 4.3; MAL-QOM: 3.0).

Uswatte et al. (2005b) examined responsivenessThe ability of an instrument to detect clinically important change over time.
of the MAL-14 in a sample of 27 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. who received an automated form of constraint-induced movement therapy (AutoCITE) or general fitness rehabilitation. ResponsivenessThe ability of an instrument to detect clinically important change over time.
was measured by responsivenessThe ability of an instrument to detect clinically important change over time.
ratios; results showed high responsivenessThe ability of an instrument to detect clinically important change over time.
for the MAL-AOU and MAL-QOM (RR = 3.8, 5.0, respectively).

Hammer and Lindmark (2010) examined responsivenessThe ability of an instrument to detect clinically important change over time.
and longitudinal construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the MAL-30 in a sample of 30 patients with subacute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. who were randomized to receive forced use therapy or standard upper extremity rehabilitation. ResponsivenessThe ability of an instrument to detect clinically important change over time.
was measured according to effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
(ES), standard response means (SRM) and responsivenessThe ability of an instrument to detect clinically important change over time.
ratios (RR) from baseline to post-treatment (2 weeks), and from baseline to follow-up (3 months). Effect sizes for the MAL-AOU and MAL-QOM were moderate to large from baseline to post-treatment (MAL-AOU: 0.51; MAL-QOM: 0.54) and from baseline to follow-up (MAL-AOU: 1.02; MAL-QOM: 1.17), indicating sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
to change. Standard response means were large from baseline to post-treatment (MAL-AOU: 1.28; MAL-QOM: 1.03), and from baseline to follow-up (MAL-AOU: 1.14; MAL-QOM: 1.19). The greater SRM compared to ES reflects smaller variability in change scores than baseline scores. ResponsivenessThe ability of an instrument to detect clinically important change over time.
ratios were large from baseline to post-treatment (MAL-AOU: 1.22; MAL-QOM: 1.23) and from baseline to follow-up (MAL-AOU: 2.44; MAL-QOM: 2.69). Longitudinal construct was measured by comparison with the FMA-UE, ARAT, Motor Assessment Scale – Upper Extremity score (MAS-UE), 16-hole peg test (16HPT) and the Grippit ratio of isometric grip strength, using Spearman’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. Correlations with the MAS-UE were significant and adequate from baseline to follow-up (MAL-AOU r = 0.53, MAL-QOM r = 0.47); and with the FMA-UE from baseline to post-treatment (MAL-AOU r = 0.44, MAL-QOM r = 0.67) and from baseline to follow-up (MAL-AOU r = 0.39, MAL-QOM r = 0.43).

Khan et al. (2013) examined responsivenessThe ability of an instrument to detect clinically important change over time.
of the German MAL-30 in a sample of 42 patients with acute to chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using standard response mean (SRM). Participants were stratified into two groups according to level of arm and hand function using the Chedoke McMaster StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Assessment (CSMA). Measures were taken at baseline, discharge from rehabilitation and 6-month follow-up. Change scores from the lower-function group (CSMA arm and hand score ≤ 6) revealed high responsivenessThe ability of an instrument to detect clinically important change over time.
of the MAL-AOU and MAL-QOM from baseline to discharge (SRM = 0.93, 0.94 respectively) and baseline to follow-up (SRM = 0.95. 0.98 respectively), but poor from discharge to follow-up (SRM = 0.20, 0.42 respectively). Change scores from the high-function group (CSMA arm and hand score > 6) showed high responsivenessThe ability of an instrument to detect clinically important change over time.
of the MAL-AOU and MAL-QOM from baseline to discharge (SRM = 1.43, 1.31 respectively) and from baseline to follow-up (SRM = 1.34, 1.33, respectively), but poor responsivenessThe ability of an instrument to detect clinically important change over time.
from discharge to follow-up (SRM = 0.22, 0.24 respectively). The authors concluded that the MAL is a responsive measure when the intervention period is included in the measured time interval.

Simpson & Eng (2013) conducted a literature review of upper limb assessments commonly used in strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. rehabilitation, including the MAL. In studies that measured outcomes following CIMT, the observed change (i.e. patients’ perceptions of change, effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
) was 1.6-6.2 times larger than measures of functional change such as the ARAT or WMFT. Similarly, assessments which measure perceived function in the individual’s environment require larger percentage changes than laboratory-based performance measures to surpass the measurement error. Minimal Detectable Change for the MAL-AOU ranged from 72.5% to 86.7% (90% and 95% confidence levels).

Taub et al. (2013) reported on effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
(ES) of the Lower Functioning MAL (LF-MAL) in a sample of 6 individuals with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. who used orthotics/splints and adaptive equipment outside the laboratory over 6 sessions (Phase A), then received mCIMT + neurodevelopmental therapy for 15 consecutive weekdays with continued use of assistive devicesAssistive devices are any piece of equipment that you use to make your daily activities easier to perform.
(Phase B). Effect sizes were calculated from (i) baseline to pre-mCIMT; (ii) pre-mCIMT to post-mCIMT; and (iii) baseline to post-mCIMT and were large at all timepoints (ES = 2.6, 2.1, 3.0, respectively, p < 0.002).

Sterr et al. (2014) reported on treatment effect in a sample of 65 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants received four different CIMT treatment protocols that differed in intensity and use of a constraint. Whole-group analysis showed a significant and large treatment effect from baseline to post-treatment (MAL-AOU: d = 1.19; MAL-QOM: d = 1.38); the treatment effect from post-treatment to 6-month follow-up was small but significant for the MAL-AOU only (d = 0.4). Treatment effect was not significant at 12-month follow-up. There was a significant positive association between training intensity and improvement in MAL-AOU scores.

SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
& SpecificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
:
Chen et al. (2012) examined minimal detectable change (MDC)Minimal Detectable Change (MDC) refers to the minimal amount of change outside of error that reflects true change by a patient between two time points (rather than a variation in measurement). of the MAL. This study used data from the EXCITE trial, in which 222 patients with subacute/chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. who were randomized to receive constraint induced movement therapy (CIMT) for 2 weeks (n = 106) or no treatment (n = 116). MDC with 90% confidence intervals was calculated from pre-post test data from the control group. The MDC of the MAL-AOU was 16.8% (Standard Error of the Mean 7.2%), indicating that a change in amount of use of the affected upper limb greater than 16.8% is required so as to be 90% certain that the change is not due to measurement error. The MDC (90% CI) for the MAL-QOM was 15.4% (SEM 6.6%), indicating higher sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
than the MAL-AOU scale. After treatment, the CIMT group showed an 84.6% increase in MAL-AOU scores and a 72.2% increase in MAL-QOM scores. Both MAL scores exceeded the MDC and were sensitive to change in the context of this intervention.

Simpson, Conroy & Beaver (2015) examined sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of the MAL-28 in a sample of 9 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., by comparison with the Fugl-Meyer Assessment, the Wolf Motor Function Test and StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Iimpact Scale. Measures were taken at baseline, post-treatment and follow-up, and correlations were analysed using Spearman’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. Changes in MAL-AOU scores were sensitive to changes in SIS physical domain scores (ρ = 0.7342, p < 0.0243). Changes in MAL-QOM scores were sensitive to changes in WMFT Functional Ability scores (ρ = 0.6245, p < 0.0722).

References

Ashford, S., Slade, M., Malaprade, F., & Turner-Stokes, L. (2008). Evaluation of functional outcome measures for the hemiparetic upper limb: a systematic review. Journal of Rehabilitation Medicine, 40, 787-95.
DOI: 10.2340/16501977-0276
Cakar, E., Dincer, U., Zeki, M., Kilac, H., Tongur, N., & Taub, E. (2010). Turkish adaptation of Motor Activity Log-28. Turkish Journal of Physical Medicine and Rehabilitation, 56, 1-5.
http://www.ftrdergisi.com/eng/arsiv.asp
Chen, H.L., Lin, K.C., Hsieh, Y.W., Wu, C.Y., Liing, R.J., & Chen, C.L. (2018). A study of predictive validity, responsiveness, and minimal clinically important difference of arm accelerometer in real-world activity of patients with chronic stroke. Clinical Rehabilitation, 32(1), 75-83.
DOI: 10.1177/0269215517712042
Chen, S., Wolf, S.L., Zhang, Q., Thompson, P.A., & Winstein, C.J. (2012). Minimal detectable change of the Actual Amount of Use Test and the Motor Activity Log: the EXCITE trial. Neurorehabilitation and Neural Repair, 26(5), 507-14.
DOI: 10.1177/1545968311425048
Chuang, I.-C., Lin, K.-C., Wu, C.-Y., Hsieh, Y.-W., Liu, C.-T., & Chen, C.-L. (2017). Using Rasch analysis to validate the Motor Activity Log and the Lower Functioning Motor Activity Log in patients with stroke. Physical Therapy, 97(10), 1030-40.
DOI: 10.1093/pjpzs071
Dettmers, C., Teske, U., Hamzei, F., Uswatte, G., Taub, E., & Weiller, C. (2005). Distributed form of constraint-induced movement therapy improves functional outcome and quality of life after stroke. Archives of Physical Medicine and Rehabilitation, 86, 204-9.
DOI: 10.1016/j.apmr.2004.05.007
Hammer, A.M. & Lindmark, B. (2010). Responsiveness and validity of the Motor Activity Log in patients during the subacute phase after stroke. Disability and Rehabilitation, 32(14), 1184-93.
DOI: 10.3109/09638280903437253
Huseyinsinoglu, B.E., Ozdincler, A.R., Ogul, O.E., & Krespi, Y. (2011). Reliability and validity of Turkish version of Motor Activity Log-28. Turkish Journal of Neurology, 17(2), 83-9.
Johnson, A., Judkins, L., Morris, D.M., Uswatte, G., & Taub, E. (2003). The validity and reliability of the 45-item Upper Extremity Motor Activity Log. Journal of Neurologic Physical Therapy, 27(4), 172.
Khan, C.M. & Oesch, P. (2013). Validity and responsiveness of the German version of the Motor Activity Log for the assessment of self-perceived arm use in hemiplegia after stroke. NeuroRehabilitation, 33, 413-21.
DOI: 10.3233/NRE-130972
Kunkel, A., Kopp, B., Muller, G., Villringer, K., Villringer, A., Taub, E., & Flor, H. (1999). Constraint-induced movement therapy for motor recovery in chronic stroke patients. Archives of Physical Medicine & Rehabilitation, 80, 624-8.
PMID: 10378486.
Li, K.-Y., Lin, K.-C., Wang, T.-N., Wu, C.-Y., Huang, Y.-H., & Ouyang, P. (2012). Ability of three motor measures to predict functional outcomes reported by stroke patients after rehabilitation. NeuroRehabilitation, 30, 267-75.
DOI: 10.3233/NRE-2012-0755
Lin, K.-C., Chuang, L.-L., Wu, C.-Y., Hsieh, Y.-W., & Chang, W.-Y. (2010a). Responsiveness and validity of three dexterous function measures in stroke rehabilitation. Journal of Rehabilitation Research & Development, 47(6), 563-72.
DOI:10.1682/JRRD.2009.09.0155
Lin, K.-C., Fu, T., Wu, C.-Y., Hsieh, Y.-W., Chen, C.-L., & Lee, P.-C. (2010b). Psychometric comparisons of the Stroke Impact Scale 3.0 and Stroke-Specific Quality of Life Scale. Quality of Life Research, 19(3), 435-43.
DOI 10.1007/s11136-010-9597-5
Miltner, W.H.R., Bauder, H., Sommer, M., Dettmers, C., & Taub, E. (1999). Effects of constraint-induced movement therapy on patients with chronic motor deficits after stroke: a replication. Stroke, 30(3), 586-92.
PMID: 10066856
Page, S. (2003). Forced use after TBI: promoting plasticity and function through practice. Brain Injury, 17(8), 675-84.
DOI: 10.1080/0269905031000107160
Pereira, N.D., Ovando, A.C., Michaelsen, S.M., Anjos, S.M.D., Lima, R.C.M., Nascimento, L.R., & Teixeira-Salmela, L.F. (2012). Motor Activity Log-Brazil: reliability and relationships with motor impairments in individuals with chronic stroke. Arquivos de Neuro-Psiquiatria, 70(3), 196-201.
Saliba, V.A., Magalhães, L.C., Faria, C.D., Laurentino, G.E.C., Cassiano, J.G., Teixeira-Salmela, L.F. (2011). [Cross-cultural adaptation and analysis of the psychometric properties of the Brazilian version of the Motor Activity Log]. Revista Panamericana de Salud Pública, 30(3), 262-71.
https://www.researchgate.net/publication/266487017
Santisteban, L., Teremetz, M., Bleton, J.-P., Baron, J.-C., Maier, M.A., & Lindberg, P.G. (2016). Upper limb outcome measures used in stroke rehabilitation studies: a systematic literature review. Plos One, May 6.
DOI: 10.1371/journal.pone.0154792
Shindo, K., Oba, H., Hara, J., Ito, M., Hotta, F. & Liu, M. (2015). Psychometric properties of the simple test for evaluating hand function in patients with stroke. Brain Injury, 29(6), 772-6.
DOI: 10.3109/02699052.2015.1004740
Silva, E.S.M., Pereira, N.D., Gianlorenço, A.C.L., & Camargo, P.R. (2018). The evaluation of non-use of the upper limb in chronic hemiparesis is influenced by the level of motor impairment and difficulty of the activities – proposal of a new version of the Motor Activity Log. Physiotherapy Theory and Practice,
DOI: 10.1080/09593985.2018.1460430
Simpson, A., Conroy, S., & Bever, C. (2015). Preliminary assessment of the Motor Activity Log-28 in patients with chronic stroke. Neurology, 84(14 Supplement), P5.174.
Simpson, L.A. & Eng, J.J. (2013). Functional recovery following stroke: capturing changes in upper extremity function. Neurorehabilitation and Neural Repair, 27(3), 240-50.
DOI: 10.1177/1545968312461719
Sterr, A., O’Neill, D., Dean, P.J.A., & Herron, K.A. (2014). CI therapy is beneficial to patients with chronic low-functioning hemiparesis after stroke. Frontiers in Neurology, 5, 204.
DOI: 10.3389/fneur.2014.00204
Taub, E., Miller, N.E., Novack, T.A., Cook, E.W., Fleming, W.C., Nepomuceno, C.S., Connel, J.S., & Crago, J.E. (1993). Technique to improve chronic motor deficit after stroke. Archives of Physical Medicine and Rehabilitation, 74(4), 347-54.
PMID: 8466415
Taub, E. & Uswatte, G. (2000). Constraint-induced movement therapy and massed practice. Stroke, 31(4), 986-8.
PMID: 10754013.
Taub, E., Uswatte, G., Bowman, M.H., Mark, V.W., Delgado, A., Bryson, C., Morris, D., & Bishop-McKay, S. (2013). Constraint-induced movement therapy combined with conventional neurorehabilitation techniques in chronic stroke patients with plegic hands: a case series. Archives of Physical Medicine and Rehabilitation, 94, 86-94.
DOI: 10.1016/j.apmr.2012.07.029
Taub, E., Uswatte, G., & Pidikiti, R. (1999). Constraint-induced movement therapy: a new family of techniques with broad application to physical rehabilitation – a clinical review. Journal of Rehabilitation Research & Development, 36(3), 237-51.
PMID: 10659807
Uswatte. G. & Taub, E. (2005). Implications of the learned nonuse formulation for measuring rehabilitation outcomes: lessons from constraint-induced movement therapy. Rehabilitation Psychology, 50(1), 34-42.
DOI: 10.1037/0090-5550.50.1.34
Uswatte, G., Giuliani, C., Winstein, C., Zeringue, A., Hobbs, L., & Wolf, S.L. (2006a). Validity of accelerometry for monitoring real-world arm activity in patients with subacute stroke: evidence from the extremity constraint-induced therapy evaluation trial. Archives of Physical Medicine and Rehabilitation, 87, 1340-5.
DOI: 10.1016/j.apmr.2006.06.006
Uswatte, G., Taub, E., Morris, D., Light, K., & Thompson, P.A. (2006b). The Motor Activity Log-28: assessing daily use of the hemiparetic arm after stroke. Neurology, 67(7), 1189-94.
https://www.ncbi.nlm.nih.gov/pubmed/17030751
Uswatte, G., Foo, W.L., Olmstead, H., Lopez, K., Holand, A., & Simms, L.B. (2005a). Ambulatory monitoring of arm movement using accelerometry: an objective measure of upper-extremity rehabilitation in persons with chronic stroke. Archives of Physical Medicine and Rehabilitation, 86, 1498-1501.
PMID: 16003690
Uswatte, G., Taub, E., Morris, D., Vignolo, M., & McCulloch, K. (2005b). Reliability and validity of the upper-extremity Motor Activity Log-14 for measuring real-world arm use. Stroke, 36(11), 2493-6.
DOI: 10.1161/01.STR.0000185928.90848.2e
van der Lee, J.H., Beckerman, H., Knol, D.L., de Vet, H.C.W., & Bouter, L.M. (2004). Clinimetric properties of the Motor Activity Log for the assessment of arm use in hemiparetic patients. Stroke, 35, 1410-14.
DOI: 10.1161/01.STR.0000126900.24964.7e
van der Lee, J.H., Wagenaar, R.C., Lankhorst, G.J., Vogelaar, T.W., Deville, W.L., & Bouter, L.M. (1999). Forced use of the upper extremity in chronic stroke patients: results from a single-blind randomized clinical trial. Stroke, 30, 2369-75.
Wu, C.-Y., Chuang, L.-L., Lin, K.-C., & Horng, Y.-S. (2011). Responsiveness and validity of two outcome measures of instrumental activities of daily living in stroke survivors receiving rehabilitative therapies. Clinical Rehabilitation, 25, 175-83.
DOI: 10.1177/0269215510385482

See the measure

How to obtain the Motor Activity Log?

Click here to see the Motor Activity Log manual.

Motor Evaluation Scale for Upper Extremity in Stroke Patients (MESUPES)

Evidence Reviewed as of before: 08-09-2015

Author(s)*: Annabel McDermott, OT

Editor(s): Annie Rochette, PhD OT

Expert Reviewer: Prof. Ann Van de Winckel, PhD, MSc, PT

Content consistency: Gabriel Plumier

Purpose

The MESUPES measures quality of movement performance of the hemiparetic arm and hand in stroke patients. Authors of the assessment are Perfetti & Dal Pezzo (original version of the scale) and Ann Van de Winckel, PhD, MSc, PT (final version of the scale). The original publication of the final version of the scale is by Van de Winckel et al. (2006).

In-Depth Review

Purpose of the measure

The MESUPES measures quality of movement performance of the hemiparetic arm and hand in strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. patients.

Available versions

The original version of the MESUPES comprised 22 items within three categories of arm function (10 items), hand function (9 items) and functional tasks (3 items).

The final version of the measure, analyzed with Principle Component Analysis and Rasch analysis resulted in a 17-item version with two categories of arm function (8 items) and hand function (“range of motion” 6 items; and “orientation during functional tasks” 3 items) (Van de Winckel et al., 2006).

Features of the measure

Items:

The original MESUPES is comprised of 22 items in three subscales:

Arm function: 10 items
Hand function: 9 items
Functional tasks: 3 items

The final version of the MESUPES is comprised of 17 items in two subscales:

MESUPES–Arm function: 8 items with 6 response categories (0-5)
MESUPES–Hand function: 9 items with 3 response categories (0-2).

During the MESUPES–Arm subset, patients are required to perform specific movements of the upper limb in three consecutive phases:

The task is performed passively
The therapist assists the patient during the movement
The patient performs the task by him/herself.

During the MESUPES–Hand subsets, patients are instructed to perform specific movements of the hand and fingers by themselves.

Scoring:

As the MESUPES adopts an ordinal scale, Rasch analysis has been performed to translate ordinal data into interval measures (logit scores) (Van de Winckel et al., 2006).

Online scoring will soon be available to enable users to input the ordinal scores and retrieve logits scores immediately (personal correspondence, Van de Winckel, 2015).

Subset 1: Arm function

The MESUPES–Arm subset evaluates ‘normal’ movement of the hemiparetic limb, which can be judged by comparison with movement of the patient’s unaffected arm. Only qualitatively ‘normal’ movements of the arm are scored.

The tasks are performed in three phases. The number of phases evaluated depends on the level of ability the patient has, to perform the movement correctly.

Testing phase	Points achieved
1. The therapist moves the patient’s arm and hand and evaluates muscle tone first.
No adequate adaptation of tone to movement:	0 points
Adequate adaptation of tone (normal tone) to at least part of the movement:	1 point
2. If the patient exhibits normal tone, the patient participates in the movement and the therapist evaluates muscle contractions.
The patient demonstrates functionally and qualitatively correct muscle contraction in at least part of the movement:	2 points
3. If the patient exhibits normal muscle contraction, the patient performs the movement independently and the therapist assesses range of movement. A score is given for the range of motion that the patient can perform with good quality of motion.
Part of the movement is performed normally:	3 points
Total range of normal movement is done slowly or with great effort:	4 points
The patient demonstrates normal movement performance:	5 points

The patient is allowed to repeat test items with a maximum of three attempts; the patient is awarded the highest score achieved. See the measure for more scoring information.

Subset 2: Hand function (Range of Motion)

Performance of movement and measurement of range of motion is not compared with the unaffected hand for this subset. Only qualitatively normal movements of the hand and fingers are scored.

Testing procedure	Points achieved
The patient performs the instructed movement actively and the therapist assesses range of movement between 0-2cm qualitatively and quantitatively.	0-2 points
no movement:	0 points
movement amplitude < 2 cm	1 point
movement amplitude ≥ 2 cm	2 points

Subset 3: Hand function (Orientation during functional tasks)

Quality of movement is not compared with the unaffected hand for this subset.

Testing procedure	Points achieved
The patient manipulates materials as instructed and the therapist assesses whether the patient is able to orient the wrist and fingers to the object throughout the movement in a normal way.	0-2 points
no movement or movement with abnormal orientation of fingers and wrist towards the object:	0 points
movement with normal orientation of fingers or wrist towards the object:	1 point
whole movement correct:	2 points

The maximum achievable score is 58 (MESUPES-Arm maximum score is 40; MESUPES-Hand maximum score is 18). The patient is awarded one score for each task, and the highest score is retained. A score of 0 is awarded when the patient demonstrated inadequate tone, abnormal muscle contractions, synergic (flexor/extensor) or mass movement patterns (Appendix 2, Instructions, Van de Winckel et al. , 2006).

What to consider before beginning:

The first four items are performed in supine; all other items are performed in a sitting position with hips and knees at 90 degrees and elbows resting on the table. The patient can be provided support to maintain a sitting position if required. The patient cannot be assessed (and therefore awarded a point) if he/she is not able to sit in an upright position for a task. The therapist can reposition the patient’s upper extremity before beginning each new task, and should wait until the tone is normalized before starting a new task. If the patient is not able to achieve a relaxed starting position, he/she is awarded a score of 0 for the item.

The patient must be given clear instructions using the following steps:

The therapist explains the task verbally and demonstrates the movement
The patient is asked to perform the task with the non-affected side first to ensure he/she understands the demands of the task.

Time:

It takes approximately 10 minutes to administer the evaluation (between 5min for patients with very poor or very good motor impairmentLoss of strength and coordination, decrease in arm or leg movement
– about 15min for patients with more severe hypertonia).

Training requirements:

Instructions are given in Appendix 2 (Van de Winckel et al., 2006) and are available here online. These instructions should suffice for trained clinicians (physical therapists, occupational therapists etc).

For the original evaluation, seven raters were trained for an hour to familiarize them with the assessment protocol (Van de Winckel et al., 2006). In Johansson & Hager’s study (2012), raters underwent a 2h training session.

An instructional video will soon be made available online. In the meantime, the developer of the MESUPES (Prof. Ann Van de Winckel, avandewi@umn.edu) can be contacted to address questions concerning the use of the MESUPES.

Equipment:

Plinth or mat
Desk and chair, positioned so that the patient is sitting with hip and knees in 90 degrees flexion
Wooden or plastic block marked with 1cm and 2cm to measure range of movement during hand tasks
One larger and one smaller plastic bottle (cylinder; diameter 6 cm, like a 20fl oz or 591ml soda or water bottle)
One smaller plastic bottle (cylinder, diameter 2.5cm, height 8cm, like a round correction fluid bottle, as shown in the figure)
Dice (1.5 x 1.5 cm)

Client suitability

Differential item functioning was performed with Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
to test the stability of item hierarchy (from easy to difficult items) on several variables.

There is no differential item functioning across subgroups of gender, age (<60 / ≥60 years), time since stroke (< 3 months / ≥ 3 months), country of residence, side of lesion and type of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (hemorrhagic, ischemic) (Van de Winckel et al. 2006), meaning that the hierarchy of items (from easy to difficult) is maintained across all strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. patients groups with the above mentioned variables.

Can be used with:

Individuals with stroke

Should not be used with:

The measure is intended for use with adult patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.; there is insufficient evidence regarding psychometric properties of the tool with other populations, including a pediatric population.

In what languages is the measure available?

Catalan (available online, Van de Winckel A, 2015)
Dutch (Flemish) (available online, Van de Winckel, A., 2015)
English (available online, Van de Winckel et al., 2006)
French (available online, Van de Winckel A, 2015)
German (available online, Van Bellingen, T., Van de Winckel, A., et al. 2009. Chapter 1: Assessment in Neurorehabilitation. In Neurology (2^nd ed.) (192-201). Huber.
Italian – (available online, Van de Winckel A, 2015) (Perfetti & Dal Pezzo, original version)
Portuguese (available online, Van de Winckel A, 2015)
Spanish (available online, Van de Winckel A, 2015)
Swedish (available online, Johansson & Hager, 2012)/li>

Summary

What does the tool measure?	The MESUPES measures quality of movement performance of the hemiparetic arm and hand in patients with stroke.
What types of clients can the tool be used for?	The MESUPES was developed for use with adults with stroke.
Is this a screening or assessment tool?	Assessment tool
Time to administer	10 minutes (range 5-15min)
ICF Domain	• Body function/structure • Activity
Versions	Final version (Van de Winckel et al., 2006) = 17 items (total score /58; MESUPES-arm score /40; MESUPES-hand score /18)
Languages	Available online on StrokEngine: Catalan Dutch (Flemish) English French German Italian Portuguese Spanish Swedish
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistency: One study has reported on the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MESUPES using Principal Component Analysis and Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model. . Results showed high person separation indices and unidimensionality within subtests. Test-retest: Two studies have reported on the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the MESUPES in patients with subacute to chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and reported good to very good agreement over 24-48 hours. Intra-rater: No studies have reported on the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the MESUPES. Inter-rater: Two studies have reported on the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the MESUPES in patients with subacute to chronic stroke and reported good to very good agreement between raters for subtests; moderate to very high item reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest . ; and sufficient absolute reliability of the total score.
Validity	Content: One study investigated validity of the 17-item MESUPES and reported unidimensionality of the arm and hand scales. Criterion: Concurrent: One study examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the MESUPES and reported high correlations with the Modified Motor Assessment Scale (MMAS). Predictive: No studies have reported on predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the MESUPES. Construct: Convergent/Discriminant: No studies have reported on convergent/discriminant validityThe degree to which an assessment measures what it is supposed to measure. of the MESUPES. Known Groups: No studies have reported on known group validity of the MESUPES.
Floor/Ceiling Effects	No studies have reported on the floor/ceiling effects of the MESUPES.
Does the tool detect change in patients?	• No studies have reported on the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." or specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative). of the MESUPES. • One study reported MDC scores of 8, 7 and 5 (95%, 90% and 80% CI, respectively).
Acceptability	Administration of the MESUPES is easy and fast. The measure is inexpensive and requires minimal standard equipment.
Feasibility	The MESUPES requires no specialized training to administer. However, the MESUPES should only be administered by clinicians with knowledge of stroke and clinical assessment of tone, muscle contraction and movement.
How to obtain the tool?	See the measure

Psychometric Properties

Overview

A literature search was conducted to identify all relevant publications on the psychometric properties of the MESUPES. Two English studies were identified.

Floor and ceiling effect

No studies have reported on the floor or ceiling effects of the MESUPES.

Van de Winckel (personal correspondence, 2015) noted that in the study by Van de Winckel et al. (2006) in which 396 patients with low to high motor performance following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. were assessed using the MESUPES less than 5% of patients achieved a score of 0 on the arm items and less than 20% of participants achieved the maximum score. Approximately 42% of participants achieved a score of 0 on the hand items and less than 5% of patients achieved a maximum score on the hand items.

Reliability

Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.:
Van de Winckel et al. (2006) examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MESUPES in a sample of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. using Principal Component Analysis and Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
. Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
was used to determine ‘item-trait interaction’, which shows the degree of invariance across the intended dimension, and ‘person separation index’. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. was obtained when the MESUPES was divided into the MESUPES-Arm (8 items) and MESUPES-Hand (9 items) subtests. Rasch analysis and fit statistics showed that both subtests adhered to unidimensional characteristics, whereby all items in the subtests pertain to the same construct. The person separation index was 0.99 for the MESUPES-Arm and 0.97 for the MESUPES-Hand, indicating very high internal consistency.

Test-retest:
See inter-rater reliability above for results also pertaining to test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
.

Inter-rater:
Van de Winckel et al. (2006) investigated inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the MESUPES in a sample of 56 patients with subacute to chronic stroke. Assessments were conducted by 2 assessors over 24 hours. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
, calculated using intra-class correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients (ICCs) was excellent for the arm function total score (ICC=0.95, 95% CI 0.91-0.97) and hand function total score (ICC=0.97, 95% CI 0.95-0.98). Assessment of inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
by weighted percentage agreement and weighted kappa confirmed item reliability for the arm function subtest (weighted kappa coefficient = 0.62-0.79; weighted percentage agreement 85.71-98.21); scores were not derived for hand function items as more than 50% of the sample scored 0.

Johansson & Hager (2012) investigated inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the MESUPES in a sample of 42 patients with subacute to chronic stroke. Assessments were conducted by 2 therapists within 48 hours. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
, calculated by percentage agreement using linear-weighted kappa analysis revealed good to very good agreement between raters (kappa range 0.63-0.96). Relative and absolute reliability was measured using intra-class correlation coefficients (ICCs) and standard error of measurement (SEM): item reliability was moderate to very high (ICC=0.63-0.96); reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of subscores and the total score was very high (ICC=0.98, 95% CI 0.96=0.99); and the total score demonstrated sufficient absolute reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
(SEM=2.68).

Validity

Content:

The original version of the MESUPES developed by Perfetti & Dal Pezzo comprised 22 items across three categories of (i) arm function (10 items); (ii) hand function (9 items); and (iii) functional tasks (3 items).

Van de Winckel et al. (2006) investigated validityThe degree to which an assessment measures what it is supposed to measure.
and unidimensionality of the MESUPES in a sample of 396 patients with subacute to chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Principle Component Analysis (PCA) of the original 22-item version revealed two dimensions: arm function and hand function. Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
of these two separate scales identified misfit among five items (respectively 2 arm items and 3 hand items). Following removal of these items, subsequent Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
of the remaining 17 items and fit statistics confirmed unidimensionality of both arm and hand scales:

	Person fit	Item fit	Person separation index
Arm function	-0.51±1.19	-0.65±1.07	0.99
Hand function	-0.12±0.71	0.15±1.21	0.97

Test items followed an order of increasing difficulty with no reversed thresholds and no differential item functioning (DIF) according to gender, age (<60, ≥60), side of hemiparesis, time since strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (< 3 months, ≥ 3 months), type of stroke or country (Van de Winckel et al., 2006).

Criterion:

Concurrent:
Johansson & Hager (2012) investigated concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the MESUPES in a sample of 42 patients with subacute to chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. by comparison with the Modified Motor Assessment Scale (MMAS), using Spearman’s rho. Correlations were high between the MESUPES total scores and the MMAS (r=0.87); MESUPES arm items and MMAS (r=0.84); and MESUPES hand items and MMAS (r=0.80).

Construct:

Convergent/Discriminant:
No studies have reported on convergent/discriminant validityThe degree to which an assessment measures what it is supposed to measure.
of the MESUPES.

Known Group:
No studies have reported on the known group validity of the MESUPES.

Responsiveness

Johansson & Hager (2012) assessed minimal detectable change (MDC)Minimal Detectable Change (MDC) refers to the minimal amount of change outside of error that reflects true change by a patient between two time points (rather than a variation in measurement). of the MESUPES with a sample of 42 patients with subacute to chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Patients were assessed at two time points 48 hours apart. The authors reported change scores of 8, 7 and 5 (95%, 90% and 80% confidence intervals, respectively) were required for certainty of true change.

References

Johansson, G.M. & Hager, C.K. (2012). Measurement properties of the Motor Evaluation Scale for Upper Extremity in Stroke Patients (MESUPES). Disability & Rehabilitation, 34(4):288-94. DOI: 10.3109/09638288.2011.606343
Van de Winckel, A., Feys, H., van der Knaap, S., Messerli, R., Baronti, F., Lehmann, R., Van Hemelrijk, B., Pante, F., Perfetti, C., & De Weerdt, W. (2006). Can quality of movement be measured? Rasch analysis and inter-rater reliability of the Motor Evaluation Scale for Upper Extremity in Stroke Patients (MESUPES). Clinical Rehabilitation, 20, 871-84.

See the measure

How to obtain the MESUPES

Click on the language below:

Please click here for an instructional video on how to use the scale.

Nine Hole Peg Test (NHPT)

Evidence Reviewed as of before: 09-06-2011

Author(s)*: Sabrina Figueiredo, BSc

Editor(s): Lisa Zeltzer, MSc OT; Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Purpose

In-Depth Review

Purpose of the measure

The Nine Hole Peg Test (NHPT) was developed to measure finger dexterity, also known as fine manual dexterity. It can be used with a wide range of populations, including clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Additionally, the NHPT is a relatively inexpensive test and can be administered quickly.

The NHPT should be used in association with other upper extremity performance tests, in order to estimate upper limb function with more accuracy.

Available versions

The NHPT was first introduced by Kellor, Frost, Silberberg, Iversen, and Cummings in 1971. In 1985, norms for the NHPT in healthy individuals were established by Mathiowetz, Weber, Kashman, and Volland.

Features of the measure

Items:

The NHPT is composed of a square board with 9 pegs. At one end of the board are holes for the pegs to fit in to, and at the other end is a shallow round dish to store the pegs. The NHPT is administered by asking the client to take the pegs from a container, one by one, and placing them into the holes on the board, as quickly as possible. Clients must then remove the pegs from the holes, one by one, and replace them back into the container. In order to practice and register baseline scores, the test should begin with the unaffected upper limb. The board should be placed at the client’s midline, with the container holding the pegs oriented towards the hand being tested. Only the hand being evaluated should perform the test. The hand not being evaluated is permitted to hold the edge of the board in order to provide stability (Mathiowetz et al., 1985; Sommerfeld, Eek, Svensson, Holmqvist, & Arbin, 2004).

Scoring:

Clients are scored based on the time taken to complete the test activity, recorded in seconds. The stopwatch should be started from the moment the participant touches the first peg until the moment the last peg hits the container. (Grice, Vogel, Le, Mitchell, Muniz, & Vollmer, 2003; Mathiowetz et al., 1985).

Mathiowetz et al. (1985) reported that on average, healthy male adults complete the NHPT in 19.0 seconds (SD 3.2) with the right hand, and in 20.6 seconds (SD 3.9) with the left hand. For healthy female adults, the NHPT was completed in 17.9 seconds (SD 2.8) and 19.6 seconds (SD 3.4) with the right and left hand, respectively.

Alternative scoring – the number of pegs placed in 50 or 100 seconds can be recorded. In this case, results are expressed as the number of pegs placed per second (Jacob-Lloyd, Dunn, Brain, & Lamb, 2005; Sunderland, Trinson, Bradley, & Langton-Hewer, 1989).

Time:

Not typically reported. Norms indicated above indicate approximate testing times in normals.

Subscales:

None

Equipment:

The standardized equipment consists of:

A board, in wood or plastic, with 9 holes (10 mm diameter, 15 mm depth), placed apart by 32 mm (Mathiowetz et al., 1985; Sommerfeld et al., 2004) or 50 mm (Heller, Wade, Wood, Sunderland, Hewer, & Ward, 1987).
A container for the pegs. Initially the container was a square box (100 x 100 x 10 mm) apart from the board. The most current container is a shallow round dish at the end of the board (Grice et al., 2003).
9 pegs (7 mm diameter, 32 mm length) (Mathiowetz et al., 1985).
Stopwatch.

Training:

None typically reported.

Alternative forms of the Nine Hole Peg Test

None.

Client suitability

Can be used with:

Clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Clients should have a satisfactory level of upper limb fine motor skills as they must be able to pick up the pegs to complete the test.

Should not be used in:

The NHPT cannot be used with clients who have severe upper extremity impairment.
The NHPT cannot be used with clients with severe cognitive impairment.
Scoring with an upper time limit of 50 or 100 seconds requires caution especially in the acute post-stroke period due to the possibility of floor effects (Jacob-Lloyd et al., 2005; Sunderland et al.,1989).

In what languages is the measure available?

There are no official translations of the NHPT.

Some publications from Netherlands, Japan and Sweden have used the NHPT as an outcome measure, which shows its use in languages other than English. (Dekker, Van Staalduinem, Beckerman, Van der Lee, Koppe, & Zondervan, 2001; Hatanaka, Koyama, Kanematsu, Takahashi, Matsumoto, & Domen, 2007; Sommerfeld et al., 2004).

Summary

What does the tool measure?	Finger dexterity.
What types of clients can the tool be used for?	The NHPT can be used with, but is not limited to clients with stroke. There are no restrictions when administering it to clients with chronic stroke. With clients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. the mode of scoring should be observed in order to avoid floor effects.
Is this a screening or assessment tool?	Assessment
Time to administer	The amount of time it takes to administer the NHPT has not been reported and it will vary according to the client’s impairment or the mode of scoring.
Versions	There are no alternative versions.
Other Languages	There are no official translations.
Measurement Properties
Reliability	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: No studies have examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the NHPT. Intra-rater: Three studies have examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the NHPT. Both reported excellent intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. and one reported adequate intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. using correlation coefficients. One study used Spearman rho and the two others, Pearson correlation. Inter-rater: Three studies have examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the NHPT and reported inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. using correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. coefficients. One study used Spearman rho and the two others, Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. .
Validity	Criterion: Concurrent: Two studies have examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the NHPT. The first study examined the sensitivity of the NHPT comparing it to the Frenchay Arm Test as the gold standard and reported that NHPT has a low sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." , with 27% of misclassified results. The second study examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the NHPT and reported adequate to excellent correlation with the Box and Block Test (BBT) and the Action Research Arm Test (ARAT) at pre and post-treatment. Predictive: One study has examined predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. and reported that NHPT is not able to predict functional outcomes after six months of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Construct: Convergent: One study has examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the NHPT and reported excellent correlations between the NHPT and the Motricity Index using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. coefficients.
Floor/Ceiling Effects	Two studies have examined floor effects of the NHPT. In both studies, clients were scored based on a cutoff of 50 or 100 seconds. Participants not able to complete the test within this time were scored as 0. In both studies, at earlier phases of the stroke, floor effects were poor or adequat. After six months of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. the floor effects were adequate.
Does the tool detect change in patients?	Two studies have examined the ability to detect change of the NHPT and reported that the NHPT is able to detect change.
Acceptability	The NHPT should not be used clients with severe upper extremity impairment and those who are not able to pick up the pegs.
Feasibility	The administration of the NHPT is quick and simple, however it requires standardized equipment. One study has examined the feasibility of the NHPT and reported that, on average, 52% of clients with acute stroke were not able to perform the NHPT (Jacob-Lloyd et al., 2005).
How to obtain the tool?	The NHPT instructions can be obtained in the study by Mathiowetz et al., (1985). Also, a version of the measure can be obtained from the publication by Wade (1992). Davis et al. (1999) reported the most used standardized equipments for NHPT in the United States are produced by Smith and Nephew Rehabilitation, Inc. and Sammons Preston. Standardized equipment can be obtained at the website: http://www.sammonspreston.com/Supply/Product.asp?Leaf_Id=A8515

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the Nine-Hole Peg Test (NHPT) in two different populations – healthy normal subjects and individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. We identified seven. The results of these suggest that the NHPT may be a reliable, valid and responsive measure in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. In clients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., the NHPT needs to be used carefully due to the possibility of floor effects.

In a literature review, Croarkin, Danoff, and Barnes (2004) identified the level of evidence1a (Strong) : Well-designed meta-analysis, or 2 or more high quality RCTs (PEDro ≥ 6) showing similar findings 1b(Moderate): 1 RCT of high quality (PEDro ≥ 6) 2a (Limited): At least 1 fair quality RCT (PEDro = 4-5) 2b (Limited): At least one poor quality RCT (PEDro < 4) or well-designed non-experimental study (non-randomized controlled trial, quasi-experimental studies, cohort studies with multiple baselines, single subject series with multiple baselines, etc.) 3 (Consensus): Agreement by an expert panel or a group of professionals in the field or a number of pre-post studies all with similar results 4 (Conflicting): Conflicting evidence of 2 or more equally well-designed studies 5 (No evidence): No well-designed studies - only case studies/case descriptions or cohort studies/single subject series with no multiple baselines) for nine upper extremity motor function tests. The level of evidence1a (Strong) : Well-designed meta-analysis, or 2 or more high quality RCTs (PEDro ≥ 6) showing similar findings 1b(Moderate): 1 RCT of high quality (PEDro ≥ 6) 2a (Limited): At least 1 fair quality RCT (PEDro = 4-5) 2b (Limited): At least one poor quality RCT (PEDro < 4) or well-designed non-experimental study (non-randomized controlled trial, quasi-experimental studies, cohort studies with multiple baselines, single subject series with multiple baselines, etc.) 3 (Consensus): Agreement by an expert panel or a group of professionals in the field or a number of pre-post studies all with similar results 4 (Conflicting): Conflicting evidence of 2 or more equally well-designed studies 5 (No evidence): No well-designed studies - only case studies/case descriptions or cohort studies/single subject series with no multiple baselines) was established based on the total number of psychometric properties addressed in studies of each test. Compared to the Action Research Arm Test (Lyle, 1981), Chedoke-McMaster Stroke Assessment (Gowland, VanHullenaar & Torresin et al., 1995), Fugl-Meyer Sensorimotor Assessment (Fugl-Meyer, Jääskö, Leyman, Olsson & Steglind, 1975), Modified Motor Assessment Chart (Lindmark & Hamrin, 1988), Motor Assessment Scale (Carr, Shepherd, Nordholm & Lynne, 1985), Motor Club Assessment (Ashburn, 1982), Motricity Index (Demeurisse, Demol & Rolaye, 1980) et Rivermead Motor Assessment (Lincoln & Leadbitter, 1979),the NHPT was found to have the greatest number of psychometric properties supported, with studies on intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
, inter-rater reliability, convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
and predictive validity.

Floor/Ceiling Effects

Jacob-Lloyd, Dunn, Brain, and Lamb (2005) examined the ceiling and floor effects of the NHPT in 50 persons with stroke. Participants were assessed twice within a 6 month interval. The first assessment was at hospital discharge. In this study, participants were scored based on the cutoff of 100 seconds. Those who took more than 100 seconds to complete the test were scored as 0. At discharge, the NHPT demonstrated an adequate floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect."
, with less than 20 % of the participants scoring the minimal value. After 6 months, the number of participants scoring the minimal value decreased with the NHPT still demonstrating an adequate floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect."
.

Sunderland, Trinson, Bradley, and Langton-Hewer (1989) examined the presence of a floor effect in 31 participants with stroke. Assessments were performed at four points in time: admission, 1, 3 and 6 months post-stroke. Participants were given 50 seconds to complete the test. Those who were not able to complete the test within this time limit were scored as 0. Initially, the NHPT demonstrated a poor floor effect of 65% but decreased at the 6 month follow up.
Note: No values were provided by the authors for the 6 month follow-up.

Reliability

Note: A number of the publications on reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
reviewed below used statistical analyses such as Pearson’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient that are not considered the analyses of preference for testing reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
and may artificially inflate reliability coefficients. Future studies should examine the reliability of the NHPT using ICC or Kappa statistics.

Test-retest:
No studies were identified examining the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the NHPT.

Intra-rater:
Heller, Wade, Wood, Sunderland, Hewer, and Ward (1987) examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the NHPT, Frenchay Arm Test (Heller et al., 1987), Finger Tapping Rate (Lezak, 1983), and Grip Strength (Mathiowetz, Kashman, Volland, Weber, Dowe, & Rogers, 1985) in 10 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were re-assessed with a 2-week interval by the same rater. In this study, results describe the range of reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the four measures mentioned above, and values for each individual measure were not provided. Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient was excellent (ranging for all four measures from r = 0.68 to 0.99).
Note: Although is not possible to discern the exact value for the NHPT`s reliability, all values were considered excellent and statistically significant, suggesting that the NHPT may be reliable with stable stroke clients.

Mathiowetz, Weber, Kashman, and Volland (1985) examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the NHPT in 26 healthy female young adults. Participants were re-assessed with a 1-week interval by the same rater. The Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient showed excellent agreement (r = 0.69) for the right hand and adequate agreement (r = 0.43) for the left.

Grice et al. (2003) reproduced the Mathiowetz et al. (1985) study in order to estimate the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the NHPT, after its design was slightly modified. In the Mathiowetz and associates’ study, the NHPT equipment was composed of a wooden board for the holes and a wooden square container for the pegs. The NHPT equipment was then modified to a plastic board with a shallow round dish as container, at the end of the board. Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient for the new NHPT was reported as adequate (r = 0.46; r = 0.44) for the right and left hand, respectively.

Inter-rater:
Heller et al. (1987) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the NHPT, Frenchay Arm Test (Heller et al., 1987), Finger Tapping Rate (Lezak, 1983), and Grip Strength (Mathiowetz et al., 1985) in 10 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were assessed twice within a week by two different raters. Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients were excellent (ranging for all four measures from r = 0.75 to 0.99).
Note: in this study, individual values for each measure were not provided. Although is not possible to discern the exact value for the NHPT`s reliability, all values were considered excellent.

Mathiowetz et al. (1985) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the NHPT in 26 healthy young female adults. Participants were evaluated simultaneously and independently by two raters. Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients showed excellent agreement (r = 0.97; r = 0.99) for the right and left hand, respectively.

Grice et al (2003) reproduced Mathiowetz et al. (1985) study to estimate the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the new NHPT. Pearson correlation coefficients showed excellent agreement (r = 0 .98; r = 0.99) for the right and left hand, respectively.

Validity

Content:

Not available.

Criterion:

Concurrent:
Sunderland et al. (1989) estimated the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of the NHPT, the Motor Club Assessment (Ashburn, 1982) and the Motricity Index (Demeurisse et al., 1980) by comparing them to the Frenchay Arm Test (Heller et al., 1987), as the gold standardA measurement that is widely accepted as being the best available to measure a construct.
, in 31 participants with acute stroke. The NHPT had the lowest sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
with 27% of the cases incorrectly classified. The most sensitive measure, with 0% of cases misclassified, was the Motricity Index.

Lin, Chuang, Wu, Hsieh and Chang (2010) compared the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the NHPT, Action Research Arm Test (ARAT) and Box and Block Test (BBT) for evaluating hand dexterity in 59 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The Fugl-Meyer Assessment of Sensorimotor Recovery After Stroke (FMA), Motor Activity Log (MAL) and StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale (SIS) were also administered to assess the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the NHPT, ARAT and BBT. Using Spearman rank correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient, the NHPT, ARAT and BBT were found to have adequate to excellent correlations at pre-treatment (ranging from rho=-0.55 to -0.80) and post-treatment (ranging from rho=-0.57 to -0.71). In addition, the ARAT and BBT were found to have adequate correlations with the FMA, MAL and SIS (ranging from rho=0.31 to -0.59); however, the NHPT had only poor to adequate correlations with the FMA and MAL (ranging from rho=-0.16 to -0.33); and adequate to excellent correlations with the SIS (ranging from rho=-0.58 to -0.66). When considering both the results of responsiveness and validation components of the study, the ARAT and BBT are believed to be more appropriate than the NHPT for evaluating dexterity.

Predictive:
Sunderland et al. (1989) examined whether the NHPT, Motor Club Assessment (Ashburn, 1982) and Motricity Index (Demeurisse et al., 1980) were able to predict functional outcomes at six months after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. measured by the Frenchay Arm Test (Heller et al., 1987). Predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the NHPT was examined in 31 participants with acute stroke. Assessments were performed at four points in time: admission, 1, 3 and 6 months post-stroke. The NHPT administered at 1 month did not predict functional outcomes at 6 months. The best predictor of functional outcomes at 6 months was the Motricity Index.

Construct:

Convergent/Discriminant:
Parker, Wade, and Hewer (1986) tested the construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the NHPT by comparing the NHPT to the Motricity Index (Demeurisse et al., 1980) in 187 persons with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between NHPT and Motricity Index was excellent (r = 0.82).

Known groups:
No studies have examined known groups’ validityThe degree to which an assessment measures what it is supposed to measure.
of the NHPT.

Responsiveness

Jacob-Lloyd et al. (2005) examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the NHPT in 50 persons with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were assessed twice within a 6 month interval. The first assessment was at hospital discharge. Effect sizes were calculated using Wilcoxon signed rank test. Although the author reported a large effect size in this study, no reference values were provided. The NHPT was more likely to detect change than the Motricity Index (Demeurisse et al., 1980).

Lin, Chuang, Wu, Hsieh and Chang (2010) evaluated the responsiveness of the NHPT, the Action Research Arm Test (ARAT) and Box and Block Test (BBT) for evaluating hand dexterity in 59 patients with subacute stroke (< 6-months) and Brunnstrom stage IV to VI for proximal and distal upper extremity function. Patients were randomly assigned to receive constraint-induced therapyA form of intervention that involves restraining the unaffected upper or lower extremity in order to encourage movement of the affected limbs. For persons with USN, constraint-induced therapy involves restraining the unaffected arm or hand using a sling or padded mitt, in order to promote visual scanning and movement in the neglected hemispace.
, bilateral arm training or control treatment and received 2 hours of therapy, 5 days per week for 3 weeks. Assessments were performed at baseline and 3 weeks. Using Standardized Response MeanThe standardized response mean (SRM) is calculated by dividing the mean change by the standard deviation of the change scores.
(SRM) to calculate responsivenessThe ability of an instrument to detect clinically important change over time.
, the NHPT, ARAT and BBT were all found to have moderate SRM (0.64 0.79, 0.74 respectively), indicating sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
for detecting change in hand dexterity. When considering both the results of responsivenessThe ability of an instrument to detect clinically important change over time.
and validation components of the study, the ARAT and BBT are believed to be more appropriate than the NHPT for evaluating dexterity.

References

Ashburn, A. (1982). A physical assessment for stroke patients. Physiotherapy, 68, 109-113.
Carr, J.H., Shepherd, R.B., Nordholm, L., & Lynne, D. (1985). Investigation of a new motor assessment scale for stroke patients. Physical Therapy, 65, 175- 180.
Croarkin, E., Danoff, J., & Barnes, C. (2004). Evidence-based rating of upper-extremity motor function tests used for people following a stroke. Physical Therapy, 84, 62-74.
Cromwell, F.S. (1965). Occupational therapists manual for basic skills assessment: primary prevocational evaluation. California, USA: Fair Oaks Printing.
Davis, J., Kayser, J., Matlin, P., Mower, S., & Tadano, P. (1999). Nine hole peg tests – are they all the same? Occupational Therapy Practice, 4, 59-61.
Dekker, C.L., Van Staalduinem, A.M., Beckerman, H., Van der Lee, J.H., Koppe, P.A., & Zondervan, R.C.J. (2001). Concurrent validity of instruments to measure upper extremity performance: the action research arm test; the nine hole peg test and the motricity index. Nederlands Tijdscrift Voor Fysiotherapie, 111(15), 110- 115.
Demeurisse, G., Demol, O., & Robaye, E. (1980). Motor evaluation in vascular hemiplegia. European Neurology, 19(6), 382-389.
Desrosiers, J., Rochette, A.,Hebert, R.,& Bravo, G. (1997). The minnesota manual dexterity test: reliability, validity and reference values studies with healthy elderly people. Canadian Journal of Occupational Therapy, 64(5), 270-276.
Fugl-Meyer, A.R., Jääskö, L., Leyman, I., Olsson, S., & Steglind, S. (1975). The post-stroke hemiplegic patient 1. A method for evaluation of physical performance. Scandinavian Journal of Rehabilitation Medicine, 7, 13-31.
Grice, K.O., Vogel, K.A., Le, V., Mitchell, A., Muniz, S., & Vollmer, M.A. (2003). Adult norms for a commercially available nine hole peg test for finger dexterity. American Journal of Occupational Therapy, 57, 570-573.
Gowland, C., VanHullenaar, S., Torresin, W., et al., (1995). Chedoke-McMaster Stroke Assessment: development, validation, and administration manual. Hamilton, (ON), Canada: School of Rehabilitation Science, McMaster University.
Hatanaka, T., Koyama, T., Kanematsu, M., Takahashi, N., Matsumoto, K., & Domen, K. (2007). New evaluation method for upper extremity dexterity of patients with hemiparesis after stroke: the 10-second tests. International Journal of Rehabilitation Research, 30(3), 243-247.
Heller, A., Wade, D.T., Wood, V.A., Sunderland, A., Hewer, R., & Ward, E. (1987). Arm function after stroke: measurement and recovery over the first three months. Journal of Neurology, Neurosurgery & Psychiatry, 50(6), 714- 719.
Jacob-Lloyd, H.A., Dunn, O.M., Brain, N.D., & Lamb, S.E. (2005). Effective measurement of the functional progress of stroke clients. British Journal of Occupational Therapy, 68 (6), 253-259.
Jebsen, R.H., Taylor, N., Trieschmann, R.B., Trotter, M.J., & Howard, L.A. (1969). An objective and standardized test of hand function. Archives of Physical Medicine and Rehabilitation, 50, 311-319.
Kellor, M., Frost, J., Silberberg, N., Iversen, I., & Cummings R. (1971). Hand strength and dexterity. American Journal of Occupational Therapy, 25, 77-83.
Lezak, M.D. (1983). Neuropsychological assessment. Oxford, England: Oxford University Press.
Lincoln, N.B. & Leadbitter, D. (1979). Assessment of motor function in stroke patients. Physiotherapy, 65, 48-51.
Lin, K-C., Chuang, L-L., Wu, C-Y., Hseih, Y-W. & Chang, W-Y. (2010). Responsiveness and validity of three dexterous function measures in stroke rehabilitation. Journal of Rehabilitation Research and Development, 47(6), 563-572.
Lindmark, B. & Hamrin, E. (1988). Evaluation of function capacity after stroke as a basis for active intervention: Presentation of a modified chart for motor capacity assessment and its reliability. Scandinavian Journal of Rehabilitation Medicine, 20, 103-109.
Lyle, R.C. (1981). A performance test for assessment of upper limb function in physical rehabilitation treatment and research. International Journal of Rehabilitation and Research, 4, 483-92.
Mathiowetz, V., Weber, K., Kashman, N., & Volland, G. (1985). Adult norms for the nine hole peg test of finger dexterity. Occupational Therapy Journal of Research, 5, 24 -33.
Mathiowetz, V., Kashman, N., Volland, G., Weber, K., Dowe, M., & Rogers, S. (1985). Grip and pinch strength: normative data for adults. Archives of Physical and Medicine Rehabilitation, 66, 69-72.
Parker, V. M., Wade, D. T., & Hewer, R. (1986). Loss of arm function after stroke: measurement, frequency, and recovery. International Rehabilitation Medicine, 8(2), 69-73.
Sommerfeld, D.K., Eek, E.U.B., Svensson, A.K., Holmqvist, L.W., & Arbin, M.H. (2004). Spasticity after stroke: its occurrence and association with motor impairments and activity limitations. Stroke, 35, 134-140.
Sunderland, A., Trinson, D., Bradley, L., Hewer, R. (1989). Arm function after stroke: an evaluation of grip strength as a measure of recovery and a prognostic indicator. Journal of Neurology, Neurosurgery & Psychiatry, 52, 1267-1272.
Tiffin, J. (1968). Purdue Pegboard Examiner Manual. Chicago, USA: Science Research Associates.
Wade, D.T. (1992). Measurement in Neurological Rehabilitation. Oxford, England: Oxford University Press.

See the measure

How to obtain the NHPT?

The NHPT instructions can be obtained in the study by Mathiowetz et al. (1985) and Wade (1992).

Davis, Kayser, Matlin, Mower, and Tadano (1999) reported that the most commonly used standardized equipment for the NHPT in the United States are produced by both Smith and Nephew Rehabilitation, Inc., and Sammons Preston.

Standardized equipment can be obtained at the website: http://www.sammonspreston.com/Supply/Product.asp?Leaf_Id=A8515

Purdue Pegboard Test (PPT)

Evidence Reviewed as of before: 06-09-2012

Author(s)*: Katie Marvin, MSc.PT

Editor(s): Annabel McDermott, OT; Nicol Korner-Bitensky, PhD OT

Purpose

The Purdue Pegboard Test (PPT) is a test of fingertip dexterity and gross movement of the hand, fingers and arm in patients with impairments of the upper extremity resulting from neurological and musculoskeletal conditions.

In-Depth Review

Purpose of the measure

The Purdue Pegboard Test (PPT) was developed by Joseph Tiffin in 1948. The PPT is now used widely by clinicians and researchers as a measure of (1) gross movement of the arm, hand and fingers, and (2) fingertip dexterity. The PPT is suitable for use with patients with impairments of the upper extremity resulting from neurological and musculoskeletal conditions.

Available versions

None typically reported

Features of the measure

Description of tasks:

The PPT measures:

(1) Gross movement of the fingers, hand and arm; and
(2) Fingertip dexterity

The patient should be seated comfortably at a testing table with the PPT on the table in front of him/her. The testing board consists of a board with 4 cups across the top and two vertical rows of 25 small holes down the centre. The two outside cups contain 25 pins each; the cup to the immediate left contains 40 washers and the cup to the immediate right of the center contains 20 collars.

The clinician demonstrates and then administers the following 5 subtests:

Right hand (30 seconds): Clients use their right hand to place as many pins as possible down on the row within 30 seconds.
Left hand (30 seconds): Clients use their left hand to place as many pins as possible down on the row within 30 seconds.
Both hands (30 seconds): Clients use both hands simultaneously to place as many pins as possible down both rows.
Right + Left + Both hands: *Please note that this is not an actual test, it is a mathematical sum calculation of the above scores.
Assembly (60 seconds): Clients use both hands simultaneously while assembling pins, washers and collars.

Specific administration instructions can be found in the instruction manual that accompanies the PPT.

Scoring and Score Interpretation:

The clinician compiles 5 separate scores from the complete test procedure, one for each of the following tasks:

Right hand (30 seconds): The total number of pins placed in the right hand column using the right hand in the allotted time.
Left hand (30 seconds): The total number of pins placed in the left hand column using the left hand in the allotted time.
Both hands (30 seconds): The total number of pairs of pins placed in both columns using both hands in the allotted time.
Right + Left + Both hands: The sum of scores for the previous three tasks (right hand + left hand + both hands).
Assembly (60 seconds): The total number of pins, washers and collars assembled in the allotted time.

The testing should commence in the order outlined above, unless the patient is left-handed; tasks 1 and 2 should then be reversed. The preferred method of administration is the three-trial method: the patient should be permitted to attempt three trials for each task after a single demonstration by the clinician. (The one-trial administration method only permits the patient one trial following demonstration by the clinician). The test can be administered in an individual or group setting.

Desrosiers, Hebert, Bravo and Dutil (1995) developed predictive equations for Purdue Pegboard subtest scores, based on normative data resulting from their study. The normative data portion of the study involved 360 healthy participants over the age of 60 years. The following predictive equations were determined:

Purdue subtests	Females	Males
Right hand	24.0 – 0.15 x (age)	22.5 – 0.15 x (age)
Left hand	23.7 – 0.16 x (age)	24.1 – 0.18 x (age)
Both hands	19.9 – 0.14 x (age)	20.0 – 0.15 x (age)
Right + Left + Both hands	67.7 – 0.45 x (age)	66.5 – 0.48 x (age)
Assembly	59.4 – 0.45 x (age)	62.2 – 0.53 x (age)

Example: The expected score for an 80 year old woman on the right hand task is: 24.0 – (0.15 x 80) = 12.

Time:

The PPT takes approximately 5 to 10 minutes to administer and score.

Training requirements:

None typically reported, however it is recommended that the clinician is familiar with the assessment tool. The clinician should be able to demonstrate to clients performance of the PPT at an average speed.

Equipment:

Purdue Pegboard Test (Model #32020)

Instruction manual
One test board
Pins x 50, collars x 20, washers x 40
Score sheets
Testing table approximately 30 inches tall
Stopwatch or clock that reads in seconds

Alternative forms of the Purdue Pegboard Test

None typically reported.

Client suitability

Can be used with:

Clients presenting with lateral brain damage (Costa et al., 1963)
Clients with hemiplegiaComplete paralysis of the arm, leg, and trunk on one side of the body that results from damage to the parts of the brain that control muscle movements. Hemiplegia is not a progressive condition, nor is it a disease. resulting from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (Ashford, Slade, Malaprad & Turner-Stokes, 2008)
Clients requiring assessment for vocational rehabilitation (Hemm and Curtis, 1980)
Clients with dyslexia (Leslie, Davidson and Batey, 1985)
Clients of all ages

Should not be used in:

None reported

In what languages is the measure available?

No formal translations of the PPT have been reported. Because of the non-verbal nature of the assessment it can be used by non-English groups.

Summary

What does the tool measure?	Dexterity and gross movement of the upper limb
What types of clients can the tool be used for?	The PPT can be used with, but is not limited to clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment tool
Time to administer	The PPT takes approximately 5 to 10 minutes to administer.
Versions	There are no alternative versions of the PPT.
Other Languages	None typically reported.
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Test-retest: Several studies have investigated the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the PPT in healthy patients and found adequate to excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). for all subtests. A three-trial administration method has been found to be more reliable than a one-trial method.
Validity	Construct: Known groups: One study examined the known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured. of the PPT and found to have 70% accuracy for detecting lateralization of brain damage and 90% accuracy for detecting brain damage regardless of lateralization.
Floor/Ceiling Effects	No studies have examined the floor/ceiling effects of the PPT in clients with stroke.
Does the tool detect change in patients?	No studies have formally investigated the responsivenessThe ability of an instrument to detect clinically important change over time. of the PPT in clients with stroke.
Acceptability	The PPT has been criticized for not being reflective of real life activities of daily living (Ashford, Slade, Malaprade & Turner-Stokes, 2008). The test is quick to complete and should not produce undue fatigue for patients.
Feasibility	The PPT is short and easy to administer and score.
How to obtain the tool?	The PPT can be ordered by contacting the manufacturer directly at: Lafayette Instruments 3700 Sagamore Parkway North P.O. Box 5729 \| Lafayette, IN 47903 USA Tel: 765.423.1505 \| 800.428.7545 Fax: 765.423.4111 E-mail: info@lafayetteinstrument.com www.lafayetteinstrument.com

Psychometric Properties

Overview

A literature search was conducted to identify all relevant publications on the psychometric properties of the Purdue Pegboard Test (PPT). Several studies have been conducted, however only one study was specific to clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Reliability

Test-retest:
Buddenberg and Davis (1999) examined the 1-week test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the PPT using the one-trial and three-trial administration procedures, in 47 healthy participants. The three-trial administration method was found to have excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
for all subtests (ICC=0.82, 0.89, 0.85, 0.89 and 0.81 for the right hand, left hand, both hands, R+L+B and assembly subtests respectively). The one-trial administration method was found to have poor to adequate test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
using Interclass Correlation Coefficient (ICC=0.37, 0.61, 0.58, 0.70, 0.51 for the right hand, left hand, both hands, R+L+B and assembly subtests respectively).

Several studies have investigated the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the one-trial administration method of the PPT in healthy participants. The following chart has been adapted from Buddenberg and Davis (1999).

Reliability Coefficients Reported for One-Trial Administrations of the Purdue Pegboard Test

Subtests	Tiffin & Asher (1948)	Bass & Stucki (1951)	Tiffin (1968)	Reddon, Gill, Gauk & Maerz (1988) (men/women)	Desrosiers, Bravo & Dutil (1995)
Right hand	0.63	0.67	0.68	0.63/0.76	0.66
Left hand	0.60	0.66	0.65	0.64/0.79	0.66
Both hands	0.68	0.71	0.73	0.67/0.81	0.81
Right + Left + Both hands	0.71	0.79	0.71	NR	0.90
Assembly	0.68	0.72	0.67	0.81/0.83	0.84

NR=not reported

Desrosiers, Hebert, Bravo and Dutil (1995) investigated the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the PPT in 35 healthy individuals aged 60-89 years with no-known upper-limb impairment. Each individual completed the PPT on 2 occasions with approximately 1 week between testing. Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
, calculated using ICC was found to be adequate to excellent for the 5 subtests (ICC=0.66, 0.83, 0.81, 0.90 and 0.84 for Right hand, Left hand, Both hands, Right+Left+Both hands and Assembly subtests respectively). Scores from the second administration were higher, indicating a practice effect.

Inter-rater:
No studies have examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the PPT in clients with stroke.

Validity

Content:

No studies have examined the content validity of the PPT in clients with stroke.

Criterion:

Construct:

Convergent/Discriminant:
No studies have examined the convergent or discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess.
of the PPT in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Known Groups:
Costa, Vaughan, Levita & Farber (1963) examined the known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured.
of Purdue Pegboard subtests (Right, Left and Both hands) in 54 clients with brain damage resulting from neoplasms, traumatic injury or degenerative, vascular or infectious diseases; and 26 clients with peripheral nervous system lesions or lesions below the level of the thoracic spine (control group). Clinical neurological examination, electroencephalography and neuroradiographic procedures were used to confirm diagnosis. The PPT accurately identified clients below the age of 60 years as having brain damage if one or more of the following were found on scoring: left score < 11; right score < 13; both hands score < 10; or left score > right score +3; and a lesion on the left if left score > right score, and on the right if right score > left score + 3. The PPT accurately identified clients above the age of 60 years as having brain damage if one or more of the following were found on scoring: left score < 10; right score < 10; both hands score < 8; or left > right +3; and a lesion on the left if left > right, and on the right if right > left + 3. If the client’s scores accurately classified the client as having brain damage but neither left or right lesions were identified based on the scores, the brain damage is categoried as bilateral. The above PPT cutoff scores were found to have a 70 percent accuracy for lateralization and a 90 percent accuracy for brain damage without regard to lateralization.

Sensitivity/specificity:

No studies have examined the sensitivity/specificity of the PPT in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Responsiveness

No studies have examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the PPT in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

References

Ashford, S., Slade, M., Malaprade, F., Turner-Stokes, L. (2008). Evaluation of functional outcome measures for the hemiparetic upper limb: A systematic review. Journal of Rehabilitation Medicine, 40, 787-795
Buddenberg, L.A. & Davis, C. (1999). Test-retest reliability of the Purdue Pegboard Test. The American Journal of Occupational Therapy, 54(5), 555-558
Costa, L.D., Vaughan, H.G., Levita, E. & Farber, N. (1963). Purdue Pegboard as a predictor of the presence and laterality of cerebral lesions. Journal of Consulting Psychology, 27(2), 133-137
Desrosiers, J., Hebert, R, Bravo, G. and Dutil, E. (1995). The Purdue Pegboard Test: Normative data for people aged 60 and over. Disability and Rehabilitation, 17(5), 217-224

See the measure

How to obtain the Purdue Pegboard Test?

The PPT can be ordered by contacting the manufacturer directly at:

Lafayette Instruments
3700 Sagamore Parkway North
P.O. Box 5729 | Lafayette, IN 47903 USA
Tel: 765.423.1505 | 800.428.7545
Fax: 765.423.4111
E-mail: info@lafayetteinstrument.com
Web: www.lafayetteinstrument.com

Stroke Arm Ladder

Evidence Reviewed as of before: 15-02-2012

Author(s)*: Katie Marvin, MSc. PT (Candidate)

Editor(s): Annabel McDermott, OT; Nicol Korner-Bitensky, PhD OT

Expert Reviewer: Johanne Higgins, PhD

Purpose

The Stroke Arm Ladder was developed from an existing bank of test items used to evaluate upper extremity function in patients with stroke. The Stroke Arm Ladder incorporates observable tests of capacity or performance and questions aimed at identifying activity and participation components of the World Health Organization’s International Classification of Functioning, Disability and Health (ICF). The measure includes items that cover a wide range of difficulty levels.

In-Depth Review

Purpose of the measure

The StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Arm Ladder was developed from an existing bank of test items used to evaluate upper extremity function in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The measure incorporates observable tests of capacity or performance and questions aimed at identifying activity and participation components of the World Health Organization’s International Classification of Functioning, Disability and Health (ICF). The measure includes items that cover a wide range of difficulty levels.

Clinicians and researchers need to use a variety of evaluation measures to assess interventions and constructs related to upper extremity function in patients following stroke. Administration of a variety of tests can be lengthy, time-consuming and burdensome on clients. The StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Arm Ladder was developed to address this issue by providing a more comprehensive, all-encompassing interval scale measure for evaluation and monitoring of upper extremity.

Available versions

None yet reported

Features of the measure

Items:

The Stroke Arm Ladder is comprised of 34 items selected from an existing bank of 49 test items used to evaluate upper extremity function in patients with stroke. The existing bank of items reflect the domains of the World Health Organization’s International Classification of Functioning, Disability and Health (ICF) (body functions; and activity and participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations.), and was derived from commonly used outcome measures, such as the Chedoke McMaster Stroke Assessment, Barthel Index and the Stroke Rehabilitation Assessment of Movement.

Description of tasks:

Staring item: pistol grip, pull trigger then return.

If patient is unable to perform starting item – then proceed to EASY subtest, start with number 7.
If patient is able to perform starting item – then proceed to DIFFICULT subtest, start with number 36.

EASY subtest items:

Item	Score/100
1. Tie a scarf around one’s neck (bilateral task). The task is partially executed (more than 25%) or certain steps are executed with major difficulties necessitating repeated efforts. Part of the task may have had to be modified or needed assistance to make it achievable.	3
2. Open a jar and remove a spoonful of coffee (bilateral task). The task is partially executed (more than 25%) or certain steps are executed with major difficulties necessitating repeated efforts. Part of the task may have had to be modified or needed assistance to make it achievable.	4
3. Unlock a lock and open a pill container (bilateral task). The task is partially executed (more than 25%) or certain steps are executed with major difficulties necessitating repeated efforts. Part of the task may have had to be modified or needed assistance to make it achievable.	5
4. Feeding independently. The patient needs some assistance to feed him- or herself a meal from a tray or table when someone places the food within reach. The patient needs assistance to put on an assistive device if required, cut up food, use salt and pepper, spread butter, etc. The patient needs assistance to be able to accomplish this in a reasonable time.	23
5. Write on an envelope and stick a stamp on it (bilateral task). The task is partially executed (more than 25%) or certain steps are executed with major difficulties necessitating repeated efforts. Part of the task may have had to be modified or needed assistance to make it achievable.	29
6. Dressing and undressing. The patient needs some assistance: to put on, remove and fasten all clothing and tie shoelaces (unless it is necessary to use adaptive aids for this). This includes putting on, removing and fastening corsets or braces when they are prescribed.	33
7. Shuffle and deal playing cards (bilateral task). The task is partially executed (more than 25%) or certain steps are executed with major difficulties necessitating repeated efforts. Part of the task may have had to be modified or needed assistance to make it achievable. Able to perform item number 7: Move down until patient is unable to meet the criteria for the specific task. Unable to perform item number 7: Move up until patient is able to meet the specific criteria for the specific task.	34
8. Elbow at side 90 flexion: supination then pronation.	44
9. Finger flexion then extension.	45
10. Extends elbow in supine (starting with elbow fully flexed). Able to complete the movement in a manner that is comparable to the unaffected side.	46
11. Protract scapula in supine. Able to complete the movement in a manner that is comparable to the unaffected side.	48
12. Can the patient prepare their own meals? Cook meals independently?.	49
13. Feeding independently: The patient can feed him- or herself a meal from a tray or table when someone places the food within reach. The patient is able to put on an assistive device if required, cut up food, use salt and pepper, spread butter, etc. The patient must be able to accomplish this in a reasonable time.	51
14. Hand unsupported: opposition of thumb to little finger.	51
15. Handle coins (unilateral task). The task is partially executed (more than 25%) or certain steps are executed with major difficulties necessitating repeated efforts. Part of the task may have had to be modified or needed assistance to make it achievable.	52
16. Place hand on sacrum. Able to complete the movement in a manner that is comparable to the unaffected side.	53
17. Shrug shoulders (scapular elevation). Able to complete the movement in a manner that is comparable to the unaffected side.	55
18. Can patient perform housework? Without help?	56
19. Dressing and undressing independently. Patient is able to put on, remove and fasten all clothing and tie shoelaces (unless it is necessary to use adaptive aids for this). This includes putting on, removing and fastening corsets or braces when they are prescribed.	56
20. Pick up and move small objects (unilateral task). The task is partially executed (more than 25%) or certain steps are executed with major difficulties necessitating repeated efforts. Part of the task may have had to be modified or needed assistance to make it achievable.	57
21. Write on an envelope and stick a stamp on it (bilateral task). The task is successfully completed without hesitation or difficulty, as instructed or demonstrated.	57

Difficult subtest items:

Item	Score/100
22. In the past two weeks, were you able to cut your food with a knife and fork?	58
23. In the past two weeks, were you able to use your hand that was more affected by your strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. to turn a doorknob?	59
24. Pick up and move a jar (unilateral task). The task is successfully completed without hesitation or difficulty, as instructed or demonstrated.	59
25. Unlock a lock and open a pill container (bilateral task). The task is successfully completed without hesitation or difficulty, as instructed or demonstrated.	60
26. In the past two weeks, were you able to do light household tasks/chores (e.g. dust, make a bed, take out garbage, do the dishes)? Just a little or not difficult at all.	61
27. Bathing independently. The patient must be able to use a bathtub, a shower or take a complete sponge bath. The patient must be able to perform all the steps involved in any one of these tasks without another person being present.	63
28. Tie a scarf around one’s neck (bilateral task). The task is successfully completed without hesitation or difficulty, as instructed or demonstrated.	63
29. Hand from knee to forehead 5x in 5 seconds.	64
30. Arm resting at side of body: raise arm overhead with full supination.	64
31. Pronation: tap index finger 10x in 5 seconds	65
32. In the past two weeks, were you able to use your hand that was most affected by your stroke to carry heavy objects (e.g. bag of groceries)? (Men)	66
33. Open a jar and remove a spoonful of coffee (bilateral task). The task is successfully completed without hesitation or difficulty, as instructed or demonstrated.	71
34. In the past two weeks, were you able to clip your toenails?	73
35. In the past two weeks, were you able to use your hand that was most affected by your strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. to carry heavy objects (e.g. bag of groceries)? (women) Just a little or not difficult at all.	73
36. Elbow at side, 90 degrees flexion: resisted shoulder external rotation. Able to perform item number 36: Move down until patient is unable to meet the criteria for the specific task. Unable to perform test item number 36: Move up until patient is able to meet the criteria for the specific task.	76
37. Thumb to finger tips, then reverse 3x in 12 seconds.	78
38. Number of blocks transferred in 60 seconds > 30	82
39. Clap hands overhead then behind back 3x in 5 seconds.	82
40. Bounce ball 4 times in succession then catch.	93
41. Number of blocks transferred in 60 seconds >60	100

Scoring and Score Interpretation:

The Stroke Arm Ladder is scored out of 100 and is based on completion of test items. For example, if the patient is able to perform the starting test item (pistol grip, pull trigger), they automatically start at item number 36 in the ‘DIFFICULT items subtest’; items are tested in a sequential order (36, 37, 38, 39âÂ€¦etc); if the patient successfully completes the next three items but is unable to complete item 40 then they receive a score of 82 out of 100 (as indicated in the right hand column beside item 39).

Information on score interpretation is not yet available.

Time:

Not reported.

Training requirements:

None reported.

Equipment:

Scarf
Jar with lid
Coffee
Pill container
Manual lock
Feeding utensils
Plate, bowl, glass, mug
Salt and pepper shakers
Envelope
Stamp
Clothing (shirt and pants with buttons)
Deck of cards
Coins
Pen or pencil
Access to a kitchen and bathroom if observation of tasks is required

Alternative forms of the assessment

None yet reported

Client suitability

Can be used with:

Clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (mild, moderate and severe) in the acute and sub-acute phase.

Should not be used with:

Patients greater than 7 months post-stroke until further validation testing is completed.

In what languages is the measure available?

English

Summary

What does the tool measure?	Upper extremity function following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
What types of clients can the tool be used for?	Can be used with clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screening or assessment tool?	Assessment tool
Time to administer	Not yet reported.
Versions	There are no alternative versions.
Other Languages	There are no official translations.
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: One study examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the Stroke Arm Ladder and found internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. to be excellent.
Validity	Content: One study examined the content validity of the Stroke Arm Ladder and confirmed the hierarchial sequencing of the items using Rasch analysis. Construct: Convergent/Discriminant: One study examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the Stroke Arm Ladder and reported excellent correlations between the Stroke Arm Ladder and the Stroke Rehabilitation Assessment of Movement; and poor correlation between the Stroke Arm Ladder and the mental and emotional health subsets of the Medical Outcomes Study Short Form 36. Known Groups: One study examined known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured. and found that the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Arm Ladder could differentiate between the two extremes of stroke severity: mild and severe.
Floor/Ceiling Effects	One study examined the floor and ceiling effects and found no floor or ceiling effects in a sample population of patients with stroke ranging from mild to severe. Note: The Stroke Arm Ladder has only been tested on patients up to 7 months post-stroke.
Does the tool detect change in patients?	Not yet assessed.
Acceptability	Results support preliminary validation of the psychometric properties, however further research is needed before the tool is ready for use clinically.
Feasibility	The administration of the Stroke Arm Ladder is easy and simple to administer. The Stroke Arm Ladder provides a more comprehensive all-encompassing evaluation tool for evaluation and monitoring of upper extremity function.
How to obtain the tool?	Information on the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Arm Ladder can be obtained from the Higgins, Finch, Kopec & Mayo (2011) study.

Psychometric Properties

Overview

A literature search was conducted to identify all relevant publications on the psychometric properties of the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Arm Ladder and revealed only the initial validation study. Results support preliminary validation of the psychometric properties, however further research is needed before the tool is ready for use clinically.

Floor/Ceiling Effects

Higgins, Finch, Kopec and Mayo (2011) examined the floor and ceiling effects of the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Arm Ladder in patients with stroke and found no floor or ceiling effects, as no patients scored below or above the easiest and hardest items (respectively).

Note: This sample only included patients up to 7 months post-stroke and thus, the Stroke Arm Ladder should not be used for patients past 7 months post-stroke until further validation testing is completed.

Reliability

Test-retest:
Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
has not been examined.

Intra-rater:
Intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
has not been examined.

Inter-rater:
Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
has not been examined.

Validity

Content:

Higgins, Finch, Kopec and Mayo (2011) investigated the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension.
of the Stroke Arm Ladder in clients with stroke. In the development of the Stroke Arm Ladder, 49 items from validated tests and indices used to assess upper extremity function and movement, such as the Box and Block Tests, were selected. Fifteen items were deleted for reasons such as redundancy and lack of fit to the model. When validating the 34 items selected for the final version of the measure, all patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. had fit residuals between -2.0 and +2.0. The hierarchical sequencing of the items was confirmed using Rasch analysis. The results from this study suggest that all 34 items in the Stroke and Arm Ladder reflect the same construct.

Criterion:

Concurrent:
Concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
has not been examined.

Predictive:
Predictive validity has not been examined.

Construct:

Convergent/Discriminant:
Higgins, Finch, Kopec and Mayo (2011) investigated the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the Stroke Arm Ladder by comparing it to the index of global functional recovery (total score on the Stroke Rehabilitation Assessment of Movement). Excellent correlation was found between the two measures (r=0.6, P<0.0001). The authors also reviewed the correlation between the Stroke Arm Ladder and the mental and emotional subsets of the Medical Outcomes Study Short Form 36 (SF-36), and found poor correlation (r=0.2, P<0.0001). Results from this study indicate that the Stroke Arm Ladder adequately measures the construct of upper extremity function, with limited ability to assess mental and emotional status following stroke, as intended by the developers.

Known Groups:
Higgins, Finch, Koppec and Mayo (2011) examined known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured.
of the Stroke Arm Ladder in patients with stroke. Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. were classified as having mild, mild-moderate, moderate or severe stroke using the Canadian Neurological Scale (CNS). Results revealed that the Stroke Arm Ladder was able to differentiate two out of four different levels of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity: mild and severe Patients classified as having either moderate or severe stroke scored similarly on the measure, as did patients classified as having mild and mild-moderate strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Patients classified as having moderate or severe strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. differed significantly from those classified as having mild or mild-moderate stroke, indicating the ability of the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Arm Ladder to differentiate between the two extremes (mild versus severe).

Sensitivity/ Specificity

Sensititive or specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
has not been examined.

Responsiveness

ResponsivenessThe ability of an instrument to detect clinically important change over time.
has not been examined.

References

Higgins, J., Finch, L.E., Kopec, J. & Mayo, N.E. (2011). Development and initial psychometric evaluation of the Stroke Arm Ladder: A measure of upper extremity function post stroke. Clinical Rehabilitation, 25(8), 740-759.

See the measure

How to obtain the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Arm Ladder?

Higgins, J., Finch, L.E., Kopec, J. & Mayo, N.E. (2011). Development and initial psychometric evaluation of the Stroke Arm Ladder: A measure of upper extremity function post stroke. Clinical Rehabilitation, 25(8), 740-759.

Stroke Impact Scale (SIS)

Evidence Reviewed as of before: 29-06-2018

Author(s)*: Lisa Zeltzer, MSc OT; Katherine Salter, BA; Annabel McDermott

Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Purpose

The Stroke Impact Scale (SIS) is a stroke-specific, self-report, health status measure. It was designed to assess multidimensional stroke outcomes, including strength, hand function Activities of Daily Living / Instrumental Activities of Daily Living (ADL/IADL), mobility, communication, emotion, memory and thinking, and participation. The SIS can be used both in clinical and in research settings.

In-Depth Review

Purpose of the measure

The Stroke Impact Scale (SIS) is a stroke-specific, self-report, health status measure. It was designed to assess multidimensional stroke outcomes, including strength, hand function, ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living / Instrumental ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living (ADL/IADL), mobility, communication, emotion, memory and thinking, and participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations.. The SIS can be used both in clinical and research settings.

Available versions

The StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale was developed at the Landon Center on Aging, University of Kansas Medical Center. The scale was first published as version 2.0 by Duncan, Wallace, Lai, Johnson, Embretson, and Laster in 1999. Version 2.0 of the SIS is comprised of 64 items in 8 domains (Strength, Hand function, Activities of Daily Living (ADL)Basic tasks that involve bodily issues (bathing, dressing, toileting, transferring, continence, eating and walking) that are done on a daily basis. / Instrumental ADL, Mobility, Communication, Emotion, Memory and thinking, Participation). Based on the results of a Rasch analysis process, 5 items were removed from version 2.0 to create the current version 3.0 (Duncan, Bode, Lai, & Perera, 2003b).

Features of the measure

Items:

The SIS version 3.0 includes 59 items and assesses 8 domains:

Strength – 4 items
Hand function – 5 items
ADL/IADL – 10 items
Mobility – 9 items
Communication – 7 items
Emotion – 9 items
Memory and thinking – 7 items
Participation/Role function – 8 items

An extra question on stroke recovery asks that the client rate on a scale from 0 – 100 how much the client feels that he/she has recovered from his/her stroke.

To see the items of the SIS, please click here.

Instructions on item administration:

Prior to administering the SIS, the purpose statement must be read as written below. It is important to tell the respondent that the information is based on his/her point of view.

Purpose statement:
“The purpose of this questionnaire is to evaluate how stroke has impacted your health and life. We want to know from your point of view how stroke has affected you. We will ask you questions about impairments and disabilities caused by your stroke, as well as how stroke has affected your quality of life. Finally, we will ask you to rate how much you think you have recovered from your stroke”.

Response sheets in large print should be provided with the instrument, so that the respondent may see, as well as hear, the choice of responses for each question. The respondent may either answer with the number or the text associated with the number (eg. “5” or “Not difficult at all”) for an individual question. If the respondent uses the number, it is important for the interviewer to verify the answer by stating the corresponding text response. The interviewer should display the sheet appropriate for that particular set of questions, and after each question must read all five choices.

Questions are listed in sections, or domains, with a general description of the type of questions that will follow (eg. “These questions are about the physical problems which may have occurred as a result of your stroke”). Each group of questions is then given a statement with a reference to a specific time period (eg. “In the past week how would you rate the strength of your…”). The statement must be repeated before each individual question. Within the measure the time period changes from one week, to two weeks, to four weeks. It is therefore important to emphasize the change in the time period being assessed for the specific group of questions.

Scoring:

The SIS is a patient-based, self-report questionnaire. Each item is rated using a 5-point Likert scaleLikert scaling is one type of response to items in a questionnaire or tool. For example, Likert scaling would have you rate an item such as "I am satisfied with the care I received" on a scale using a 1-to-5 response scale where:
• 1 = strongly disagree
• 2 = disagree
• 3 = undecided
• 4 = agree
• 5 = strongly agree
You will find various options and scaling methods for the number of response choices (1-to-7, 1-to-9, 0-to-4). Odd-numbered scales usually have a middle value that is labelled Neutral or Undecided. Some tools used forced-choice Likert scaling with an even number of responses and no middle neutral or undecided choice.. The patient rates his/her difficulty completing each item, where:

1 = an inability to complete the item
5 = no difficulty experienced at all.

Note: Scores for three items in the Emotion domain (3f, 3h, 3i) must be reversed before calculating the Emotion domain score (i.e. 1 » 5, 2 » 4, 3 = 3, 4 » 2, 5 » 1). The SIS scoring database (see link below) takes this change of direction into account when scoring. When scoring manually, use the following equation to compute the item score for 3f, 3h and 3i: Item score = 6 – individual’s rating.

A final single-item Recovery domain assesses the individual’s perception of his/her recovery from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., measured in the form of a visual analogue scale from 0-100, where:

0 = no recovery
100 = full recovery.

Domain scores range from 0-100 and are calculated using the following equation:

Domain score = [(Mean item score – 1) / 5-1 ] x 100

Scores are interpreted by generating a summative score for each domain using an algorithm equivalent to that used in the SF-36 (Ware & Sherbourne, 1992).

See http://www.kumc.edu/school-of-medicine/preventive-medicine-and-public-health/research-and-community-engagement/stroke-impact-scale/instructions.html to download the scoring database.

Time:

The SIS is reported to take approximately 15-20 minutes to administer (Finch, Brooks, Stratford, & Mayo, 2002).

Subscales:

The SIS 3.0 is comprised of 8 subscales or ‘Domains’:

Strength
Hand function
ADL/IADL
Mobility
Communication
Emotion
Memory and thinking
ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations.

A final single-item domain measures perceived recovery since strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. onset.

Equipment:

Only the scale and a pencil are needed.

Training:

The SIS 3.0 requires no formal training for administration (Mulder & Nijland, 2016). Instructions for administration of the SIS 3.0 are available online through the University of Kansas Medical Center SIS information page.

Alternative forms of the SIS

SIS-16 (Duncan et al., 2003a).

Duncan et al. (2003) developed the SIS-16 to address the lack of sensitivity to differences in physical functioning in functional measures of stroke outcome. Factor analysis of the SIS 2.0 revealed that the four physical domains (Strength, Hand function, ADL/IADL, Mobility) are highly correlated and can be summed together to create a single physical dimension score (Duncan et al., 1999; Mulder & Nijland, 2016). Accordingly, the SIS-16 consists of 16 items from the SIS 2.0:

ADL/IADL – 7 items
Mobility – 8 items
Hand Function – 1 item.

All other domains should remain separate (Duncan et al., 1999).

SF-SIS (Jenkinson et al., 2013).

Jenkinson et al. (2013) developed a modified short form of the SIS (SF-SIS) comprised of eight items. The developers selected the one item from each domain that correlated most highly with the total domain score, through three methods: initial pilot research, validation analysis and a focus group. The final choice of questions for the SF-SIS comprised those items that were chosen by methods on 2 or more occasions. The SF-SIS was evaluated for face validityA form of content validity, face validity is assessed by having 'experts' (this could be clinicians, clients, or researchers) review the contents of the test to see if the items seem appropriate. Because this method has inherent subjectivity, it is typically only used during the initial phases of test construction.
and acceptability within a focus group of patients from acute and rehabilitation strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. settings and with multidisciplinary stroke healthcare staff. The SF-SIS has also been evaluated for content, convergent and discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess.
(MacIsaac et al., 2016).

Client suitability

Can be used with:

The SIS can only be administered to patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
The SIS 3.0 and SIS-16 can be completed by telephone, mail administration, by proxy, and by proxy mail administration (Duncan et al., 2002a; Duncan et al., 2002b; Kwon et al., 2006). Studies have shown potential proxy bias for physical domains (Mulder & Nijland, 2016). It is recommended that possible responder bias and the inherent difficulties of proxy use be weighed against the economic advantages of a mailed survey when considering these methods of administration.

Should not be used with:

The SIS version 2.0 should be used with caution in individuals with mild impairment as items in the Communication, Memory and Emotion domains are considered easy and only capture limitations in the most impaired individuals (Duncan et al., 2003).
Respondents must be able to follow a 3-step command (Sullivan, 2014).
Time taken to administer the SIS is a limitation for individuals with difficulties with concentration, attention or fatigue following stroke (MacIsaac et al., 2016).

In what languages is the measure available?

The SIS was originally developed in English.

Cultural adaptations, translations and psychometric testing have also been conducted in the following languages:

Brazilian (Carod-Artal et al., 2008)
French (Cael et al., 2015)
German (Geyh, Cieza & Stucki, 2009)
Italian (Vellone et al., 2010; Vellone et al., 2015)
Japanese (Ochi et al., 2017)
Korean (Choi et al., 2017; Lee & Song, 2015)
Nigerian (Hausa) (Hamza et al., 2012; Hamza et al., 2014)
Portuguese (Goncalves et al., 2012; Brandao et al., 2018)
Ugandan (Kamwesiga et al., 2016)
United Kingdom (Jenkinson et al., 2013)

The MAPI Research Institute has translated the SIS and/or SIS-16 into numerous languages including Afrikaans, Arabic, Bulgarian, Cantonese, Czech, Danish, Dutch, Farsi, Finnish, French, German, Greek, Hebrew, Hungarian, Icelandic, Italian, Japanese, Korean, Malay, Mandarin, Norwegian, Portuguese, Russian, Slovak, Spanish, Swedish, Tagalog, Thai and Turkish. Translations may not be validated.

Summary

What does the tool measure?	Multidimentional strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. outcomes, including strength, hand function, Activities of daily living/Instrumental activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. of daily living, mobility, communication, emotion, memory, thinking and participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations..
What types of clients can the tool be used for?	Patients with stroke.
Is this a screening or assessment tool?	Assessment
Time to administer	The SIS takes 15-20 minutes to administer.
Versions	SIS 2.0, SIS 3.0, SIS-16, SF-SIS.
Other Languages	The SIS has been translated into several languages. Please click here to see a list of translations.
Measurement Properties
Reliability	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: SIS 2.0: Two studies reported excellent internal consistency; one study reported excellent internal consistency for 5/8 domains and adequate internal consistency for 3/8 domains. SIS 3.0: Two studies reported excellent internal consistency; one study reported excellent internal consistency for 6/8 domains and adequate internal consistency for 2/8 domains. SIS-16: One study reported good spread of item difficulty. SF-SIS: One study reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.. Test-retest: SIS 2.0: One study reported adequate to excellent test-rest reliability in all domains except for the Emotion domain.
Validity	Criterion : Concurrent: SIS 2.0: Excellent correlations with the Barthel Index, FMA, nstrumental ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. of Daily Living (IADL) Scale, Duke Mobility Scale and Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. Scale; adequate to excellent correlations with the FIM; adequate correlations with the NIHSS and MMSE; and poor to excellent correlations with the SF-36. SIS 3.0: Excellent correlation between SIS Hand Function and MAL-QOM; excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between SIS ADL/IADL and FIM, Barthel Index, Lawton IADL Scale; excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between SIS Strength and Motricity Index; excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between SIS Mobility and Barthel Index; adequate to excellent correlation between SIS ADL/IADL and NEADL; adequate correlation between SIS Social ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. and SF-36 Social Functioning, Lawton IADL scale; adequate correlation between SIS Memory domain and MMSE; poor to adequate correlations between remaining SIS domains and FIM, NEADL, FMA, MAL-AOU, MAL-QOM, FAI. SIS-16: Excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Barthel Index; adequate to excellent correlations with the STREAM total and subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). scores; adequate correlation with SF-36 Physical Functioning. Predictive: SIS 2.0: Physical function, Emotion and Participation domains were statistically significant predictors of the patient’s own assessment of recovery; SIS scores were poor predictors of mean steps per day. SIS 3.0: Pre-treatment SIS scores were compared with outcome measures after 3 weeks of upper extremity rehabilitation: Hand function and ADL/IADL domains showed adequate to excellent correlations with FIM, FMA, MAL-AOU, MAL-QOM, FAI, and NEADL; other domains demonstrated poor to adequate correlations with outcome measures. SIS-16: – Admission scores show an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with actual length of stay and an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with predicted length of stay; there was a significant correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with discharge destination (home/rehabilitation). – The combination of early outcomes of MAL-QOM and SIS show high accuracy in predicting final QOL among patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Construct: Convergent/Discriminant: SIS 2.0: Domains demonstrate adequate to excellent correlations with corresponding WHOQOL-BREF subscales and Zung’s Self-Rating DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. Scale; poor correlations between the SIS Communication domain and both WHOQOL-BREF and Zung’s Self-Rating DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. Scale; and a poor correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between the SIS Physical domain and the WHOQOL Environment scores. SIS 3.0: Excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the SF-SIS, EQ-5D, mRS, BI, NIHSS, EQ-5D; moderate to excellent correlations with the EQ-VAS; and a moderate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the SIS-VAS. SIS 3.0 telephone survey: Adequate to excellent correlations with the FIM and SF-36V. SIS-16: Adequate to excellent correlations with the WHOQOL-BREF Physical domain; poor correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the WHOQOL Social relationships domain. SF-SIS: Excellent correlations with the EQ-5D, mRS, BI, NIHSS, EQ-5D; moderate to excellent correlations with the EQ-VAS; and moderate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the SIS-VAS. Known groups: SIS 2.0: Most domains can differentiate between patients with varying degrees of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity. SIS 3.0: Physical and ADL/IADL domains showed score discrimination and distribution for different degrees of stroke severity. SIS-16: Can discriminate between patients of varying degrees of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity.
Floor/Ceiling Effects	Three studies have examined floor/ceiling effects of the SIS. SIS 2.0: Two studies reported the potential for floor effects in the domain of Hand function among patients with moderate strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity, and a potential for ceiling effects in the Communication, Memory and Emotion domains. SIS 3.0: One study reported minimal floor and ceiling effects for the Social participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domain; one study reported ceiling effects for the Hand function, Memory and thinking, Communication, Mobility and ADL/IADL domains over time. SIS-16: One study reported no floor effects and minimal ceiling effects.
Does the tool detect change in patients?	Five studies have investigated responsivenessThe ability of an instrument to detect clinically important change over time. of the SIS. SIS 2.0: One study reported significant change in patients’ recovery in the expected direction between assessments at 1 and 3 months, and at 1 and 6 months post-stroke, however sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." to change was affected by stroke severity and time of post-stroke assessment. SIS 3.0: – One study determined change scores for a clinically important difference (CID)Clinically Important Difference (CID) is the smallest change in a measure's score that is perceived significant by a patient or healthcare professional. within four subscales of the Strength, ADL/IADL, Mobility, Hand function. The MDC was 24.0, 17.3, 15.1 and 25.9 (respectively); minimal CID was 9.2, 5.9, 4.5 and 17.8 (respectively). – One study reported medium responsiveness for Hand function, StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. recovery and SIS total score; other domains showed small responsivenessThe ability of an instrument to detect clinically important change over time. . – One study found Participation and Recovery from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. were the most responsive domains over the first year post-stroke; Strength and Hand function domains also showed high clinically meaningful positive/negative change. SIS-16: One study reported change scores of 23.1 indicated statistically significant improvement from admission to discharge, and sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." to change was large.
Acceptability	– SIS 3.0 and SIS-16 are available in proxy version. The patient-centred nature of the scale’s development may enhance its relevance to patients and assessment across multiple levels may reduce patient burden. – Time taken to administer the SIS has been identified as a limitation. – The SIS 2.0 should be used with caution in individuals with mild impairment as some domains only capture limitations in the most impaired individuals.
Feasibility	– The SIS is a patient-based self-report scale that takes 15-20 minutes to administer. – The SIS can be administered in person or by proxy, by mail or telephone. – The SIS does not require any formal training. – Instructions for administration of the SIS 3.0 are available online.
How to obtain the tool?	Please click here to see a copy of the SIS.

Psychometric Properties

Overview

We conducted a literature search to identify relevant publications on the psychometric properties of the SIS. Seventeen studies were included. Studies included in this review are specific to the original English versions of the SIS version 2.0, SIS 3.0 or SIS-16.

Floor/Ceiling Effects

Duncan et al. (1999) found that SIS version 2.0 showed the potential for floor effects in the Hand function domain in the moderate strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. group (40.2%) and a possible ceiling effect in the Communication domain for both the mild (35.4%) and moderate (25.7%) stroke groups. The highest percentage of ceiling effects for the SIS was for the Communication domain (35%) compared with a 64.6% ceiling rate for the Barthel Index (Mahoney & Barthel, 1965).

Duncan et al. (2003b) conducted a Rasch analysis which confirmed these two effects observed in Duncan et al. (1999) – a floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect."
in the SIS Hand function domain and a ceiling effect in the Communication domain. A ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." in the Memory and Emotion domains was also reported.

Lai et al. (2003) examined floor/ceiling effects of the SIS-16 and SIS Social Participation domain in a sample of 278 patients at 3 months post-stroke. The authors reported floor/ceiling effects of 0% and 4% (respectively) for the SIS-16, and 1% and 5% (respectively) for the SIS Social Participation domain.

Richardson et al. (2016) examined floor/ceiling effects of the SIS 3.0 in a sample of 164 patients with subacute stroke. Measures were taken at three timepoints: on admission to the study and at 6-month and 12-month follow-up (n=164, 108, 37 respectively). Poor ceiling effects (>20%) were seen for the Hand function domain at baseline, 6 months and 12 months (25.0%, 36.4%, 37.8%, respectively); the Memory and thinking domain at 6 months and 12 months (22.2%, 21.6%, respectively); the Communication domain at 6 months and 12 months (30.6%, 27%, respectively); the Mobility domain at 6 months (20.4%); and the ADL/IADL domain at 12 months (21.6%). There were no significant floor effects at any timepoint.

Reliability

Internal consistency:
Duncan et al (1999) examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SIS version 2.0 using Cronbach’s alpha coefficients and reported excellent internal consistency for each of the 8 domains (ranging from a=0.83 to 0.90).

Duncan et al. (2003b) examined reliability of the SIS version 2.0 by Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
. Item separation reliability is the ratio of the “true” (observed minus error) variance to the obtained variation. The smaller the error, the higher the ratio will be. It ranges from 0.00 to 1.00 and is interpreted the same as the Cronbach’s alpha. Item separation reliability of the SIS version 2.0 ranged from 0.93-1.00. A separation index > 2.00 is equivalent to a Cronbach’s alpha of 0.80 or greater (excellent). In this study, 5 out of 8 domains had a separation index that exceeded 2.00 (in addition to the composite physical domain). The values for the Emotion and Communication domains were only in the adequate range because of the ceiling effect in those domains and those for the Hand function domain were only adequate because of the floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect."
in that domain.

Edwards and O’Connell (2003) administered the SIS version 2.0 to 74 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and reported excellent internal consistency (ranging from a=0.87 for participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. to a=0.95 for hand function). The percentage of item-domain correlations >0.40 was 100% for all domains except emotion and ADL/IADL. In the ADL/IADL scale, one item (cutting food) was more closely associated with hand function than ADL/IADL.

Lai et al. (2003) examined reliability of the SIS-16 and SIS Social ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domain in a sample of 278 patients at 3 months post-stroke. Both the SIS-16 and SIS Social Participation domain showed good spread of item difficulty, with easier items that are able to measure lower levels of physical functioning in patients with severe strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Jenkinson et al. (2013) examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SIS 3.0 and the SF-SIS among individuals with stroke (n=73, 151 respectively), using Cronbach’s alpha. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SIS 3.0 was excellent for all domains (a=0.86 to 0.96). Higher order factor analysis of the SIS 3.0 showed one factor with an eigenvalue > 1 that accounted for 68.76% of the variance. Each dimension of the SIS 3.0 loaded on this factor (eigen value = 5.5). Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SF-SIS was high (a=0.89). Factor analysis of the SF-SIS similarly showed one factor that accounted for 57.25% of the variance.

Richardson et al. (2016) examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SIS 3.0 in a sample of 164 patients with subacute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Cronbach’s alpha. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. was measured at three timepoints: on admission to the study and at 6-month and 12-month follow-up. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of all domains was excellent at all timepoints (a=0.81 to 0.97). The composite Physical Functioning score was excellent at all timepoints (a=0.95 to 0.97).

MacIsaac et al. (2016) examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SIS 3.0 in a sample of 5549 individuals in an acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. setting and 332 individuals in a stroke rehabilitation setting, using Cronbach’s alpha. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. was excellent within both acute and rehabilitation data sets (a=0.98, 0.93 respectively). Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of individual domains was excellent for both acute and rehabilitation data sets, except for the Emotion domain (a=0.60, 0.63 respectively) and the Strength domain (a=0.77, rehabilitation data set only).

Test-retest:
Duncan et al. (1999) examined test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the SIS version 2.0 in 25 patients who were administered the SIS at 3 or 6 months post stroke and again one week later. Test-retest was calculated using intraclass correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients (ICC), which ranged from adequate to excellent (ICC=0.7 to 0.92) with the exception of the Emotion domain, which had only a poor correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
(ICC=0.57).

Validity

Content:

Development of the SIS was based on a study at the Landon Center on Aging, University of Kansas Medical Center (Duncan, Wallace, Studenski, Lai, & Johnson, 2001) using feedback from individual interviews with patients and focus group interviews with patients, caregivers, and health care professionals. Participants included 30 individuals with mild and moderate strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., 23 caregivers, and 9 strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. experts. Qualitative analysis of the individual and focus group interviews generated a list of potential items. Consensus panels reviewed the potential items, established domains for the measure, developed item scales, and decided on mechanisms for administration and scoring.

Criterion:

Concurrent:
Duncan et al. (1999) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the SIS by comparison with the Barthel Index, Functional Independence Measure (FIM), Fugl-Meyer Assessment (FMA), Mini-Mental State Examination (MMSE), National Institute of Health Stroke Scale (NIHSS), Medical Outcomes Study Short Form 36 (SF-36), Lawton Instrumental Activities of Daily Living (IADL)Complex tasks that involve social or societal issues (shopping, bill paying, cooking, housework, etc.) that are done on a regular basis. Scale, Duke Mobility Scale and Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale. The following results were found for each domain of the SIS:

SIS Domain	Comparative Measure	CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.	Rating
Hand function	FMA – Upper Extremity Motor	r = 0.81	Excellent
Mobility	FIM Motor	r = 0.83	Excellent
	Barthel Index	r = 0.82	Excellent
	Duke Mobility Scale	r = 0.83	Excellent
	SF-36 Physical Functioning	r = 0.84	Excellent
Strength	NIHSS Motor	r = -0.59	Adequate
Strength	FMA Total	r = 0.72	Excellent
ADL/IADL	Barthel Index	r = 0.84	Excellent
	FIM Motor	r = 0.84	Excellent
	Lawton IADL Scale	r = 0.82	Excellent
Memory	MMSE	r = 0.58	Adequate
Communication	FIM Social/Cognition	r = 0.53	Adequate
Communication	NIHSS Language	r = -0.44	Adequate
Emotion	Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. Scale	r = -0.77	Excellent
Emotion	SF-36 Mental Health	r = 0.74	Excellent
ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations.	SF-36 Emotional Role	r = 0.28	Poor
	SF-36 Physical Role	r = 0.45	Adequate
	SF-36 Social Functioning	r = 0.70	Excellent
Physical	Barthel Index	r = 0.76	Excellent
	FIM Motor	r = 0.79	Excellent
	SF-36 Physical Functioning	r = 0.75	Excellent
	Lawton IADL Scale	r = 0.73	Excellent

Duncan et al. (2002a) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the SIS version 3.0 and SIS-16 using Pearson correlations. The SIS was correlated with the Mini-Mental State Examination (MMSE), Barthel Index, Lawton IADL Scale and the Motricity Index. The SIS ADL/IADL domain showed an excellent correlation with the Barthel Index (r=0.72) and with the Lawton IADL Scale (r=0.77). The SIS Mobility domain showed an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the Barthel Index (r=0.69). The SIS Strength domain showed an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the Motricity Index (r=0.67). The SIS Memory domain showed an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the MMSE (r=0.42).

Lai et al. (2003) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the SIS-16 and SIS Social ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domain by comparison with the SF-36 Physical Functioning and Social Functioning subscales, Barthel Index and Lawson IADL Scale, using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Measures were administered to 278 patients with stroke at 3 months post-stroke. There was an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between SIS-16 and SF-36 Physical Functioning (r=0.79), and an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between SIS Social Participation and SF-36 Social Functioning (r=0.65). There was an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between SIS-16 and the Barthel Index at 3 months post-stroke (r=0.75), and an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between SIS Social ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. and Lawton IADL Scale at 3 months post-stroke (r=0.47).

Lin et al. (2010a) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the SIS version 3.0 by comparison with the Fugl-Meyer Assessment (FMA), Motor Activity Log – Amount of Use and – Quality of Movement (MAL-AOU, MAL-QOM), Functional Independence Measure (FIM), Frenchay Activities Index (FAI) and Nottingham Extended ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living Scale (NEADL). Concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
was measured using Spearman correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients prior to and on completion of a 3-week intervention period. SIS Hand Function showed an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with MAL-QOM at pre-treatment and post-treatment (r=0.65, 0.68, respectively, p<0.01), and adequate correlations with all other measures (FMA, MAL-AOU, FIM, FAI, NEADL). SIS ADL/IADL showed an excellent correlation with the FIM at pre-treatment and post-treatment (r=0.69, 0.75, respectively, p<0.01). Correlations between SIS ADL/IADL and the NEADL were adequate at pre-treatment (r=0.54, p<0.01) and excellent at post-treatment (r=0.62, p<0.01). Correlations between the SIS ADL-IADL and all other measures (FMA, MAL-AOU, MAL-QOM, FAI) were adequate at pre-treatment and post-treatment. Other SIS domains demonstrated poor to adequate correlations with comparison measures.

Ward et al. (2011) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the SIS-16 by comparison with the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Rehabilitation Assessment of Movement (STREAM), using Spearman correlations. Measures were administered to 30 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. on admission to and discharge from an acute rehabilitation setting. Correlations between the SIS-16 and STREAM total and subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
scores were adequate to excellent on admission (STREAM total r=0.7073; STREAM subtests r=0.5992 to 0.6451, p<0.0005) and discharge (STREAM total r=0.7153; STREAM subtests r=0.5499 to 0.7985, p<0.0002).

Richardson et al. (2016) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the SIS 3.0 by comparison with the 5-level EuroQol 5D (EQ-5D-5L), using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Measures were administered to patients with subacute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. on admission to the study and at 6-month and 12-month follow-up (n=164, 108, 37, respectively). At admission correlations with the EQ-5D-5L were excellent for the ADL (r=0.663) and Hand function (r=0.618) domains and Physical composite score (r=0.71); correlations with other domains were adequate (r=0.318 to 0.588), except for the Communication domain (r=0.228). At 6-month follow-up correlations with the EQ-5D-5L were excellent for the Strength (r=0.628), ADL (r=0.684), Mobility (r=0.765), Hand function (r=0.668), ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. (r=0.740) and Recovery domains (r=0.601) and Physical composite score (r=0.772); correlations with other domains were adequate (r=0.402 to 0.562). At 12-month follow-up correlations with the EQ-5D-5L were excellent for the Strength (r=0.604), ADL (r=0.760), Mobility (r=0.683) and ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. (r=0.738) domains and the Physical composite score (r=756); correlations with other domains were adequate (r=0.364 to 0.592).

Predictive:
Duncan et al. (1999) examined which domain scores of the SIS version 2.0 could most accurately predict a patient’s own assessment of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. recovery, using multiple regression analysis. The SIS domains of Physical function, Emotion, and ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. were found to be statistically significant predictors of the patient’s assessment of recovery. Forty-five percent of the variance in the patient’s assessment of percentage of recovery was explained by these factors.

Fulk, Reynolds, Mondal & Deutsch (2010) examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the 6MWT and other widely used clinical measures (FMA-LE, self-selected gait-speed, SIS and BBS) in 19 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The SIS was found to be a poor predictor of mean steps per day (r=0.18, p=0.471). Although gaitThe pattern of walking, which is often characterized by elements of progression, efficiency, stability and safety.
speed and balance were related to walking activity, only the 6MWT was found to be a predictor of community ambulation in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Huang et al. (2010) examined change in quality of life after distributed constraint-induced movement therapy (CIMT) in a sample of 58 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using CHAID analysis. Predictors of change included age, gender, side of lesion, time since strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., cognitive status (measured by the MMSE), upper extremity motor impairmentLoss of strength and coordination, decrease in arm or leg movement
(measured by the FMA-UE) and independence in activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living (measured by the FIM). Initial FIM scores were the strongest predictor of overall SIS score (p=0.006) and ADL/IADL domain score (p=0.004) at post-treatment. Participants with FIM scores ≤ 109 showed significantly greater improvement in overall SIS scores than participants with FIM scores > 109. There were no significant associations between other SIS domains and other predictors.

Lin et al. (2010a) examined predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the SIS version 3.0 by comparing pre-treatment SIS scores with post-treatment scores of the Fugl-Meyer Assessment (FMA), Motor Activity Log – Amount of Use and – Quality of Movement (MAL-AOU, MAL-QOM), Functional Independence Measure (FIM), Frenchay ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Index (FAI) and Nottingham Extended ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living Scale (NEADL). Predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
was measured using Spearman correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients prior to and on completion of a 3-week intervention period. The SIS Hand Function showed excellent correlations with MAL-AOU (r=0.61, p<0.01) and MAL-QOM (r=0.66, p<0.01), and adequate correlations with all other measures (FMA, FIM, FAI, NEADL). The SIS ADL/IADL showed an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the FIM (r=0.70, p<0.01), and adequate correlations with all other measures (FMA, MAL-AOU, MAL-QOM, FAI, NEADL). Other SIS domains demonstrated poor to adequate correlations with comparison measures.

Ward et al. (2011) examined predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the SIS-16 and other clinical measures (STREAM, FIM) in a sample of 30 patients in an acute rehabilitation setting, using Spearman rho coefficients and Wilcoxon rank-sum tests. Results indicated an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between SIS-16 admission scores and predicted length of stay (rho=-0.6743, p<0.001) and an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between SIS-16 admission scores and actual length of stay (rho=-0.7953, p<0.001). There was an significant correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with discharge destination (p<0.05).

Lee et al. (2016) developed a computational method to predict quality of life after stroke rehabilitation, using Particle Swarm-Optimized Support Vector Machine (PSO-SVM) classifier. A sample of 130 patients with subacute/chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. received occupational therapy for 1.5-2 hours/day, 5 days/week for 3-4 weeks. Predictors of outcome included 5 personal parameters (age, gender, time since strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. onset, education, MMSE score) and 9 early functional outcomes (Fugl-Meyer Assessment, Wolf Motor Function Test, Action Research Arm Test, Functional Independence Measure, Motor Activity Log – Amount of Use (MAL-AOU) and – Quality of Movement (MAL-QOM), ABILHAND, physical function, SIS). The combination of early outcomes of MAL-QOM and SIS showed highest accuracy (70%) and highest cross-validated accuracy (81.43%) in predicting final QOL among patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. SIS alone showed high accuracy (60%) and cross-validated accuracy (81.43%).

Construct:

Duncan et al. (2003b) performed a Rasch analysis on version 2.0 of the SIS. For measures that have been developed using a conceptual hierarchy of items, the theoretical ordering can be compared with the empirical ordering produced by the Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
as evidence of the construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the measure. In this study, the expectation regarding the theoretical ordering of task difficulty was consistent with the empirical ordering of the items by difficulty for each domain, providing evidence for the construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the SIS.

Convergent/Discriminant:
Edwards and O’Connell (2003) examined discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess.
of the SIS version 2.0 and SIS-16 in a sample of 74 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., by comparison with the World Health Organization Quality of Life Bref-Scale (WHOQOL-BREF) and Zung’s Self-Rating DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale (ZSRDS). There were adequate to excellent correlations between the SIS-16 and the WHOQOL-BREF Physical domain (r=0.40 to 0.63); correlations with the WHOQOL-BREF Social relationships domain were poor (r=0.13 to 0.18). There were adequate to excellent correlations between the SIS ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domain and all WHOQOL-BREF domains (r=0.45 to 0.69). The correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the SIS ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domain and the WHOQOL-BREF Physical domain was excellent (r=0.69). The SIS ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domain demonstrated an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the ZSRDS (r=-0.56). There were adequate correlations between the SIS Memory and Emotion domains and the WHOQOL-BREF Psychological domain (r=0.49, 0.70, respectively) and between the SIS Memory and Emotion domains and the ZSRDS (r=-0.38, -0.62, respectively). There was a poor correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the SIS Physical domain and the WHOQOL-BREF Environment scores (r=0.15). Neither the ZSRDS nor the WHOQOL-BREF assess communication, accordingly both measures demonstrated poor correlations with the SIS Communication domain (ZSRDS: r=-0.28; WHOQOL-BREF: r=0.11 to 0.28).
Note: Some correlations are negative because a high score on the SIS indicates normal performance whereas a high score on other measures indicates impairment.

Jenkinson et al. (2013) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the SIS version 3.0 and the SF-SIS in a sample of individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=73, 151, respectively) by comparison with the EuroQoL EQ-5D, using Spearmans correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. The SIS and SF-SIS demonstrated identical excellent correlations with the EQ-5D (r=0.83)

MacIsaac et al. (2016) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the SIS 3.0 and the SF-SIS in a sample of 5549 patients in an acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. setting and 332 patients in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. rehabilitation setting, using Spearman’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. Convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
was measured by comparison with the SIS-VAS, patient-reported outcome measures the EuroQoL EQ-5D and EQ-5D-VAS, and functional measures the Barthel Index (BI), modified Rankin Score (mRS), and the National Institutes of Health StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale (NIHSS). Within acute data, the SIS and SF-SIS demonstrated significant excellent correlations with the mRS (p=-0.87, -0.80, respectively), the BI (p=0.89, 0.80), the NIHSS (p=-0.77, -0.73), the EQ-5D (p=0.88, 0.82) and the EQ-VAS (p=0.73, 0.72). Within rehabilitation data, the SIS and SF-SIS demonstrated excellent correlations with the BI (p=0.72, 0.65, respectively) and the EQ5D (p=0.69, 0.69), and moderate correlations with the SIS-VAS (p=0.56, 0.57) and the EQ-VAS (p=0.46, 0.40). Correlations between the SIS and SF-SIS were excellent in the acute data (p=0.94) and rehabilitation data (p=0.96).

Kwon et al. (2006) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the SIS 3.0 by telephone administration in a sample of 95 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Pearson coefficients. Convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
was measured by comparison with the Functional Independence Measure (FIM) – Motor component (FIM-M) and – Cognitive component (FIM-C), with the Medical Outcomes Study Short Form 36 for veterans (SF-36V). Patients were administered the SIS at 12 weeks post-stroke and the FIM and SF-36 at 16 weeks post-stroke. The SIS 3.0 telephone survey showed adequate to excellent correlations with the FIM (r=0.404 to 0.858, p<0.001) and SF-36V (r=0.362 to 0.768, p<0.001).

Known groups:
Duncan et al. (1999) found that all domains of the SIS version 2.0, with the exception of the Memory/thinking and Emotion domains, were able to discriminate between patients across 4 Rankin levels of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity (p<0.0001, except for the Communication domain, p=0.02). These results suggest that scores from most domains of the SIS can differentiate between patients based on strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity.

Lai et al. (2003) administered the SIS and SF-36 to 278 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. 90 days after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The SIS-16 was able to discriminate among the Modified Rankin Scale (MRS) levels of 0 to 1, 2, 3, and 4. The SIS ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domain was also able to discriminate across the MRS levels of 0 to 1, 2, and 3 to 4. These results suggest that the SIS can discriminate between patients of varying degrees of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity.

Kwon et al. (2006) administered the SIS 3.0 by telephone administration to a sample of 95 patients at 12 weeks post-stroke. The MRS was administered to patients at hospital discharge. SIS 3.0 scores were reported by domains: SIS-16, SIS-Physical and SIS-ADL; all domains showed score discrimination and distribution for different degrees of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity: MRS 0/1 vs. MRS 4/5; MRS 2 vs. MRS 4/5; and MRS 3 vs. MRS 4/5.

Sensitivity and Specificity:

Beninato, Portney & Sullivan (2009) examined sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of the SIS-16 relative to a history of multiple falls in a sample of 27 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants reported a history of no falls or one fall (n=18) vs. multiple falls (n=9), according to Tinetti’s definition of falls. SIS-16 cut-off scores of 61.7 yielded 78% sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
and 89% specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
. Area under the ROC curve was adequate (0.86). Likelihood ratios were used to calculate post-test probability of a history of falls, and results showed high positive (LR+ = 7.0) and low negative (LR- = 0.25) likelihood ratios. Results indicate that the SIS-16 demonstrated good overall accuracy in detecting individuals with a history of multiple falls.

Responsiveness

Duncan et al. (1999) examined responsivenessThe ability of an instrument to detect clinically important change over time.
of the SIS version 2.0. Significant change was observed in patients’ recovery in the expected direction between assessments at 1 and 3 months, and at 1 and 6 months post-stroke, however sensitivity to change was affected by strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity and time of post-stroke assessment. All domains of the SIS showed statistically significant change from 1 to 3 months and 1 to 6 months post-stroke, but this was not observed between 3 and 6 months post-stroke for the domains of Hand function, Mobility, ADL/IADL, combined physical, and ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. among patients recovering from minor strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. For patients with moderate strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., statistically significant change was observed at both 1 to 3 months and 1 to 6 months post-stroke in all domains, and from 3 to 6 months for the domains of Mobility, ADL/IADL, combined physical, and ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations..

Lin et al. (2010a) examined responsivenessThe ability of an instrument to detect clinically important change over time.
of the SIS version 3.0 in a sample of 74 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were randomly assigned to receive constraint-induced movement therapy (CIMT), bilateral arm training (BAT) or conventional rehabilitation over a 3-week intervention period. ResponsivenessThe ability of an instrument to detect clinically important change over time.
was measured according to change from pre- to post-treatment, using Wilcoxon signed rank test and Standardised Response Mean (SRM). Most SIS domains showed small responsivenessThe ability of an instrument to detect clinically important change over time.
(SRM = 0.22-0.33, Wilcoxon Z = 1.78-2.72). Medium responsivenessThe ability of an instrument to detect clinically important change over time.
was seen for Hand Function (SRM = 0.52, Wilcoxon Z = 4.24, P<0.05), Stroke Recovery (SRM = 0.57, Wilcoxon Z = 4.56, P<0.05) and SIS total score (SRM=0.50, Wilcoxon Z = 3.89, P<0.05).

Lin et al. (2010b) evaluated the clinically important difference (CID)Clinically Important Difference (CID) is the smallest change in a measure's score that is perceived significant by a patient or healthcare professional. within four physical domains of the SIS 3.0 (strength, ADL/IADL, mobility, hand function) in a sample of 74 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were randomly assigned to receive CIMT, BAT or conventional rehabilitation over a 3-week intervention period. The following change scores were found to indicate a true and reliable improvement (MDC): Strength subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
= 24.0; ADL/IADL subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
= 17.3; Mobility subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
= 15.1; and Hand Function subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
= 25.9. The following mean change scores were considered to represent a CID: Strength subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
= 9.2; ADL/IADL subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
= 5.9; Mobility subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
= 4.5; and Hand Function subscale = 17.8. CID values were determined by the effect-size index and from comparison with a global rating of change (defined by a score of 10-15% in patients’ perceived overall recovery from pre- to post-treatment).
Note: Lin et al. (2010b) note that CID estimates may have been influenced by the age of participants and baseline degree of severity. Younger patients needed greater change scores from pre- to post-treatment to have a clinically important improvement compared to older patients. Those with higher baseline severity of symptoms showed greater MDC values therefore must show more change from pre- to post-treatment in order to demonstrate significant improvements. Also, the results may be limited to strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. patients who demonstrate improvement after rehabilitation therapies, Brunnstromm stage III and sufficient cognitive ability. Therefore, a larger sample size is recommended for future validation of these findings.

Ward et al. (2011) examined responsivenessThe ability of an instrument to detect clinically important change over time.
of the SIS-16 and other clinical measures (STREAM, FIM) in a sample of 30 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Change scores were evaluated using Wilcoxon signed rank test and responsivenessThe ability of an instrument to detect clinically important change over time.
to change was assessed using standardized response means (SRM). Measures were taken on admission to and discharge from an acute rehabilitation setting (average length of stay 23.3 days, range 7-53 days). SIS-16 change scores indicated statistically significant improvement from admission to discharge (23.1, p<0.0001) and sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
to change was large (SRM=1.65).

Guidetti et al. (2014) examined responsivenessThe ability of an instrument to detect clinically important change over time.
of the SIS 3.0 in a sample of 204 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. who were assessed at 3 and 12 months post-stroke, using Wilcoxon’s matched pairs test. Clinically meaningful change within a domain was defined as a change of 10-15 points between timepoints. The ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. and Recovery domains were the most responsive domains over the first year post-stroke, with 27.5% and 29.4% of participants (respectively) reporting a clinically meaningful positive change, and 20% and 10.3% of participants (respectively) reporting a clinically meaningful negative change, from 3 to 12 months post-stroke. The Strength and Hand function domains also showed high clinically meaningful positive change (23%, 18.0% respectively) and negative change (14.7%, 14.2% respectively) from 3 to 12 months post-stroke. There were significant changes in scores on the Strength (p=0.045), Emotion (p=0.001) and Recovery (p<0.001) domains from 3 to 12 months post-stroke. The Strength, Hand function and ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domains had the highest perceived impact (i.e. lowest mean scores) at 3 months and 12 months.

References

Beninato, M., Portney, L.G., & Sullivan, P.E. (2009). Using the International Classification of Functioning, Disability and Health as a framework to examine the association between falls and clinical assessment tools in people with stroke. Physical Therapy, 89(8), 816-25.
Brandao, A.D., Teixeira, N.B., Brandao, M.C., Vidotto, M.C., Jardim, J.R., & Gazzotti, M.R. (2018). Translation and cultural adaptation of the Stroke Impact Scale 2.0 (SIS): a quality-of-life scale for stroke. Sao Paulo Medical Journal, 136(2), 144-9. doi: 10.1590/1516-3180.2017.0114281017
Brott, T.G., Adams, H.P., Olinger, C.P., Marler, J.R., Barsan, W.G., Biller, J., Spilker, J., Holleran, R., Eberle, R., Hertzberg, V., Rorick, M., Moomaw, C.J., & Walker, M. (1989). Measurements of acute cerebral infarction: A clinical examination scale. Stroke, 20, 864-70.
Cael, S., Decavel, P., Binquet, C., Benaim, C., Puyraveau, M., Chotard, M., Moulin, T., Parrette, B., Bejot, Y., & Mercier, M. (2015). Stroke Impact Scale version 2: validation of the French version. Physical Therapy, 95(5), 778-90.
Carod-Artal, F.J., Coral, L.F., Trizotto, D.S., Moreira, C.M. (2008). The Stroke Impact Scale 3.0: evaluation of acceptability, reliability, and validity of the Brazilian version. Stroke, 39, 2477-84.
Choi, S.U., Lee, H.S., Shin, J.H., Ho, S.H., Koo, M.J., Park, K.H., Yoon, J.A., Kim, D.M., Oh, J.E., Yu, S.H., & Kim, D.A. (2017). Stroke Impact Scale 3.0: reliability and validity evaluation of the Korean version. Annals of Rehabilitation Medicine, 41(3), 387-93.
Collin, C. & Wade, D. (1990). Assessing motor impairment after stroke: a pilot reliability study. Journal of Neurology, Neurosurgery, and Psychiatry, 53, 576-9.
Duncan, P. W., Bode, R. K., Lai, S. M., & Perera, S. (2003b). Rasch analysis of a new stroke-specific outcome scale: The Stroke Impact Scale. Archives of Physical Medicine and Rehabilitation, 84, 950-63.
Duncan, P. W., Lai, S. M., Tyler, D., Perera, S., Reker, D. M., & Studenski, S. (2002a). Evaluation of Proxy Responses to the Stroke Impact Scale. Stroke, 33, 2593-9.
Duncan, P.W., Reker, D.M., Horner, R.D., Samsa, G.P., Hoenig, H., LaClair, B.J., & Dudley, T.K. (2002b). Performance of a mail-administered version of a stroke-specific outcome measure: The Stroke Impact Scale. Clinical Rehabilitation, 16(5), 493-505.
Duncan, P.W., Wallace, D., Lai, S.M., Johnson, D., Embretson, S., & Laster, L.J. (1999). The Stroke Impact Scale version 2.0: Evaluation of reliability, validity, and sensitivity to change. Stroke, 30, 2131-40.
Duncan, P.W., Wallace, D., Studenski, S., Lai, S.M., & Johnson, D. (2001). Conceptualization of a new stroke-specific outcome measure: The Stroke Impact Scale. Topics in Stroke Rehabilitation, 8(2), 19-33.
Duncan, P.W., Lai, S.M., Bode, R.K., Perea, S., DeRosa, J.T., GAIN Americas Investigators. (2003a). Stroke Impact Scale-16: A brief assessment of physical function. Neurology, 60, 291-6.
Edwards, B. & O’Connell, B. (2003). Internal consistency and validity of the Stroke Impact Scale 2.0 (SIS 2.0) and SIS-16 in an Australian sample. Quality of Life Research, 12, 1127-35.
Finch, E., Brooks, D., Stratford, P.W., & Mayo, N.E. (2002). Physical Rehabilitations Outcome Measures. A Guide to Enhanced Clinical Decision-Making (2nd ed.), Canadian Physiotherapy Association, Toronto.
Folstein, M.F., Folstein, S.E., & McHugh, P.R. (1975). “Mini-mental state”. A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12(3), 189-98.
Fugl-Meyer, A.R., Jaasko, L., Leyman, I., Olsson, S., & Steglind, S. (1975). The post-stroke hemiplegic patient: a method for evaluation of physical performance. Scandinavian Journal of Rehabilitation Medicine, 7, 13-31.
Fulk, G.D., Reynolds, C., Mondal, S., & Deutsch, J.E. (2010). Predicting home and community walking activity in people with stroke. Archives of Physical Medicine and Rehabilitation, 91, 1582-6.
Geyh, S., Cieza, A., & Stucki, G. (2009). Evaluation of the German translation of the Stroke Impact Scale using Rasch analysis. The Clinical Neuropsychologist, 23(6), 978-95.
Goncalves, R.S., Gil, J.N., Cavalheiro, L.M., Costa, R.D., & Ferreira, P.L. (2012). Reliability and validity of the Portuguese version of the Stroke Impact Scale 2.0 (SIS 2.0). Quality of Life Research, 21(4), 691-6.
Guidetti, S., Ytterberg, C., Ekstam, L., Johansson, U., & Eriksson, G. (2014). Changes in the impact of stroke between 3 and 12 months post-stroke, assessed with the Stroke Impact Scale. Journal of Rehabilitative Medicine, 46, 963-8.
Hamilton, B.B., Granger, C.V., & Sherwin, F.S. (1987). A uniform national data system for medical rehabilitation. In: Fuhrer, M. J., ed. Rehabilitation Outcome: Analysis and Measurement. Baltimore, Md: Paul Brookes, 137-47.
Hamza, A.M., Nabilla, A.S., & Loh, S.Y. (2012). Evaluation of quality of life among stroke survivors: linguistic validation of the Stroke Impact Scale (SIS) 3.0 in Hausa language. Journal of Nigeria Soc Physiotherapy, 20, 52-9.
Hamza, A.M., Nabilla, A.-S., Yim, L.S., & Chinna, K. (2014). Reliability and validity of the Nigerian (Hausa) version of the Stroke Impact Scale (SIS) 3.0 index. BioMed Research International, 14, Article ID 302097, 7 pages. doi: 10.1155/2014/302097
Hogue, C., Studenski, S., Duncan, P.W. (1990). Assessing mobility: The first steps in preventing fall. In: Funk, SG., Tornquist, EM., Champagne, M.T., Copp, L.A., & Wiese, R.A., eds. Key Aspects of Recovery. New York, NY: Springer, 275-81.
Hsieh, F.-H., Lee, J.-D., Chang, T.-C., Yang, S.-T., Huang, C.-H., & Wu, C.-Y. (2016). Prediction of quality of life after stroke rehabilitation. Neuropsychiatry, 6(6), 369-75.
Huang, Y-h., Wu, C-y., Hsieh, Y-w., & Lin, K-c. (2010). Predictors of change in quality of life after distributed constraint-induced therapy in patients with chronic stroke. Neurorehabilitation and Neural Repair, 24(6), 559-66. doi: 10.1177/1545968309358074
Jenkinson, C., Fitzpatrick, R., Crocker, H., & Peters, M. (2013). The Stroke Impact Scale: validation in a UK setting and development of a SIS short form and SIS index. Stroke, 44, 2532-5.
Kamwesiga, J.T., von Koch, L., Kottorp, A., & Guidetti, S. (2009). Cultural adaptation and validation of Stroke Impact Scale 3.0 version in Uganda: a small-scale study. SAGE Open Medicine, 4: 2050312116671859. doi: 10.1177/2050312116671859
Kwon, S., Duncan, P., Studenski, S., Perera, S., Lai, S.M., & Reker, D. (2006). Measuring stroke impact with SIS: Construct validity of SIS telephone administration. Quality of Life Research, 15, 367-76.
Lai, S.M., Perera, S., Duncan, P.W., & Bode, R. (2003). Physical and Social Functioning After Stroke: Comparison of the Stroke Impact Scale and Short Form-36. Stroke, 34, 488-93.
Lawton, M. & Brody, E. (1969). Assessment of older people: self-maintaining and instrumental activities of daily living. Gerontologist, 9, 179 -86.
Lee, H.-J. & Song, J.-M. (2015). The Korean language version of Stroke Impact Scale 3.0: cross-cultural adaptation and translation. Journal of the Korean Society of Physical Medicine, 10(3), 47-55.
Lin, K.C., Fu, T., Wu, C.Y., Hsieh, Y.W., Chen, C.L., & Lee, P.C. (2010a). Psychometric comparisons of the Stroke Impact Scale 3.0 and Stroke-Specific Quality of Life Scale. Quality of Life Research, 19(3), 435-43. doi: 10.1007/s11136-010-9597-5.
Lin K.-C., Fu T., Wu C.Y., Wang Y.-H., Wang Y-.H., Liu J.-S., Hsieh C.-J., & Lin S.-F. (2010b). Minimal detectable change and clinically important difference of the Stroke Impact Scale in stroke patients. Neurorehabilitation and Neural Repair, 24, 486-92.
MacIsaac, R., Ali, M., Peters, M., English, C., Rodgers, H., Jenkinson, C., Lees, K.R., Quinn, T.J., VISTA Collaboration. (2016). Derivation and validation of a modified short form of the Stroke Impact Scale. Journal of the American Heart Association, 5:e003108. doi: 10/1161/JAHA.115003108.
Mahoney, F.I. & Barthel, D.W. (1965). Functional evaluation: The Barthel Index. Maryland State Medical Journal, 14, 61-5.
Mulder, M. & Nijland, R. (2016). Stroke Impact Scale. Journal of Physiotherapy, 62, 117.
Ochi, M., Ohashi, H., Hachisuka, K., & Saeki, S. (2017). The reliability and validity of the Japanese version of the Stroke Impact Scale version 3.0. Journal of UOEH, 39(3), 215-21. doi: 10.7888/juoeh.39.215
Richardson, M., Campbell, N., Allen, L., Meyer, M., & Teasell, R. (2016). The stroke impact scale: performance as a quality of life measure in a community-based stroke rehabilitation setting. Disability and Rehabilitation, 38(14), 1425-30. doi: 10.310/09638288.2015.1102337
Sullivan, J. (2014). Measurement characteristics and clinical utility of the Stroke Impact Scale. Archives of Physical Medicine and Rehabilitation, 95, 1799-1800.
Vellone, E., Savini, S., Barbato, N., Carovillano, G., Caramia, M., & Alvaro, R. (2010). Quality of life in stroke survivors: first results from the reliability and validity of the Italian version of the Stroke Impact Scale 3.0. Annali di Igiene, 22, 469-79.
Vellone, E., Savini, S., Fida, R., Dickson, V.V., Melkus, G.D., Carod-Artal, F.J., Rocco, G., & Alvaro, R. (2015). Psychometric evaluation of the Stroke Impact Scale 3.0. Journal of Cardiovascular Nursing, 30(3), 229-41. doi: 10.1097/JCN.0000000000000145
Ward, I., Pivko, S., Brooks, G., & Parkin, K. (2011). Validity of the Stroke Rehabilitation Assessment of Movement Scale in acute rehabilitation: a comparison with the Functional Independence Measure and Stroke Impact Scale-16. Physical Medicine and Rehabilitation, 3(11), 1013-21. doi: 10.1016/j.pmrj.2011.08.537
Ware, J.E. Jr., & Sherbourne, C.D. (1992). The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Medical Care, 30, 473-83.
Yesavage, J.A., Brink, T., Rose, T.L., Lum, O., Huang, V., Adey, M., & Leirer, V.O. (1983). Development and validation of a geriatric depression screening scale: A preliminary report. Journal of Psychiatric Research, 17, 37-49.

See the measure

How to obtain the SIS?

Please click here to see a copy of the SIS.

This instrument was developed by:

Pamela Duncan, PhD, PT
Dennis Wallace, PhD
Sue Min Lai, PhD, MS, MBA
Stephanie Studenski, MD, MPH
DallasJohnson, PhD, and
Susan Embretson, PhD.

In order to gain permission to use the SIS and its translations, please contact MAPI Research Trust: contact@mapi-trust.org

Upper Extremity Function Test (UEFT)

Evidence Reviewed as of before: 19-04-2013

Author(s)*: Katie Marvin, MSc. PT

Editor(s): Annabel McDermott; Nicol Korner-Bitensky, PhD OT

Purpose

In-Depth Review

Purpose of the measure

The Upper Extremity Function Test (UEFT) is an evaluative measure to assess upper extremity functional impairment and the severity of impairment in patients exhibiting dysfunction in the upper extremity. The test assesses function based on the assumption that complex upper extremity movements used in everyday activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
are made up of certain movement patterns (e.g. supination/pronation, grasp/release, pinch grip, etc.), so that evaluation of these movement patterns can predict the patient’s ability to perform functional activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
. The UEFT was designed primarily to quantify the patient’s ability to execute upper extremity activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of a general nature, and does not take into consideration factors such as skill, speed, range of motion, endurance, sensation etc. The selected list of test items is believed to represent the upper limb movements that are necessary to perform many of the major activities of daily living. The UEFT has not yet been correlated to vocational activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of the upper extremity.

Available versions

The Action Research Arm Test (ARAT) was developed by Ronald Lyle in 1981 by adapting theUpper Extremity Function Test (UEFT)(Carroll, 1965). TheUEFTtest administration and scoring was simplified, the time required to administer the test was shortened, and items were grouped based on the hierarchical scale (Guttman Scale) (Lang, Wagner, Dromerick, & Edwards, 2006). Due to the need for more specific and detailed instructions related to the client’s position, scoring and test administration, Yozbatiran, Der-Yeghiaian, and Cramer (2008) proposed a standardized approach to the ARAT.

Please visit our Action Research Arm Test module for further information.

Features of the measure

Items:

The UEFT consists of 33 items or tasks, detailed below.

Description of tasks:

The patient is positioned comfortably in a chair in front of the table used for testing. The patient is evaluated while performing different tasks, such as moving objects to a shelf, placing objects over a peg, writing their name, etc. The objects are of varying shapes and weights in order to evaluate the patient’s grasp, grip, pinch, placing, arm extension and elevation, pronation and supination, and functional strength.

Please note that the patient is not permitted to move from the chair during testing (unless a break is required), although weight transfer and rolling from side to side of the buttock is permitted. Each arm is tested individually. Demonstration of tasks are permitted (Carroll, 1965)

Scoring and Score Interpretation:

The UEFT uses a simple scoring method where results can be compared at different time intervals.

Scoring:

3	Performs test normally.
2	Completes test, but takes abnormally long time or has great difficulty.
1	Performs test partially. This grade is assigned when the patient is able to pick up or lift the test item from the table but is unable to place the object in its correct end position. For example, in items 27 to 29, the patient is able to lift the pitcher or glass but is unable to pour the water into the proper receptacle.
0	Can perform no part of the test. If the patient pushes objects out of their slots or around on the table a grade of 0 is assigned.

The total score is tallied. The maximum score for the dominant hand is 99 as compared to a maximum score of 96 for the non-dominant hand, because item 33 consists of writing of the patient’s name with the dominant hand.

The authors of the test concluded that a score increase or decrease of 10 points represents a meaningful gain or loss of important function, respectively.

Nearly equal scoring points have been allotted for the two functions prehension’ (grasp, grip and pinch) and placing’ (shoulder stability; shoulder abduction and flexion/extension; elbow flexion/extension; wrist flexion/extension and pronation/supination); as such, both functions need to be intact in order for a high score to be awarded.

Score interpretation:

0 to 25:	Trace function
26-50:	Very poor
51-75:	Poor
76-89:	Partial function
90-98:	Functional
99 (dominant hand) / 96 (non-dominant hand):	Maximal function

Functional Implications of UEFT:

Basmajian et al. (1982) investigated the functional implications of UEFT scores and found the following scores to be indicative of the following patient capabilities:

0: no function
10: holding a book for reading
20: driving
30: carrying objects from place to place
40: dressing
50: feeding
60: shaving/make-up
70: hand crafts
80: fine crafts (needlework, gardening, capentry)
90: card playing
100: letter writing/typing

Adapted from Basmajian, Gowland, Brandstater, Swanson & Trotter (1982).

Time:

The UEFT takes approximately 1 hour to administer (Lyle, 1981).

Training requirements:

None typically reported, however it is recommended that the clinician is familiar with the assessment tool.

Subscales:

None typically reported.

Equipment:

17.5 in. width x 28.5 in. length x 30.75 in. height table
3.75 in. width shelf mounted 14.75 in. from the table
Wooden cubes: 4 x 4 x 4in. (576g); 3 x 3 x 3in. (243g); 2 x 2 x 2in. (72g); 1 x 1 x 1 (9g)
Large iron pipe: 1.625 O.D. x 6.125in. (500g)
Small iron pipe: 0.87 O.D. x 4.125 (125g)
Slate: 4.125 x 1 x .375 (61g)
Wooden ball: 3 O.D. (100g)
Glass marble 0.625 O.D. (6.3g)
Metal sphere 0.44 O.D. (6.6g); 0.25 O.D. (1.0g); 0.16 (0.34g)
Steel washer 0.16 thick x 1.375 O.D. x 0.56 I.D. (14.5g)
Iron 6 lb approximately
2 Plastic tumblers 8 fl. oz
Aluminum water pitcher 3 qt capacity
Pencil

*O.D. = outside diameter; I.D. = inside diameter

Please refer to Carroll (1965) for further information regarding administration set-up of the UEFT.

Alternative form of the Action Research Arm Test

None typically reported.

Client suitability

Can be used with:

Clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Should not be used with:

When administering the UEFT to clients with upper extremity amputations, the total score should be adjusted according to the following scale.

Total UEFT Scores for people with amputations:

Wrist:	0
Three fingers:	41
Middle finger:	87
Index finger and 2nd metacarpal:	84
Thumb and metacarpal-phalangeal joint:	91
Index finger at proximal interphalangeal joint:	93

Languages of the measure?

There are no official translations of the UEFT.

Summary

What does the tool measure?	The UEFT measures specific changes in upper extremity impairment and function
What types of clients can the tool be used for?	The UEFT can be used with, but is not limited to clients with stroke.
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	The UEFT takes approximately 1 hour to administer.
Versions	The Action Research Arm Test (ARAT) was developed by Ronald Lyle in 1981 by adapting the Upper Extremity Function Test (Carroll, 1965).
Other Languages	There are no official translations.
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Test-retest: One study investigated the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the UEFT and found strong inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. in a sample of patients with chronic upper extremity impairment resulting from conditions including stroke. Inter-rater: One study investigated inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the UEFT and found strong inter-rater reliability.
Validity	Predictive: One study examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the UEFT and found admission UEFT scores to be predictive of discharge UEFT scores.
Floor/Ceiling Effects	No studies have examined the floor/ceiling effects of the UEFT.
Does the tool detect change in patients?	No studies have formally examined the responsiveness of the UEFT.
Acceptability	The UEFT is simple to administer and can be easily administered in a variety of settings (e.g. home or medical office settings).
Feasibility	The administration of the UEFT and the ARAT is quick and simple, but requires standardized equipment.
How to obtain the tool?	Please refer to the initial validation study by Carroll (1965) for further information on the UEFT.

Psychometric Properties

Overview

A literature search was conducted to identify all relevant publications on the psychometric properties of the Upper Extremity Function Test. Limited information is available on the UEFT. However, the Action Research Arm Test, developed in 1981 as an adaptation of the UEFT, is a more reliable, valid and responsive measure currently used for clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Floor/Ceiling Effects

No studies have examined the floor/ceiling effects of the UEFT.

Reliability

Test-retest:
Carroll (1965) examined test re-test reliability of the UEFT in a sample of 23 patients with chronic stable upper extremity impairment due to varying causes (including stroke) and 7 patients with typical upper extremity function. The UEFT was administered two times, 30 days apart. Scores for individuals with typical upper extremity function were identical on the two different testing days. Of scores attained for patients with chronic stable upper extremity impairment, 1 case was identical, 5 cases showed a 1-point difference, 7 cases showed a 3-point difference, 2 cases showed a 5-point difference, and 3 cases showed a difference of 6, 7 and 8 points. The results of this initial validation study suggest that UEFT has strong test re-test reliability.

Intra-rater:
No studies have examined the intra-rater reliability of the UEFT.

Inter-rater:
Carroll (1965) investigated inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the UEFT among clinicians who were either experienced or not experienced with the UEFT. Two raters with experience using the UEFT rated the upper extremities of 48 individuals with stroke. The two examiners rated 46% of the patients identically, 21% within 1 point, 8% within 2 points, 10% within 3 points, 8% within 4 points and 6% of patients within 5 points. Subsequently, three examiners without experience using the UEFT were educated on the grading system and were then asked to rate the performance of 15 patients with stroke. The inexperienced raters scored within 7 points of the experienced raters 97% of the time. The results of this study indicate that the UEFT has strong inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
.

Validity

Content:

Criterion:

Predictive:
Barrecca, Finlayson, Gowland & Basmajian (1999) examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the UEFT and the Halstead Category Test in 16 patients with stroke. Admission UEFT and Halstead Category Test scores were found to be predictive of discharge UEFT scores (approximately 5 weeks later), even in patients with severe upper extremity disability following stroke.

Construct:

Responsiveness

Popovic, Popovic, Sinkjaer, Stefanovic & Schwirtlick (2003) investigated the effects of Functional Electrical Stimulation on upper extremity function in patients with stroke. The UEFT was used as an outcome measure and was able to detect change in upper extremity function in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

References

Barreca, S., Finlayson, A., Gowland, C. & Basmajian, J. (1999). Use of the Halstead Category Test as a predictor of functional recovery in the hemiplegic upper limb: A cross-validation study. The Clinical Neuropsychologist, 13(2), 171-178.
Basmajian, C., Gowland, M., Brandstater, L., Swanson, L. & Trotter, J. (1982). EMG feedback treatment of upper limb in hemiplegic stroke patients: A pilot study. Archives of Physical Medicine Rehabilitation, 63, 614.
Carroll, D. (1965). A quantitative test of upper extremity function. Journal of Chronic Diseases, 18, 479-491.
Lang, C.E., Wagner, J.M, Dromerick, A.W., & Edwards, D.F. (2006). Measurement of upper extremity function early after stroke: properties of the action research arm test.Archives Physical Medicine and Rehabilitation, 87, 1605-1610.
Lyle, R.C. (1981). A performance test for assessment of upper limb function in physical rehabilitation treatment and research. International Journal of Rehabilitation Research, 4(4), 483-492.
Okkema, K.A. (1998). Functional evaluation of upper extremity use following stroke: A literature review. Topics of Stroke Rehabilitation, 4(4), 54-75
Popovic, M.B., Popovic, D.B., Sinkjaer, T., Stefanovic, A. & Schwirtlich, L. (2003). Clinical evaluation of Funcational Evaluation Therapy in acute hemiplegic subjects. Journal of Rehabilitation Research and Development, 40(5), 443-454.

See the measure

Further information on the UEFT can be found in the following publication:

Carroll, D. (1965). A quantitative test of upper extremity function. Journal of Chronic Diseases, 18, 479-491.

Wolf Motor Function Test (WMFT)

Evidence Reviewed as of before: 11-01-2011

Author(s)*: Sabrina Figueiredo, BSc

Editor(s): Nicol Korner-Bitensky, PhD OT

Content consistency: Gabriel Plumier

Purpose

The Wolf Motor Function Test (WMFT) quantifies upper extremity (UE) motor ability through timed and functional tasks (Wolf, Catlin, Ellis, Archer, Morgan & Piacentino, 1995).

In-Depth Review

Purpose of the measure

The Wolf Motor Function Test (WMFT) quantifies upper extremity (UE) motor ability through timed and functional tasks (Wolf, Catlin, Ellis, Archer, Morgan & Piacentino, 1995).

Available versions

The original version of the WMFT was developed by Wolf, Lecraw, Barton, and Jann in 1989 to examine the effects of constraint-induced movement therapy in clients with mild to moderate stroke and traumatic brain injury. In 1999, a graded WMFT was developed by Uswatte and Taub to assess the motor abilities of patients who were functioning at a lower level (Morris, Uswatte, Crago, Cook & Taub, 2001).

Features of the measure

Items:

The original version of the WMFT consisted of 21 items. The widely used version of the WMFT consists of 17 items. The first 6 items involve timed functional tasks, items 7 and 14 are measures of strength, and the remaining 9 items consist of analyzing movement quality when completing various tasks (Wolf et al., 1995; Whitall, Savin, Harris-Love, & Waller, 2006).

The examiner should test the less affected upper extremity followed by the most affected side. The following items should be performed as quickly as possible, truncated at 120 seconds (Wolf, Thompson, Morris, Rose, Winstein, Taub, et al., 2005):

Forearm to table (side): client attempts to place forearm on a table by abducting at the shoulder
Forearm to box (side): client attempts to place forearm on a box, 25.4cm tall, by abduction at the shoulder
Extended elbow (side): client attempts to reach across a table, 28cm long, by extending the elbow (to the side)
Extended elbow (to the side) with 1lb weight: client attempts to push the weight against outer wrist joint across the table by extending the elbow
Hand to table (front): client attempts to place involved hand on a table
Hand to box (front): client attempts to place hand on the box placed on the tabletop
Weight to box: client attempts to place the heaviest possible weight on the box placed on the tabletop
Reach and retrieve (front): client attempts to pull 1lb weight across the table by using elbow flexion and cupped wrist
Lift can (front): client attempts to lift a can and bring it close to his/her lips with a cylindrical grasp
Lift pencil (front): client attempts to pick up a pencil by using 3-jaw chuck grasp.
Pick-up paper clip (front): client attempts to pick up a paper clip by using a pincer grasp
Stack checkers (front): client attempts to stack checkers onto the center checker
Flip 3 cards (front): using the pincer grasp, client attempts to flip each card over
Grip strength
Turning the key in lock (front): using pincer grasp, while maintaining contact, client turns key 180 degrees to the left and right
Fold towel (front): client grasps towel, folds it lengthwise, and then uses the tested hand to fold the towel in half again
Lift basket (standing): client picks up a 3lb basket from a chair, by grasping the handles, and placing it on a bedside table

Scoring:

The items are rated on a 6-point scale as outlined below (Wolf et al., 2005):

0. “Does not attempt with UE being tested”
1. “UE being tested does not participate functionally; however, an attempt is made to use the UE. In unilateral tasks, the UE not being tested may be used to move the UE being tested”.
2. “Does attempt, but requires assistance of the UE not being tested for minor readjustments or change of position, or requires more than 2 attempts to complete, or accomplishes very slowly. In bilateral tasks, the UE being tested may serve only as a helper”.
3. “Does attempt, but movement is influenced to some degree by synergy or is performed slowly or with effort”.
4. “Does attempt; movement is similar to the non-affected side but slightly slower; may lack precision, fine coordination or fluidity”.
5. “Does attempt, movement appears to be normal”.

Lower scores are indicative of lower functioning levels.

Time:

Not reported, but since a maximum of 120 seconds is allocated to each item, it should take approximately 30 minutes with additional time for measuring grip strength (item 14).

Subscales:

None officially documented. However, many studies use the Performance Time (WMFT-PT) and Functional Capacity (WMFT-FA) scales as subtests of the WFMT.

Equipment:

Table 28 cm long (height not reported)
Chair (dimensions not reported)
Bedside table (dimensions not reported)
Box (25.4 cm tall)
Free-weights
Can
Pencil
Paperclip
Checkers
Cards
Key lock with the key
Towel
Basket
Dynamometer for measuring hand grip strength

Training:

Not reported.

Alternative form of the WMFT

The original version (21 items)
The modified version (17 items): The modified version is most widely used and allows assessment of clients with severe, moderate and mild stroke.

Client suitability

Can be used with:

Clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.
Clients with upper limb functional deficits/li>

Should not be used with:

Severe cases of upper limb spasticityInvoluntary muscle tightness and stiffness that can occur after a stroke. It is characterized by exaggerated deep tendon reflexes that interfere with muscular activity, gait, movement, or speech. Spasticity can increase initially but wane down later on, after stroke.
, and upper limb amputees

In what languages is the measure available?

French and English.

Summary

Wolf Motor Function Test (WMFT) Evaluation Summary

What does the tool measure?	The WMFT quantifies upper extremity motor ability through timed and functional tasks.
What types of clients can the tool be used for?	The WMFT can be used with, but is not limited to clients with stroke.
Is this a screening or assessment tool?	Assessment
Time to administer	The WMFT takes approximately 30 minutes to administer.
Versions	The original version (21 items), and the modified version (17 items)
Other Languages	English
Measurement Properties
Reliability	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: Two studies examined the internal consistency of the WMFT and reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. using Cronbach’s alpha. Test-retest: Two studies examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the WMFT and reported excellent reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest . using Pearson and Intraclass correlations coefficients (ICC). Inter-rater: Four studies examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the WMFT and reported excellent reliability using the ICC.
Validity	Content: No studies have reported the content validity of the WMFT. Criterion: Concurrent: – Two studies examined the concurrent validity of the WMFT and reported moderate to excellent correlations with the Fugl-Meyer Assessment, as the gold standard measure. – One study examined the concurrent validity of the WMFT and reported excellent correlations with the Action Research Arm Test. Construct: Known Groups: One study examined the known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured. of the WMFT using Wilcoxon Test and reported that the WMFT is able to discriminate between healthy individuals and those with upper extremity impairments.
Floor/Ceiling Effects	No studies have examined floor/ceiling effects of the WMFT in clients with stroke.
Does the tool detect change in patients?	No studies have examined the responsivenessThe ability of an instrument to detect clinically important change over time. of the WMFT in clients with stroke.
Acceptability	The WMFT is the widely used as an outcome measure for constraint-induced movement therapy.
Feasibility	The administration of the WMFT is quick and simple.
How to obtain the tool?	The WMFT can be found at: Wolf, S., Thompson, P., Morris, D., Rose, D., Winstein, C., Taub, E., Giuliani, C., & Pearson, S. (2005). The EXCITE Trial: Atrributes of the Wolf Motor Function test in patients with Subacute StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Neurorehabil Neural Repair, 19, 194-205.

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the Wolf Motor Function Test (WMFT) in individuals with stroke. We identified 3 studies.

Floor/Ceiling Effects

Nijland et al. (2010) investigated the psychometric properties of the WMFT and the Action Research Arm Test in 40 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. with mild to moderate hemiparesis. The WMFT showed adequate floor and ceiling effects with only 5 to 17% of patients scoring the lowest or highest score

Reliability

Nijland et al. (2010) investigated the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the WMFT in 40 patients with stroke with mild to moderate hemiparesis. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the WMFT, as calculated using Cronbach’s Coefficient Alpha was excellent (α = 0.98).

Test-retest:
Morris et al. (2001) analyzed the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the WMFT in 24 clients with stroke. Participants were re-assessed within a 2-week interval. The test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
, as calculated using Pearson Correlation Coefficient, was excellent for both functional ability and performance tests (r = 0.95; 0.90, respectively).

Whitall, Savin, Harris-Love, and Waller (2006) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the WMFT in 66 clients with stroke. Participants were re-assessed within a 2 week interval by the same rater and under the same conditions. Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
, as calculated using Intraclass Correlation Coefficient (ICC)Intraclass correlation (ICC) is used to measure inter-rater reliability for two or more raters. It may also be used to assess test-retest reliability. ICC may be conceptualized as the ratio of between-groups variance to total variance., was found to be excellent (ICC = 0.97).

Inter-rater:
Morris et al. (2001) evaluated the Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the WMFT in 24 clients with stroke. Evaluations were conducted by a physiotherapist and were videotaped. The recordings were then rated by two physiotherapists and one occupational therapistIn charge of the "assessment of personal and domestic care activities; evaluation and treatment of functional impairments related to change in sensorimotor, cognitive and perceptual abilities; prescription of wheelchairs and bathroom appliances; home visits; patient and family education."(Suggested by Philips et al, 2002)
. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
, as calculated using ICC, was excellent for both functional ability and performance tests (ICC = 0.93; 0.99, respectively).

Wolf et al. (2001) verified the Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the WMFT in 19 clients with stroke and in 19 healthy individuals. All participants were evaluated by 2 raters, independently. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
, as calculated using ICC, was excellent (ICC = 0.97)

Whitall et al. (2006) estimated the inter-rater reliability of the WMFT in 10 clients with stroke. The assessment of functional ability was videotaped and rated by three different raters. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was excellent (ICC = 0.99).

Nijland et al. (2010) investigated the psychometric properties of the WMFT and Action Research Arm Test in 40 patients with stroke with mild to moderate hemiparesis. 18 patients participated in the reproducibility testing of the WMFT and were assessed twice by the same observer approximately 10 days apart. Intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
, as analyzed using the ICC was found to be excellent (ICC = 0.94).

Validity

Content:
No studies have reported the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension.
of the WMFT.

Criterion:
Concurrent:
Wolf et al. (2001) examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the WMFT by comparing it to the Upper Extremity Fugl-Meyer Assessment (UE-FMA – Fugl-Meyer, Jääskö, Leyman, Olsson, & Steglind, 1975) as the gold standardA measurement that is widely accepted as being the best available to measure a construct.
in 19 clients with stroke. Adequate correlations were found between the WMFT and the UE-FMA (r = -0.57).

Whitall et al. (2006) assessed the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the WMFT by comparing it to the UE-FMA as the gold standardA measurement that is widely accepted as being the best available to measure a construct.
in 66 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Correlations between the functional ability test of the WMFT and the UE-FMA were excellent (r = -0.88).

Nijland et al. (2010) investigated the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the WMFT by comparing it to the Action Research Arm Test (ARAT – Lyle, 1981) in 40 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. with mild to moderate hemiparesis. For the purpose of their investigation, the WMFT score was split into 4 variables: Functional Ability Score (FAS), median time score (s), item 7 and item 14 (strength). Correlations were calculated between the ARAT total score and the four variables. Excellent correlations between the ARAT total score and the WMFT FAS (r= 0.86), median time score (r=-0.89) and strength tasks (items 7 and 14) (r=0.70) were found.

Predictive:
No studies have reported the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the WMFT.

Construct:
Known groups:
Wolf et al. (2001) evaluated whether the WMFT was able to distinguish between individuals with impairment secondary to stroke (n=19) from those without impairment (n=19). Known group’s validity, as calculated using Wilcoxon testThe Wilcoxon test is a nonparametric test that compares two paired groups. This test calculates and then analyzes the differences between the pairs. The Wilcoxon Rank Sum test is used to determine whether two scores have the same continuous distribution. The Wilcoxon Signed Rank test is suitable to use as an alternative to the paired t-test when the scores are not normally distributed.
, showed that the WMFT scores for the dominant and the non-dominant hand of individuals without impairment were significantly higher when compared to the most and to the least affected upper extremity of clients with stroke.

Responsiveness

No studies have reported the responsivenessThe ability of an instrument to detect clinically important change over time.
of the WMFT.

References

Barreca, S.R., Gowland, C.K., Stratford, P.W., et al. (2004). Development of the Chedoke Arm and Hand Activity Inventory: Theoretical constructs, item generation, and selection. Topics in Stroke Rehabilitation, 11(4), 31- 42.
Fugl-Meyer, A.R., Jääskö, L., Leyman, I., Olsson, S., & Steglind, S. (1975). The post-stroke hemiplegic patient 1. A method for evaluation of physical performance. Scandinavian Journal of Rehabilitation Medicine, 7, 13-31
Lyle, R.C. (1981). A performance test for assessment of upper limb function in physical rehabilitation treatment and research. International Journal of Rehabilitation and Research, 4, 483-492.
Morris, D., Uswatte, G., Crago, J., Cook, E., Taub, E. (2001). The reliability of the Wolf Motor Function Test for assessing upper extremity function after stroke. Arch Phys Med Rehabil, 82, 750-755.
Nijland, R., van Wegen, E., Verbunt, J, van Wijk, R., van Kordelaar, J. & Kwakkel, G. (2010) A comparison of two validated tests for upper limb function after stroke: The Wolf Motor Function Test and the Action Research Arm Test. Journal of Rehabilitation Medicine, 42, 694-696.
Whitall, J., Savin, D., Harris-Love, M., Waller, S. (2006). Psychometric properties of a modified wolf motor function test for people with mild and moderate upper extremity hemiparesis. Arch Phys Med Rehabil, 82, 750-755.
Wolf, S., Catlin, P., Ellis, M., Archer, A., Morgan, B., Piacentino, A. (2001). Assessing Wolf Motor Function Test as outcome measure for research in patients after stroke. Stroke, 32, 1635-1639.
Wolf, S., Thompson, P., Morris, D., Rose, D., Winstein, C., Taub, E., Giuliani, C., and Pearson, S. (2005). The EXCITE Trial: Atrributes of the Wolf Motor Function test in patients with Subacute Stroke. Neurorehabil Neural Repair, 19, 194-205.

See the measure

The WMFT can be obtained from the following publication or by clicking here.:

Wolf, S., Thompson, P., Morris, D., Rose, D., Winstein, C., Taub, E., Giuliani, C., & Pearson, S. (2005). The EXCITE Trial: Atrributes of the Wolf Motor Function test in patients with Subacute Stroke. Neurorehabil Neural Repair, 19, 194-205.