Motor Evaluation Scale for Upper Extremity in Stroke Patients (MESUPES)

Evidence Reviewed as of before: 08-09-2015

Author(s)*: Annabel McDermott, OT

Editor(s): Annie Rochette, PhD OT

Expert Reviewer: Prof. Ann Van de Winckel, PhD, MSc, PT

Content consistency: Gabriel Plumier

Purpose

The MESUPES measures quality of movement performance of the hemiparetic arm and hand in stroke patients. Authors of the assessment are Perfetti & Dal Pezzo (original version of the scale) and Ann Van de Winckel, PhD, MSc, PT (final version of the scale). The original publication of the final version of the scale is by Van de Winckel et al. (2006).

In-Depth Review

Purpose of the measure

The MESUPES measures quality of movement performance of the hemiparetic arm and hand in strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. patients.

Available versions

The original version of the MESUPES comprised 22 items within three categories of arm function (10 items), hand function (9 items) and functional tasks (3 items).

The final version of the measure, analyzed with Principle Component Analysis and Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
resulted in a 17-item version with two categories of arm function (8 items) and hand function (“range of motion” 6 items; and “orientation during functional tasks” 3 items) (Van de Winckel et al., 2006).

Features of the measure

Items:

The original MESUPES is comprised of 22 items in three subscales:

Arm function: 10 items
Hand function: 9 items
Functional tasks: 3 items

The final version of the MESUPES is comprised of 17 items in two subscales:

MESUPES–Arm function: 8 items with 6 response categories (0-5)
MESUPES–Hand function: 9 items with 3 response categories (0-2).

During the MESUPES–Arm subset, patients are required to perform specific movements of the upper limb in three consecutive phases:

The task is performed passively
The therapist assists the patient during the movement
The patient performs the task by him/herself.

During the MESUPES–Hand subsets, patients are instructed to perform specific movements of the hand and fingers by themselves.

Scoring:

As the MESUPES adopts an ordinal scale, Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
has been performed to translate ordinal data into interval measures (logit scores) (Van de Winckel et al., 2006).

Online scoring will soon be available to enable users to input the ordinal scores and retrieve logits scores immediately (personal correspondence, Van de Winckel, 2015).

Subset 1: Arm function

The MESUPES–Arm subset evaluates ‘normal’ movement of the hemiparetic limb, which can be judged by comparison with movement of the patient’s unaffected arm. Only qualitatively ‘normal’ movements of the arm are scored.

The tasks are performed in three phases. The number of phases evaluated depends on the level of ability the patient has, to perform the movement correctly.

Testing phase	Points achieved
1. The therapist moves the patient’s arm and hand and evaluates muscle tone first.
No adequate adaptation of tone to movement:	0 points
Adequate adaptation of tone (normal tone) to at least part of the movement:	1 point
2. If the patient exhibits normal tone, the patient participates in the movement and the therapist evaluates muscle contractions.
The patient demonstrates functionally and qualitatively correct muscle contraction in at least part of the movement:	2 points
3. If the patient exhibits normal muscle contraction, the patient performs the movement independently and the therapist assesses range of movement. A score is given for the range of motion that the patient can perform with good quality of motion.
Part of the movement is performed normally:	3 points
Total range of normal movement is done slowly or with great effort:	4 points
The patient demonstrates normal movement performance:	5 points

The patient is allowed to repeat test items with a maximum of three attempts; the patient is awarded the highest score achieved. See the measure for more scoring information.

Subset 2: Hand function (Range of Motion)

Performance of movement and measurement of range of motion is not compared with the unaffected hand for this subset. Only qualitatively normal movements of the hand and fingers are scored.

Testing procedure	Points achieved
The patient performs the instructed movement actively and the therapist assesses range of movement between 0-2cm qualitatively and quantitatively.	0-2 points
no movement:	0 points
movement amplitude < 2 cm	1 point
movement amplitude ≥ 2 cm	2 points

Subset 3: Hand function (Orientation during functional tasks)

Quality of movement is not compared with the unaffected hand for this subset.

Testing procedure	Points achieved
The patient manipulates materials as instructed and the therapist assesses whether the patient is able to orient the wrist and fingers to the object throughout the movement in a normal way.	0-2 points
no movement or movement with abnormal orientation of fingers and wrist towards the object:	0 points
movement with normal orientation of fingers or wrist towards the object:	1 point
whole movement correct:	2 points

The maximum achievable score is 58 (MESUPES-Arm maximum score is 40; MESUPES-Hand maximum score is 18). The patient is awarded one score for each task, and the highest score is retained. A score of 0 is awarded when the patient demonstrated inadequate tone, abnormal muscle contractions, synergic (flexor/extensor) or mass movement patterns (Appendix 2, Instructions, Van de Winckel et al. , 2006).

What to consider before beginning:

The first four items are performed in supine; all other items are performed in a sitting position with hips and knees at 90 degrees and elbows resting on the table. The patient can be provided support to maintain a sitting position if required. The patient cannot be assessed (and therefore awarded a point) if he/she is not able to sit in an upright position for a task. The therapist can reposition the patient’s upper extremity before beginning each new task, and should wait until the tone is normalized before starting a new task. If the patient is not able to achieve a relaxed starting position, he/she is awarded a score of 0 for the item.

The patient must be given clear instructions using the following steps:

The therapist explains the task verbally and demonstrates the movement
The patient is asked to perform the task with the non-affected side first to ensure he/she understands the demands of the task.

Time:

It takes approximately 10 minutes to administer the evaluation (between 5min for patients with very poor or very good motor impairmentLoss of strength and coordination, decrease in arm or leg movement
– about 15min for patients with more severe hypertonia).

Training requirements:

Instructions are given in Appendix 2 (Van de Winckel et al., 2006) and are available here online. These instructions should suffice for trained clinicians (physical therapists, occupational therapists etc).

For the original evaluation, seven raters were trained for an hour to familiarize them with the assessment protocol (Van de Winckel et al., 2006). In Johansson & Hager’s study (2012), raters underwent a 2h training session.

An instructional video will soon be made available online. In the meantime, the developer of the MESUPES (Prof. Ann Van de Winckel, avandewi@umn.edu) can be contacted to address questions concerning the use of the MESUPES.

Equipment:

Plinth or mat
Desk and chair, positioned so that the patient is sitting with hip and knees in 90 degrees flexion
Wooden or plastic block marked with 1cm and 2cm to measure range of movement during hand tasks
One larger and one smaller plastic bottle (cylinder; diameter 6 cm, like a 20fl oz or 591ml soda or water bottle)
One smaller plastic bottle (cylinder, diameter 2.5cm, height 8cm, like a round correction fluid bottle, as shown in the figure)
Dice (1.5 x 1.5 cm)

Client suitability

Differential item functioning was performed with Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
to test the stability of item hierarchy (from easy to difficult items) on several variables.

There is no differential item functioning across subgroups of gender, age (<60 / ≥60 years), time since strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (< 3 months / ≥ 3 months), country of residence, side of lesion and type of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (hemorrhagic, ischemic) (Van de Winckel et al. 2006), meaning that the hierarchy of items (from easy to difficult) is maintained across all strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. patients groups with the above mentioned variables.

Can be used with:

Individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.

Should not be used with:

The measure is intended for use with adult patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.; there is insufficient evidence regarding psychometric properties of the tool with other populations, including a pediatric population.

In what languages is the measure available?

Catalan (available online, Van de Winckel A, 2015)
Dutch (Flemish) (available online, Van de Winckel, A., 2015)
English (available online, Van de Winckel et al., 2006)
French (available online, Van de Winckel A, 2015)
German (available online, Van Bellingen, T., Van de Winckel, A., et al. 2009. Chapter 1: Assessment in Neurorehabilitation. In Neurology (2^nd ed.) (192-201). Huber.
Italian – (available online, Van de Winckel A, 2015) (Perfetti & Dal Pezzo, original version)
Portuguese (available online, Van de Winckel A, 2015)
Spanish (available online, Van de Winckel A, 2015)
Swedish (available online, Johansson & Hager, 2012)/li>

Summary

What does the tool measure?	The MESUPES measures quality of movement performance of the hemiparetic arm and hand in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
What types of clients can the tool be used for?	The MESUPES was developed for use with adults with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment tool
Time to administer	10 minutes (range 5-15min)
ICF Domain	• Body function/structure • Activity
Versions	Final version (Van de Winckel et al., 2006) = 17 items (total score /58; MESUPES-arm score /40; MESUPES-hand score /18)
Languages	Available online on StrokEngine: Catalan Dutch (Flemish) English French German Italian Portuguese Spanish Swedish
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: One study has reported on the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MESUPES using Principal Component Analysis and Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model. . Results showed high person separation indices and unidimensionality within subtests. Test-retest: Two studies have reported on the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the MESUPES in patients with subacute to chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and reported good to very good agreement over 24-48 hours. Intra-rater: No studies have reported on the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the MESUPES. Inter-rater: Two studies have reported on the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the MESUPES in patients with subacute to chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and reported good to very good agreement between raters for subtests; moderate to very high item reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest . ; and sufficient absolute reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest . of the total score.
ValidityThe degree to which an assessment measures what it is supposed to measure.	Content: One study investigated validityThe degree to which an assessment measures what it is supposed to measure. of the 17-item MESUPES and reported unidimensionality of the arm and hand scales. Criterion: Concurrent: One study examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the MESUPES and reported high correlations with the Modified Motor Assessment Scale (MMAS). Predictive: No studies have reported on predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the MESUPES. Construct: Convergent/Discriminant: No studies have reported on convergent/discriminant validityThe degree to which an assessment measures what it is supposed to measure. of the MESUPES. Known Groups: No studies have reported on known group validityThe degree to which an assessment measures what it is supposed to measure. of the MESUPES.
Floor/Ceiling Effects	No studies have reported on the floor/ceiling effects of the MESUPES.
Does the tool detect change in patients?	• No studies have reported on the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." or specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative). of the MESUPES. • One study reported MDC scores of 8, 7 and 5 (95%, 90% and 80% CI, respectively).
Acceptability	Administration of the MESUPES is easy and fast. The measure is inexpensive and requires minimal standard equipment.
Feasibility	The MESUPES requires no specialized training to administer. However, the MESUPES should only be administered by clinicians with knowledge of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and clinical assessment of tone, muscle contraction and movement.
How to obtain the tool?	See the measure

Psychometric Properties

Overview

A literature search was conducted to identify all relevant publications on the psychometric properties of the MESUPES. Two English studies were identified.

Floor and ceiling effect

No studies have reported on the floor or ceiling effects of the MESUPES.

Van de Winckel (personal correspondence, 2015) noted that in the study by Van de Winckel et al. (2006) in which 396 patients with low to high motor performance following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. were assessed using the MESUPES less than 5% of patients achieved a score of 0 on the arm items and less than 20% of participants achieved the maximum score. Approximately 42% of participants achieved a score of 0 on the hand items and less than 5% of patients achieved a maximum score on the hand items.

Reliability

Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.:
Van de Winckel et al. (2006) examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MESUPES in a sample of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. using Principal Component Analysis and Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
. Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
was used to determine ‘item-trait interaction’, which shows the degree of invariance across the intended dimension, and ‘person separation index’. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. was obtained when the MESUPES was divided into the MESUPES-Arm (8 items) and MESUPES-Hand (9 items) subtests. Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
and fit statistics showed that both subtests adhered to unidimensional characteristics, whereby all items in the subtests pertain to the same construct. The person separation index was 0.99 for the MESUPES-Arm and 0.97 for the MESUPES-Hand, indicating very high internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency..

Test-retest:
See inter-rater reliability above for results also pertaining to test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
.

Intra-rater:
No studies have reported on the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the MESUPES.

Inter-rater:
Van de Winckel et al. (2006) investigated inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the MESUPES in a sample of 56 patients with subacute to chronic stroke. Assessments were conducted by 2 assessors over 24 hours. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
, calculated using intra-class correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients (ICCs) was excellent for the arm function total score (ICC=0.95, 95% CI 0.91-0.97) and hand function total score (ICC=0.97, 95% CI 0.95-0.98). Assessment of inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
by weighted percentage agreement and weighted kappa confirmed item reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
for the arm function subtest (weighted kappa coefficient = 0.62-0.79; weighted percentage agreement 85.71-98.21); scores were not derived for hand function items as more than 50% of the sample scored 0.

Johansson & Hager (2012) investigated inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the MESUPES in a sample of 42 patients with subacute to chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Assessments were conducted by 2 therapists within 48 hours. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
, calculated by percentage agreement using linear-weighted kappa analysis revealed good to very good agreement between raters (kappa range 0.63-0.96). Relative and absolute reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
was measured using intra-class correlation coefficients (ICCs) and standard error of measurement (SEM): item reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
was moderate to very high (ICC=0.63-0.96); reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of subscores and the total score was very high (ICC=0.98, 95% CI 0.96=0.99); and the total score demonstrated sufficient absolute reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
(SEM=2.68).

Validity

Content:

The original version of the MESUPES developed by Perfetti & Dal Pezzo comprised 22 items across three categories of (i) arm function (10 items); (ii) hand function (9 items); and (iii) functional tasks (3 items).

Van de Winckel et al. (2006) investigated validityThe degree to which an assessment measures what it is supposed to measure.
and unidimensionality of the MESUPES in a sample of 396 patients with subacute to chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Principle Component Analysis (PCA) of the original 22-item version revealed two dimensions: arm function and hand function. Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
of these two separate scales identified misfit among five items (respectively 2 arm items and 3 hand items). Following removal of these items, subsequent Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
of the remaining 17 items and fit statistics confirmed unidimensionality of both arm and hand scales:

	Person fit	Item fit	Person separation index
Arm function	-0.51±1.19	-0.65±1.07	0.99
Hand function	-0.12±0.71	0.15±1.21	0.97

Test items followed an order of increasing difficulty with no reversed thresholds and no differential item functioning (DIF) according to gender, age (<60, ≥60), side of hemiparesis, time since strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (< 3 months, ≥ 3 months), type of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. or country (Van de Winckel et al., 2006).

Criterion:

Concurrent:
Johansson & Hager (2012) investigated concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the MESUPES in a sample of 42 patients with subacute to chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. by comparison with the Modified Motor Assessment Scale (MMAS), using Spearman’s rho. Correlations were high between the MESUPES total scores and the MMAS (r=0.87); MESUPES arm items and MMAS (r=0.84); and MESUPES hand items and MMAS (r=0.80).

Predictive:
No studies have reported on the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the MESUPES.

Construct:

Convergent/Discriminant:
No studies have reported on convergent/discriminant validityThe degree to which an assessment measures what it is supposed to measure.
of the MESUPES.

Known Group:
No studies have reported on the known group validityThe degree to which an assessment measures what it is supposed to measure.
of the MESUPES.

Responsiveness

Johansson & Hager (2012) assessed minimal detectable change (MDC)Minimal Detectable Change (MDC) refers to the minimal amount of change outside of error that reflects true change by a patient between two time points (rather than a variation in measurement). of the MESUPES with a sample of 42 patients with subacute to chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Patients were assessed at two time points 48 hours apart. The authors reported change scores of 8, 7 and 5 (95%, 90% and 80% confidence intervals, respectively) were required for certainty of true change.

SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
& SpecificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
:
No studies have reported on sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
/specificity of the MESUPES.

References

Johansson, G.M. & Hager, C.K. (2012). Measurement properties of the Motor Evaluation Scale for Upper Extremity in Stroke Patients (MESUPES). Disability & Rehabilitation, 34(4):288-94. DOI: 10.3109/09638288.2011.606343
Van de Winckel, A., Feys, H., van der Knaap, S., Messerli, R., Baronti, F., Lehmann, R., Van Hemelrijk, B., Pante, F., Perfetti, C., & De Weerdt, W. (2006). Can quality of movement be measured? Rasch analysis and inter-rater reliability of the Motor Evaluation Scale for Upper Extremity in Stroke Patients (MESUPES). Clinical Rehabilitation, 20, 871-84.

See the measure

How to obtain the MESUPES

Click on the language below:

Please click here for an instructional video on how to use the scale.