Motor Evaluation Scale for Upper Extremity in Stroke Patients (MESUPES)
Purpose
The MESUPES measures quality of movement performance of the hemiparetic arm and hand in stroke
In-Depth Review
Purpose of the measure
The MESUPES measures quality of movement performance of the hemiparetic arm and hand in stroke
Available versions
The original version of the MESUPES comprised 22 items within three categories of arm function (10 items), hand function (9 items) and functional tasks (3 items).
The final version of the measure, analyzed with Principle Component Analysis and Rasch analysis
resulted in a 17-item version with two categories of arm function (8 items) and hand function (“range of motion” 6 items; and “orientation during functional tasks” 3 items) (Van de Winckel et al., 2006).
Features of the measure
Items:
The original MESUPES is comprised of 22 items in three subscales:
- Arm function: 10 items
- Hand function: 9 items
- Functional tasks: 3 items
The final version of the MESUPES is comprised of 17 items in two subscales:
- MESUPES–Arm function: 8 items with 6 response categories (0-5)
- MESUPES–Hand function: 9 items with 3 response categories (0-2).
During the MESUPES–Arm subset, patients are required to perform specific movements of the upper limb in three consecutive phases:
- The task is performed passively
- The therapist assists the patient during the movement
- The patient performs the task by him/herself.
During the MESUPES–Hand subsets, patients are instructed to perform specific movements of the hand and fingers by themselves.
Scoring:
As the MESUPES adopts an ordinal scale, Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute – such as upper limb function – independently of particular tests or indices.  It creates a linear representation using many individual items, ranked by item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.   A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty. The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information from various tests or tools with different scoring systems to be applied using the Rasch model.
has been performed to translate ordinal data into interval measures (logit scores) (Van de Winckel et al., 2006).
Online scoring will soon be available to enable users to input the ordinal scores and retrieve logits scores immediately (personal correspondence, Van de Winckel, 2015).
Subset 1: Arm function
The MESUPES–Arm subset evaluates ‘normal’ movement of the hemiparetic limb, which can be judged by comparison with movement of the patient’s unaffected arm. Only qualitatively ‘normal’ movements of the arm are scored.
The tasks are performed in three phases. The number of phases evaluated depends on the level of ability the patient has, to perform the movement correctly.
Testing phase | Points achieved |
---|---|
1. The therapist moves the patient’s arm and hand and evaluates muscle tone first. | |
No adequate adaptation of tone to movement: | 0 points |
Adequate adaptation of tone (normal tone) to at least part of the movement: | 1 point |
2. If the patient exhibits normal tone, the patient participates in the movement and the therapist evaluates muscle contractions. | |
The patient demonstrates functionally and qualitatively correct muscle contraction in at least part of the movement: | 2 points |
3. If the patient exhibits normal muscle contraction, the patient performs the movement independently and the therapist assesses range of movement.
A score is given for the range of motion that the patient can perform with good quality of motion. |
|
Part of the movement is performed normally: | 3 points |
Total range of normal movement is done slowly or with great effort: | 4 points |
The patient demonstrates normal movement performance: | 5 points |
The patient is allowed to repeat test items with a maximum of three attempts; the patient is awarded the highest score achieved. See the measure for more scoring information.
Subset 2: Hand function (Range of Motion)
Performance of movement and measurement of range of motion is not compared with the unaffected hand for this subset. Only qualitatively normal movements of the hand and fingers are scored.
Testing procedure | Points achieved |
---|---|
The patient performs the instructed movement actively and the therapist assesses range of movement between 0-2cm qualitatively and quantitatively. | 0-2 points |
no movement: | 0 points |
movement amplitude < 2 cm | 1 point |
movement amplitude ≥ 2 cm | 2 points |
Subset 3: Hand function (Orientation during functional tasks)
Quality of movement is not compared with the unaffected hand for this subset.
Testing procedure | Points achieved |
---|---|
The patient manipulates materials as instructed and the therapist assesses whether the patient is able to orient the wrist and fingers to the object throughout the movement in a normal way. | 0-2 points |
no movement or movement with abnormal orientation of fingers and wrist towards the object: | 0 points |
movement with normal orientation of fingers or wrist towards the object: | 1 point |
whole movement correct: | 2 points |
The maximum achievable score is 58 (MESUPES-Arm maximum score is 40; MESUPES-Hand maximum score is 18). The patient is awarded one score for each task, and the highest score is retained. A score of 0 is awarded when the patient demonstrated inadequate tone, abnormal muscle contractions, synergic (flexor/extensor) or mass movement patterns (Appendix 2, Instructions, Van de Winckel et al. , 2006).
What to consider before beginning:
The first four items are performed in supine; all other items are performed in a sitting position with hips and knees at 90 degrees and elbows resting on the table. The patient can be provided support to maintain a sitting position if required. The patient cannot be assessed (and therefore awarded a point) if he/she is not able to sit in an upright position for a task. The therapist can reposition the patient’s upper extremity before beginning each new task, and should wait until the tone is normalized before starting a new task. If the patient is not able to achieve a relaxed starting position, he/she is awarded a score of 0 for the item.
The patient must be given clear instructions using the following steps:
- The therapist explains the task verbally and demonstrates the movement
- The patient is asked to perform the task with the non-affected side first to ensure he/she understands the demands of the task.
Time:
It takes approximately 10 minutes to administer the evaluation (between 5min for patients with very poor or very good motor impairment
– about 15min for patients with more severe hypertonia).
Training requirements:
Instructions are given in Appendix 2 (Van de Winckel et al., 2006) and are available here online. These instructions should suffice for trained clinicians (physical therapists, occupational therapists etc).
For the original evaluation, seven raters were trained for an hour to familiarize them with the assessment protocol (Van de Winckel et al., 2006). In Johansson & Hager’s study (2012), raters underwent a 2h training session.
An instructional video will soon be made available online. In the meantime, the developer of the MESUPES (Prof. Ann Van de Winckel, avandewi@umn.edu) can be contacted to address questions concerning the use of the MESUPES.
Equipment:
- Plinth or mat
- Desk and chair, positioned so that the patient is sitting with hip and knees in 90 degrees flexion
- Wooden or plastic block marked with 1cm and 2cm to measure range of movement during hand tasks
- One larger and one smaller plastic bottle (cylinder; diameter 6 cm, like a 20fl oz or 591ml soda or water bottle)
- One smaller plastic bottle (cylinder, diameter 2.5cm, height 8cm, like a round correction fluid bottle, as shown in the figure)
- Dice (1.5 x 1.5 cm)
Client suitability
Differential item functioning was performed with Rasch analysis
to test the stability of item hierarchy (from easy to difficult items) on several variables.
There is no differential item functioning across subgroups of gender, age (<60 / ≥60 years), time since stroke
Can be used with:
- Individuals with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain.
Should not be used with:
- The measure is intended for use with adult patients with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain.; there is insufficient evidence regarding psychometric properties of the tool with other populations, including a pediatric population.
In what languages is the measure available?
- Catalan (available online, Van de Winckel A, 2015)
- Dutch (Flemish) (available online, Van de Winckel, A., 2015)
- English (available online, Van de Winckel et al., 2006)
- French (available online, Van de Winckel A, 2015)
- German (available online, Van Bellingen, T., Van de Winckel, A., et al. 2009. Chapter 1: Assessment in Neurorehabilitation. In Neurology (2nd ed.) (192-201). Huber.
- Italian – (available online, Van de Winckel A, 2015) (Perfetti & Dal Pezzo, original version)
- Portuguese (available online, Van de Winckel A, 2015)
- Spanish (available online, Van de Winckel A, 2015)
- Swedish (available online, Johansson & Hager, 2012)/li>
Summary
What does the tool measure? | The MESUPES measures quality of movement performance of the hemiparetic arm and hand in patients with stroke |
What types of clients can the tool be used for? | The MESUPES was developed for use with adults with stroke |
Is this a screening or assessment tool? |
Assessment tool |
Time to administer | 10 minutes (range 5-15min) |
ICF Domain | • Body function/structure • Activity |
Versions | Final version (Van de Winckel et al., 2006) = 17 items (total score /58; MESUPES-arm score /40; MESUPES-hand score /18) |
Languages |
Available online on StrokEngine:
|
Measurement Properties | |
Reliability |
Internal consistency One study has reported on the internal consistency . Results showed high person separation indices and unidimensionality within subtests. Test-retest: Intra-rater: Inter-rater: |
Validity |
Content: One study investigated validity of the 17-item MESUPES and reported unidimensionality of the arm and hand scales. Criterion: Predictive: Construct: Known Groups: |
Floor/Ceiling Effects | No studies have reported on the floor/ceiling effects of the MESUPES. |
Does the tool detect change in patients? | • No studies have reported on the sensitivity or specificity of the MESUPES. • One study reported MDC scores of 8, 7 and 5 (95%, 90% and 80% CI, respectively). |
Acceptability | Administration of the MESUPES is easy and fast. The measure is inexpensive and requires minimal standard equipment. |
Feasibility | The MESUPES requires no specialized training to administer. However, the MESUPES should only be administered by clinicians with knowledge of stroke |
How to obtain the tool? | See the measure |
Psychometric Properties
Overview
A literature search was conducted to identify all relevant publications on the psychometric properties of the MESUPES. Two English studies were identified.
Floor and ceiling effect
No studies have reported on the floor or ceiling effects of the MESUPES.
Van de Winckel (personal correspondence, 2015) noted that in the study by Van de Winckel et al. (2006) in which 396 patients with low to high motor performance following stroke
Reliability
Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.:
Van de Winckel et al. (2006) examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MESUPES in a sample of patients with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. using Principal Component Analysis and Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute – such as upper limb function – independently of particular tests or indices.  It creates a linear representation using many individual items, ranked by item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.   A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty. The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information from various tests or tools with different scoring systems to be applied using the Rasch model.
. Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute – such as upper limb function – independently of particular tests or indices.  It creates a linear representation using many individual items, ranked by item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.   A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty. The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information from various tests or tools with different scoring systems to be applied using the Rasch model.
was used to determine ‘item-trait interaction’, which shows the degree of invariance across the intended dimension, and ‘person separation index’. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. was obtained when the MESUPES was divided into the MESUPES-Arm (8 items) and MESUPES-Hand (9 items) subtests. Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute – such as upper limb function – independently of particular tests or indices.  It creates a linear representation using many individual items, ranked by item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.   A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty. The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information from various tests or tools with different scoring systems to be applied using the Rasch model.
and fit statistics showed that both subtests adhered to unidimensional characteristics, whereby all items in the subtests pertain to the same construct. The person separation index was 0.99 for the MESUPES-Arm and 0.97 for the MESUPES-Hand, indicating very high internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency..
Test-retest:
See inter-rater reliability above for results also pertaining to test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
.
Intra-rater:
No studies have reported on the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater’s subsequent ratings are contaminated by knowledge of earlier ratings.
of the MESUPES.
Inter-rater:
Van de Winckel et al. (2006) investigated inter-rater reliability
of the MESUPES in a sample of 56 patients with subacute to chronic stroke
, calculated using intra-class correlation
coefficients (ICCs) was excellent for the arm function total score (ICC=0.95, 95% CI 0.91-0.97) and hand function total score (ICC=0.97, 95% CI 0.95-0.98). Assessment of inter-rater reliability
by weighted percentage agreement and weighted kappa confirmed item reliability
for the arm function subtest (weighted kappa coefficient = 0.62-0.79; weighted percentage agreement 85.71-98.21); scores were not derived for hand function items as more than 50% of the sample scored 0.
Johansson & Hager (2012) investigated inter-rater reliability
of the MESUPES in a sample of 42 patients with subacute to chronic stroke
, calculated by percentage agreement using linear-weighted kappa analysis revealed good to very good agreement between raters (kappa range 0.63-0.96). Relative and absolute reliability
was measured using intra-class correlation
coefficients (ICCs) and standard error of measurement (SEM): item reliability
was moderate to very high (ICC=0.63-0.96); reliability
of subscores and the total score was very high (ICC=0.98, 95% CI 0.96=0.99); and the total score demonstrated sufficient absolute reliability
(SEM=2.68).
Validity
Content:
The original version of the MESUPES developed by Perfetti & Dal Pezzo comprised 22 items across three categories of (i) arm function (10 items); (ii) hand function (9 items); and (iii) functional tasks (3 items).
Van de Winckel et al. (2006) investigated validityThe degree to which an assessment measures what it is supposed to measure.
and unidimensionality of the MESUPES in a sample of 396 patients with subacute to chronic strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain.. Principle Component Analysis (PCA) of the original 22-item version revealed two dimensions: arm function and hand function. Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute – such as upper limb function – independently of particular tests or indices.  It creates a linear representation using many individual items, ranked by item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.   A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty. The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information from various tests or tools with different scoring systems to be applied using the Rasch model.
of these two separate scales identified misfit among five items (respectively 2 arm items and 3 hand items). Following removal of these items, subsequent Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute – such as upper limb function – independently of particular tests or indices.  It creates a linear representation using many individual items, ranked by item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.   A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty. The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information from various tests or tools with different scoring systems to be applied using the Rasch model.
of the remaining 17 items and fit statistics confirmed unidimensionality of both arm and hand scales:
Person fit | Item fit | Person separation index | |
---|---|---|---|
Arm function | -0.51±1.19 | -0.65±1.07 | 0.99 |
Hand function | -0.12±0.71 | 0.15±1.21 | 0.97 |
Test items followed an order of increasing difficulty with no reversed thresholds and no differential item functioning (DIF) according to gender, age (<60, ≥60), side of hemiparesis, time since stroke
Criterion:
Concurrent:
Johansson & Hager (2012) investigated concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also “gold standard.”
of the MESUPES in a sample of 42 patients with subacute to chronic strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. by comparison with the Modified Motor Assessment Scale (MMAS), using Spearman’s rho. Correlations were high between the MESUPES total scores and the MMAS (r=0.87); MESUPES arm items and MMAS (r=0.84); and MESUPES hand items and MMAS (r=0.80).
Predictive:
No studies have reported on the predictive validity
of the MESUPES.
Construct:
Convergent/Discriminant:
No studies have reported on convergent/discriminant validityThe degree to which an assessment measures what it is supposed to measure.
of the MESUPES.
Known Group:
No studies have reported on the known group validity
of the MESUPES.
Responsiveness
Johansson & Hager (2012) assessed minimal detectable change (MDC)Minimal Detectable Change (MDC) refers to the minimal amount of change outside of error that reflects true change by a patient between two time points (rather than a variation in measurement). of the MESUPES with a sample of 42 patients with subacute to chronic strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain.. Patients were assessed at two time points 48 hours apart. The authors reported change scores of 8, 7 and 5 (95%, 90% and 80% confidence intervals, respectively) were required for certainty of true change.
SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also “Specificity.”
& SpecificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
:
No studies have reported on sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also “Specificity.”
/specificity of the MESUPES.
References
- Johansson, G.M. & Hager, C.K. (2012). Measurement properties of the Motor Evaluation Scale for Upper Extremity in Stroke Patients (MESUPES). Disability & Rehabilitation, 34(4):288-94. DOI: 10.3109/09638288.2011.606343
- Van de Winckel, A., Feys, H., van der Knaap, S., Messerli, R., Baronti, F., Lehmann, R., Van Hemelrijk, B., Pante, F., Perfetti, C., & De Weerdt, W. (2006). Can quality of movement be measured? Rasch analysis and inter-rater reliability of the Motor Evaluation Scale for Upper Extremity in Stroke Patients (MESUPES). Clinical Rehabilitation, 20, 871-84.
See the measure
How to obtain the MESUPES
Click on the language below:
Please click here for an instructional video on how to use the scale.