ADL Profile

Evidence Reviewed as of before: 14-12-2012

Author(s)*: Valérie Poulin; Vanessa Barfod, BA

Editor(s): Annabel McDermott, OT.; Nicol Korner-Bitensky, PhD OT

Expert Reviewer: Carolina Bottari, erg. PhD

Purpose

The ADL Profile is a criterion-referenced measure of independence in everyday activities such as self-care, household management and community activities for individuals with a traumatic brain injury (TBI). The ADL Profile was created by Elisabeth Dutil, Carolina Bottari, Marie Vanier and Céline Gaudreault.

In-Depth Review

Purpose of the measure

The ADL Profile is a criterion-referenced measure of independence in everyday activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
in consideration of the important contribution of executive functions for individuals with a traumatic brain injury (TBI) (Canadian Association of Occupational Therapists, 2012; C. Bottari, personal communication, November 6, 2012). The ADL Profile consists of both a performance-based assessment (evaluator’s direct observation of performance) and a questionnaire administered in the form of semi-structured interviews with the person and a significant other (perceptions of person and significant other of person’s functioning). The ADL Profile assesses an individual’s ability to formulate and plan goals for personal and instrumental activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living (PADL and IADL) in interaction with the environment in which they live. Using a task-analysis framework, the individual’s independent performance of ADL tasks is quantitatively analyzed according to 4 executive function operations:

Formulating a goal
PlanningPlanning ability involves anticipating future events, formulating a goal or endpoint, and devising a sequence of steps or actions that will achieve the goal or endpoint" (Anderson, 2008, p. 17)
Carrying out the task
Verifying attainment of the initial goal (Bottari et al., 2010b).

The ADL Profile was originally developed for use with patients with traumatic brain injury as an assessment of independence in everyday activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
within three environments:

Personal (self-care dimension);
Home (home dimension); and
Community (community dimension).

An exhaustive list of variables was derived from existing ADL tools and were organized as activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
, tasks or operations under the three dimensions of personal care, home and community environments, according to Crochart’s (1987) ergonomic model. An expert group of therapists and researchers were consulted to review the refined list of variables and to ensure that all domains related to the concept of ADLs were included in the instrument. A review of the literature on components of ADL assessments also provided support for the experts’ verdict (Dutil et al., 2005).

Available versions

There are no alternate versions of the ADL Profile.

The IADL Profile is a revised version of the ADL Profile (Bottari et al., 2010b) and as such is not included in this review.

Features of the measure

Items:

The ADL Profile consists of 20 PADL and IADL tasks in two parts:

A non-structured, performance-based assessment that comprises observation of 17 tasks; and
A semi-structured interview administered to the patient and his/her significant other that documents 3 tasks (indicated below).

The 20 items relate to (a) personal care; (b) household management; and (c) community activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
: Personal care items (6 tasks):

Bathing/showering
Grooming
Toileting
Putting on clothes and shoes
Having a meal
Following his or her diet/taking his or her medication*

Household management (5 tasks):

Preparing a light meal
Preparing a hot meal
Doing daily housecleaning
Doing weekly housecleaning
Doing laundry

Community activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
(9 tasks):

Walking or moving outdoors
Using public transportation
Driving a car*
Running errands
Telephoning for information
Paying bill
Using an automatic banking machine
Making a budget
Keeping appointments*

* Three tasks are evaluated by semi-structured interview. Driving is not evaluated per se but certain information is gathered regarding this activity.

Description of tasks:

The client is asked what he/she would normally do at that time of the day and is then given the opportunity to perform the ADL task without assistance from the clinician. The non-structured evaluation enables the clinician to observe deficits relating to executive processes. Accordingly, the clinician informs the client at the onset of the test that he/she will provide limited interactions with the client throughout the examination. This enables the clinician to observe the client’s ability to manage on his/her own. It is important that the clinician provides a minimum of structure and assistance during the test, even if the client makes an error, because observing their ability to monitor and correct errors without assistance is crucial to showing their independence in consideration of executive functions. The examiner may withhold cueing for up to 10 minutes, unless a situation is judged as dangerous. If the person is clearly unable to perform a component without help the examiner may provide graded assistance (Bottari et al, 2010b).

If the client chooses a task that is not part of the ADL Profile the clinician may ask the client to consider another goal.

What to consider before beginning:

The ADL Profile is best administered in the client’s home and community environment (Bottari et al., 2010b).

Klein et al. (2008) reviewed standardized performance-based ADL measures developed for adult/geriatric populations using an action research study design with 10 occupational therapists working with adult/geriatric clients with physical dysfunction in a tertiary-care rehabilitation hospital, to identify which measures best matched principles of occupational therapy practice and intended outcomes. The ADL Profile achieved the highest rating for its ‘fit’ with the values, beliefs and principles that underpin occupational therapy practice when compared with 17 other ADL measures, including the Assessment of Motor and Process Skills, Rivermead ADL Assessment, the Functional Performance Measure, Nottingham ADL Scale, Barthel Index and Functional Independence Measure. The ADL Profile met four of five construct criteria:

Client-centred (score enables item relevancy for client);
Dynamic interaction (measure acknowledges dynamic interaction between the client, task and physical environment, but does not consider the financial or social environment);
Uniqueness of the individual (measure enables assessment of physical, affective and cognitive performance components); and
Uniqueness of performance (measure incorporates client determination of task process unless client safety is a factor).

Of the 18 tools analysed, none achieved a score for the fifth dimension, a holistic perspective (i.e. integration of the client’s roles, culture, resources, spiritual beliefs and values). While it was reported that the ADL Profile did not consider the social environment, it is important to note that the questionnaire is administered to the patient’s significant other.

Scoring and Score Interpretation:

Each task is scored according to independence in task performance (task score) and the manner in which the task is performed (operation score) with regards to the following four operations:

(i)formulate a goal
(ii)plan
(iii)carry out the task
(iv)verify attainment of the initial goal (Bottari et al., 2010b).

Tasks are scored using a four-level ordinal scale:

0	dependent
1	requires verbal assistance (1v) or physical assistance (1p) or verbal and physical assistance (1vp)
2	independent with difficulty
3	independent

Scores are not added across tasks or operations. The task score is determined by the lowest score on any of the four operations observed during performance of the task. Therefore, difficulty in any operation directly influences independence and task performance.

Time:

Time to administer the ADL Profile will depend on the client’s stage of recovery and the number of tasks the clinician needs to administer. In acute care, it may take between 30 and 60 minutes as the clinician may decide to only administer self care tasks and one or two tasks from the community or home domains. When administered in preparation for discharge from a rehabilitation hospital or to community based participants to whom all tasks may be pertinent to administer, up to 7 hours may be required.

The administration time is acceptable when only components of the tool are administered to the subjects, but the assessment is length if administered in full. However, the authors note that the wealth of information obtained from observing the person complete various activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
in her home and community environment cannot be underestimated in terms of its contribution to treatment planningPlanning ability involves anticipating future events, formulating a goal or endpoint, and devising a sequence of steps or actions that will achieve the goal or endpoint" (Anderson, 2008, p. 17)
.

Training requirements:

The ADL Profile is intended for use by occupational therapists. It is recommended that clinicians complete a three-day training course to ensure correct administration and interpretation. The course provides information regarding the measure (objectives, conceptual frameworks, variables, administration procedure, scoring and interpretation), uses video to provide instruction regarding administration, and provides opportunities to practice task analysis and scoring (Bottari et al., 2010a).

Subscales:

N/A

Equipment:

The ADL Profile does not necessitate specialized equipment but requires any objects the client typically uses in his/her daily living.

Alternative forms of the ADL

There are no other forms of the ADL Profile.

Client suitability

Can be used with:

Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Patients with TBI throughout the continuum of care: – to assist in discharge planningPlanning ability involves anticipating future events, formulating a goal or endpoint, and devising a sequence of steps or actions that will achieve the goal or endpoint" (Anderson, 2008, p. 17)
from an acute care hospital, in rehabilitation and for community reintegration (Bottari et al., 2006; C. Bottari personal communication, November 6, 2012).
Patients with schizophrenia (Semkovska et al., 2004).

Should not be used with:

None reported

In what languages is the measure available?

French and English.

Summary

What does the tool measure?	The ADL Profile measures independence in everyday activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. in consideration of executive function deficits related to goal setting, planningPlanning ability involves anticipating future events, formulating a goal or endpoint, and devising a sequence of steps or actions that will achieve the goal or endpoint" (Anderson, 2008, p. 17) and execution.
What types of clients can the tool be used for?	Clients with traumatic brain injury and strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment tool
Time to administer	Time to administer the ADL Profile will depend on the client’s stage of recovery and the number of tasks the clinician needs to administer. In acute care, it may take between 30 and 60 minutes as the clinician may decide to only administer self care tasks and one or two tasks from the community or home domains. When administered in preparation for discharge from a rehabilitation hospital or to community based participants to whom all tasks may be pertinent to administer, up to 7 hours may be required.
Versions	ADL Profile
Other Languages	French
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: No studies have examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the ADL Profile when used with an adult strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population. Test-retest: No studies have examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the ADL Profile when used with an adult strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population. Intra-rater: No studies have examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the ADL Profile when used with an adult strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population. Inter-rater: One study reported adequate inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. for three tasks: preparing a hot meal; eating; obtaining information.
ValidityThe degree to which an assessment measures what it is supposed to measure.	Content: The ADL Profile was established through literature reviews and consultation with expert researchers and clinicians. Criterion: No studies have reported on the criterion validityExamines the extent to which a measure provides results that are consistent with a gold standard . It is typically divided into concurrent validity and predictive validity . of the ADL Profile when used with an adult strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population. Construct: Convergent: One study reported significant correlations between five ADL Profile tasks related to personal care and corresponding tasks of the FIM (Standing up, Toilet transfers, Bathtub transfers, Walking, Stair climbing).
Floor/Ceiling Effects	No studies have examined ceiling effects of the ADL Profile when used with an adult strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.
SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." / SpecificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).	No studies have examined the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." /specificity of the ADL Profile when used with an adult strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.
Does the tool detect change in patients?	No studies have reported on responsivenessThe ability of an instrument to detect clinically important change over time. of the ADL Profile when used with an adult strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.
Acceptability	The administration time is acceptable when only components of the tool are administered, but administration in full may take up to seven hours over several sessions. However, the wealth of information obtained from observing the person complete various activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. in her home and community environment cannot be underestimated in terms of its contribution to treatment planningPlanning ability involves anticipating future events, formulating a goal or endpoint, and devising a sequence of steps or actions that will achieve the goal or endpoint" (Anderson, 2008, p. 17) .
Feasibility	The ADL Profile can be administered by an occupational therapistIn charge of the "assessment of personal and domestic care activities; evaluation and treatment of functional impairments related to change in sensorimotor, cognitive and perceptual abilities; prescription of wheelchairs and bathroom appliances; home visits; patient and family education."(Suggested by Philips et al, 2002) . It requires completion of a three-day training course.
How to obtain the tool?	Available at the Canadian Association of Occupational Therapists: www.caot.ca or Les Éditions Émersion http://www.leseditionsemersion.com/articles.php?lng=fr&pg=6.

* Initially developed for a traumatic-brain injured population, the psychometric properties of the tool with this population are described in the administration guide of the too

Psychometric Properties

Overview

A literature search was conducted to identify all relevant publications on the psychometric properties of the ADL Profile relevant to individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Two studies were found.

Floor/Ceiling Effects

No studies have reported ceiling effects of the ADL Profile in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Reliability

Test-retest:
No studies have reported on the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the ADL Profile in patients with stroke.

Intra-rater:
No studies have reported on the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the ADL Profile in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Inter-rater:
Dell’Aniello-Gauthier (1994) reported that the ADL Profile demonstrates adequate inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
(mean kappa = 0.58-0.68) for three ADL tasks: preparing a hot meal, eating and obtaining information.

Validity

Content:

The ADL Profile was established through literature reviews & consultation with expert researchers and clinicians (Dutil et al., 2005).

Criterion:

No studies have reported on the criterion validityExamines the extent to which a measure provides results that are consistent with a gold standard . It is typically divided into concurrent validity and predictive validity .
of the ADL Profile.

Construct:

Convergent:
Gervais (1995) found significant correlations between 5 tasks of the ADL Profile related to personal care and corresponding tasks of the Functional Independence Measure (Kendall’s tau c = 0.40-0.73; p<.001).

Responsiveness

No studies have examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the ADL Profile.

References

Azegami M., Ohira M., Miyoshi K., Kobayashi C., Hongo M., Yanagihashi R., & Sadoyama T. (2007) Effect of single and multi-joint lower extremity muscle strength on the functional capacity and ADL/IADL status in Japanese community-dwelling older adults. Nursing & Health Sciences, 9(3), 168-176.
Bottari, C., Dutil, C., Dassa, C., & Rainville, C. (2006) Choosing the most appropriate environment to evaluate independence in everyday activities: Home or clinic? Australian Occupational Therapy Journal, 53, 98-106.
Bottari, C., Dassa, C., Rainville, C., & Dutil, C. (2010a). A Generalizability Study of the Instrumental Activities of Daily Living Profile. Archives of Physical Medicine and Rehabilitation, 91, 734-42
Bottari, C., Dassa, C., Rainville, C., & Dutil, C. (2010b). The IADL Profile: Development, content validity, intra- and interrater agreement. The Canadian Journal of Occupational Therapy, 77 (2), 90-101.
Canadian Association of Occupational Therapists. (2012). ADL Profile. Retrieved from: http://www.caot.ca/default.asp?pageid=1438
Crochard, K. (1987). Les activités du GESCOM en 1986. Paris: Centre national d’études des telecommunications.
Dell’Anniello-Gauthier, M. (1994). Étude métrologique du mini-profil, instrument de mesure du statut fonctionnel des personnes âgées victimes d’un accident vasculaire cérébral. Sherbrooke, Québec : Université de Sherbrooke.
Dutil, E., Bottari, C., Vanier, M., & Gaudreault, C. (2005). ADL Profile: description of the instrument. 4th ed. Montréal: Les Éditions Émersion.
Dutil, E., Bottari, C., Vanier, M. & Gaudreault, C. (2005). Profil des AVQ: Description de l’outil, 4th ed. Montréal: Les Éditions Émersion.
Fougeyrollas, P. Saint-Michel, G. & Blouin, M. (1989). Propostition d’une révision du 3e niveau de la CIDIH: le handicap. [Proposition for a revision of the 3rd level of the International Classification of Handicaps: the handicap]. Réseau International CIDIH, 2 (1), 9-32.
Gervais N. (1995). Comparaison du profil des AVQ et de la mesure d’indépendance fonctionnelle: validité de trait. Montréal: Université de Montréal.
Kielhofner, G. (1995). A Model of Human Occupation: Theory and Application. USA: Lippincott Williams & Wilkins.
Klein, S., Barlow, I. & Hollis, V. (2008). Evaluating ADL measures from an occupational therapy perspective. Canadian Journal of Occupational Therapy 75,: 69-81.
Lawton, P. (1983). Environment and other determinants of well-being in older people. The Gerontologist 23, 349-357.
Luria, A.R. (1973). The Working Brain – An Introduction to Neuropsychology. New York: Basic Books.
Semkovska, M., Bedard, M.A., Godbout, L., Limoge, F., & Stip, E. (2004) Assessment of executive dysfunction during activities of daily living in schizophrenia. Schizophrenia Research, 69: 289-300

See the measure

How to obtain the ADL Profile:

The ADL Profile can be purchased at Les Éditions Émersion (https://www.leseditionsemersion.com).

Assessment of Motor and Process Skills (AMPS)

Evidence Reviewed as of before: 26-11-2010

Author(s)*: Lisa Zeltzer, MSc OT

Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Expert Reviewer: Dianna Robertson, BSc OT, MSc OT (Thesis candidate)

Purpose

The Assessment of Motor and Process Skills (AMPS) is an observational assessment that allows for the simultaneous evaluation of motor and process skills and their effect on the ability of an individual to perform complex or instrumental and personal activities of daily living (ADL). The AMPS is comprised of 16 motor and 20 process skill items.

In-Depth Review

Purpose of the measure

The Assessment of Motor and Process Skills (AMPS) is an observational assessment that allows for the simultaneous evaluation of motor and process skills and their effect on the ability of an individual to perform complex or instrumental and personal activities of daily living (ADL)Basic tasks that involve bodily issues (bathing, dressing, toileting, transferring, continence, eating and walking) that are done on a daily basis.. The AMPS is comprised of 16 motor and 20 process skill items.

Motor skills are the observable goal-directed actions people perform during ADL task performance in order to move themselves or the task objects (e.g. walk, transport objects, reach for and manipulate objects, position the body).

Process skills refer to the ability of an individual to logically sequence the actions of the ADL task performance over time (e.g. initiate and sequence actions, use appropriate tools and material, and accommodate actions when problems are encountered).

The AMPS is based on a ‘top down’ assessment approach. Using a ‘top down’ approach means that the AMPS assessment begins “with the ability of the individual to perform the daily life tasks that he or she wants and needs to perform to be able to fulfill his or her roles competently and with satisfaction” (Fisher, 2003). The ‘top down’ approach involves finding out more about a client’s occupational concerns and then observing the client performing that occupation. Through the observation process, the therapist is able to use clinical reasoning to identify the underlying functional deficit in order to intervene to compensate for the deficit, if this is possible.

The quality of the person’s occupational performance is assessed by rating the effort, efficiency, safety, and independence demonstrated in each of the motor and process skills that comprise the task performance.

Available versions

The AMPS was first developed by Fisher in 1990, however, the AMPS was not published by Fisher until 1995. The AMPS manual is currently on its sixth edition.
In 2001, it was recommended that 20 new tasks be added to the AMPS (Bray, Fisher, & Duran, 2001). These tasks were added to benefit individuals at the lower or higher ends of the AMPS motor and process skill scales.
To date, there are 85 AMPS tasks to select from. A list of these tasks can be found on the AMPS International webpage at: http://www.ampsintl.com/AMPS/resources/tasks.php.

Features of the measure

Items:
There are no actual items to the AMPS. After an initial interview with the caregiver or the client, the rater selects a subset of 3-5 ADL tasks from a list
of standardized tasks that are described in the AMPS manual (e.g. fetching a drink from the fridge, folding laundry, preparing a sandwich). The tasks selected must be relevant and meaningful to the client, and consist of tasks that he/she once knew how to perform. The tasks must be challenging to the client. From this subset of tasks, the client then selects 2-3 tasks to perform.

Prior to beginning task observation, the client and rater must agree on the elements of the task and the tools and materials to be used. In a clinical setting, the client can familiarize his/herself by placing the tools and materials where he/she prefers them to be stored. The client is expected to perform the designated tasks in their usual manner, but must also adhere to the guidelines specified
in the APS manual. For example, if the client selects task A-1, retrieving a beverage from refrigerator, the examiner must watch for the following specific criteria:

Obtain container of beverage from the refrigerator
Pour the beverage into cup or glass
Serve beverage
Clean up.
Although each of the tasks involves a standard procedure, some
flexibilityThe ability to shift between different thoughts and actions so that when a problem arises, one can draw upon past mistakes and successes and use this knowledge to plan solutions (Anderson, 2008)
is allowed to ensure that the assessment remains
semi-individualized.

To assist the examiner in preparing for and administering the AMPS interview, task notes are provided which outline the things to look for while task is being completed.

A list of tasks can be found at the AMPS International website: http://www.ampsintl.com/AMPS/resources/tasks.php

One can select online which tasks are to be completed by checking the box beside the task. To find a list of the steps to look for while the task is being completed, select ‘print notes’, which will automatically generate the steps into a printable worksheet. These task notes are intended to be used in combination with the AMPS task descriptions to assist the rater, and are not intended to replace
careful reading of the task descriptions in the manual.

Scoring:
The AMPS uses a 4-point Likert scaleLikert scaling is one type of response to items in a questionnaire or tool. For example, Likert scaling would have you rate an item such as "I am satisfied with the care I received" on a scale using a 1-to-5 response scale where:
• 1 = strongly disagree
• 2 = disagree
• 3 = undecided
• 4 = agree
• 5 = strongly agree
You will find various options and scaling methods for the number of response choices (1-to-7, 1-to-9, 0-to-4). Odd-numbered scales usually have a middle value that is labelled Neutral or Undecided. Some tools used forced-choice Likert scaling with an even number of responses and no middle neutral or undecided choice. to rate the client’s performance on 16 motor and 20 process skills (see table).

Score	Interpretation
4	Competent, when the patient performs the task without evidence of increased effort, decreased efficiency, or lack of safety.
3	If the examiner questions the effectiveness, the performance is scored ‘questionable’.
2	Ineffective performance that disrupts or interferes with the action
1	Marked deficient performance that impedes the action progression and yields unacceptable outcome.

The raw motor and process scores are entered into the AMPS computer-scoring program and analyzed using many-faceted Rasch analysis (Linacre, 1993). Many-faceted Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
is used to allow for the calibration of: 1. skills item difficulty, 2. task challenge, 3. individual evaluator leniency, and 4. client variation in ADL ability, on the same linear scale.

The Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
creates a unique method that enables the AMPS administrator to predict how an individual is expected to perform on any of the calibrated ADL tasks in the assessment after completion of only two or three tasks (Fisher, 1995). This analysis converts the clients’s ordinal raw scores into equal interval measures of ability (person ability measures), which are expressed in log-odds probability units (logits). The logit ability measures are placed on a linear continuum of increasing ability for each of the ADL scales (motor and process). The AMPS person ability measures represent the person’s place on the continuum, and provides an indication of how challenging a task that person can manage effectively. The higher the ADL motor or ADL process ability, the more able is the client (Fisher, 1997).

If the AMPS is to be used for documenting treatment efficacy, quality assurance, or research, it must be computer scored.

Motor and process cutoff measures:

The position of a person’s ability measures on the ADL motor and ADL process scales can also be evaluated relative to the motor and process cutoff measures. The cutoff measures are 2.0 logits for the ADL motor scale and 1.0 logits for the ADL process scale. These cutoff measures were developed based on the performance of 2,548 subjects in the AMPS database (Fisher, 1997). Individuals with ability measures that fall below the cutoff on either the ADL motor or ADL process scale demonstrated observable motor or process deficits that were
affecting their ability to perform ADL tasks in an effective manner. They were also more likely to require assistance in performing daily life tasks.

Time:
It takes 30-40 minutes to administer the AMPS (AMPS International Website:
http://www.ampsintl.com/AMPS/resources/tasks.php).

Subscales:
The AMPS has two subscales: ADL motor skills and ADL process skills.

Equipment:
No specialized equipment is required to complete the AMPS. Only the AMPS notes, and the relevant equipment for task completion are required.

(AMPS International Website:
http://www.ampsintl.com/AMPS/resources/tasks.php).

Training:
The AMPS can be administered only by occupational therapists who have completed a 5-day training and calibration workshop. Information regarding
training sessions can be found by visiting the AMPS International website: http://www.ampsintl.com/workshops.htm

The AMPS administration manual and computer scoring software is only provided to individuals who participate in AMPS training and calibration workshops.

To become an AMPS Calibrated Rater, an occupational therapy practitioner must complete the following steps:

Attend a 5-day training course
Test 10 clients who perform 2-3 AMPS tasks
Independently interview and score live clients (the use of videotapes is not acceptable). Two of ten clients may be co-scored (two therapists observing a client at the same time, but independently score client performance).
Enter the data into the computer using the AMPS computer-scoring program
Email exported data to AMPS Project International within 3 months of taking the course.

Alternative forms of the AMPS

The School Version of the Assessment of Motor and Process
Skills (School AMPS).
The School AMPS is an evaluation tool for measuring student’s schoolwork task performance in typical classroom settings.

Client suitability

Can be used with:

Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Should not be used with:

The AMPS cannot be used to diagnose underlying mind-brain-body problems (e.g. memory, apraxia, motivation, perception).
The AMPS cannot be administered to patients who are confined to bed or who are unwilling to participate in simple daily living tasks.
The AMPS is not suitable for children under the age of 3

Languages of the measure

To date, the AMPS has been administered to over 12,000 subjects from North America, Scandinavia, the United Kingdom, Australia, and New Zealand.

A number of studies have supported the validityThe degree to which an assessment measures what it is supposed to measure.
of the AMPS as a cross-cultural measure (Fisher, Liu, Velozo & Pan 1992; Goldman & Fisher, 1997; Goto, Fisher & Mayberry, 1996; Magalhaes, Fisher, Bernspang & Linacre, 1996; Stauffer, Fisher & Duran, 2000). For example, Goto, Fisher and Mayberry (1996) tested the cross-cultural validityThe degree to which an assessment measures what it is supposed to measure.
of the AMPS with six trained raters from diverse backgrounds, and found high cross-cultural validityThe degree to which an assessment measures what it is supposed to measure.
and inter-rater reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
.

Validation of the AMPS has been established for use in Sweden, (Bernspang & Fisher, 1995), Taiwan (Fisher, Liu, Velozo, & Pan, 1992), and in Spain (http://www.terapia-ocupacional.com/Cursos/Curso_AMPS_Escala_Valoracion_Habilidades_Motoras_Procesamiento_Terapia_Ocupacional.htm).

Limited parts of the AMPS manual(s) and software are available in Japanese, Swedish, Dutch, French, Norwegian, Slovenian, Finnish, and Danish. AMPS International is currently working on new translations in Spanish, Italian, and German.

Summary

What does the tool measure?	Motor and process skills and their effect on the ability of an individual to perform complex or instrumental and personal activities of daily living (ADL)Basic tasks that involve bodily issues (bathing, dressing, toileting, transferring, continence, eating and walking) that are done on a daily basis..
What types of clients can the tool be used for?	The AMPS can be used with, but is not limited to patients with stroke.
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	The AMPS takes 30-40 minutes to administer.
Versions	The School Version of the Assessment of Motor and Process Skills (School AMPS)
Other Languages	Limited parts of the manual(s) and software are available in Japanese, Swedish, Dutch, French, Norwegian, Slovenian, Finnish, and Danish. AMPS International is currently working on new translations in Spanish, Italian, and German.
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: No studies have examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the AMPS. Test-rest: Out of two studies examining the test-rest reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest . of the AMPS, both reported excellent test-retest. Intra-rater: Only one study has examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the AMPS and reported excellent intra-rater. Inter-rater: No studies have examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the AMPS.
ValidityThe degree to which an assessment measures what it is supposed to measure.	Criterion: Concurrent: Excellent correlations with the Scale of Independent Behavior, the Functional Independence Measure, and the Cambridge Cognitive Examination (CAMCOG) have been reported. Predictive: The AMPS score has been found to be predictive of the need for supervision/assistance to live in the community, and home safety for individuals with psychiatric conditions associated with cognitive impairments. Construct: Known groups: AMPS can differentiate between individuals with Multiple Sclerosis and healthy controls; patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and healthy controls; older adults without disability and people with Alzheimer’s disease who need minimal assistance; people with Alzheimer’s disease who require moderate assistance; individuals with and without psychiatric disorders.
Floor/Ceiling Effects	No studies have examined the floor or ceiling effects of the AMPS.
Does the tool detect change in patients?	One study examined the responsivenessThe ability of an instrument to detect clinically important change over time. of the AMPS in a 3-arm drug trial and reported significant differences for instrumental ADL process skills among the 3 conditions, suggesting that the AMPS may be a sensitive measure for detecting change under various study conditions in drug trials.
Acceptability	The AMPS cannot be used to diagnose underlying mind-brain-body problems (e.g. memory, apraxia, motivation, perception). The AMPS cannot be administered to patients who are confined to bed or who are unwilling to participate in simple daily living tasks, or for children under the age of 3.
Feasibility	The AMPS takes 30-40 minutes to administer, and does not require any specialized equipment. The rater selects a subset of 3-5 ADL tasks (from which the client selects 2-3 to perform) from a list of standardized tasks that are described in the AMPS manual. The AMPS is simple to score and uses a 4-point Likert scaleLikert scaling is one type of response to items in a questionnaire or tool. For example, Likert scaling would have you rate an item such as "I am satisfied with the care I received" on a scale using a 1-to-5 response scale where: • 1 = strongly disagree • 2 = disagree • 3 = undecided • 4 = agree • 5 = strongly agree You will find various options and scaling methods for the number of response choices (1-to-7, 1-to-9, 0-to-4). Odd-numbered scales usually have a middle value that is labelled Neutral or Undecided. Some tools used forced-choice Likert scaling with an even number of responses and no middle neutral or undecided choice.. The scores are then analyzed using an AMPS computer-scoring program. The AMPS can be administered only by occupational therapists who have completed a 5-day training and calibration workshop.
How to obtain the tool?	The AMPS manual and software can be purchased online at http://www.ampsintl.com/

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the AMPS.

Reliability

Test-retest:
Doble, Fisk, Lewis and Rockwood (1999) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the AMPS in a sample of 55 elderly adults and reported excellent test-retest coefficients for both the motor and process subscores (r = 0.88 and r = 0.86, respectively).

Fisher (1995) reported that with a sample of older adults (mean age of 80), the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was excellent for both the AMPS motor scale (r = 0.91) and for the AMPS process scale (r = 0.90).

Intra-rater:
Fisher, Liu, Velozo and Pan (1992) reported that in a sample of Taiwanese participants without disability, the AMPS had excellent intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
(r = 0.93).

Validity

Criterion:

Concurrent:
Bruininks, Woodcock, Weatherman, and Hill (1985) correlated the AMPS with the Scale of Independent Behavior (Neistadt, 1993) and reported excellent correlations (ranging from r = 0.62 to r = 0.85).

Robinson and Fisher (1996) examined the Functional Independence Measure (Keith, Granger, Hamilton & Sherwin, 1987) (r = 0.62) as well as with the Cambridge Cognitive Examination (CAMCOG), a cognitive component of the Cambridge Mental Disorders of the Elderly Examination (an interview measure of dementia) (Roth et al., 1986) (r = 0.65).

Predictive:
Fisher (1997) reported that 84% of people with ADL motor ability measures below 2.0 logits and 93% of those with ADL process ability measures below 1.0 logits, required supervision or assistance to live in the community. The fact that a higher proportion of people with low ADL process ability measures than with low ADL motor ability measures required some assistance demonstrates that the ADL process scale is a better indicator of need for assistance to live in the community than is the ADL motor scale.

McNulty and Fisher (2001) examined whether the AMPS could predict home safety for individuals with psychiatric conditions associated with cognitive impairments. Moderate positive relationships were found between ADL motor and ADL process ability and home safety in both the clinic and the home. Home ADL process ability was the best predictor of home safety for participants who were categorized as less safe in the study.

Construct:

Known groups:
Doble, Fisk, Fisher, Ritvo and Murray (1994) examined the instrumental ADL performance of 22 community-dwelling patients with mild to moderate Multiple Sclerosis in comparison to participants without disability who were matched for age and gender. Functional competence of the patients with Multiple Sclerosis, as measured by the AMPS, was poorer than that of the control group suggesting that the AMPS can differentiate between individuals with Multiple Sclerosis and healthy controls.

Bernspang and Fisher (1995) administered the AMPS to 71 individuals with right cerebral vascular accident, 76 persons with left cerebral vascular accident, and 83 community-living healthy individuals. Both stroke groups had significantly lower IADL performance than the control participants, suggesting that the AMPS can distinguish between patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and healthy controls.

Hartman, Fisher, and Duran (1999) administered the AMPS to 329 older adults without disability and 167 people with Alzheimer’s disease who need minimal assistance, and 292 with Alzheimer’s disease who require moderate assistance. In this study, the AMPS was able to distinguish between the three groups.

Pan and Fisher (1994) examined the hypothesis that mean AMPS scores would differ between individuals with psychiatric disorders and individuals without. Sixty participants, 30 without and 30 with psychiatric disorders, were studied. The hypothesis was supported for both the AMPS motor and process scales, suggesting that the AMPS can distinguish between individuals with and without psychiatric disorders.

Responsiveness

One pharmacological pilot study of individuals with Alzheimer’s disease examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the AMPS using repeated measures ANOVA (Oakley & Sunderland, 1997). Significant differences were found for instrumental ADL process skills, but not for motor skills, among three drug conditions. The results of this study suggest that the AMPS may be a sensitive measure for detecting change under various study conditions in drug trials.

References

Bernspang, B., Fisher, A. (1995). Differences between persons with right or left cerebral vascular accident on the Assessment of Motor and Process. Archives of Physical Medicine and Rehabilitation, 76, 1144-1151.
Bray, K., Fisher, A. G., Duran, L.(2001).The validity of adding new tasks to the Assessment of Motor and Process Skills. American Journal of Occupational Therapy 55,, 409-415.
Bruininks, R. H., Woodcock, R. W., Weatherman, R. F., Hill, B. K. (1985). Development and Standardization of the Scales of Independent Behavior. Allen, TX: DLM Resources.
Cooke, K, Z., Fisher, A. G., Mayberry, W., Oakley, E. (2000). Differences in activities of daily living process skills of persons with and without Alzheimer’s disease. Occupational Therapy Journal of Research, 20, 87-104.
Dickerson, A. E., Fisher, A. G. (2000). Age differences in functional performance. American Journal of Occupational Therapy, 47, 686-692.
Doble, S. E., Fisk, J. D., Fisher, A., Ritvo, P. Murray, T. (1994). Functional competence of community-dwelling persons with multiple sclerosis using the Assessment of Motor and Process Skills. Archives of Physical Medicine and Rehabilitation, 75, 843-851.
Doble, S. E., Fisk, J. D., Lewis, N., Rockwood, K. (1999). Test-retest reliability of the Assessment of Motor and Process Skills. Occupational Therapy Journal of Research, 19, 203-215.
Doble, S. E., Fisher, A. G., Fisk, J. D., MacPherson, K. M. (1992). Validation of the Assessment of Motor and Process Skills (AMPS) with Elderly Adults with Dementia. Final Report to the Alzheimer’s Association. Halifax, Nova Scotia: Dalhousie University.
Duran, L., Fisher, A. (1996). Male and female performance on the Assessment of Motor and Process Skills. Archives of Physical Medicine and Rehabilitation, 77, 1019-1024.
Fisher, A. G. (1990). Assessment of Motor and Process Skills. Research edition, R. Unpublished test manual. Chicago, IL: University of Illinois at Chicago.
Fisher, A. (1995). The Assessment of Motor and Process Skills (AMPS). Fort Collins, CO: Three Star Press.
Fisher, A. G. (1997). Assessment of Motor and Process skills, 2nd edn. Fort Collins, CO: Three Star Press.
Fisher, A. G., Liu, Y., Velozo, C., Pan, A. W. (1992). Cross-cultural assessment of process skills. American Journal of Occupational Therapy, 46, 876-885.
Fisher, A. G. (2003). AMPS: Assessment of Motor and Process Skills. Volume 1: Development, Standardisation, and Administration Manual. 5th edn. Colorado: Three Star Press Inc.
Goldman, S., Fisher, A. G. (1997). Cross-cultural validation of the Assessment of Motor and Process Skills (AMPS). British Journal of Occupational Therapy, 46, 77-85.
Goto, S., Fisher, A. G., Mayberry, W. L. (1996). Assessment of Motor and Process Skills applied cross-culturally to the Japanese. American Journal of Occupational Therapy, 50, 798-806.
Hartman, M. L., Fisher, A. G., Duran, L. (1999). Assessments of functional ability of people with Alzheimer’s disease. Scandinavian Journal of Occupational Therapy, 6, 111-118.
Keith, R., Granger, C., Hamilton, B., Sherwin, F. (1987). The Functional Independence Measure: A new tool for rehabilitation. In: N. Eisenberg & R. Grzesiak (Eds.), Advances in Clinical Rehabilitation. New York: Springer.
Linacre, J. M. (1993). Many-Facet Rasch Measurement, 2nd edn. Chicago: MESA.
Linden, A., Boschian, K., Eker, C., Schalen, W., Nordstrom, C.-H. (2005). Assessment of motor and process skills reflects brain-injured patients ability to resume independent living better than neuropsychological tests. Acta Neurol Scand, 111, 48-53.
Magalhaes, L., Fisher, A., Bernspang, B., Linacre, J. (1996). Cross-cultural assessment of functional ability. The Occupational Therapy Journal of Research, 16(1), 45-63.
McNulty, M. C., Fisher, A. G. (2001). Validity of using the Assessment of Motor and Process Skills to estimate overall home safety in persons with psychiatric conditions. Am J Occup Ther, 55(6), 649-655.
Neistadt, M. E. (1993). A meal preparation treatment protocol for adults with brain injury. Am J Occup Ther, 48, 431-438.
Oakley, F., Sunderland, T. (1997). Assessment of Motor and Process Skills as a measure of IADL functioning in pharmacologic studies of people with Alzheimer’s disease: A pilot study. International Psychogeriatrics, 9, 197-206.
Pan, A. W., Fisher, A. G. (1994). The Assessment of Motor and Process Skills of persons with psychiatric disorders. American Journal of Occupational Therapy, 48, 775-780.
Robinson, S., Fisher, A. G. (1996). A study to examine the relationship of the Assessment of Motor and Process Skills (AMPS) to other tests of cognition and function. British Journal of Occupational Therapy, 59, 260-63.
Roth, M., Mountjoy, C., Huppert, F., Hendrie, H., Verna, S., Godard, R. (1986). CAMDEX. The Cambridge Examination for Mental Disorders of the Elderly. Cambridge, UK: Cambridge University Press.
Stauffer, L. M., Fisher, A. G., Duran, L. (2000). ADL Performance of black Americans and white Americans on the Assessment of Motor and Process Skills. American Journal of Occupational Therapy, 54, 607-613.

See the measure

How to obtain the AMPS

The AMPS manual and software can be purchased online at http://www.ampsintl.com/

Barthel Index (BI)

Evidence Reviewed as of before: 07-10-2015

Author(s)*: Katie Marvin, PT; Lisa Zeltzer, MSc OT

Editor(s): Annabel McDermott, OT; Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Purpose

The Barthel Index (BI) measures the extent to which somebody can function independently and has mobility in their activities of daily living (ADL) i.e. feeding, bathing, grooming, dressing, bowel control, bladder control, toileting, chair transfer, ambulation and stair climbing. The index also indicates the need for assistance in care. The BI is a widely used measure of functional disability. The index was developed for use in rehabilitation patients with stroke and other neuromuscular or musculoskeletal disorders, but may also be used for oncology patients.

In-Depth Review

Purpose of the measure

The Barthel Index (BI) measures the extent to which somebody can function independently and has mobility in their activities of daily living (ADL)Basic tasks that involve bodily issues (bathing, dressing, toileting, transferring, continence, eating and walking) that are done on a daily basis. i.e. feeding, bathing, grooming, dressing, bowel control, bladder control, toileting, chair transfer, ambulation and stair climbing. The index also indicates the need for assistance in care.

The BI is a widely used measure of functional disability. The index was developed for use in rehabilitation patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and other neuromuscular or musculoskeletal disorders, but may also be used for oncology patients.

Available versions

The BI was first developed by Mahoney and Barthel in 1965 and later modified by Collin, Wade, Davies, and Horne in 1988.

Original 10-item version (Mahoney & Barthel, 1965). Refers to the following 10 categories: feeding, bathing, grooming, dressing, bowel control, bladder control, toileting, chair transfer, ambulation and stair climbing. Items are weighted according to the level of nursing care required and are rated in terms of whether individuals can perform activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
independently, with some assistance, or are dependent (scored as 10, 5 or 0).

Features of the measure

Items:

The original 10-item form of the BI consists of 10 common ADL activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
including: feeding, bathing, grooming, dressing, bowel control, bladder control, toileting, chair transfer, ambulation and stair climbing. Items are rated in terms of whether individuals can perform activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
independently, with some assistance, or are dependent (scored as 10, 5 or 0). Items are weighted according to the level of nursing care required.

Scoring:

The score of the BI is a summed aggregate and there is preferential weighting on mobility and continence. The scores are allotted in the following way: 0 or 5 points per item for bathing and grooming; 0, 5, or 10 points per item for feeding, dressing, bowel control, bladder control, toilet use, and stairs; 0, 5, 10, or 15 points per item for transfers and mobility. The Index yields a total score out of 100 – the higher the score, the greater the degree of functional independence (McDowell & Newell, 1996). This score is calculated by simply totaling the individual item scores, which requires simple arithmetic computation
by hand.

A modified scoring system has been suggested by Shah, Vanclay, & Cooper (1989) using a 5-level ordinal scale for each item to improve sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
to detecting change (1=unable to perform task, 2=attempts task but unsafe, 3=moderate help required, 4=minimal help required, 5=fully independent). Shah and coll. (1989) note that a score of 0-20 suggests total dependence, 21-60 severe dependence, 61-90 moderate dependence and 91-99 slight dependence.

Subscales:

None typically reported.

Equipment:

To administer the BI, one only needs a pencil and the test items.

Training:

Administration of the BI does not require training and has been shown to be equally reliable when administered by skilled and unskilled individuals (Collin & Wade, 1988). The BI can also be self-administered (McGinnis, Seward, DeJong, & Osberg, 1986). However, for patients older than 75 years of age, it is not recommended that the BI be administered as a self-report measure (Sinoff & Ore, 1997). One study suggests that the scale can be administered reliably over the telephone (Korner-Bitensky & Wood-Dauphinee, 1995).

Time:

The BI can take as little as 2-5 minutes to complete by self-report and up to 20 minutes to complete by direct observation (Finch, Brooks, Stratford, & Mayo, 2002).

Alternative forms of the BI

Modified 10-item version (MBI)(Collin et coll., 1988). Functional categories may be scored from 0 to 1, 0 to 2, or 0 to 3, depending on the item. Total scores range from 0 to 20.
5-item short form(Hobart & Thompson, 2001). The 5-item version refers to the following 5 categories: transfers, bathing, toilet use, stairs, and mobility. Each item is scored 0 to 1, 0 to 2, or 0 to 3, depending on the function. Total scores range from 0 to 20. Hobart & Thompson (2001) found that the 5-item BI is psychometrically equivalent to the 10-item BI (correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with original version was r = 0.90).
The expanded 15-item version(Granger et coll., 1979; Fortinsky & Granger, 1981). Added a 4-point scale of intact/limited/helper required/null. Scores range from 0 to 100. In the 15-item version, a score of 60 is commonly considered to be the threshold score for marked dependence (Granger, Sherwood, & Greer, 1977). High correlations of the expanded 15-item BI and other measures of function have been demonstrated (e.g., with Katz Indice of ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living, r = 0.78; with PULSES profile (medical status, upper and lower limb function, sensory and excretory function, mental and emotional status), r = -0.74 to -0.90 (Shinar, Gross, Bronstein, Licara-Gehr, Eden, Cabrera, et coll., 1987; Granger, 1985; Rockwood, Stolee & Fox, 1993). Scores were also predictive of return to independent living after 6 months (Granger, Hamilton, Gresham, & Kramer, 1989).
The extended BI (EBI)(Prosiegel, Bottger, & Schenk, 1996). The EBI consists of 16 items, 15 of which are identical to the Functional Independence Measure. Very little literature exists on the EBI, however Jansa, Pogacnik, and Gompertz (2004) found it to be a reliable and valid measure of disability/activity levels in 33 patients with newly diagnosed acute ischemic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
The 3-item BI(Ellul, Watkins, & Barer, 1988).Based on 3 items (bed-chair transfers, mobility, and bladder incontinence), it is a useful alternative to the full BI for assessing function at hospital discharge. To date, this version has only been validated in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Self-rating BI(SB). The SB has good concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
and is well related with the original BI and the Functional Independence Measure. The indexes test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
is sufficiently high for practical use (Hachisuka, Ogata, Ohkuma, Tanaka, & Dozono, 1997; Hachisuka, Okazaki, & Ogata, 1997; McGinnis et coll., 1986).
Early Rehabilitation Barthel Indice (ERI). An extension of the BI, it was developed to assess functioning of individuals with severe brain damage, who often cannot be differentiated appropriately due to floor effects that occur with increasing severity of neurological impairment. The ERI looks at the following aspects: state requiring temporary intensive medical monitoring"The process of checking the task over time for ‘quality control’ and the adjustment of behavior" (Stuss, 2009, p. 9-10)
, tracheostoma requiring special treatment (suctioning), intermittent artificial respiration, confusional state requiring special care, behavioural disturbances requiring special care, swallowing disorders requiring special care, and severe communication deficits. Schonle (1995) found that the ERI is quick, economical, and reliable when administered to 210 early rehabilitation patients and 312 patients with severe brain damage.

There is little consensus over which should be considered the definitive version of the BI (McDowell & Newell, 1996), but the original and the 10-item and 15-item modifications are the most commonly used.

Client suitability

Can be used with:

Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

The BI is a frequently used strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. outcome measure. It has been repeatedly shown to be a reliable and valid measure of basic ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living (Mahoney & Barthel, 1965; Loewen & Anderson, 1990; Gresham, Phillips & Labi, 1980; Collin et coll., 1988; Roy, Tongeri, Hay, & Pentland, 1988; Wade & Hewer, 1987; Leung et coll., 2007). In patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., the BI determines the extent of post-stroke disability, self-care activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
and ability to live independently. The total score of the BI has also been found to predict length of stay in hospital (Granger, Albrecht, & Hamilton, 1979).

There are no prerequisites for completing the BI. For patients who are unable to respond to the BI independently, the BI can be completed by proxy (eg. Duncan, Lai, Tyler, Perera, Reker, & Studenski, 2002; Wyller, Sveen, & Bautz-Holter 1995). Further, the BI can be reliably administered over the telephone to either the patient or their proxy (Korner-Bitensky & Wood-Dauphinee, 1995).

Should not be used in:

To capture significant losses in higher levels of physical function or activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
that are necessary for independence in the home and community. This means that patients can still score a maximum score of 100 and experience significant impairments (Kelly-Hayes et al., 1998).
It should be used with caution in patients with mild strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. It is responsive to change but has definite ceiling effects in persons with mild strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (Wade & Hewer, 1987; Skilbeck, Wade, Hewer, & Wood, 1983).

In what languages is the measure available?

The BI has been translated and validated in:

Dutch (Post, van Asbeck, van Dijk, & Schrijvers, 1995)
German (Heuschmann et al., 2005; Valach, Signer, Hartmeier, Hofer, & Steck, 2003)
Turkish (Kucukdeveci, Yavuzer, Tennant, Suldur, Sonel, & Arasil, 2000)
Persian (Oveisgharan, 2006)
French (Condouret et al., 1988; Wirotius & Foucher-Berres, 1991)
Chinese (Leung, Cha, & Shah, 2007) (modified Barthel Index)

Summary

What does the tool measure?	ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. of Daily Living
What types of clients can the tool be used for?	Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., patients with other neuromuscular or musculoskeletal disorders, oncology patients
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	Self report: 2-5 minutes; Direct observation: 20 minutes, but may vary according to patient’s abilities and tolerance
Versions	Modified 10-item version (MBI); 5-item short form; The expanded 15-item version; The extended BI (EBI); The 3-item BI; Self-rating BI (SB); Early Rehabilitation Barthel Index (ERI)
Other Languages	Dutch, German, Turkish, Persian, French, Chinese
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: Five studies of the MBI reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.. Test-retest: One study of the MBI reported excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). . Inter-rater: One study of the MBI and four studies of BI reported excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. ; and one study of the BI reported adequate inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. .
Validity	Criterion: Concurrent: One study demonstrated excellent concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." between the MBI and motor-Functional Independence Measure (FIM) at admission and discharge. Predictive: The MBI predicted instrumental ADL permformance at 6-months post-stroke; likelihood a patient will regain continence following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.; risk for falls in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.; functional recovery following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.; and acute care hospital length following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Construct: Excellent correlations in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. on the physical mobility dimension of the Nottingham Health Profile SubscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). ; the Physical Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). of the SF-36; Berg Balance Scale; the Fugl-Meyer Assessment Scale; Frenchay ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. Indice.
Does the tool detect change in patients?	Significant ceiling effects noted for the BI, meaning that it doesn’t detect change well in highly functional individuals. The Functional Independence Measure was developed as a measure that would be better able to detect change in disability than the BI, however little to no difference has been found. Out of 8 studies examined, 3 reported that the BI had a large ability to detect change, 3 reported adequate ; 2 reported small.
Acceptability	The MBI/BI has been evaluated for both self-report and use with proxy respondents in addition to direct observation.
Feasibility	The MBI/BI is simple to administer. Requires training if administered by direct observation. It has been developed in many forms that can be administered in many situations and can be used for longitudinal assessment.
How to obtain the tool?	For a copy of the original BI click here.

Psychometric Properties

Overview

There is considerable psychometric data available for the BI (McDowell & Newell, 1996) and its various modified versions. For the purposes of this review, we conducted a literature search to identify all relevant publications on the psychometric properties of the original BI and the modified 10-item BI (MBI), the two most commonly used versions. We then selected to review articles from high impact journals, and from a variety of authors.

*Please note that the content in the original BI and MBI version of the BI is the same. Only the scoring values were changed in the MBI version (scored 0, 1, 2 or 3 versus 0, 5 and 10 in the original version), and thus do not impact the clinimetric properties of the tool (Quinn, Langhorne and Stott, 2011). The MBI yields a score ranging from 0 to 20, whereas the original BI yields a score of 0 to 100. For the purposes of this module, the psychometric properties for both the BI and MBI will be presented together and will be referred to as either the BI or MBI.

Floor and ceiling effect

Salbach et coll. (2001) examined the ceiling effects of the BI, Timed Up and Go (TUG), Berg Balance Scale (BBS), 10 meter walk test (10mWT) and 5 meter walk test (5mWT) in 50 patients with residual gaitThe pattern of walking, which is often characterized by elements of progression, efficiency, stability and safety.
deficits after a first-time stroke. The BI demonstrated the most significant ceiling effects at both 8 and 38 days post-stroke (28% and 56% respectively).

Dromerick, Edwards and Diringer (2003) examined the floor/ceiling effects of the BI, the Functional Independence Measure (FIM), the Modified Rankin Scale (MRS) and the International Stroke Trial Measure. The four measures were administered to 95 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. on admission to and discharge from rehabilitation. The BI demonstrated adequate floor effects at admission (5%) and poor ceiling effects at discharge (27%), whereas the FIM demonstrated excellent floor and ceiling effects (0% for both); the MRS demonstrated adequate floor effects at admission (18%) and excellent ceiling effects at discharge (0%); and the International StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Trial Measure demonstrated poor floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect."
at admission (100%) and excellent ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." at discharge (0%).

Van der Putten, Hobart, Freeman and Thompson (1999) compared the floor/ceiling effects of the MBI to that of the Motor-FIM, cognitive-FIM and total FIM in 201 patients with multiple sclerosis and 82 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. undergoing inpatient neurorehabilitation. The MBI, and motor-FIM demonstrated adequate floor and ceiling effects for both patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and patients with multiple sclerosis (floor effects = 1.2% (BI, stroke), 1.2% (motor-FIM, strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.); and ceiling effects = 8.5% (BI, strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.) and 1.2% (motor-FIM, strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.). The total-FIM showed no floor or ceiling effects for both patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and patients with MS (0% for all). The Cognitive-FIM demonstrated poor ceiling effects in patients with multiple sclerosis (36%) and adequate ceiling effects in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Hsueh, Lin, Jeng and Hsieh (2002) compared the floor/ceiling effects of the FIM to that of the MBI and the 5-item BI (BI-5) in 118 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. undergoing treatment on an inpatient rehabilitation unit. The MBI and the motor-FIM both exhibited adequate floor effects at admission and discharge (MBI 18.2% and 4.7%; motor-FIM 5.8% and 3.5% respectively) and excellent ceiling effects at admission and discharge (0% for all). The BI-5 exhibited poor floor effects at admission (46.6%) and adequate floor effects at discharge (13.6%), and excellent ceiling effects at admission and discharge. The results of this study indicate that the MBI and motor-FIM have comparable floor/ceiling effects, with the motor-FIM performing slightly better with respect to floor effects (18.2% vs. 5.8%).

Reliability

Hsueh, Lin, Jeng and Hsieh (2002) compared the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the FIM to that of the MBI and the 5-item BI (BI-5) in 118 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. undergoing treatment on an inpatient rehabilitation unit. The MBI and FIM motor subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
both demonstrated excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. (Cronbach’s alpha coefficient ≥ 0.84), whereas the BI-5 demonstrated adequate internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. (Cronbach’s alpha coefficient ≥ 0.71) at admission and discharge.

Quinn, Langhorne and Stott (2011) conducted a literature review examining the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MBI in studies involving patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MBI was found to be excellent (Cronbach’s alpha ³ 0.80) across all reviewed studies, as detailed below.

Shah and coll. (1989) examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MBI in 258 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. was excellent (Cronbach’s alpha 0.90).

Leung and coll. (2007) examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the Chinese version and the English version of the MBI and found internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. to be excellent for both measures (Cronbach’s alpha 0.93 and 0.92 respectively).

Hseuh, Lee and Hsieh (2001) examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MBI in 121 Taiwanese patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. at four time points (14, 30, 90 and 180 days post-stroke). The internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the BI was excellent (Cronbach’s alpha 0.89-0.92).

Test-retest:
Green, Forster and Young (2001) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the MBI, Rivermead Mobility Indice (RMI), Nottingham extended ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living Scale (NEADL) and Frenchay ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Indice (FAI) in 22 patients that were at least one year post-stroke. The four measures were administered twice, with a one-week interval. The MBI and RMI were found to have the strongest test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
with 75% and 85% agreement overall, respectively; however there was still considerable variability in kappa statistics (BI kappa =-0.09-0.81; RMI kappa =0.64-1.00). The NEADL and FAI demonstrated greater variability and more error (NEADL kappa =0.14-0.89; FAI kappa =0.25-1.00).

Inter-rater:
Leung and coll. (2007) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the Chinese and English versions of the MBI in 15 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was found to be excellent for the Chinese version (kappa = 0.81-1.00) and adequate to excellent for the English version (kappa =0.63-0.85), as calculated using kappa statistics.

Duffy, Gajree, Langhorne, Stott and Quinn (2013) conducted a systematic reviewA systematic review is a summary of available research on a given topic that compares studies based on design and methods. It summarizes the findings of each, and points out flaws or potentially confounding variables that may have been overlooked. A critical analysis of each study is done in an effort to rate the value of its stated conclusions. The research findings are then summarized, and a conclusion is provided.
examining the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the BI and MBI in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. In a systematic reviewA systematic review is a summary of available research on a given topic that compares studies based on design and methods. It summarizes the findings of each, and points out flaws or potentially confounding variables that may have been overlooked. A critical analysis of each study is done in an effort to rate the value of its stated conclusions. The research findings are then summarized, and a conclusion is provided.
and meta-analysisMethod in which the results of two or more studies are statistically combined. Typically used when studies have few subjects, but similar designs. By increasing the available number of subjects, more weight can be given to the findings.
, 10 studies were included that involved assessors of differing backgrounds and experience. The BI was found to have excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
in eight of the ten studies and adequate inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
in two of the ten studies, as calculated using intraclass correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
(ICC), kappa statistics or weighted kappa statistics (ICC ranging from 0.94 to 0.96; kappa ranging from 0.62 to 0.90; weighted kappa ranging from 0.70 to 0.99). The results from five of the 10 studies are included below; the remaining 5 studies could not be reviewed for the purposes of this module as they were not available in English.

Loewen and Anderson (1988) examined the inter-rater and intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the BI in seven patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
and intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
were excellent (ICC=0.96 and 0.99 respectively).

Wolfe, Taub, Woodrow and Burney (1991) compared the inter-rater and intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the BI with the Rankin Scale. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was excellent for both the BI and Rankin Scale (kw=0.88 to 0.98 and 0.75 to 0.95 respectively). Intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
was excellent for both the BI and Rankin Scale (kw=0.98 and 0.95 respectively).

Hseuh, Lee and Hsieh (2001) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the BI in Taiwanese patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., at four time points (14, 30, 90 and 180 days post-stroke). The inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
between items of the BI was adequate (weighted kappa = 0.53) to excellent (weighted kappa =0.94). The inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
for the total score was excellent (ICC=0.94).

Oveisgharan and coll. (2006) examined inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of a Persian translated version of the BI; inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was excellent (weighted kappa =0.99).

Cincura and coll. (2008) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the National Institutes of Health StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale, Modified Rankin Scale and the BI in Brazilian patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was found to be adequate (kappa =0.70).

Validity

Content:

No studies have examined the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension.
of the BI in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Criterion:

Concurrent:
Hsueh, Lin, Jeng and Hsieh (2002) examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of MBI and the 5-item BI (BI-5) with the motor subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
of the FIM in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Spearman correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. The three measures were administered to 118 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. at admission to and discharge from an inpatient rehabilitation unit. Concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the MBI and the FIM motor subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
was excellent at admission and discharge (r=0.92 and 0.94 respectively), whereas the 5-item BI demonstrated adequate to excellent concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
with the FIM motor subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
at admission and discharge (r=0.74 and 0.92 respectively).

Predictive:
Hseuh, Lee and Hsieh (2001) examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the MBI in 121 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. by comparison with the Frenchay ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Indice (FAI), using Pearson product-moment correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. The MBI was administered at 14, 30, 90 and 180 days post-stroke and the FAI was administered at 180 days post-stroke. The MBI scores at 14, 30 and 90 days post-stroke demonstrated adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with FAI scores at 180 days post-stroke, (r=0.59, 0.66. 0.63 respectively). Results of this study found the MBI to be an adequate predictor of instrumental ADL performance at six months following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. onset.

Patel, Coshall, Lawrence, Rudd and Wolfe (2001) examined the ability of the MBI and Frenchay Activity Indice (FAI) to predict whether a patient with post-stroke urinary incontinence would regain continence. The study involved 207 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. with new onset urinary incontinence in the acute phase of recovery. Univariate analysis and multiple regression analysis were used to determine predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
. The MBI and the FAI were administered on approximately day seven post-stroke to allow for medical stabilization and at 3-months post-stroke. Patients scoring 15 to 18 (out of 20) on the MBI on day seven were found more likely to regain continence as compared with those scoring less than 15 (Odds ratio=21.8, 95% CI=5.95 – 79.7). At 3 months, patients with incontinence were found to have greater disability as measured by the MBI (P<0.001) and FAI (P=0.002) and greater rates of institutionalization (P<0.001).

Sze, Wong, Leung and Woo (2001) investigated the predictors of falls in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using a study sample of 677 patients admitted to an inpatient rehabilitation stroke unitStroke units are designed to provide multidisciplinary specialized care for patients who have had a stroke. In the best units, the team consists of nurses, pharmacists, social workers, medical staff, and occupational, physical and speech therapists. Stroke units can be located in a special unit in a defined location, or can used as a roving stroke specialist team. (Hill, M. Stroke Units in Canada. CMAJ. 2002:167:649-50.). Initial assessments, including the MBI, were completed on admission (three to seven days following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. onset). For the purposes of their study, MBI scores were stratified as: ≥15 mild disability, 6-14 moderate to severe disability, and ≤5 very severe disability. Patients with moderate to severe disability (MBI scores 6-14) were found to have an increase risk for falls (odds ration 2.59, 95%CI=1.24-5.42, r=0.0114). DysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
was also found to put patients at an increased risk for falls (odds ratio 1.81; 95% CI, 1.03–3.17, r=.0382).

Tilling and coll. (2001) examined the ability of the MBI to predict functional recovery following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The MBI was administered to 299 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. at baseline, 2, 4, 6, and 12 months following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.; recovery trajectories were then plotted using the MBI scores in an effort to establish a prediction model based on the found normal patterns of recovery. Performance of the prediction model was validated using an additional group of 710 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Initial MBI scores, when considered along with individual patient characteristics (such as age, sex and pre-stroke disability), were found to be predictive of future MBI scores up to 1-year following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
was found to be even stronger when the patient’s actual observed recovery was taken into consideration and the predictions of future MBI scores were adjusted accordingly. Scoring <1 point below the predicted score on the MBI was found to be predictive of death before the next assessment time point (65% sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
, 79% specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
). The results of this study suggest that this model can aid in establishing initial recovery predictions, developing rehabilitation goals and monitoring"The process of checking the task over time for ‘quality control’ and the adjustment of behavior" (Stuss, 2009, p. 9-10)
recovery in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Chang, Tseng, Weng, Lin, Liou and Tan (2002) examined the predictors of acute care hospital length of stay in 330 patients with first-ever acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Univariate analysis and multiple regression analysis were used to determine predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
. MBI scores at admission (r=0.042), along with National Institute of Health StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale (NIHSS) scores at admission (r=0.001), the quadratic term of initial NIHSS score (r=0.001), small-vessel occlusion strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (r<0.001), gender (male) (r=0.004) and smoking (r=0.043) were found to be the most significant predictors of hospital length of stay. A one-point decrease in score on the MBI (indicating a decline in function) corresponded to an increase in length of stay by approximately one day.

Hsieh and coll. (2007) investigated the minimal clinically important difference (MCID) of the modified 10-item BI in a two-part study involving patients with sub-acute to chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. In the initial part of the study, 43 patients with sub-acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. that demonstrated potential for improvement with regard to activities of daily living (ADL)Basic tasks that involve bodily issues (bathing, dressing, toileting, transferring, continence, eating and walking) that are done on a daily basis. were selected for a 4-week intensive occupational therapy program. The MBI and a 15-point Likert-type scale assessing the patients’ perceived global ratings of their ADL function were administered at baseline and at discharge (with a mean interval between assessment and discharge of 25 days). The estimated MCID was 1.85. The second part of the study involved assessing the repeatability of scores in 56 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. who were thought to have stable ADL function. The estimated MCID was 1.45. Results indicate that an improvement in total score by 1.85 points or more (on the 0 to 20 scoring scale) indicate a meaningful change beyond measurement error, and thus a change in score less than 1.85 points may be subject to measurement error.
Note: The MCID estimated in this study is applicable only for improvement in function, not deterioration.

Construct:

Wilkinson and coll. (1997) investigated the construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the MBI as a standard long-term outcome measure of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The Hospital Anxiety and DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale (HADS), London Handicap Scale (LHS), Frenchay ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Indice (FAI), SF36, Nottingham Health Profile (NHP) and the Life Satisfaction Indice (LSI) were administered alongside the MBI in a long-term study involving 106 patients with first-ever strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (patients were followed for a mean interval of 4.9 years). Rank CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
Coefficients were excellent between the MBI and SF36 Physical Functioning dimension (r=0.81), NHP Energy (r=0.605) and Physical Mobility (r=0.840) dimensions, LHS (r=0.726) and FAI (r=0.826). Rank CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
Coefficients were adequate between the MBI and the SF36 Social Functioning (r=0.481), Role: Physical (r=0.415), Mental Health (r=0.332), Vitality (r=0.500), Bodily Pain (r=0.356) and General Health (r=0.438) dimensions, HADS (r=-0.563), and LSI (r=0.361). Poor correlations were found between the MBI and the SF36 Role: Emotional dimension (r=0.217) and NHP Sleep dimension (r=0.189). The results of this study suggest that the MBI should be administered alongside other measures that assess the psychosocial dimensions of health status as the MBI fails to sufficiently assess these aspects.

Convergent/Discriminant:
Hseuh, Lee and Hsieh (2001) examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MBI, Berg Balance Scale (BBS) and the Fugl-Meyer Motor Assessment (FMA) in 121 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Pearson product-moment correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient . The three measures were administered at 14, 30, 90 and 180 post-stroke. The total MBI score had excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the FMA and BBS scores at all four time points (MBI and FMA r=0.8, 0.81, 0.78, 0.8; MBI and BBS r =0.89, 0.94, 0.9, 0.91 respectively).

Known Groups:
No studies have examined the known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured.
of the BI in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Responsiveness

Wood-Dauphinee, Williams and Shapior (1990) compared the responsivenessThe ability of an instrument to detect clinically important change over time.
of the BI to the Fugl-Meyer Assessment (FMA) in 167 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Patients were assessed at admission to hospital and at 5-weeks post-stroke. The correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between mean change in FMA Upper and Lower Extremity Motor subscores and total Barthel Indice scores was adequate (r = 0.57), as calculated using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. The FMA and BI were both found to have small effect sizes (ES = 0.24 and 0.42 respectively) from admission to 5-weeks post strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The results of this study suggest that both measures have poor responsivenessThe ability of an instrument to detect clinically important change over time.
with the BI being more sensitive to detecting change that the FMA.

Salbach and coll. (2001) examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the BI, Timed Up and Go (TUG), Berg Balance Scale (BBS), 10 meter walk test (10mWT) and 5 meter walk test (5mWT) in 50 patients with residualgait deficits after a first-time strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The BI, BBS 5mWT and 10mWT demonstrated large effect sizes and the TUG demonstrated a moderate effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
, between 8 days and 38 days post-stroke, as calculated using standardized response means (SRM = 0.99, 1.04, 1.22, 0.92 and 0.73 respectively).

Hsueh, Lin, Jeng, and Hsieh (2002) compared the responsivenessThe ability of an instrument to detect clinically important change over time.
of the BI, 5-item short form BI (BI-5) and motor-FIM in 118 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. undergoing treatment on an inpatient rehabilitation unit. The BI, BI-5 and motor-FIM all exhibited high responsivenessThe ability of an instrument to detect clinically important change over time.
, as calculated using standardized response meanThe standardized response mean (SRM) is calculated by dividing the mean change by the standard deviation of the change scores.
(SRM) (BI=1.2; 5-BI=1.2; motor-FIM=1.3) indicating significant sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
for detecting change.

Wallace, Duncan, and Lai (2002) compared the responsivenessThe ability of an instrument to detect clinically important change over time.
of the BI to that of the motor-FIM for recovery following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Change was measured using the Modified Rankin Scale. The BI and motor-FIM were administered to 372 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. at one and three months following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The BI and motor-FIM were both found to have small effect sizes (ES = 0.31 and 0.28 respectively), indicating similar responsivenessThe ability of an instrument to detect clinically important change over time.
between the measures.

Van der Putten, Hobart, Freeman and Thompson (1999) compared the responsivenessThe ability of an instrument to detect clinically important change over time.
of the MBI to that of the motor and cognitive components of the FIM and the FIM total score in 201 patients with multiple sclerosis and 82 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. undergoing inpatient neuro-rehabilitation. The MBI and the total-FIM and motor-FIM all demonstrated large effect sizes for patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (ES = 0.95, 82, 91 respectively) and the cognitive-FIM demonstrated an adequate effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
(ES = 0.61). Change in scores for all scales in both disease groups were positive, indicating less disability on discharge than admission. Effect sizes on the MBI were similar to those of the FIM in both patient groups.

Hsueh, Lee and Hsieh (2001) examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the MBI in 121 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The MBI was administered at 14, 30, 90 and 180 post-stroke. Standardized effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
scores were calculated for the intervals between 14-30 days, 30-90 days, 90-180 days and 14-180 days. The MBI demonstrated moderate to large effect sizes for all intervals, except for the 90-180 days post-stroke interval (ES = 0.56, 0.53, 0.11 and 1.27 respectively). The largest effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
was 14-180 days post-stroke, indicating that the MBI is most sensitive to detecting change in ADL function over longer periods of time.

Dromerick, Edwards, and Diringer (2003) examined responsivenessThe ability of an instrument to detect clinically important change over time.
of the MBI and the FIM in a sample of 95 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. on admission to and discharge from a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. rehabilitation service. The Modified Rankin Scale and the International StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Trial Measure were used to measure disability. The FIM was found to be more responsive to change from admission to discharge than the MBI, as calculated using the standardized response meanThe standardized response mean (SRM) is calculated by dividing the mean change by the standard deviation of the change scores.
(SRM) (SRM= 2.18 vs. 1.72). The MBI detected change in 71/95 subjects but demonstrated ceiling effects with 27% of subjects scoring >95. The results of this study found the FIM to be the most sensitive of the four measures, detecting change in 91/ 95 patients, including change in 18 patients in whom the MBI detected no change.

Schepers, Ketelaar, Visser-Meily, Dekker and Lindeman (2006) investigated the responsivenessThe ability of an instrument to detect clinically important change over time.
of the MBI, FIM, Frenchay ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Indice (FAI), and StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Adapted Sickness Impact Profile 30 (SA-SIP30). The four measures were administered to 163 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. at admission to inpatient rehabilitation and at 6-months and 1-year post strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The MBI and the FIM total and motor scores were found to have a large effect sizes at 6-months post strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (ES 0.98, 0.84 and 0.89 respectively) and a moderate effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
at 1-year post strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (ES = 0.52, 0.47 and 0.51 respectively). The FIM cognitive score was found to have a moderate effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
at both 6-months and 1-year post strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (ES = 0.47 at both time points). The SASIP30 and FAI demonstrated moderate effect sizes at 1-year post stoke (ES = 0.63 and 0.59 respectively). Results of this study indicate that the MBI and FIM (total and motor) are most apt to detect change in the subacute phase.
Note: The effect sizes for the SIP30 and FAI were not calculated at 6-months post strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. due to insufficient data. The FAI was only administered to patients who resided at home during the time of testing as the measure pertains to function relating daily housekeeping and activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
typically performed outside of the rehabilitation or hospital environment.

References

Bohannon, R. & Landes, M. (2004). Reliability, validity, and responsiveness of a 3-Item Barthel for characterizing the physical function of patients hospitalized for acute stroke. Journal of Neurologic Physical Therapy, 28(3), 110-113.
Brazil, L., Thomas, R., Laing, R., Hines, F., Guerrero, D., Ashley, S., & Brada, M. (1997). Verbally administered Barthel Index as functional assessment in brain tumour patients. Journal of Neuro-Oncology, 34(2), 187-192.
Chang, K., Tseng, M., Weng, H., Lin, Y., Liou, C. & Tan, T. (2002). Prediction of length of stay of first-ever ischemic stroke. Stroke, 33(1), 2670-2674.
Cincura, C., Pontes-Neto, O., Neville, I., Mendes, H., Menezes, D., Mariano, D. et al. (2009). Validation of the National Institutes of Health Stroke Scale, Modified Rankin Scale and Barthel Index in Brazil: The role of cultural adaptation and structured interviewing. Cerebrovascular Diseases, 27, 119-122.
Collin, C., Wade, D., Davies, S., & Horne, V. (1988). The Barthel ADL Index: a reliability study. International Disability Studies, 10, 61-63.
Condouret, J., Pujol, M., Roques, C. F., Roudil, J., Soulages, X., & Bourg, V. (1988). Valeur et limites de l’incide de Barthel a propos de 115 malades hemiplegiques. In: J. Pelissier (dir.), Hemiplegie vasculaire de l’adule et medicine de reeducation. Paris: Masson, p. 45-51.
Dromerick, A., Edwards, D. & Diringer, M. (2003). Sensitivity to changes in disability after stroke: A comparison of four scales useful in clinical trials. Journal of Rehabilitation Research and Development, 40, 1-8.
Duffy, L., Gajree, S., Langhorne, P., Stott, D. & Quinn, T. (2013). Reliability (inter-rater agreement) of the Barthel Index for assessment of stroke survivors. Stroke, 44, 462-468.
Duncan, P., Lai, S., Tyler, D., Perera, S., Reker, D. & Studenski, S. (2002). Evaluation of proxy responses to the Stroke Impact Scale. Stroke, 33, 2593.
Duncan, P., Samsa, G., Weinberger, M., Goldstein, L., Bonito, A., Witter, D., Enarson, C., & Matchar, D. (1997). Health status of individuals with mild stroke. Stroke, 28(4), 740-745.
Ellul, J., Watkins, C., & Barer, D. (1998). Estimating total Barthel scores from just three items: The European Stroke Database ‘minimum dataset’ for assessing functional status at discharge from hospital. Age and Ageing, 27(2), 115-122.
Finch, E., Brooks, D., Stratford, P. & Mayo, N. (2002). Physical Rehabilitations Outcome Measures. A Guide to Enhanced Clinical Decision-Making (2nd ed.), Canadian Physiotherapy Association, Toronto.
Fortinsky, R., Granger, C. & Seltzer, G. (1981). The use of functional assessment in understanding home care needs. Medical Care, 19, 489-497.
Granger, C., Greer, D., Liset, E., Coulombe, J., & O’Brien, E. (1975). Measurement of outcomes of care for stroke patients. Stroke, 6, 34-41.
Granger, C., Sherwood, C. & Greer, D. (1977). Functional status measures in a comprehensive stroke care program. Archives of Physical Medicine and Rehabilitation, 58, 555-561.
Granger, C. (1985). Outcome of comprehensive medical rehabilitation: an analysis based upon the impairment, disability, and handicap model. International Journal of Rehabilitation Medicine, 7, 45-50.
Granger, C., Hamilton, B., Gresham, G., & Kramer, A. (1989). The Stroke Rehabilitation Outcome Study: Part II. Relative merits of the total Barthel Index Score and a four-item subscore in predicting patient outcomes. Archives of Physical Medicine and Rehabilitaiton, 70, 100-103.
Green, J., Forster, N. & Young, J. (2001). A test-retest reliability study of the Barthel Index, the Rivermead Mobility Index, and the Nottingham extended Activities of Daily Living Scale and the Frenchay Activities Index in stroke patients. Disability and Rehabilitation, 23(15), 670-676.
Gresham, G., Phillips, T. & Labi, M. (1980). ADL status in stroke: relative merits of three standard indexes. Archives of Physical Medicine and Rehabilitation, 61, 355-358.
Hachisuka, K., Ogata, H., Ohkuma, H., Tanaka, S., & Dozono, K. (1997). Test-retest and inter-method reliability of the self-rating Barthel Index. Clinical Rehabilitation, 11(1), 28-35.
Hachisuka, K., Okazaki, T., & Ogata, H. Self-rating Barthel index compatible with the original Barthel index and the Functional Independence Measure motor score. Journal of University of Occupational and Environmental Health, 19(2), 107-121.
Harwood, R. & Ebrahim, S. (2000). Measuring the outcomes of day hospital attendance: a comparison of the Barthel Index and London Handicap Scale. Clinical Rehabilitation, 14, 527-531.
Heuschmann, P., Kolominsky-Rabas, P., Nolte, C., Hunermund, G., Ruf, H., Laumeier, I., Meyrer, R., Alberti, T., Rahmann, A., Hurth, T., & Berger, K. (2005). [The reliability of the german version of the barthel-index and the development of a postal and telephone version for the application on stroke patients]. Fortschritte der Neurologie-Psychiatrie und ihrer Grenzgebiete, 73(2), 74-82.
Hobart, J. & Thompson, A. (2001). The five item Barthel index. Journal of Neurology, Neurosurgery and Psychiatry, 71, 225-230.
Hocking, C., Williams, M., Broad, J., & Baskett, J. (1999). Sensitivity of Shah, Vanclay and Cooper’s Modified Barthel Index. Clinical Rehabilitation, 13, 141-147.
Hsieh, Y., Wang, C., Wu, S., Chen, P., Sheu, C. & Hsieh, C. (2007). Establishing the minimally clinically important difference of the Barthel Index in stroke patients. Neurorehabilitation and Neural Repair, 21, 233-238.
Hsueh, I., Lee, M., & Hsieh, C. (2001). Psychometric characteristics of the Barthel Activities of Daily Living index in stroke patients. Journal of the Formosan Medical Association, 100(8), 526-532.
Hsueh, I., Lin, J., Jeng, J. & Hsieh, C. (2002). Comparison of the psychometric characteristics of the functional independence measure, 5 item Barthel index, and 10 item Barthel index in patients with stroke. Journal of Neurology, Neurosurgery and Psychiatry, 73, 188-190.
Jansa, J., Pogacnik, T., & Gompertz, P. (2004). An evaluation of the Extended Barthel Index with acute ischemic stroke patients. Neurorehabilitation and Neural Repair, 18(1), 37-41.
Katz, P. (2003). Measures of Adult General Functional Status (The Barthel Index, Katz Index of Activities of Daily Living, Health Assessment Questionnaire (HAQ), MACTAR Patient Preference Disability Questionnaire, and Modified Health Assessment Questionnaire (MHAQ)). Arthritis & Rheumatism (Arthritis Care & Research), 49(5S), S15-S27.
Kelly-Hayes, M., Robertson, J., Broderick, J., Duncan, P., Hershey, L., Roth, E., Thies, W. & Trombly, C. (1998). The American Heart Association Stroke Outcome Classification. Stroke, 29, 1274-1280.
Korner-Bitensky, N. & Wood-Dauphinee, S. (1995). Barthel Index information elicited over the telephone: is it reliable? American Journal of Physical Medicine and Rehabilitation, 74, 9-18.
Kucukdeveci, A., Yavuzer, G., Tennant, A., Suldur, N., Sonel, B., & Arasil, T. (2000). Adaptation of the modified Barthel Index for use in physical medicine and rehabilitation in Turkey. Scandinavian Journal of Rehabilitation Medicine, 32(2), 87-92.
Leung, S., Chan, C. & Shah, S. (2007) Development of a Chinese version of the Modified Barthel Index- validity and reliability. Clinical Rehabilitation, 21, 912-922.
Loewen, S. & Anderson, B. (1990). Predictors of stroke outcome using objective measurement scales. Stroke, 21, 78-81.
Mahoney, F. & Barthel, D. (1965). Functional evaluation: The Barthel Index. Maryland State Medical Journal, 14, 61-5.
McDowell, I. & Newell, C. (1996). Measuring health: a guide to rating scales and questionnaires (pp. 63-67). (2nd Ed.), New York: Oxford University Press.
McGinnis, G., Seward, M., DeJong, G., & Osberg, J. (1986). Program evaluation of physical medicine and rehabilitation departments using self-report Barthel. Archives of Physical Medicine and Rehabilitation, 14, 61-65.
Oveisgharan, S. (2006, May). Barthel Index in a Middle East Country: Translation, Validity and Reliability. Poster presented at the European Stroke Conference, Brussels, Belgium.
Oveisgharan, S., Shirani, S., Ghorbani, A., Soltanzade, A., Abdolmehdi, B., Hosseini, S. et al. (2006). Barthel Index in a middle-east country: Translation, validity and reliability. Cerebrovacular Disease, 22, 350-354.
Patel, M., Coshall, C., Lawrence, E., Rudd, A., & Wolfe, C. (2001). Recovery from poststroke urinary incontinence: Associated factors and impact on outcome. Journal of the American Geriatrics Society, 49(9), 1229-1233.
Pietra, G., Savio, K., Oddone, E., Reggiani, M, Monaco, F. & Leone, M. (2011). Validity and reliability of the Barthel Index administered by telephone. Stroke, 42, 2077-2079.
Post, M., van Asbeck, F., van Dijk, A., & Schrijvers, A. (1995). [Dutch interview version of the Barthel Index evaluated in patients with spinal cord injuries]. Nederlands Tijdschrift voor Geneeskunde, 139(27), 1376-1380.
Prosiegel, M., Bottger, S., & Schenk, T. (1996). Der Erwertiertr Barthel Index (EBI)-eine neue Skala zur Erfassung von Fahigkeitsstorungen bei neurologischen patieneten. Journal of Neurologic Rehabilitation, 1, 7-13.
Rockwood, K., Stolee, P., & Fox, R. (1993). Use of goal attainment scaling in measuring clinically important change in the frail elderly. Journal of Clinical Epidemiology, 46, 1113-1118.
Roy, C., Togneri, J., Hay, E., & Pentland, B. (1988). An inter-rater reliability study of the Barthel index. International Journal of Rehabilitation Research, 11, 67-70.
Sainsbury, A., Seebass, G., Bansal, A., & Young, J. B. (2005). Reliability of the Barthel Index when used with older people. Age and Ageing, 34, 228-232.
Salbach, N., Mayo, N., Higgins, J., Ahmed, S., Finch, L., & Richards, C. (2001). Responsiveness and predictability of gait speed and other disability measures in acute stroke. Archives of Physical Medicine and Rehabilitation, 82,(9), 1204-1212.
Schepers, V., Ketelaar, M., Visser-Meily, J., Dekker, J & Lindeman, E. (2006). Responsiveness of functional health status measures frequently used in stroke research. Disability and Rehabilitation, 28, 1035-1040.
Schonle, P. (1995). [The early rehabilitation Barthel Index-an early rehabilitation-oriented extension of the Barthel Index]. Rehabilitation (Stuttg), 34(2), 69-73.
Shah, S., Vanclay, F.,& Cooper, B. (1989). Improving the sensitivity of the Barthel Index for Stroke rehabilitation. Journal of Clinical Epidemiology, 42, 703-709.
Shinar, D., Gross, C., Bronstein, K., Licara-Gehr, E., Eden, D., Cabrera, A., et al. (1987). Reliability of the Activities of Daily Living Scale and its use in telephone interview. Archives of Physical Medicine and Rehabilitation, 68, 723-728.
Sinoff, G. & Ore, L. (1997). The Barthel Activities of Daily Living Index: self-reporting versus actual performance in the old-old (> 75 years). Journal of American Geriatric Society, 45, 832-836.
Skilbeck, C., Wade, D., Hewer, R., & Wood, V. (1983). Recovery after stroke. Journal of Neurology, Neurosurgery and Psychiatry, 46, 5-8.
Spector, W. (1996). Functional disability scales. In: B. Spilker (Ed.), Quality of Life and Pharmacoeconomics in Clinical Trials (2nd edition, pp. 133-43). Philadelphia: Lippincott-Raven.
Stone, S., Ali, B., Auberleek, I., Thompsell, A. & Young, A. (1994). The Barthel index in clinical practice: use on a rehabilitation ward for elderly people. Journal of the Royal College of Physicians of London, 28(5), 419-423.
Sze, K., Wong, E., Leung, H. & Woo, J. (2001). Falls among Chinese stroke patients during rehabilitation. Archives of Physical Medicine and Rehabilitation, 82(9), 1219-1225.
Tilling, K., Sterne, J. A., Rudd, A., Glass, T., Wityk, R. & Wolfe, C. (2001). A new method for predicting recovery after stroke. Stroke, 32, 2867.
Valach, L., Signer, S., Hartmeier, A., Hofer, K., & Steck, G. C. (2003). Chedoke-McMaster stroke assessment and modified Barthel Index self-assessment in patients with vascular brain damage. International Journal of Rehabilitation Research, 26(2), 93-99.
Van der Putten, J., Hobart, J., Freeman, J. & Thompson, A. (1999). Measuring change in disability after inpatient rehabilitation: comparison of the responsiveness of the Barthel Index and the Functional Independence Measure. Journal of Neurology, Neurosurgery and Psychiatry, 66, 480-484.
Wade, D. & Hewer, R. (1987). Functional abilities after stroke: measurement, natural history, and prognosis. Journal of Neurology, Neurosurgery and Psychiatry, 50, 177-182.
Wallace, D., Duncan, P. & Lai, S. (2002). Comparison of the responsiveness of the Barthel Index and the motor component of the Functional Independence Measure in stroke: the impact of using different methods for measuring responsiveness. Journal of Clinical Epidemiology, 55, 922-928.
Wilkinson, P. Wolfe, C., Warburton, F., Rudd, A., Howard, R., Ross-Russell, R., et al. (1997). Longer term quality of life and outcome in stroke patients: is the Barthel index alone an adequate measure of outcome? Quality in Health Care, 6, 125-130.
Wirotius, J. & Foucher-Barres, F. (1991). L’index de Barthel. Journal de Readaptation Medicale, 11(3), 183-187.
Wolfe, C., Taub, N., Woodrow, E. & Burney, P. (1991). Assessment of scales of disability and handicap for stroke patients. Stroke, 22, 1242-1244.
Wood-Dauphinee, S., Williams, J. & Shapiro, S. (1990). Examining outcome measures in a clinical study of stroke. Stroke, 21, 731-739.
Wylie, C. (1967). Gauging the response of stroke patients to rehabilitation. Journal of American Geriatrics Society, 5, 797-805.
Wyller, T., Sveen, U., & Bautz-Holter, E. (1995). The Barthel ADL Index One Year after Stroke: Comparison between Relatives’ and Occupational Therapist’s Scores. Age and Ageing, 24, 398-401.

See the measure

For a copy of the original BI click here.

Copyright Information:
The Maryland State Medical Society (https://www.medchi.org/) holds the copyright for the Barthel Index. It may be used freely for non commercial purposes with the following citation: Mahoney FI, Barthel D. “Functional evaluation: the Barthel Index.” Maryland State Med Journal 1965;14:56-61. Used with permission. Permission is required to modify the Barthel Index or to use it for commercial purposes.

DOC Screen

Evidence Reviewed as of before: 30-04-2019

Author(s)*: Alexandra Matteau

Editor(s): Annabel McDermott

Content consistency: Gabriel Plumier

Purpose

The DOC screen is a screening tool that can be used to identify individuals at high risk of depression, obstructive sleep apnea and cognitive impairment following a stroke.

In-Depth Review

Purpose of the measure

The DOC screen is a screeningTesting for disease in people without symptoms.
tool that identifies individuals at high risk of depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
, obstructive sleep apnea and cognitive impairment following a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Available versions

The DOC screen was developed by Swartz et al. and was first published in 2013. The tool was developed by combining and modifying three existing validated brief screens, the 2-item Patient Health Questionnaire (PHQ-2), the STOP questionnaire and a 10-point version of the Montreal Cognitive Assessment (MoCA).

Features of the measure

Items:

The DOC screen comprises three screeningTesting for disease in people without symptoms.
tests:

DOC – Mood (PHQ-2)

This test comprises two items with the purpose of screeningTesting for disease in people without symptoms.
for depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
. The test evaluates the degree to which an individual has experienced depressed mood and anhedonia over the past two weeks.

DOC – Apnea (STOP Questionnaire)

This test comprises four items with the purpose of screeningTesting for disease in people without symptoms.
for obstructive sleep apnea: snoring, tiredness during daytime, breathing interruption during sleep, and hypertension.

DOC – Cog (10-point version of the MoCA)

This test comprises three tasks with the purpose of screeningTesting for disease in people without symptoms.
for cognitive impairment: clock drawing, abstraction, and 5-word recall (memory).

Scoring:

Each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
has different scoring and is interpreted independently.

DOC – Mood (total score 0-6)

The two items are scored from 0-3 whereby the respondent is asked to rate how often each symptom occurred over the last 2 weeks:

0 = not at all
1 = several days
2 = more than half of the days
3 = nearly every day.

DOC – Apnea (total score 0-4)

The four items are scored on a dichotomic scale (0 = no, 1 = yes) according to whether or not the respondent experiences each symptom.

DOC – Cog (total score 0-10)

Clock drawing task (0-3 points): 1 point each is given for (i) contour, (ii) numbers and (iii) the hands of the clock.
Abstraction task (0-2 points): 1 point is given for each item pair correctly answered.
Delayed recall task (0-5 points): 1 point is given for each word recalled without any cues.

The score for each task is summed to calculate the subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
score.

Each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
is then summed to obtain a total score ranging between 0 and 20.

A raw score interpretation and a regression interpretation can be obtained at http://www.docscreen.ca/.

Time:

The DOC screen takes approximately 5 minutes to complete.

Subscales:

The DOC screen is comprised of three subscales: DOC Mood, DOC Apnea and DOC Cog.

Equipment:

A pencil and the test form are needed to complete the DOC screen.

Training:

No training requirements have been reported. The DOC screen can be administered by any individual who is able to correctly follow the instructions, but must be interpreted by a qualified health professional.

Alternative forms of the DOC Screen:

An alternative version is available and uses different words for the memory and abstraction tasks. This version must be used if the patient has previously been exposed to the MoCA or DOC screen to minimize any learning effects associated with repeated administration.

The E-DOC screen is an electronic version of the tool, which is available through the DOC screen website. The E-DOC screen has not been validated.

Client suitability

Can be used with:

Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
The DOC screen may also be suitable for use among patients with other neurological and vascular disorders such as multiple sclerosis, Alzheimer’s disease, mild cognitive impairment, Parkinson’s Disease and traumatic brain injury. However, no study has been conducted with this population.

Should not be used with:

While no contraindications have been reported, some considerations must be made when completing the test:

A translator, family member or caregiver can provide translation for patients who do not speak English fluently;
Provide visual aid (e.g. glasses) for patients with visual loss;
Speak loudly and clearly for patients with reduced hearing;
Motor tasks such as the clock drawing activity may be difficult for patients with motor impairments – use sound clinical judgement for this task;
Use alternative communication strategies for patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada).

In what languages is the measure available?

English

Summary

What does the tool measure?	Depression, obstructive sleep apnea and cognitive impairment following stroke.
What types of clients can the tool be used for?	Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Screening.
Time to administer	Five minutes.
Versions	DOC screen E-DOC screen A second version is available to minimize learning effects associated with repeated administration.
Languages	The DOC screen is only available in English.
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: No studies have examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the DOC screen. Test-retest: No studies have examined test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the DOC screen. Intra-rater: No studies have examined intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the DOC screen. Inter-rater: No studies have examined inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the DOC screen.
Validity	Criterion: Concurrent: No studies have examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the DOC screen. Predictive: No studies have examined predictive validity of the DOC screen. Construct: Convergent/Discriminant: No studies have examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the DOC screen. Known groups: No studies have examined known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured. . However, one study examined the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative). and reported that the DOC screen is a valid measure that can reliably identify patients at high-risk of depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. , obstructive sleep apnea and cognitive impairment.
Floor/Ceiling Effects	No studies have examined the floor or ceiling effects of the DOC screen.
Does the tool detect change in patients?	Not reported.
Acceptability	The DOC screen is a standardized screeningTesting for disease in people without symptoms. tool suitable for use with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. patients.
Feasibility	The measure is brief, easy to score and requires no formal training. A study on 1503 patients showed that 89% of participants completed the screen in 5 minutes or less.
How to obtain the tool?	The DOC screen is free to use for clinical and educational purposes. The administration manual and forms are available online from the following website: http://www.docscreen.ca/

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the DOC screen in individuals with stroke. We identified only one study, which was published in part by the developers of the measure. More studies are required before definitive conclusions can be drawn regarding the reliability and validity of the DOC screen.

Floor/Ceiling Effects

No studies have examined the floor or ceiling effects of the DOC screen.

Reliability

Internal consistency:
No studies have examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the DOC screen.

Test-retest:
No studies have examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the DOC screen.

Inter-rater:
No studies have examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the DOC screen.

Intra-rater:
No studies have examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the DOC screen.

Validity

Criterion:

Concurrent:
No studies have examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the DOC screen.

Predictive:
No studies have examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the DOC screen.

Construct:

Convergent/Discriminant:
No studies have examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the DOC screen.

Known groups:
No studies have examined the known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured.
of the DOC screen.

Responsiveness

No studies have examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the DOC screen.

Sensitivity and Specificity:

Swartz et al. (2017) examined the sensitivity and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of the DOC screen for detecting depression, obstructive sleep apnea and cognitive impairment using receiver operating characteristic (ROC), area under the curve analyses (AUC) and the two-cut point approach. DOC-Mood was compared with the Structured Clinical Interview for DSM Disorders (SCID-D) and excellent sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
(92%) and specificity (99%) was identified for detecting depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
(AUC=0.898). DOC-Apnea was compared with results on polysomnography (PSG) and excellent sensitivity (95%) and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
(96%) for detecting obstructive sleep apnea was identified (AUC=0.660). DOC-Cog was compared to a 30-minute neuropsychological tests protocol proposed by Hachinski et al. (2006) and excellent sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
(100%) and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
(95%) for detecting cognitive impairment was identified (AUC=0.776).

References

Hachinski, V., Iadecola, C., Petersen, R. C., Breteler, M. M., Nyenhuis, D. L., Black, S. E., … & Vinters, H. V. (2006). National Institute of Neurological Disorders and Stroke–Canadian stroke network vascular cognitive impairment harmonization standards. Stroke, 37 (9), 2220-2241.
Swartz, R. H., Cayley, M. L., Lanctôt, K. L., Murray, B. J., Cohen, A., Thorpe, K. E., … & Herrmann, N. (2017). The “DOC” screen: Feasible and valid screening for depression, Obstructive Sleep Apnea (OSA) and cognitive impairment in stroke prevention clinics. PloS one, 12 (4), e0174451.

See the measure

How to obtain the DOC Screen?

The form and manual of administration are available online from the following website: http://www.docscreen.ca/

The Doc screen is free to use for clinical and educational purposes and therefore no permissions are required.

Frenchay Activities Index (FAI)

Evidence Reviewed as of before: 19-08-2008

Author(s)*: Lisa Zeltzer, MSc OT

Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Content consistency: Gabriel Plumier

Purpose

The Frenchay Activities Index (FAI) is a measure of instrumental activities of daily living (IADL) for use with patients recovering from stroke. The FAI assesses a broad range of activities associated with everyday life. The benefit of the FAI is that while activities of daily living scales tend to focus on issues related to self-care and mobility (Holbrook & Skilbeck, 1983), the FAI provides a broader measurement of actual activities patients have undertaken in the recent past (Wade, Legh-Smith, & Langton, 1985).

In-Depth Review

Purpose of the measure

The Frenchay ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Index (FAI) is a measure of instrumental activities of daily living (IADL)Complex tasks that involve social or societal issues (shopping, bill paying, cooking, housework, etc.) that are done on a regular basis. for use with patients recovering from stroke. The FAI assesses a broad range of activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
associated with everyday life. The benefit of the FAI is that while activities of daily living scales tend to focus on issues related to self-care and mobility (Holbrook & Skilbeck, 1983), the FAI provides a broader measurement of actual activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
patients have undertaken in the recent past (Wade, Legh-Smith, & Langton, 1985).

Available versions

The FAI was published by Margaret Holbrook and Clive E. Skilbeck in 1983.

Features of the measure

Items:

The FAI contains 15 items or activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
that can be separated into 3 subscales; Domestic chores, Leisure/work and Outdoor activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
.

The items of the FAI are as follows:

Preparing main meals
Must play a substantial part in organization, preparation and cooking.
Washing up
Must do all or share equally, e.g. washing or wiping and putting away.
Washing clothes
Organization of washing and drying clothes. Sharing task equally, e.g. loading, unloading, hanging, folding.
Light housework
Dusting, ironing, tidying small objects. Anything heavier is included in item 5.
Heavy housework
Changing beds, cleaning floors, windows, vacuuming, moving chairs, etc.
Local shopping
Substantial role in organizing and buying groceries. Can include collection of pension or going to the Post Office.
Social outings
Going out to clubs, cinema, theatre, drinking, dinner with friends, etc. May be transported there, provided patient takes an active part once arrived. Includes social activities at home, initiated by the patient.
Walking outdoors over 15 minutes
Sustained walking for at least 15 minutes (allowed short stops for breath).
Pursuing active interest in hobby
Must require ‘active’ participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations., e.g. caring for houseplants, knitting, reading specialist magazines or window-shopping.
Driving a car
Must drive a car, or get to a bus/coach and travel on it independently.
Outings/car rides
Train, bus, or car rides to some place for pleasure, not for a routine social outing. Must involve patient organization and decision-making. Holidays within the last 6 months are divided into days/month (e.g. a 7-day holiday = 1 or 2 days/month).
Gardening
Light = occasional weeding or sweeping; Moderate = regular weeding, raking, pruning; Heavy = all necessary work including heavy digging.
Household and/or car maintenance
Light = repairing small items, replacing lightbulb or plug; Moderate = spring cleaning, hanging a picture, routine car maintenance; Heavy = painting/decorating, most necessary household/car maintenance.
Reading books
Full-length books, not magazines or newspapers. Can be talking books.
Gainful work
Paid work, not voluntary work. The time worked should be averaged out over six months (e.g., 1 month working for 18 hours/week over the 6-month period would be scored as ‘up to 10 hours/week’).

Time:

The FAI takes approximately 5 minutes to complete when administered in an interview format (with or without the patient’s family) (Segal & Schall, 1994).

Scoring:

The frequency with which each item or activity is undertaken over the past 3 or 6 months (depending on the nature of the activity) is assigned a score of 1 – 4 where a score of 1 = lowest level of activity. The scale provides a summed score from 15 – 60.

A modified 0-3 scoring system introduced by Wade et al. (1985) yields a score of 0 – 3 for each item, and a summed score from 0 – 45.

Note: In patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., the FAI should be used to assess pre-morbid IADL at 3 and 6 months before strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., and subsequently to record changes in IADL following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., at specific intervals (Holbrook & Skilbeck, 1983). Studies typically examine change in post-stroke IADL by examining patients at 1 year after stroke, and looking retrospectively at the past 3 and 6 months.

Subscales:

There are 3 subscales to the FAI:

Domestic (items 1-5)
Leisure/work (items 7, 9, 11, 13, 15)
Outdoors (items 6, 8, 10, 12, 14)

Equipment:

Only the questionnaire and a pencil are needed to complete the FAI.

Training:

No training is required to complete the FAI. The FAI is most often interview-administered.

The FAI can be used as a mailed questionnaire. Carter, Mant, Mant, Wade, and Winner (1997) reported an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between mailed questionnaire FAI scores and face-to-face interview scores (r = 0.94).

The FAI can also be used with a proxy respondent. Proxy agreement was excellent for the FAI (intraclass correlation coefficient (ICC)Intraclass correlation (ICC) is used to measure inter-rater reliability for two or more raters. It may also be used to assess test-retest reliability. ICC may be conceptualized as the ratio of between-groups variance to total variance. = 0.85) (Segal & Schall, 1994). Holbrook and Skilbeck (1983) found that information obtained by relatives were interchangeable with information acquired from the patient. Segal and Schall (1994) reported proxy agreement for the three subscales as ranging from adequate (ICC = 0.59 for Leisure/work) to excellent (ICC = 0.77 for Domestic and Outdoors).

Alternative Forms of the FAI

FAI-18 (Miller, Deathe, & Harris, 2004).
Three items (sport/recreation and visiting in the last 3 months, and banking in the last 6 months) were added to the FAI and the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
was examined in patients with lower limb amputation. The total score of the FAI-18 ranges from 0 to 54. Support for the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
(r = -0.46), the Prosthetic Evaluation Questionnaire-Mobility Scale (r = 0.40) and the Activities-specific Balance Confidence Scale (r = 0.52). The FAI-18 was not found to offer any advantage over the original FAI and therefore use of the original FAI is recommended to ensure results are comparable between populations and studies. Further, the FAI-18 has not been examined in patients with stroke.
Modified FAI (Tooth, McKenna, Smith, & O’Rourke, 2003).
A 13-item modified version has been developed based on the recommendations by Schuling, de Haan, Limburg, and Groenier (1993) to omit the items ‘reading books’ and ‘gainful work’. At 6 months post-stroke, the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the 13 FAI items was excellent when scored by patients (alpha = 0.85) and when scored by proxies (alpha = 0.83). However, the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
examined separately varied widely.

Client suitability

Can be used with

Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Can also be used with patients with cognitive impairment, using a proxy respondent. The focus of the FAI is on frequency of activity rather than quality of activity. This may reduce elements of subjectivity, which typically undermine the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of proxy assessment (Segal & Schall, 1994).

Should not be used with

When examining FAI scores, male and female scores should be considered separately as there is evidence of a gender bias in FAI scores (Holbrook & Skilbeck, 1983). Sveen, Bautz-Holter, Sodring, Wyller, and Laake (1999) reported that men had significantly higher scores in the Outdoor activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
, and there was a trend towards women having higher scores in the Domestic activity subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
.
Due to individual variability, the FAI should not be administered by interview and by mailed questionnaire, sequentially (Carter et al., 1997).
Use caution when examining proxy ratings at the item level, because there is less agreement than what has been observed with the total score (Wyller, Sveen, & Bautz-Holter, 1996; Tooth et al., 2003).
Be aware of the biases involved with proxy use. Tooth et al. (2003) reported that patients tend to score themselves as performing activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
more frequently than proxy respondents especially in meal preparation, heavy housework, social outings, driving and home maintenance. In addition, male proxy respondents and respondents who are relatives (rather than spouses) tend to give higher ratings, particularly in the area of domestic activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
.

In what languages is the measure available?

English
Dutch – translated (Schuling, de Haan, Limburg, & Groenier, 1993)
Chinese – translated and validated (Hsueh & Hsieh, 1997)

Summary

What does the tool measure?	Instrumental ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. of Daily Living
What types of clients can the tool be used for?	Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	Interview: 5 minutes (with or without the patient’s family)
Versions	FAI-18, Modified FAI
Other Languages	Chinese (translated and validated), Dutch (translated)
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: Out of three studies examining internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency., three reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.. Test-retest: Out of four studies examining test-retest, three reported excellent test-retest, and one reported a range from poor to excellent depending on item examined. Inter-rater: Out of two studies examining inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. , two studies reported excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. as measured by intraclass correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. coefficients. Using Cohen’s kappa, one study reported adequate to excellent reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest . and one study reported poor to excellent reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest . .
Validity	Content: Three studies examined the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension. of the FAI suggesting the presence of a single underlying construct in that each item contributes to each of the three identified factors (Domestic; Leisure/work; Outdoors) Criterion: Excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between postal and interview FAI scores, however individual differences on scores ranged widely between mailed and postal responses taken 10 days later. Construct: Excellent correlations with Rankin Scale ; SF-36 (Physical Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). ). Adequate to excellent correlations with the Sickness Impact Profile ; Barthel Index ; Functional Independence Measure (Motor subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). ); Euroqol. Adequate correlations with StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Adapted Sickness Impact Profile ; SF-36 (Social Functioning and Vitality subscales); two-minute walk test, Timed Up and Go test ; Prosthetic Evaluation Questionnaire-Mobility; Activities-specific Balance Confidence Scale. Known groups: The FAI has been found to distinguish strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity in male patients only and can discriminate between patients in a pre-stroke versus a reference group, and patients’ pre-stroke and post-stroke levels of activity.
Does the tool detect change in patients?	One study reported an “obvious” floor effect for individuals examined at 6 months post-stroke. Out of two studies examined, one reported that the FAI had a moderate ability to detect change (in patients 6-12 months post-stroke) and one reported that the FAI changed in the expected direction from pre-stroke to 6 months post-stroke, to 1 year post-stroke.
Acceptability	The FAI is short, simple, and encourages participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. of significant others or family members. It is suitable for use with proxy respondents.
Feasibility	The FAI is simple to administer and requires no training or special equipment. It has been used for longitudinal assessment.
How to obtain the tool?	A copy of the original FAI provided in Holbrook, M., Skilbeck, C. E. (1983). An activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. index for use with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. patients. Age and Ageing, 12(2), 166-170.

Psychometric Properties

Overview

For the purposes of this review, we conducted a literature search to identify all relevant publications on the psychometric properties of the FAI. In general, the FAI has good overall reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
, however it has considerable variability in the strength of agreement at the level of individual scale item scores (reported both for test-retest and inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
). Further, there is little evidence regarding the responsivenessThe ability of an instrument to detect clinically important change over time.
of the FAI.

Floor/Ceiling Effects

Schuling et al. (1993) examined the psychometric properties of the FAI in a group of patients with stroke and a control group of individuals from the general population aged 65 or older. No ceiling effects were reported in this study.

Similarly, Wade et al. (1985) examined the psychometric properties of the FAI using data from 976 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. No ceiling effects were reported.

Pederson et al. (1997) examined the FAI in 437 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and reported an “obvious” floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect."
at 6 months post-stroke.

Walters, Morrell and Dixon (1999) examined the psychometric properties of four generic instruments in 233 patients with venous leg ulcers. The FAI demonstrated an adequate floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect."
of 2.1%. No ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." was observed.

Reliability

Internal consistency:

Schuling et al. (1993) examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the FAI retrospectively in a group of patients with stroke and a control group of individuals from the general population aged 65 or older. They looked at the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the FAI pre-stroke, 6 months post-stroke and in control patients. An excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. was reported for the total score of the FAI in the control group (alpha = 0.83) and in patients post-stroke (alpha = 0.87). An adequate alpha coefficient was reported for patients pre-stroke (alpha = 0.78). When subscales were examined individually, the Domestic subscale had excellent alpha coefficients (alpha = 0.82 for control and pre-stroke; 0.88 for post-stroke). The Leisure/work subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
had poor internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. in all groups (control, alpha = 0.63; pre-stroke, alpha = 0.58; post-stroke, alpha = 0.61). The Outdoors subscale also had poor internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. in all groups (control, alpha = 0.67; pre-stroke, alpha = 0.55; post-stroke, alpha = 0.66). However, when item 14 (reading books) was deleted, alpha coefficients were adequate for control and post-stroke groups (alpha = 0.72, alpha = 0.73, respectively) and remained poor in the pre-stroke group (alpha = 0.66).

Tooth et al. (2003) examined the agreement between patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and their proxies using a modified version of the FAI (13 items). At 6 months post-stroke, the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the 13 FAI items was excellent when scored by patients (alpha = 0.85) and when scored by proxies (alpha = 0.83). The internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
examined separately varied widely. Coefficient alphas for the Domestic, Leisure, and Outdoor subscales completed by patients ranged from poor to excellent (0.83, 0.38, 0.66, respectively), as did completion by proxies (0.83, 0.59, 0.57, respectively).

Miller et al. (2004) compared the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the FAI to a modified version, the FAI-18. The internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the FAI was excellent (alpha = 0.81).

Test-retest:

Wade et al. (1985) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the FAI and reported that the overall agreement of individual items was variable. Heavy housework, local shopping, walking outside and social outings failed to reach statistical significance, while other items demonstrated excellent agreement (r = 0.80).

Green, Forster, and Young (2001) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the Barthel Index (Mahoney & Barthel, 1965), the Rivermead Mobility Index (Nouri & Lincoln, 1987), the Nottingham Extended ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living Scale (Whiting & Lincoln, 1980), and the FAI in 22 patients > 1 year post-stroke, tested twice at an interval of 1 week. Kappa coefficients for the FAI ranged from poor (kappa = 0.25 for heavy housework) to excellent (kappa = 1.00 for preparing main meals). The results of this study indicate that basic measures of activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living (as measured by the Barthel Index and Rivermead Mobility Index) may be more reliable than the measures used to assess IADL.

Turnbull, Kersten, Habib, McLellan, Mullee, and George (2000) assessed the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the FAI to establish age and sex norms in people age 16 years and over. A postal questionnaire survey was sent to 1,280 people. Then 57 respondents completed a re-test questionnaire. Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the postal version of the FAI was excellent, with a correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
of r = 0.96.

Miller et al. (2004) examined the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the FAI in 84 individuals with lower limb amputation. Individuals completed the FAI twice, within two weeks. The ICC for the FAI was excellent (ICC = 0.79), demonstrating the test-retest reliability of the FAI.

Inter-rater:

Piercy, Carter, Mant, and Wade (2000) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the FAI in 35 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and 24 individuals who were the main caregivers for patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Two raters evaluated each person, 15 days apart on average. Kappa statistics showed an excellent level of agreement for 3/15 items (kappas ranging from 0.77-0.80). An adequate level of agreement was found for 10/15 items (kappas ranging from 0.42-0.73). The other 2 items showed poor agreement (social outings, 0.27; pursuing active interest in hobby, 0.35). Three items showed significant differences between the two raters (light housework, outing/car rides, household and/or car maintenance). Spearman’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
for FAI totals of rater B verses rater A was excellent (r = 0.93). The results of this study confirm the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the FAI when administered by interview.

Post and de Witte (2003) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the Dutch version of the FAI in 45 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The FAI was administered twice, with 3-5 days in between evaluations. The total inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the FAI was excellent (ICC = 0.90). At item level, kappa coefficients ranged from adequate to excellent (kappa = 0.41-0.90).

Validity

Content:

Wade et al. (1985) examined data from 976 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. A factor analysis was conducted to demonstrate levels of communality among the FAI’s items. Correlations ranged from 0.44-0.77, suggesting the presence of a single underlying construct in that each item contributes to each of the three identified factors (Domestic; Leisure/work; Outdoors) to some extent.

Pedersen, Jorgensen, Nakayama, Raaschou, and Olsen (1997) examined whether the FAI was a good supplementary assessment to the Barthel Index (Mahoney & Barthel, 1965) for measuring higher order ADL functions in 437 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The FAI was found to be a heterogeneous scale comprised of 3 factors, two of which may represent increased item difficulties, and the third related to activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
away from the home. Items from the Barthel Index and the FAI, when analyzed together, appeared on different, orthogonal factors, suggesting that the FAI supplements the Barthel Index with minimal content overlap.

Sveen et al. (1999) examined data from 65 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. to observe how motor and cognitive impairments relate to physical activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living. In this study, the 3-factor structure of the FAI was confirmed. These three subscales include Domestic chores, Outdoor activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
and Hobbies.

Criterion:

Concurrent:
Carter et al. (1997) examined the agreement between postal and interview-administered versions of the FAI, and assessed the criterion validityExamines the extent to which a measure provides results that are consistent with a gold standard . It is typically divided into concurrent validity and predictive validity .
of the postal version, using the interviewer method as the gold standardA measurement that is widely accepted as being the best available to measure a construct.
. An excellent Spearman’s correlation of r = 0.94 was found between mailed questionnaire FAI scores and face-to-face interview FAI scores. Individual differences on scores ranged widely between FAI responses by post and responses by interview 10 days later. At the level of individual items, kappas ranged from poor (kappa = 0.35 for travel outings/car rides) to excellent (kappa = 1.00 for gainful work). The postal version was found to be a satisfactory alternative to interview administration, however, due to poor agreement in scores for individual patients, the two approaches should not be used sequentially to monitor individual patient.

Cup, Scholte op Reimer, Thijssen, and van Kuyk-Minis (2003) administered a number of different standardized measures to 26 patients with stroke. The FAI had excellent correlations with the Barthel Index (Mahoney & Barthel, 1965) (r = 0.79), the Euroqol (r = 0.65) (EuroQol Group, 1990), and the Rankin Scale (r = -0.80) (de Haan, Limburg, Bossuyt, van der Meulen, & Aaronson, 1995). The FAI an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Adapted Sickness Impact Profile-30 (van Straten, de Haan, Limburg, Schuling, Bossuyt, & van den Bos, 1997) (r = -0.43).
Note: Some correlations are negative because a high score on the FAI indicates a high level of functioning, where as a high score on the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Adapted Sickness Impact Profile-30 and the Rankin Scale indicates less desirable health outcomes.

Segal and Schall (1994) examined the proxy agreement between 38 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and their caregivers. Using Spearman’s rho, the FAI and the Functional Independence Measure (Keith et al., 1987) were found to have an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
(r = 0.80).

Hsueh, Lee, and Hsieh (2001) examined the psychometric properties of the Barthel Index (Mahoney & Barthel, 1965) in 121 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The FAI was compared to the Barthel Index at 180 days after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and was found to have an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the Barthel Index scores obtained at 14, 30, and 90 days after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (Pearson’s r = 0.59).

Walters et al. (1999) examined the psychometric properties of four generic instruments: Short-Form Health Survey (SF-36) (Ware & Sherbourne, 1992); EuroQol (EuroQol Group, 1990); McGill Short Form Pain Questionnaire (Melzack, 1975) and the FAI in 233 patients with venous leg ulcers. Correlations were calculated using Pearson Product Moment Correlations. The FAI had an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the SF-36 subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
of Physical Functioning (r = 0.72). Poor correlations between FAI and the SF-36 subscales of Role Limitations-Physical (r = 0.25), Role Limitations-Emotional (r = 0.11), Pain (r = 0.28), General Health Perceptions (r = 0.30), and Mental Health (r = 0.26) were observed. Adequate correlations were found between the FAI and the SF-36 subscales of Social Functioning (r = 0.35), and Vitality (r = 0.45). The FAI had a moderate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the EuroQol Derived Single Index (r = 0.54), and a poor correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the McGill Pain Questionnaire Sensory (r = -0.12) and Affective (r = -0.13) subscales. Note: Some correlations are negative because a high score on the FAI indicates a high level of functioning, where as a high score on other measures indicates less desirable health outcomes.

Miller et al. (2004) examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the FAI in 84 individuals with lower limb amputation. As predicted, the FAI correlated adequately with the Two-minute walk test (r = 0.53), the Timed Up and Go test (Podsiadlo & Richardson, 1991) (r = -0.49), the Prosthetic Evaluation Questionnaire-Mobility Scale (Legro, Reiber, Smith, del Aguila, Larsen, & Boone, 1998) (r = 0.39), and the Activities-specific Balance Confidence Scale (Powell & Myers, 1995) (r = 0.51).
Note: Some correlations are negative because a high score on the FAI indicates a high level of functioning, whereas a high score on the Timed Up and Go test indicates less desirable health outcomes.

Construct:

Convergent/Discriminant:
Schuling et al. (1993) examined the construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the FAI in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., and in a group of unselected participants aged 65 or older. Functional status of the patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. was measured at 26 weeks. Correlations between the FAI and the Sickness Impact Profile (Bergner, Bobbitt, Carter, & Gilson, 1981) subscales of Home management, Body care and movement, Mobility and Ambulation ranged from adequate to excellent (r = -0.56 to -0.73). The FAI also had an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the disability scores of the Barthel Index (Wade & Collin, 1988) (r = 0.66). These results provide evidence for the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the FAI. Further, the discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess.
of the FAI is supported by the poor correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
found between FAI scores and Emotional Behavior and Alertness Behavior scales of the Sickness Impact Profile (r = -0.15 and -0.14).
Note: Some correlations are negative because a high score on the FAI indicates a high level of functioning, where as a high score on the Sickness Impact Profile indicates less desirable health outcomes.

Sveen et al. (1999) examined data from 65 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and found that Domestic chores and Outdoor activities (factors found in this study to make up the FAI) correlated adequately with Barthel Index (Mahoney & Barthel, 1965) scores (r = 0.58 and r = 0.50). Domestic chores was the factor most strongly related to arm motor function of the Barthel Index, and Outdoor activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
was most strongly related to visuospatial ability. Hobbies, the third factor found in this study, did not correlate with Barthel Index scores (r = 0.11).

Tooth et al. (2003) examined the construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the FAI in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and their proxies using a modified version of the index (13 items). The total patient FAI score was found to correlate significantly with the Motor subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
of the Functional Independence Measure (Keith, Granger, Hamilton, & Sherwin, 1987) (r = 0.63) but not with the Cognitive subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
of the Functional Independence Measure (r = 0.09).

Known groups:
Holbrook and Skilbeck (1983) divided patients by strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity into ‘mild’ and ‘severe’ based on Rankin grade at the time of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. They reported that the FAI distinguished severity of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (by Rankin groupings) in male patients, who showed significantly poorer Domestic chores scores and Outdoor activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
scores at follow up. However, strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity did not influence one-year follow-up for females.

Schuling et al. (1993) reported that the FAI was able to discriminate between patients in the pre-stroke group and patients in the reference group. The FAI was also discriminative of patients’ pre-stroke and post-stroke levels of activity.

Responsiveness

Schepers, Ketelaar, Visser-Meily, Dekker, and Lindeman (2006) compared the responsivenessThe ability of an instrument to detect clinically important change over time.
of frequently used functional health status measures in strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The FAI and the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Adapted Sickness Impact Profile detected the most changes and had moderate effect sizes for patients in the chronic phase (between 6 and 12 months post-stroke) of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. rehabilitation.

Wade et al. (1985) reported that FAI scores changed in the expected direction from pre-stroke to 6 months post-stroke to 1 year post-stroke.

References

Bergner, M., Bobbitt, R. A., Carter, W. B., Gilson, B. S. (1981). The Sickness Impact Profile: development and final revision of a health status measure. Med Care, 19, 787-805.
Carter, J., Mant, F., Mant, J., Wade, D., Winner, S. (1997). Comparison of postal version of the Frenchay
Activities Index with interviewer-administered version for use in people with stroke. Clin Rehabil, 11, 131-138.
Cup, E. H. C., Scholte op Reimer, W. J. M., Thijssen, M. C. E., van Kuyk-Minis, M. A. H. (2003). Reliability and validity of the Canadian Occupational Performance Measure in stroke patients. Clinical Rehabilitaton, 17(4), 402-409.
de Haan R., Limburg M., Bossuyt P., van der Meulen J., Aaronson, N. (1995). The clinical meaning of Rankin ‘handicap’ grades. Stroke, 26, 2027-2030.
Green, J., Forster, A., Young, J. (2001). A test-retest reliability study of the Barthel Index, the Rivermead Mobility Index, the Nottingham Extended Activities of Daily Living Scale and the Frenchay Activities Index in stroke patients. Disabil Rehabil, 23(15), 670-676.
Hamrin, E. (1982). One year after stroke: a follow-up of an experimental study. Scand J Rehabil Med, 14, 111-116.
Holbrook, M., Skilbeck, C. E. (1983). An activities index for use with stroke patients. Age and Ageing, 12(2), 166-170.
Hsueh, I.-P., Hsieh, C.-L. (1997). A revalidation of the Frenchay Activities Index in stroke: A study in Taipei area. Formorsan Med J, 6, 123-130 [in Chinese].
Keith, R. A., Granger, C. V., Hamilton, B. B., Sherwin, F. S. (1987). The functional independence measure: A new tool for rehabilitation. Adv Clin Rehabil, 1, 6-18.
Legro, M. W., Reiber, G. D., Smith, D. G., del Aguila, M., Larsen, J., Boone, D. (1998). Prosthesis Evaluation Questionnaire for persons with lower limb amputations: assessing prothesis-related quality of life. Arch Phys Med Rehabil, 79, 931-938.
Mahoney, F. I., Barthel, D. W. (1965). Functional evaluation: The Barthel Index. Md State Med J, 14, 61-65.
Melzack, R. (1975). The McGill Pain Questionnaire: Major Properties and Scoring Methods. Pain, 1, 277-289.
Miller, W. C., Deathe, A. B., Harris, J. (2004). Measurement properties of the Frenchay Activities Index among individuals with a lower limb amputation. Clinical Rehabilitation, 18(4), 414-422.
Nouri, F. M., Lincoln, N. B. (1987). An extended activities of daily living scale for stroke patients. Clin Rehab, 1, 301-305.
Pedersen, P. M., Jorgensen, H. S., Nakayama, H., Raaschou, H. O., Olsen, T. S. (1997). Comprehensive assessment of activities of daily living in stroke. The Copenhagen Stroke Study. Arch Phys Med Rehabil, 78, 161-165.
Piercy, M., Carter, J., Mant, J., Wade, D. T. (2000). Inter-rater reliability of the Frenchay Activities Index in patients with stroke and their carers. Clinical Rehabilitation, 14, 433-440.
Podsiadlo, E., Richardson, S. (1991). The Timed ‘Up & Go’: a test of basic functional mobility for frail elderly persons. J Am Geriatr Soc, 39, 142-148.
Post, M. W. M., de Witte, L. P. (2003). Good inter-rater reliability of the Frenchay Activities Index in stroke patients. Clinical Rehabilitation, 17(5), 548-552.
Powell, L., Myers, A. (1995). The Activities-specific Balance Confidence (ABC) scale. J Gerontol, 50, M28-34.
Schepers, V. P. M., Ketelaar, M., Visser-Meily, J. M. A., Dekker, J., Lindeman, E. (2006). Responsiveness of functional health status measures frequently used in stroke research. Disability & Rehabilitation, 28(17), 1035-1040.
Schuling, J., de Haan, R., Limburg, M., Groenier, K. H. (1993). The Frenchay Activities Index. Assessment of functional status in stroke patients. Stroke, 24, 1173-1177.
Segal, M. E., Schall, R. R. (1994). Determining functional/health status and its relation to disability in stroke survivors. Stroke, 25, 2391-2397.
Sveen, U., Bautz-Holter, E., Sodring, K. M., Wyller, T. B., Laake, K. (1999). Association between impairments, self-care ability and social activities 1 year after stroke. Disability & Rehabilitation, 21(8), 372-377.
The EuroQol Group. (1990). EuroQol – a facility for the measurement of health-related quality of life. Health Policy, 16, 199-207.
Tooth, L. R., McKenna, K. T., Smith, M., O’Rourke, P. (2003). Further evidence for the agreement between patients with stroke and their proxies on the Frenchay Activities Index. Clinical Rehabilitation, 17, 656-665.
Turnbull, J. C., Kersten, P., Habib, M., McLellan, L., Mullee, M. A., George, S. (2000). Validation of the Frenchay Activities Index in a general population aged 16 years and older. Arch Phys Med Rehabil, 81(8), 1034-1038.
van Straten, A., de Haan, R. J., Limburg, M., Schuling, J., Bossuyt, P. M., van den Bos, G. A. M. (1997). A Stroke-Adapted 30-Item Version of the Sickness Impact Profile to Assess Quality of Life (SA-SIP30). Stroke, 28, 2155-2161.
Wade, D. T., Legh-Smith, J., Langton, H. R. (1985). Social activities after stroke: Measurement and natural history using the Frenchay Activities Index. Int Rehabil Med, 7(4), 176-181.
Ware, J. E. Jr., Sherbourne, C. D. (1992). The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care, 30, 473-483.
Whiting, S., Lincoln, N. (1980). An ADL assessment for stroke patients. Br J Occup Ther, 43, 44-46.
Wade, D. T., Collin, C. (1988). The Barthel ADL Index: a standard measure of physical disability. Int Disability Studies, 10, 64-67.
Walters, S. J., Morrell, J., Dixon, S. (1999). Measuring health-related quality of life in patients with venous leg ulcers. Quality of Life Research, 8, 327-336.
Wyller, T. B., Sveen, U., Bautz-Holter, E. (1996). The Frenchay Activities Index in stroke patients: Agreement between scores by patients and by relatives. Disabil Rehabil, 18(9), 454-459.

See the measure

How to obtain the FAI:

For a copy of the FAI with the scoring system by Wade et al. (1985), please click here.

A copy of the measure with the original scoring system is also provided in Holbrook, M., Skilbeck, C. E. (1983). An activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
index for use with stroke patients. Age and Ageing, 12(2), 166-170.

Functional Independence Measure (FIM)

Evidence Reviewed as of before: 15-10-2011

Author(s)*: Lisa Zeltzer, MSc OT;

Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Purpose

The Functional Independence Measure (FIM) was developed to address the issues of sensitivity and comprehensiveness that were criticized as being problematic with the Barthel Index (another measure of functional independence). The FIM was also developed to offer a uniform system of measurement for disability based on the International Classification of Impairment, Disabilities and Handicaps for use in the medical system in the United States (McDowell & Newell, 1996). The level of a patient’s disability indicates the burden of caring for them and items are scored on the basis of how much assistance is required for the individual to carry out activities of daily living.

In-Depth Review

Purpose of the measure

The FIM assesses six areas of function (Self-care, Sphincter control, Transfers, Locomotion, Communication and Social cognition), which fall under two Domains (Motor and Cognitive). It has been tested for use in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., traumatic brain injury, spinal cord injury, multiple sclerosis, and elderly individuals undergoing inpatient rehabilitation and has been used with children as young as 7 years old.

Available versions

The FIM was developed between 1984 and 1987 by a national task force sponsored by the American Academy of Physical Medicine and Rehabilitation and the American Congress of Rehabilitation Medicine and was published by Keith, Granger, Hamilton, and Sherwin in 1987.

Features of the measure

Items:

The FIM consists of 18 items assessing 6 areas of function. The items fall into two domains: Motor (13 items) and Cognitive (5 items). The motor items are based on the items of the Barthel Index. These domains are referred to as the Motor-FIM and the Cognitive-FIM.

The items of the FIM are listed as follows:

Motor Domain:

1. Self-care (6 items)

– Eating
– Grooming
– Bathing
– Dressing – Upper body
– Dressing – Lower body
– Toileting

2. Sphincter control (2 items)

– Bladder management
– Bowel management

3. Transfers (3 items)

– Bed/Chair/Wheelchair
– Toilet
– Tub/Shower

4. Locomotion (2 items)

– Walk/Wheelchair
– Stairs

Cognitive Domain:

5. Communication (2 items)

– Comprehension
– Expression

6. Social cognition (3 items)

– Social interaction
– Problem solving
– Memory

For the Motor-FIM, the Eating, Grooming, and Bowel management items are known to be the easiest items for patients with stroke to accomplish, whereas Tub/Shower transfers and Locomotion (Walk/Wheelchair, Stairs) are the most challenging items (Granger, Cotter, Hamilton, & Fiedler, 1993; Grimby, Gudjonsson, Rodhe, Sunnerhagen, Sundh, & Ostensson, 1996). For the Cognitive-FIM, performance of the Expression item has been found to be the easiest for patients to accomplish, and Problem solving has been found to be the most challenging (Granger et al., 1993).

Time:

The FIM is reported to take between 30-45 minutes to administer and score, with 7 minutes to gather demographic information.

Scoring:

Each item on the FIM is scored on a 7-point Likert scaleLikert scaling is one type of response to items in a questionnaire or tool. For example, Likert scaling would have you rate an item such as "I am satisfied with the care I received" on a scale using a 1-to-5 response scale where:
• 1 = strongly disagree
• 2 = disagree
• 3 = undecided
• 4 = agree
• 5 = strongly agree
You will find various options and scaling methods for the number of response choices (1-to-7, 1-to-9, 0-to-4). Odd-numbered scales usually have a middle value that is labelled Neutral or Undecided. Some tools used forced-choice Likert scaling with an even number of responses and no middle neutral or undecided choice., and the score indicates the amount of assistance required to perform each item (1 = total assistance in all areas, 7 = total independence in all areas). The ratings are based on performance rather than capacity and can be acquired by observation, patient interview, telephone interview or medical records. The developers of the FIM recommend that the scoring be derived by consensus with a multi-disciplinary team.

A final summed score is created and ranges from 18 – 126, where 18 represents complete dependence/total assistance and 126 represents complete independence. The single summed raw score may be misleading as it gives the appearance of a continuous scale. However, intervals between scores are not equal in terms of level of difficulty and cannot provide more than ordinal level information (Linacre et al., 1994). Kidd et al. (1995) suggested using the summed scores as though on an interval scale while the individual items remain ordinal. Granger, Deutsch, and Linn (1998) have applied a Rasch rating scale in order to transform the FIM’s ordinal ratings to an equal-interval rating scale so that it can be used for linear regression models.

SubscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
scores for the Motor and Cognitive domains can also be calculated (Linacre, Heinemann, Wright, Granger, & Hamilton, 1994).

Equipment:

Any items that the patient uses to carry out their activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living.

Subscales:

There are two subscales for the FIM: the Motor-FIM and the Cognitive-FIM.

Training:

The FIM must be administered by a trained and certified evaluator.

Grey and Kennedy (1993) found that the FIM could be completed as a self-report questionnaire in patients with spinal cord injury. Segal and Schall (1994) found that the FIM can be used reliably by in-person proxy for patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Segal, Gillard, and Schall (1996) further established that the FIM can be used reliably by proxy over the telephone in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (total FIM, intraclass correlation coefficient (ICC)Intraclass correlation (ICC) is used to measure inter-rater reliability for two or more raters. It may also be used to assess test-retest reliability. ICC may be conceptualized as the ratio of between-groups variance to total variance. = 0.91, Motor-FIM, ICC = 0.94; Cognitive-FIM, ICC = 0.52), and closely resembles results obtained for the in-person administration.

Alternative Forms of the Functional Independence Measure

The Functional Independence Measure for Children (WeeFIM). This measure was developed to track disability in children who are between the ages of 6 months and 7 years. The WeeFIM can be administered to children over the age of 7 if their functional abilities are below those expected of children aged 7 who do not have disabilities. It measures the impact of developmental strengths and difficulties on independence at home, in school, and in the community (Msall et al., 1994). The scale has 18 items measuring functional performance in 3 domains: Self-care, Mobility, and Cognition (Uniform Data System for Medical Rehabilitation, http://www.udsmr.org/).
Modified 5-level FIM. Gosman-Hedström and Blomstrand (2004) examined whether a 5-level FIM would be more useful than the standard 7-level version in large population studies. They used a sample of elderly strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. survivors and found that a 5-level FIM would most likely increase the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the FIM without losing sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
.

Client suitability

Can be used with:

Patients with stroke of all ages, and can be used with patients with special conditions (e.g. aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) or neglect).

Should not be used in:

No restrictions have been reported.

In what languages is the measure available?

The FIM has been translated in the following languages:

German
Italian
Spanish
Swedish
Finnish
Portuguese
Afrikaans
Turkish
French
Persian (Farsi)

Summary

What does the tool measure?	ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. of Daily Living
What types of clients can the tool be used for?	Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., traumatic brain injury, spinal cord injury, multiple sclerosis, and elderly individuals undergoing inpatient rehabilitation. Can be used with children as young as 7 years old.
Is this a screening or assessment tool?	Assessment
Time to administer	The FIM is reported to take between 30-45 minutes to administer and score, with 7 minutes to gather demographic information.
Versions	WEE-FIM; Modified 5-level FIM
Other Languages	German; Italian; Spanish; Swedish; Finnish; Portuguese; Afrikaans; Turkish; French; Persian (Farsi).
Measurement Properties
Reliability	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: Out of four studies examining internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency., all four reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.. Test-retest: Out of five studies examining test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). , all five reported excellent test-retest. Inter-rater: Out of 10 studies examining inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. , eight studies reported excellent; one reported adequate to excellent (except Social Interaction item which was poor); one reported overall poor kappa values, but excellent intraclass correlation coefficient (ICC)Intraclass correlation (ICC) is used to measure inter-rater reliability for two or more raters. It may also be used to assess test-retest reliability. ICC may be conceptualized as the ratio of between-groups variance to total variance..
ValidityThe degree to which an assessment measures what it is supposed to measure.	Content: The FIM was created based on the results of a literature review of published and unpublished measures and expert panels and was then piloted in 11 centers. The Delphi method was applied, using rehabilitation expert opinion to establish the inclusiveness and appropriateness of the items. Criterion: Excellent correlations with the Barthel Index; Modified Rankin Scale; Disability Rating Scale. FIM scores found to predict amount of home care required; admission scores predict FIM discharge scores; placement after discharge; functional gain; length of stay; depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. , ability to return to work following stroke or traumatic brain injury. Concurrent: The Motor-FIM was found to demonstrate an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Modified Rankin Scale (MRS) and the Disability Rating Scale (DRS); and adequate to excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Barthel Index. The Cognition-FIM was found to have an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the DRS; an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Montebello Rehabilitation Factor Score (MRFS) (efficacy); and a poor correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the MRFS (efficiency). Construct: FIM scores discriminated between groups based on spinal cord injury and strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity, and the presence of comorbid illness both at admission and discharge. It has also been found to distinguish between patients with or without neglect and with or without aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury. An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) at both admission and discharge. Convergent/Discriminant: The total FIM was found to demonstrate an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Office of Population Censuses and Surveys Disability Scales disability scores; an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the London Handicap Scale and the Wechsler Adult Intelligence Test-verbal IQ test; and a poor correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the SF-36 Physical and Mental components, and the General Health Questionnaire. The Motor-FIM was found to demonstrate an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Office of Population Censuses and Surveys Disability Scales disability scores; an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the London Handicap Scale; and a poor correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Wechsler Adult Intelligence Test-verbal IQ test, SF-36 Physical and Mental components, and the General Health Questionnaire. The Cogntion-FIM was found to demonstrate an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Mini-Mental State Examination (MMSE); an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Lowenstein Occupational Therapy Cognitive Assessment (LOTCA), Office of Population Censuses and Surveys Disability scores, and the revised Wechsler Adult Intelligence Test-verbal IQ; and a poor correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the London Handicap Scale, SF-36 Physical and Mental components, and the General Health Questionnaire. Ecological: The Motor-FIM demonstrated poor correlations with the Occupational Therapy Adult Perceptual ScreeningTesting for disease in people without symptoms. Test (OT-APST). The Cognition-FIM demonstrated adequate correlations with the OT-APST.
Does the tool detect change in patients?	A significant ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." has been detected with the Cognitive domain of the FIM. Out of seven studies examined, three reported that the FIM has an excellent ability to detect change in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., four reported poor ability to detect change in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. or multiple sclerosis.
Acceptability	The FIM is typically administered by interview. In patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., it can be reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest . administered to proxy respondents.
Feasibility	Training and education of persons to administer the FIM may represent significant cost. Use of interview formats may make the FIM more feasible for longitudinal assessment.
How to obtain the tool?	Click here to find a copy of the FIM (the original comes from the following website: http://www.va.gov/vdl/documents/Clinical/Func_Indep_Meas/fim_user_manual.pdf) http://www.udsmr.org

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the FIM.

Floor/Ceiling Effects

Van der Putten, Hobart, Freeman and Thompson (1999) compared the Motor-FIM and total FIM to the Barthel Index in 201 patients with multiple sclerosis and 82 post-stroke patients undergoing inpatient neurorehabilitation. The Cognitive-FIM had poor ceiling effects in patients with multiple sclerosis (36%) and adequate ceiling effects in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The total FIM showed no ceiling effect (0%) in both patients with stroke and patients with multiple sclerosis, as compared to 7% for the Barthel Index (1% for the Motor-FIM).

Hsueh, Lin, Jeng, and Hsieh (2002) compared the Motor-FIM, the original 10-item Barthel Index, and the 5-item short form Barthel Index in inpatients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. receiving rehabilitation. They reported a substantially larger floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect."
for admission Barthel Index scores than for admission Motor-FIM scores (18.2% vs. 5.8% respectively).

Hobart and Thompson (2001) compared the modified Barthel Index, the FIM and the 30-item FIM plus Functional Assessment Measure (FIM + FAM) in 149 patients with various neurological disorders. No significant floor or ceiling effects were reported in this study for the total FIM, although there was a 16.1% ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." noted for the Cognitive-FIM.

Brock, Goldie, and Greenwood (2002) examined the ceiling effects of the Motor-FIM and the Motor Assessment Scale in 106 rehabilitation inpatients with stroke at discharge. The ceiling effects for the Motor-FIM were adequate (16%), and 29% of the patients achieved the highest score on the hardest item of the Motor-FIM. In comparison, the Motor Assessment Scale had a ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." of 25% (poor) and 35% of patients scored the highest score on the most difficult item.

Dromerick, Edwards, and Diringer (2003) assessed 95 consecutive admissions to a stroke rehabilitation service for disability on admission and discharge. No floor or ceiling effects were reported at admission to or discharge from rehabilitation with the FIM, whereas the Barthel Index demonstrated a large ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." at discharge (27%).

Reliability

Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.:
Dodds, Martin, Stolov and Deyo (1993) examined the psychometric properties of the FIM by analyzing Uniform Data System data on 11,102 general rehabilitation inpatients. Common diagnoses were strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (52%), orthopedic conditions (10%), and brain injury (10%). The FIM demonstrated an excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency., with a Cronbach’s alpha of 0.93 for overall admissions and 0.95 for discharges.

Hsueh, Lin, Jeng, and Hsieh (2002) examined the reliability of the FIM in 118 inpatients with stroke. Patients were administered the Motor-FIM subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
at admission to a rehabilitation ward of a hospital and before discharge from the hospital. The Motor-FIM demonstrated excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency., with an alpha = 0.88 at admission and an alpha = 0.91 at discharge.

Hobart et al. (2001) examined the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the FIM, the Barthel Index and the FIM plus Functional Assessment Measure in 149 rehabilitation inpatients with neurologic disorders. Item-to-total correlations were adequate and ranged from 0.53 to 0.87 for the FIM total, 0.60 for the Motor-FIM and 0.63 for the Cognitive-FIM. Mean inter-item correlations were also adequate, and were reported as 0.51 for the total FIM, 0.56 to 0.91 for the Motor-FIM and 0.72 to 0.80 for the Cognitive-FIM. Cronbach alpha levels were excellent for the total FIM (alpha = 0.95), the Motor-FIM (alpha = 0.95), and for the Cognitive-FIM (alpha = 0.89). The results of this study demonstrate the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the total FIM and its Motor and Cognitive domains.

Sharrack, Hughes, Soudain, and Dunn (1999) assessed the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the FIM in patients with multiple sclerosis. The internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the FIM was excellent, with a Cronbach’s alpha of 0.98.

Test-retest:
Chau, Daler, Andre and Patris (1994) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the FIM in 254 patients under 20 years old in a rehabilitation centre. The test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was found to be excellent (ICC = 0.93 for total FIM).

Segal, Ditunno, and Staas (1993) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the FIM at discharge from an acute care rehabilitation setting and again at admission to an ongoing rehabilitation setting in 57 patients with spinal cord injuries. The two ratings were performed within 6 days of each other. The total FIM demonstrated excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
(r = 0.83).

Kidd et al. (1995) compared the FIM to the Barthel Index in two groups of 25 patients undergoing neurorehabilitation. Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was found to be excellent for the FIM (r = 0.90).

Ottenbacher, Hsu, Granger, and Fiedler (1996) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the FIM by examining the results of 11 studies including a total of 1,568 patients. The median test-retest was excellent (r = 0.95).

Pollak, Rheault, and Stoecker (1996) assessed the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the FIM in 49 individuals over the age of 80 years. Individuals were tested twice using the FIM. Excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was found for the Motor-FIM (ICC = 0.90), and for the Cognitive-FIM (ICC = 0.80).

Intra-rater:
Sharrack, Hughes, Soudain, and Dunn (1999) assessed the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the FIM (using both kappa and ICC statistics) in 35 patients with multiple sclerosis. Three raters followed patients for 9 months, with assessments every 3 months. The kappa value for the total FIM was poor (kappa = 0.28), however the ICC was excellent (ICC = 0.94). For individual items, kappa coefficients ranged from adequate (kappa = 0.55 for Dressing-lower body) to excellent (kappa = 1.00 for both Expression and Social interaction). ICC’s for individual items ranged from adequate (kappa = 0.60 for Bladder control) to excellent (ICC = 1.00 for both Expression and Social interaction).

Hobart et al. (2001) examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the FIM, the Barthel Index and the FIM plus Functional Assessment Measure in 56 rehabilitation inpatients with neurologic disorders. Patients were examined by the same multidisciplinary team on two occasions. Intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
was calculated using ICC statistics. The total FIM, Motor-FIM and Cognitive-FIM were all found to have excellent intra-rater reliabilities (ICC = 0.98, 0.98 and 0.95, respectively).

Inter-rater:
Chau, Daler, Andre and Patris (1994) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the FIM between educators, occupational therapists and physiotherapists in 254 patients under 20 years old in a rehabilitation centre. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
for the total FIM was excellent (ICC = 0.94).

Ottenbacher, Mann, Granger, Tomita, Hurren, and Charvat (1994) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the FIM and the Instrumental ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living Scale in 20 community-dwelling older patients. Two raters administered the tests over a short (7-10 days) or long (4-6 week) interval. The ICCs for inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
were excellent, ranging from 0.90 to 0.99.

Ottenbacher, Hsu, Granger, and Fiedler (1996) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the FIM by examining the results of 11 studies including a total of 1,568 patients. The median inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
for the total FIM was excellent (r = 0.95).

Hamilton, Laughlin, Fiedler and Granger (1994) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the FIM in 1,018 patients. The total FIM ICC was excellent (ICC = 0.96), as was the Motor-FIM domain (ICC = 0.96), and the Cognitive-FIM domain (ICC = 0.91).

Jaworski, Kult, and Boynton (1994) compared the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of observed and reported FIM ratings. In this study, the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the FIM was found to be excellent (ICC = 0.99).

Kidd et al. (1995) compared the FIM to the Barthel Index in two groups of 25 patients undergoing neurorehabilitation. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was found to be excellent for the FIM (r = 0.92).

Segal and Schall (1994) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the FIM in 38 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the measure was found to be excellent, with an ICC of 0.96.

Brosseau and Wolfson (1994) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the FIM in patients with multiple sclerosis and found that the FIM has an excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
(ICC = 0.83).

Daving, Andren, Nordholm, and Grimby (2001) examined the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the FIM in 63 patients with stroke, approximately 2 years after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. onset. Two raters (between three occupational therapists and one nurse) conducted independent ratings of the FIM in the patient’s home, and the interview procedure was repeated within a week by another two raters in the clinic. The kappa values during the same interview exceeded 0.40 for 17 items, demonstrating adequate to excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
, however, the Social interaction item kappa value was poor (kappa = 0.26). In comparing the two interviews, kappa values were between 0.40-0.60 for Self-care items (except Bathing) and Sphincter control (except Bowel management), however, most of the Transfers, Locomotion and Social cogniton items had kappa values below 0.40. The two interviews were also studied using ICC statistics between all raters. ICC’s ranged from adequate (0.62 for Bowel management) to excellent (0.88 for Bathing) for the 13 motor items, and were adequate (ranging from 0.60 to 0.72) for the Cognitive domain, except for the Social interaction item which had an ICC of only 0.44. Significant differences were found between raters on the Wilcoxon testThe Wilcoxon test is a nonparametric test that compares two paired groups. This test calculates and then analyzes the differences between the pairs. The Wilcoxon Rank Sum test is used to determine whether two scores have the same continuous distribution. The Wilcoxon Signed Rank test is suitable to use as an alternative to the paired t-test when the scores are not normally distributed.
for the Dressing, Transfer Toilet, Transfer Tub/Shower, Walk/Wheelchair and the Cognitive domain. The results of this study show that the FIM demonstrates high inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
in the same interview setting (whether at home or at the clinic), however the stability over time with a repeated interview by different raters is less reliable.

Sharrack, Hughes, Soudain, and Dunn (1999) assessed the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the FIM (using both kappa and ICC statistics) in 64 patients with multiple sclerosis. Each patient was assessed by three raters (2 neurologists, 1 neurology research nurseIn charge of, but not limited to, the "assessment and provision of care needs; support and education for patients and families; discharge planning."(Suggested by Philips et al, 2002)
). The kappa value for the total FIM score was poor (kappa = 0.21), however the ICC was excellent (ICC = 0.99). For individual items, kappa coefficients were variable and ranged from poor (kappa = 0.26 for Comprehension) to excellent (kappa = 0.88 for Stairs locomotion). ICC’s for the individual items were excellent, ranging from 0.76 to 0.99 with the exception of the Comprehension item, which demonstrated adequate inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
(ICC = 0.56).

Validity

Content:

The FIM was created based on the results of a literature review of published and unpublished measures and expert panels. To establish content and face validityA form of content validity, face validity is assessed by having 'experts' (this could be clinicians, clients, or researchers) review the contents of the test to see if the items seem appropriate. Because this method has inherent subjectivity, it is typically only used during the initial phases of test construction.
, the FIM was then piloted in 11 centers (including 114 clinicians from 8 different disciplines and 110 patients evaluated) (Keith & Granger, 1987). Face and content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension.
were both determined by applying the Delphi method, using rehabilitation expert opinion to establish the inclusiveness and appropriateness of the items (Granger, Hamilton, Keith, Zielezny, & Sherwins, 1986).

Criterion:

Concurrent:
Hsueh, Lin, Jeng, and Hsieh (2002) examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the Motor-FIM by examining its interrelations with the original 10-item Barthel Index, and the 5-item short form Barthel Index in 118 inpatients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. receiving rehabilitation. Concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
was measured using ICC and Spearman correlations. The Motor-FIM exhibited excellent concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
at admission as measured by Spearman correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
(r = 0.74) and adequate validityThe degree to which an assessment measures what it is supposed to measure.
as measured by ICC (ICC = 0.55). The Motor-FIM exhibited excellent concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
at discharge (Spearman correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
= 0.92, ICC = 0.86).

Kwon, Hartzema, Duncan and Min-Lai (2004) examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the Barthel Index, the FIM and the Modified Rankin Scale in a sample of post-stroke patients. Spearman correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients were excellent between the Barthel Index and the Motor-FIM (r = 0.95) and between the Motor-FIM and the Modified Rankin Scale (r = -0.89).
Note: This correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
is negative because a high score on the FIM indicates functional independence, whereas a high score on the Modified Rankin Scale indicates severe disability).

Hall, Hamilton, Gordon, and Zasler (1993) examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the Disability Rating Scale, the FIM, and the Functional Assessment Measure. Excellent correlations were found between the Motor-FIM and Cognition-FIM and the Disability Rating Scale (r = 0.64 and 0.73, respectively).

Zwecker et al. (2002) examined the relationship between cognitive status and functional motor outcomes in 66 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Functional motor outcomes were measured from efficacy and efficiency of the FIM motor scores (isolated from total FIM scores) and the Montebello Rehabilitation Factor Score (MRFS). Using Pearson’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
was found between the FIM cognitive subtest and MRFS efficacy (r=0.34, p<0.01). A poor correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
was found between the FIM cognitive and MRFS efficiency (r=0.28, p<0.05). No significant correlations were found between the FIM cognitive and FIM motor efficacy or efficiency scores.

Predictive:
For an extensive review of the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the FIM, please see:

Timbeck, R. J., Spaulding, S. J. (2003). Ability of the Functional Independence Measure to predict rehabilitation outcomes after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.: A review of the literature. Physical & Occupational Therapy in Geriatrics, 22(1), 63-76.

Chumney, D., Nollinger, K., Shesko, K., Skop, K., Spencer, M., Newton, R.A. (2010). Ability of Functional Independent Measure to accurately predict functional outcome of stroke-specific population: Systematic reviewA systematic review is a summary of available research on a given topic that compares studies based on design and methods. It summarizes the findings of each, and points out flaws or potentially confounding variables that may have been overlooked. A critical analysis of each study is done in an effort to rate the value of its stated conclusions. The research findings are then summarized, and a conclusion is provided.
. Journal of Rehabilitation and Development, 47, 17-30.

Granger, Cotter, Hamilton and Fiedler (1993) examined whether the FIM could predict the physical care needs (measured in minutes of assistance provided per day by another person in the home) of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Burden of care was assessed as help in minutes per day. It was found that a 1-point improvement in total FIM score predicted a 2.19-minute reduction in help from another person per day. The FIM, along with the Brief Symptom Inventory, was found to contribute to the prediction of patient general life satisfaction.

Corrigan, Smith-Knapp and Granger (1997) examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the FIM for patients with traumatic brain injury after discharge from acute rehabilitation. They found that the Motor-FIM predicted which patients required direct assistance with 83% accuracy, the Cognitive-FIM predicted which patients required supervision with 77% accuracy, and the Motor-FIM and Cognitive-FIM predicted which patients required any assistance with 78% accuracy. Further, the Motor-FIM score alone was the best predictor of the number of minutes of assistance needed.

Predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the FIM with discharge FIM scores, discharge destinations, length of stay, functional gain, depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
, survival, and the ability to return to work following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. or traumatic brain injury:
Inouye et al. (2000) performed a multivariate analysis on data from rehabilitation patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. obtained from patient medical records to identify predictors of functional outcome using total FIM scores. It was found that total FIM admission scores was the strongest predictor of total FIM discharge scores. No relationship was found between total FIM scores at discharge and gender, hospital length of stay, or the nature of the strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Oczkowski and Barreca (1993) examined whether the FIM could predict prognosis in 113 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. observed from admission to discharge. It was found that the admission FIM score was predictive of placement after discharge and of outcome disability. No patients with an admission FIM score below 36 were discharged home, while all of the patients with admission FIM scores above 96 were discharged home. However, discharge destination became difficult to predict in patients with a moderate range of disability (i.e. an FIM score > 36 or < 97). When individual FIM items were considered, a patient’s level of independence with bowel and bladder management was predictive of functional outcome and discharge destination.

Alexander (1994) examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the FIM in a sample of 520 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. admitted to a rehabilitation hospital. It was found that an admission FIM score of < 40 resulted in an acute care stay almost twice as long as any other FIM score. Patients aged < 55 years all were discharged home regardless of their initial severity. Patients with an FIM score < 40 and who were > 55 years old had a 50% chance of being discharged to a long term care facility. This is in contrast to the findings by Oczkowski and Barreca (1993) who found that no patients with an admission FIM score < 36 were discharged to home. Patients with an admission FIM score between 40-60 who were > 74 years were at high risk for discharge to a long term care facility. Patients with an FIM score > 80 were discharged home.

Mokler, Sandstrom, Griffin, Farris, and Jones (2000) found that in the acute care phase of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. recovery, the FIM scores for Eating, Bathing, Dressing – Lower body, Toileting, Bowel management and Social interaction and predicted discharge destination with 70% accuracy. In the later phase of recovery in rehabilitation, particularly in patients with a severe strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., scores on admission FIM items including Bladder management, Toilet transfer, and Memory, and scores on the discharge FIM items including Dressing – Upper body, Bed/Chair/Wheelchair transfers and Comprehension were associated with predicting discharge destination with up to 75% accuracy. These three admission items and three discharge items correctly predicted discharge placement in 2/3 and 3/4 of the cases, respectively.

Black, Soltis, and Bartlett (1999) examined the FIM scores of 234 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. admitted to a rehabilitation facility over a 2-year period. Patients who were discharged home were less likely to have a caregiver who worked (20%) versus patients who were discharged to long-term care (65%). The availability of a non-working family member to provide assistance and supervision was a critical factor related to discharge home. Patients with a discharge FIM score > 80 had a high probability of being discharged home when social factors (e.g. availability of family support and non-working family member) were taken into consideration. Thus, both functional status and social factors, such as family availability and support, are critical elements in predicting the discharge destination of this patient population.

Ring et al. (1997) examined 151 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. admitted to a rehabilitation centre over a 2-year period. They found that admission FIM scores and length of stay were the most significant predictors of functional gain.

Heinemann, Linacre, Wright, Hamilton, and Granger (1994) examined the extent to which functional outcome measures could predict functional status in patients with traumatic brain injury. They report that admission FIM scores were related to discharge function and length of stay. Admission Motor-FIM scores were found to be a stronger predictor of length of stay than Cognitive-FIM scores and accounted for 52% of the variance in discharge motor function. Admission Cognitive-FIM scores accounted for 46% of the variance in discharge cognitive function.

Ween, Mernoff, and Alexander (2000) examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the FIM in 244 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. at an acute rehabilitation centre. It was found that patients with an admission FIM score < 50 were dependent in their self-care activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
upon discharge. Patients who scored < 70, nine days post-stroke, were highly likely to remain functionally dependent at discharge. Patients who scored > 70 were not dependent at discharge and had shorter than average length of stay. Patients who scored between 50 and 70 on the admission FIM had unpredictable outcomes. In terms of discharge destination, patients who were < 60 years old and had an admission FIM score > 70 were strongly associated with home discharge.

Stineman, Fiedler, Granger, and Maislin (1998) examined the records of 26,339 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. discharged from 252 inpatient rehabilitation facilities. They found that patients whose admission FIM scores were > 37 were able to eat, groom, dress their upper bodies and manage their bowels and bladder independently at discharge. Patients who scored > 55 were also able to bathe, dress their lower bodies and transfer onto a bed or chair and toilet. Additionally, most patients who had initial Motor-FIM scores > 62 and whose Cognitive-FIM scores were > 30 gained independence in most tasks, including transferring into the tub and climbing the stairs by the time of discharge. They also found that between 85% and 93% of patients with moderate strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. were discharged home.

Singh et al. (2000) administered the FIM to 81 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. at 1 month, 3 months, and 1 year post-stroke. Using stepwise linear regression, they found that lower total FIM scores at 1-month post-stroke were predictive of higher depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
scores at 3 months post-stroke.

Cifu et al. (1997) compared 49 patients with traumatic brain injury who were employed at one-year follow-up with 83 patients who remained unemployed at one-year. They found that FIM scores at admission to rehabilitation were significantly associated with patients’ employment status one-year post head injury, such that patients who had returned to work one-year later had demonstrated significantly higher scores on the FIM at admission.

Tur, Gursel, Yavuzer, Kucukdeveci and Arasil (2003) examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the FIM in 102 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. admitted to rehabilitation units. The FIM was administered within 72 hours of admission and at discharge. Using a stepwise regression analysis, FIM scores at admission were found to be excellent predictors of FIM scores at discharge (0.90; p<0.001), indicating that the FIM can be used to predict functional recovery in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Whiting, Shen, Hung, Cordato & Chan (2010) examined predictors of 5-year survival in 166 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (mean age 80 years), using the FIM. Using a logistic regression model, lower preadmission FIM scores were found to negatively predict 5-year survival of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (OR 1.04, 95%CI 1.1-2.0, P=0.01). In addition, total FIM scores were found to remain relatively stable from baseline to 5-year follow up in the 5-year survival group, however, FIM cognition scores were lower than baseline scores at the 5-year follow-up.

Alexander (1994) found that patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. who had severe right brain damage had significantly less FIM change than patients with severe left brain damage.

Ring et al. (1997) found that patients with neglect or aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) had significantly higher FIM gains despite lower FIM admission scores. However, these patients also had a much longer length of stay at the hospital. It was also found that 96% of patients with right brain damage without neglect and 88% of patients with right brain damage and neglect were discharged home.

Oczkowski and Barreca (1993) found that patients with any degree of hemianopsiaBlindness in one half of the visual field of one or both eyes.
, parietal neglect, aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada), or cognitive impairment had significantly lower FIM scores than those without these impairments, but unlike the results of Ring et al. (1997), hemianopsiaBlindness in one half of the visual field of one or both eyes.
, side of lesion, neglect and aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) were not predictive of discharge destination.

Katz et al. (2000) examined correlations between the FIM (total, motor and cognitive scores) and the Lowenstein Occupational Therapy Cognitive Assessment (LOTCA – Orientation, Perception, Visuomotor Organisation and Thinking Operations subtest) in two subgroups of adults with right hemisphere strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=40 vs. patients without unilateral spatial neglect, n=21), using Spearman’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
analysis. Measures were taken on admission to and discharge from rehabilitation, and at 6-month follow-up. In the neglect group, adequate correlations were reported between FIM total and FIM motor, and LOTCA Visuomotor Organisation and Thinking Operations (range r=0.48 to -.51) at admission. Adequate to excellent correlations were reported between FIM total and FIM motor, and LOTCA Perception, Visuomotor Organisation and Thinking Operations (range r=0.48 to 0.75) at discharge. Excellent correlations were reported between FIM total and FIM motor and LOTCA Visuomotor Organisation and Thinking Operations tasks (range r=0.61 – 0.77) at follow-up. In the non-neglect group, poor to excellent correlations were reported between FIM cognitive and LOTCA scores (range r=0.05 to -.67) at admission. Moderate to excellent correlations were reported between FIM total and FIM motor, and LOTCA Visuomotor Organisation and Thinking Operations tasks at discharge and follow-up (range r=0.43 to 0.62).
Note: The FIM cognitive was not readministered at discharge or follow-up with this subgroup.

Construct:

Linacre et al. (1994) applied Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
to the admission and discharge FIM scores of 14,799 patients. Two distinct aspects of disability were found within the FIM: Motor and Cognitive function.

Cavanagh, Hogan, Gordon, and Fairfax (2000) suggested that for post-stroke patients, a simple 2-factor model of the FIM may be insufficient to describe disability and may not measure within patient change adequately. The authors suggest that a three-dimensional FIM for patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. be applied, which includes Self-care, Cognitive function, and Toileting as the major grouping of scales. They found that the 2-factor model only accounts for 66% of variance, whereas a 3-factor model accounted for more variance (74.2%).

Convergent/Discriminant:
Hobart et al. (2001) found that the total FIM and Motor-FIM scores correlated more strongly with the Office of Population Censuses and Surveys Disability Scales disability scores (r = 0.82 and 0.84, respectively), London Handicap Scale scores (r = 0.32 and 0.35, respectively), the SF-36 Physical component scores (r = 0.26 and 0.30, respectively) and the revised Wechsler Adult Intelligence Test-verbal IQ test (r = 0.35 and 0.27, respectively), than with measures of mental health status (SF-36 Mental component, r = 0.10 and 0.10, respectively) or psychological distress (General Health Questionnaire, r = 0.13 and r = 0.15, respectively). However, the Cognitive-FIM correlated most strongly with Office of Population Censuses and Surveys Disability scores (r = 0.43) and the revised Wechsler Adult Intelligence Test-verbal IQ scores (r = 0.51) and correlated poorly with the London Handicap Scale (r = 0.11), the SF-36 Physical and Mental components (r = 0.04 and r = 0.08, respectively), and the General Health Questionnaire (r = 0.01).

Giaquinto, Giachetti, Spiridigliozzi and Nolfe (2010) examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the FIM, Hospital Anxiety DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale (HADS) and the World Health Organization Quality of Life scale (WHOQOL-100) in 107 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (mean 5.6 months post-stroke). Assessments were performed at admission and discharge from a two-month rehabilitation program. As measured by Pearson’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients, an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
was found between FIM admission and FIM discharge scores (r=0.656, p<0.0001) and was not significantly influenced by gender. However, correlations between FIM discharge scores and HADS and WHOQOL-100 scores were influenced by gender. Among females an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
was found between FIM discharge and HADS scores (r=-0.315, p<0.02) and FIM discharge and WHOQOL-100 scores (r=0.339, p<0.01), but the correlations among males’ scores were poor (r=0.139 and r=0.147 respectively).

Zwecker et al. (2002) reported an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the FIM cognitive subtest and the Lowenstein Occupational Therapy Cognitive Assessment (LOTCA) (r= 0.471, p<0.001) and an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the FIM cognitive subtest and the Mini Mental State Examination (MMSE) (r=0.666, p<0.001) in 66 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Pearson’s CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
.

Known groups:
Dodds, Martin, Stolov and Deyo (1993) examined the construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the FIM using data from 11,102 general rehabilitation inpatients (52% with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., 10% with orthopedic conditions, 10% with brain injury). FIM scores discriminated between groups based on spinal cord injury and strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity, and the presence of comorbid illness both at admission and discharge. The communication item of the FIM demonstrated most of the observed score difference.

Ring, Feder, Schwartz, and Samuels (1997) examined 151 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. admitted to a rehabilitation centre over a 2-year period. They found that the FIM was able to distinguish between patients with or without neglect and with or without aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) at both admission and discharge.

Ecological validity:

Cooke, McKenna, Fleming & Darnell (2006) examined the ecological validityRefers to the extent to which a measure captures behaviours that are reflective of those that would occur in a natural setting
of the Occupational Therapy Adult Perceptual ScreeningTesting for disease in people without symptoms.
Test (OT-APST) by comparing scores and completion time with the FIM motor and cognitive subtests in a sample of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=208). Significant but poor correlations were reported between FIM motor scores and 6 of the 7 OT-APST subscales (range r=0.26 to 0.41, p<0.01). Significant adequate correlations were reported between FIM cognitive scores and all 7 OT-APST subscales (range r=0.36 to 0.50, p<0.01). Significant poor to adequate negative correlations were also reported between the time taken to complete the FIM motor and cognitive subtests and the OT-APST (r=-0.27 and -0.33 respectively, p<0.01).

Responsiveness

The FIM is often compared to the Barthel Index, because the FIM was developed to be a more comprehensive and responsive measure of disability than the Barthel Index (van der Putten et al., 1999; Hobart & Thompson, 2001; Wallace, Duncan, & Lai, 2002; Hsueh et al., 2002).

Van der Putten et al. (1999) compared the Motor-FIM and total FIM to the Barthel Index in 201 patients with multiple sclerosis and 82 post-stroke patients undergoing inpatient neurorehabilitation. The Motor-FIM and total FIM demonstrated small effect sizes in the expected direction from admission to discharge in patients with multiple sclerosis (ES = 0.34 and ES = 0.30, respectively) and large effect sizes in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (ES = 0.91 and ES = 0.82). The effect sizes for the Cognitive-FIM were not significant (ES = 0) in patients with multiple sclerosis and moderate in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (ES = 0.61). Change scores for all scales in both disease groups were positive, indicating less disability on discharge than admission. Effect sizes on the Barthel Index were similar to those of the FIM in both patient groups, suggesting that the FIM might not have an advantage in terms of its responsivenessThe ability of an instrument to detect clinically important change over time.
to change.

Wallace et al. (2002) compared the responsivenessThe ability of an instrument to detect clinically important change over time.
of the Motor-FIM to the Barthel Index for strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. recovery between 1 and 3 months. The Barthel Index and Motor-FIM exhibited similar responsivenessThe ability of an instrument to detect clinically important change over time.
to change in this patient population (Motor-FIM, ES = 0.28; Standardized Response MeanThe standardized response mean (SRM) is calculated by dividing the mean change by the standard deviation of the change scores.
(SRM) = 0.62; AUC/ROC curve = 0.675).

Hsueh et al. (2002) compared the responsivenessThe ability of an instrument to detect clinically important change over time.
of the Motor-FIM, the original 10-item Barthel Index, and the 5-item short form Barthel Index in inpatients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. receiving rehabilitation. The Barthel Index and Motor-FIM exhibited high responsivenessThe ability of an instrument to detect clinically important change over time.
(SRM = 1.2), indicating significant change.

Dromerick et al. (2003) assessed 95 consecutive admissions to a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. rehabilitation service for disability on admission and discharge. The Modified Rankin Scale and the International StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Trial Measure were compared with the Barthel Index and the FIM. The number of patients for which each scale detected a clinically significant change in disability was determined. The SRM of the FIM was superior to that of the Barthel Index (2.18 versus 1.72) (change from admission to discharge from rehabilitation). The FIM was the most sensitive measure, detecting change in 91/95 subjects, including change in 18 patients in whom the Barthel Index detected no change.

Hobart and Thompson (2001) compared the responsivenessThe ability of an instrument to detect clinically important change over time.
of the modified Barthel Index, the FIM and the 30-item FIM plus Functional Assessment Measure (FIM + FAM) in 149 patients with various neurological disorders. The SRMs for the Barthel Index, the FIM, and the FIM + FAM scales measuring global, motor, and cognitive disability were found to be similar, suggesting that there is no advantage in responsivenessThe ability of an instrument to detect clinically important change over time.
of one measure over another (total FIM, SRM = 0.48; Motor-FIM, SRM = 0.54; Cognitive-FIM, SRM = 0.17).

Sharrack et al. (1999) examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the FIM in 25 patients with multiple sclerosis. Patients were followed for 9 months, with assessments every 3 months. The total FIM demonstrated a poor sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
to change (ES = 0.46). A number of motor items (i.e. Eating, Grooming, Sphincter control, Bed/Chair/Wheelchair and Toilet Transfers, and Locomotion) had small to moderate responsivenessThe ability of an instrument to detect clinically important change over time.
(ES ranged from 0.25 for Toilet Transfer to 0.67 for Bed/Chair/Wheelchair Transfers). None of the cognitive items were responsive to change (ES ranged from 0.00 to 0.19).

Dodds, Martin, Stolov and Deyo (1993) examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the FIM by analyzing the differences between admission and discharge FIM scores from 11,102 general rehabilitation inpatients (with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (52%), orthopedic conditions (10%), and brain injury (10%)). Significant functional gains were detected by the FIM (33% score improvement). The authors conclude that the FIM demonstrates some responsivenessThe ability of an instrument to detect clinically important change over time.
, but its ability to measure change over time needs further examination.

Hammond, Grattan, Sasser, Corrigan, Bushnik, and Zafonte (2001) examined FIM score changes over time in patients with traumatic brain injury. Significant differences in total FIM, Motor-FIM and Cognitive-FIM scores were reported between discharge from rehabilitation and follow-up at one year post-injury. Change between one and two years and one and five years was reported to be distributed across all items with most change observed in cognitive function.

Beninato, Gill-Body, Salles, Stark, Black-Schaffer and Stein (2006) defined the minimal clinically important difference (MCID) when using the FIM in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population. The study included 113 patients from a rehabilitation unit at a long-term acute care hospital. The FIM was administered at admission and discharge; patient function was also assessed by attending physicians at the same time points using a 15-point integer scale where -7 indicated that a patient was “a very great deal worse”, 0 indicated “no change” and +7 indicated “a very great deal better”. Based on physicians’ ratings of clinical change made at discharge, change scores of 22, 17 and 3 for total FIM, motor FIM and cognitive FIM (respectively), were deemed to differentiate patients who demonstrated clinically important change from those who had not. Generalization of results is cautioned as the study only included patients receiving treatment at one centre and patient, caregiver or family assessments were not included in the ratings of important change.

References

Alexander, M. P. (1994). Stroke rehabilitation outcomes: A potential use of predictive variables to establish levels of care. Stroke, 25(1), 128-134.
Black, T. M., Soltis, T., Bartlett, C. (1999). Using the Functional Independence Measure instrument to predict stroke rehabilitation outcomes. Rehabilitation Nursing, 24(3), 109-114, 121.
Beninato, M., Gill-Body, K.M., Salles, S., Stark, P.C., Black-Schaffer, R.M. & Stein, J. (2006). Determination of the Minimal Clinically Important Difference in the FIM instrument in patients with stroke. Archives of Physical Medicine and Rehabilitation, 87, 32-39.
Brosseau, L., Wolfson, C. (1994). The inter-rater reliability and construct validity of the Functional Independence Measure for multiple sclerosis subjects. Clin Rehabi, 8, 107-115.
Brock, K. A., Goldie, P. A., Greenwood, K. M. (2002). Evaluating the effectiveness of stroke rehabilitation: Choosing a discriminative measure. Arch Phys Med Rehabil, 83(1), 92-99.
Cavanagh, S. J., Hogan, K., Gordon,V., Fairfax, J. (2000). Stroke-specific FIM models in an urban population. Journal of Neurological Nursing, 32, 17-21.
Chau, N., Dalter, S., Andre, J. M., Patris, A. (1994). Inter-rater agreement of two functional independence scales: The Functional Independence Measure (FIM) and a subjective uniform continuous scale. Disabil Rehabil, 16(2), 63-71.
Cifu, D., Keyser-Marcus, L., Lopez, E., Wehman, P., Kreutzer, J., Englander, J., High, W. (1997). Acute predictors of successful return to work 1 year after traumatic brain injury: A multicenter analysis. Archives of Physical Medicine and Rehabilitation, 78(2), 125-131.
Cooke, D. M., McKenna, K., Fleming, J. & Darnell, R. (2006). Construct and ecological validity of the Occupational Therapy Adult Perceptual Screening Test (OT-APST). Scandinavian Journal of Occupational Therapy, 13, 49-61.
Corrigan, J. D., Smith-Knapp, K., Granger, C. V. (1997). Validity of the functional independence measure for persons with traumatic brain injury. Arch Phys Med Rehabil, 78(8), 828-834.
Daving, Y., Andren, E., Nordholm, L., Grimby, G. (2001). Reliability of an interview approach to the Functional Independence Measure. Clin Rehabil, 15(3), 301-310.
Demers, L., Giroux, F. (1997). Validite de la Measure de l’independeance fonctionelle (MIF) pour les personnes agees suivies en readaptation. Canadian Journal on Aging/La revue canadienne du vieillissement, 16(4), 626-646.
Dodds, T. A., Martin, D. P., Stolov, W. C., Deyo, R. A. (1993). A validation of the functional independence measurement and its performance among rehabilitation inpatients. Arch Phys Med Rehabil, 74(5), 531-536.
Dromerick, A. W., Edwards, D. F., Diringer, M. N. (2003). Sensitivity to changes in disability after stroke: A comparison of four scales useful in clinical trials. Journal of Rehabilitation Research and Development, 40, 1-8.
Fourn, L., Brosseau, L., Dassa, C., Dutil, E. (1994). Validation factorielle de la Mesure de l’independence functionelle (MIF) aupres de personnes atteintes de la sclerose en plaques. Journal de Readaptation Medicale, 14, 7-16.
Gosman-Hedstrom, G., Blomstrand, C. (2004). Evaluation of a 5-level Functional Independence Measure in a longitudinal study of elderly stroke survivors. Disability & Rehabilitation, 26(7), 410-418.
Granger, C. V., Deutsch, A., Linn, R. T. (1998). Rasch analysis of the Functional Independence Measure (FIM) Mastery Test. Arch Phys Med Rehabil, 79(1), 52-57.
Granger, C. V., Hamilton, B. B., Keith, R. A., Zielezny, M., Sherwins, F. S. (1986). Advance in functional assessment for medical rehabilitation. Top Geriatr Rehabil, 1, 59-74.
Granger, C. V., Cotter, A. C., Hamilton, B. B., Fiedler, R. C., Hens, M. M. (1990). Functional assessment scales: A study of persons with multiple sclerosis. Arch Phys Med Rehabil, 71, 870-875.
Granger, C. V., Cotter, A. C., Hamilton, B. B., Fiedler, R. C. (1993). Functional assessment scales: A study of persons with stroke. Arch Phys Med Rehabil, 74(2), 133-138.
Granger, C. V., Hamilton, B. B., Fiedler, R. C. (1992). Discharge outcomes after stroke rehabilitation. Stroke, 23(7), 978-982.
Grey, N., Kennedy, P. (1993). The Functional Independence Measure: a comparative study of clinician and self rating. Paraplegia, 31, 457-461.
Grimby, B., Gudjonsson, G., Rodhe, M., Sunnerhagen, K. S., Sundh, V., Ostensson, M. L. (1996). The Functional Independence Measure in Sweden: Experience for outcome measurement in rehabilitation medicine. Scandinavian Journal of Rehabilitation Medicine, 28, 51-62.
Hall, K. M., Hamilton, B., Gordon, W. A., Zasler, N. D. (1993). Characteristics and comparisons of functional assessment indices: Disability Rating Scale, Functional Independence Measure and Functional Assessment Measure. J Head Trauma Rehabil, 8(2), 60-74.
Hall, K. M., Mann, N., High, W., Wright, J., Kreutzer, J., Wood, D. (1996). Functional measures after traumatic brain injury: ceiling effects of FIM, FIM+FAM, DRS and CIQ. J Head Trauma Rehabil, 11(5), 27-39.
Hammond, F. M., Grattan, K. D., Sasser, H., Corrigan, J. D., Bushnik, T., Zafonte, R. D. (2001). Long-term recovery course after traumatic brain injury: A comparison of the Functional Independence Measure and Disability Rating Scale. Journal of Head Trauma, 16(4), 318-329.
Hamilton, B. B., Laughlin, J. A., Fiedler, R. C., Granger, C. V. (1995). Interrater reliability of the 7-level functional independence measurement (FIM). Scand J Rehabil Med, 27, 253-256.
Hamilton, B. B., Laughlin, J. A., Fiedler, R. C., Granger, C. V. (1994). Interrater reliability of the 7-level functional independence measure (FIM). Scand J Rehabil, 26, 115-119.
Heinemann, A. W., Linacre, J. M., Wright, B. D., Hamilton, B. B., Granger, C. (1994). Prediction of rehabilitation outcomes with disability measures. Arch Phys Med Rehabil, 75(2), 133-143.
Hobart, J. C., Lamping, D. L., Freeman, J. A., Langdon, D. W., McLellan, D. L., Greenwood, R. J., Thompson, A. J. (2001).Which disability scale for neurologic rehabilitation? Neurology, 57, 639-644.
Hobart, J. C., Thompson, A. J. (2001). The five item Barthel index. J Neurol Neurosurg Psychiatry, 71, 225-230.
Hsueh, I-P., Lin, J-H., Jeng, J-S., Hsieh, C-L. (2002). Comparison of the psychometric characteristics of the functional independence measure, 5 item Barthel index, and 10 item Barthel index in patients with stroke. Journal of Neurology Neurosurgery and Psychiatry, 73, 188-190.
Inouye, M., Kishi, K., Ikeda, Y., Takada, M., Katoh, J., Iwahasi, M., Hayakawa, M., Ishihara, K., Sawamura, S., Kazumi, T. (2000). Prediction of functional outcome after stroke rehabilitation. American Journal of Physical Medicine and Rehabilitation, 79(6), 513-518.
Jaworski, D. M., Kult, T., Boynton, P. R. (1994). The Functional Independence Measure: A pilot study comparison of observed and reported ratings. Rehabil Nur Res, 3, 141-147.
Katz, N., Hartman-Meier, A., Ring, H. & Soroker, N. (2000). Relationships of cognitive performance and daily function of clients following right hemisphere stroke: Predictive and ecological validity of the LOTCA battery. Occupational Therapy Journal of Research, 20, 3-17.
Keith, R. A., Granger, C. V., Hamilton, B. B., Sherwin, F. S. (1987). The functional independence measure: A new tool for rehabilitation. Adv Clin Rehabil, 1, 6-18.
Kidd, D., Stewart, G., Baldry, J., Johnson, J., Rossiter, D., Petruckevitch, A., Thompson, A. J. (1995). The Functional Independence Measure: A comparative validity and reliability study. Disabil Rehabil, 17(1), 10-14.
Küçükdeveci, A. A., Yavuzer, G., Elhan, A. H., Sonel, B. (2001). Adaptation of the Functional Independence Measure for use in Turkey. Clinical Rehabilitation, 15(3), 311-319.
Kwon, S., Hartzema, A. G., Duncan, P. W., Min-Lai, S. (2004). Relationship among the Barthel Index, the Functional Independence Measure, and the Modified Rankin Scale. Stroke, 35, 918-923.
Linacre, J. M., Heinemann, A. W., Wright, B. D., Granger, C. V., Hamilton, B. B. (1994). The structure and stability of the Functional Independence Measure. Arch Phys Med Rehabil, 75(2), 127-132.
McDowell, I., Newell, C. (1996). Measuring health: a guide to rating scales and questionnaires (pp. 63-67). (2nd Ed.), New York: Oxford University Press.
Mokler, P. J., Sandstrom, R., Griffin, M., Farris, L., Jones, C. (2000). Predicting discharge destination for patients with severe motor stroke: Important Functional Tasks. Neurorehabilitation and Neural Repair, 14(3), 181-185.
Msall, M. E., DiGaudio, K., Rogers, B. T., LaForest, S., Catanzaro, N. L., Campbell, J., Wilczenski, F., Duffy, L. C. (1994). The Functional Independence Measure for Children (WeeFIM). Conceptual basis and pilot use in children with developmental disabilities. Clin Pediatr (Phila), 33(7), 421-430.
Naghdi S, Ansari NN, Raji P, Shamili A, Amini M, Hasson S. Cross-cultural validation of the Persian version of the Functional Independence Measure for patients with stroke. Disabil Rehabil. 2016;38(3):289-98. doi: 10.3109/09638288.2015.1036173. https://www.ncbi.nlm.nih.gov/pubmed/25885666
Oczkowski, W. J., Barreca, S. (1993). The functional independence measure: Its use to identify rehabilitation needs in stroke survivors. Arch Phys Med Rehabil, 74(12), 1291-1294.
Ottenbacher, K. J., Hsu, Y., Granger, C. V., Fiedler, R. C. (1996). The reliability of the functional independence measure: a quantitative review. Arch Phys Med Rehabil, 77(12), 1226-1232.
Ottenbacher, K. J., Mann, W. C., Granger, C. V., Tomita, M., Hurren, D., Charvat, B. (1994). Inter-rater agreement and stability of functional assessment in the community-based elderly. Arch Phys Med Rehabil, 75(12), 1297-1301.
Pollak, N., Rheault, W., Stoecker, J. L. (1996). Reliability and validity of the FIM for persons aged 80 years and above from a multilevel continuing care retirement community. Arch Phys Med Rehabil, 77(10), 1056-1061.
Ring, H., Feder, M., Schwartz, J., Samuels, G. (1997). Functional measures of first-stroke rehabilitation inpatients: Usefulness of the Functional Independence Measure total score with a clinical rationale. Arch Phys Med Rehabil, 78(6), 630-635.
Segal, M. E., Schall, R. R. (1994). Determining functional/health status and its relation to disability in stroke survivors. Stroke, 25, 2391-2397.
Segal, M. E., Ditunno, J. F., Staas, W. E. (1993). Interinstitutional agreement of individual functional independence measure (FIM) items measured at two sites on one sample of SCI patients. Paraplegia, 31(10), 622-631.
Segal, M. E., Gillard, M., Schall R. (1996). Telephone and in-person proxy agreement between stroke patients and caregivers for the functional independence measures. Am J Phys Med Rehabil, 75(3), 208-212.
Sharrack, B., Hughes, R. A., Soudain, S., Dunn, G. (1999). The psychometric properties of clinical rating scales used in multiple sclerosis. Brain, 122(1), 141-159.
Singh, A., Black, S. E., Herrmann, N., Leibovitch, F. S., Ebert, P. L., Lawrence, J., Szalai, J. P. (2000). Functional and neuranatomic correlations in poststroke depression. Stroke, 31, 637-644.
Stineman, M. G., Fiedler, R. C., Granger, C. V., Maislin, G. (1998). Functional task benchmarks for stroke rehabilitation. Archives of Physical Medicine and Rehabilitation, 79, 497-504.
Teasell, R., Foley, N. C., & Salter K. (2011). EBRSR: Evidence-Based Review of Stroke Rehabilitation. 13th ed. London (ON): EBRSR.
Timbeck, R. J., Spaulding, S. J. (2003). Ability of the Functional Independence Measure to predict rehabilitation outcomes after stroke: A review of the literature. Physical & Occupational Therapy in Geriatrics, 22(1), 63-76.
Tur, B.S., Gursel, Y.K., Yavuzer, G., Kucukdeveci, A. & Arasil, T. (2003). Rehabilitation outcome of Turkish stroke patients: In a team approach setting. International Journal of Rehabilitation Research, 26, 271-277.
van der Putten, J. J., Hobart, J. C., Freeman, J. A., Thompson, A. J. (1999). Measuring change in disability after inpatient rehabilitation: Comparison of the responsiveness of the Barthel Index and the Functional Independence Measure. Neurol Neurosurg Psychiatry, 66, 480-484.
Wallace, D., Duncan, P. W., Lai, S. M. (2002). Comparison of the responsiveness of the Barthel Index and the motor component of the Functional Independence Measure in stroke: the impact of using different methods for measuring responsiveness. J Clin Epidemiol, 55, 922-928.
Ween, J. E., Mernoff, S. T., Alexander, M. P. (2000). Recovery rates after stroke and their impact on outcome prediction. Neurorehabilitation and Neural Repair, 14(3), 229-235.
Whiting, R., Shen, Q., Hung, W.T., Cordato, D. & Chan, D.K.Y (2010). Predictors for 5-year survival in a prospective cohort of elderly stroke patients. Acta Neurologica Scandinavica, DOI: 10.1111/j.1600-0404.2010.01476.
Zwecker, M., Levenkrohn, S., Fleisig, Y., Zeilig, G., Ohry, A., & Adunsky, A. (2002). Mini-Mental State Examination, cognitive FIM instrument, and the Loewenstein Occupational Therapy Cognitive Assessment: relation to functional outcome of stroke patients. Archives of Physical Medicine and Rehabilitation, 83, 342-345.

See the measure

Click here to find a copy of the FIM (the original comes from the following website: http://www.va.gov/vdl/documents/Clinical/Func_Indep_Meas/fim_user_manual.pdf)

http://www.udsmr.org

Multiple Errands Test (MET)

Evidence Reviewed as of before: 08-05-2013

Author(s)*: Valérie Poulin, OT, PhD candidate; Annabel McDermott, OT

Editor(s): Nicol Korner-Bitensky, PhD OT

Expert Reviewer: Deirdre Dawson, PhD OT

Purpose

The Multiple Errands Test (MET) evaluates the effect of executive function deficits on everyday functioning through a number of real-world tasks (e.g. purchasing specific items, collecting and writing down specific information, arriving at a stated location). Tasks are performed in a hospital or community setting within the constraints of specified rules. The participant is observed performing the test and the number and type of errors (e.g. rule breaks, omissions) are recorded.

In-Depth Review

Purpose of the measure

The Multiple Errands Test was developed by Shallice and Burgess in 1991. The measure was intended to evaluate a patient’s ability to organize performance of a number of simple unstructured tasks while following several simple rules.

See Alternative Forms sections below for information regarding other versions.

Features of the measure

Items:

The original Multiple Errands Test (Shallice and Burgess, 1991) was comprised of 8 items: 6 simple tasks (e.g. buy a brown loaf of bread, buy a packet of throat pastilles), 1 task that is time-dependent, and 1 that comprises 4 subtasks (see Description of tasks, below). It should be noted that the MET was originally devised in an experimental context, rather than as a formal assessment.

Description of tasks:

The original Multiple Errands Test (Shallice and Burgess, 1991) was comprised of 8 written tasks to be completed in a pedestrian shopping precinct. Tasks and rules are written on a card provided to the participant before arriving at the shopping precinct. Of the 8 tasks, 6 are simple (e.g. buy a brown loaf of bread, buy a packet of throat pastilles), the 7th requires the participant to be at a particular place 15 minutes after starting the test, and the 8th is more demanding as it comprises 4 sets of information that the participant must obtain and write on a postcard:

the name of the shop most likely to have the most expensive item;
the price of a pound of tomatoes;
the name of the coldest place in Britain yesterday; and
the rate of the exchange of the French franc yesterday.

The card also includes instructions and rules, which are repeated to the participant on arrival at the shopping precinct:

“You are to spend as little money as possible (within reason) and take as little time as possible (without rushing excessively). No shop should be entered other than to buy something. Please tell one or other of us when you leave a shop what you have bought. You are not to use anything not bought on the street (other than a watch) to assist you. You may do the tasks in any order.“

Scoring:

The participant is observed performing the test and errors are recorded according to the following categorizations:

Inefficiencies: where a more effective strategy could have been applied
Rule breaks: where a specific rule (either social or explicitly mentioned in the task) is broken
Interpretation failure: where requirements of a particular task are misunderstood
Task failure: where a task is either not carried out or not completed satisfactorily.

Time taken to complete the assessment is recorded and the total number of errors is calculated.

Alternative versions of the Multiple Errands Test

Different versions of the MET were developed for use in specific hospitals (MET – Hospital Version and Baycrest MET), a small shopping plaza (MET – Simplified Version), and a virtual reality environment (Virtual MET). For each of these versions, 12 tasks must be performed (e.g. purchasing specific items and collecting specific information) while following several rules.

MET – Hospital Version (MET-HV – Knight, Alderman & Burgess, 2002)

The MET-HV was developed for use with a wider range of participants than the original version by adopting more concrete rules and simpler tasks. Clients are provided with an instruction sheet that explicitly directs them to record designated information. Clients must achieve four sets of simple tasks, with a total of 12 separate subtasks:

The client must complete six specific errands (purchase 3 items, use the internal phone, collect an envelope from reception, and send a letter to an external address).
The client must obtain and write down four items of designated information (e.g. the opening time of a shop on Saturday).
The client must meet the assessor outside the hospital reception 20 minutes after the test had begun and state the time.
The client must inform the assessor when he/she finishes the test.

The MET-HV uses 9 rules in order to reduce ambiguity and simplify task demands (Knight et al., 2002). Errors are categorized according to the same definitions as the original MET. The test is preceded by (a) an efficiency question rated using an end-point weighted 10-point Likert scale (“How efficient would you say you were with tasks like shopping, finding out information, and meeting people on time?“); and (b) a familiarity question rated using a 4-point scale (“How well would you say you know the hospital grounds?“). On completion the client answers a question rated using a 10-point scale (“How well do you think you did with the task?“).

MET – Simplified Version (MET-SV – Alderman, Burgess, Knight & Henman, 2003)

The MET-SV includes four sets of simple tasks analogous to those in the original MET, however the MET-SV incorporates 3 main modifications to the original version:

More concrete rules to enhance task clarity and reduce likelihood of interpretation failures;
Simplification of task demands; and
Space provided on the instruction sheet for the participant to record the information they were required to collect.

The MET-SV has 9 rules that are more explicit than the original MET and are clearly presented on the instruction sheet.

Baycrest MET (BMET – Dawson, Anderson, Burgess, Cooper, Krpan & Stuss, 2009)

The BMET was developed with an identical structure to the MET-HV, except that some items, information and a meeting place are specific to the testing environment (Baycrest Center, Toronto). The BMET comprises 12 items and 8 rules. The test manual provides explicit instructions including collecting test materials, language to be used in describing the test, and a pretest section to ensure participants understand the tasks. Scoring was standardized to allow for increased usability. The score sheet allows identification of specific task errors or omissions, other inefficiencies, rule breaks and strategy use (please contact the authors for further details regarding the manual: ddawson@research.baycrest.org).

Virtual MET (VMET – Rand, Rukan, Weiss & Katz, 2009)

The VMET was developed within the Virtual Mall, a functional video-capture virtual shopping environment that consists of a large supermarket with 9 aisles. The system includes a single camera that films the user and displays his/her image within the virtual environment. The VMET is a complex shopping task that includes the same number of tasks (items to be bought and information to be obtained) as the MET-HV. However, the client is required to check the contents of the shopping cart at a particular time instead of meeting the tester at a certain time. Virtual reality enables the assessor to objectively measure the client’s behaviour in a safe, controlled and ecologically valid environment. It enables repeated learning trials and adaptability of the environment and task according to the client’s needs.

What to consider before beginning:

The MET is performed in a real-world shopping area that allows for minor unpredicted events to occur.

Time:

The BMET takes approximately 60 minutes to administer (Dawson et al., 2009).

Training requirements:

It is advised that the assessor reads the test manual and becomes familiar with the procedures for test administration and scoring.

Equipment:

Access to a shopping precinct or virtual shopping environment
Pen and paper
Instruction sheet (according to version being used)

Client suitability

Can be used with:

The MET has been tested on populations with acquired brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Should not be used with:

The MET cannot be administered to patients who are confined to bed.
Participants require sufficient language skills.
Some tasks may need to be adapted depending on the rehabilitation setting.

In what languages is the measure available?

The MET was developed in English.

Summary

What does the tool measure?	The effect of executive function deficits on everyday functioning.
What types of clients can the tool be used for?	The Multiple Errands Test can be used with, but is not limited to, clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screening or assessment tool?	Assessment
Time to administer	Baycrest MET: approximately 60 minutes (Dawson et al., 2009).
Versions	Multiple Errands Test (MET) (Shallice and Burgess, 1991) MET – Simplified Version (MET-SV) (Alderman et al., 2003) MET – Hospital Version (MET-HV) (Knight, Alderman & Burgess, 2002) Virtual MET (Rand, Rukan, Weiss & Katz, 2009) Baycrest MET (Dawson et al., 2009) Modified version of the MET-SV and MET-HV (including 3 alternate versions) (Novakovic-Agopian et al., 2011, 2012)
Other Languages	N/A
Measurement Properties
Reliability	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: One study reported adequate internal consistency of the MET-HV in a sample of patients with chronic acquired brain injury including stroke. Test-retest: No studies have reported on the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the MET with a population of patients with stroke. Intra-rater: No studies have reported on the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the MET with a population of patients with stroke. Inter-rater: – One study reported excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the MET-HV in a sample of patients with chronic acquired brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. – One study reported adequate to excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the BMET in a sample of patients with acquired brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
ValidityThe degree to which an assessment measures what it is supposed to measure.	Criterion: Concurrent: No studies have reported on the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the MET in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population. Predictive: One study examined predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the MET-HV with a sample of patients with acquired brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and reported poor to adequate correlations between discharge MET-HV performance and community participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. measured by the Mayo-Portland Adaptability Inventory (MPAI-4). Construct: Convergent/Discriminant: – Three studies* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the MET-HV and reported excellent correlations with the Modified Wisconsin Card Sorting Test (MWCST), Behavioural Assessment of Dysexecutive Syndrome battery (BADS), Dysexecutive questionnaire (DEX), IADL questionnaire and FIM Cognitive score; and an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Rivermead Behavioural Memory Test (RBMT). – One study* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the MET-SV and reported adequate correlations with the Weschler Adult Intelligence Scale – Revised Full Scale IQ (WAIS-R FSIQ), MWCST, BADS and Cognitive Estimates test; and poor to adequate correlations with the DEX. – One study* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the BMET and reported adequate to excellent correlations with the Sickness Impact Profile and Assessment of Motor and Process Skills. – Three studies* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the VMET and reported excellent correlations with the MET-HV, BADS, IADL questionnaire, Semantic Fluencies test, Tower of London test, Trail Making Test, Corsi’s supra-span test, Street’s Completion Test and the Test of Attentional Performance. Note: Correlations between the MET and other measures of everyday executive functioning and IADLs used in these studies also provide support for the ecological validityRefers to the extent to which a measure captures behaviours that are reflective of those that would occur in a natural setting of the MET. Known Groups:* – Two studies reported that the MET-HV is able to differentiate between individuals with acquired brain injury (including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.) vs. healthy adults, and between healthy older adults vs. healthy younger adults. – One study reported that the MET-SV is able to differentiate between clients with brain injury including stroke vs. healthy adults. – One study reported that the BMET is able to differentiate between clients with stroke vs. healthy adults. – Three studies reported that the VMET is able to differentiate between clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. vs. healthy adults, and between healthy older adults vs. healthy younger adults. Sensitivity/Specificity: – One study reported 85% sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." and 95% specificity when using a cut-off score ≥ 7 errors on the MET-HV with clients with chronic acquired brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. – One study reported 82% sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." and 95.3% specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative). when using a cut-off score ≥ 12 errors on the MET-SV with clients with brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Floor/Ceiling Effects	No studies have reported on the floor/ceiling effects of the MET.
Does the tool detect change in patients?	ResponsivenessThe ability of an instrument to detect clinically important change over time. of the MET has not been formally evaluated, however: – One study used a modified version of the MET-HV and MET-SV to measure change following intervention; – One study used the MET-HV and the VMET to detect change in multi-tasking skills of clients with stroke following intervention.
Acceptability	The MET provides functional assessment of executive function as it enables clients to participate in real-world activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. .
Feasibility	Administration of the MET requires access to a shopping area and so is not always feasible in a typical clinical setting. Some tasks may need to be adapted depending on the rehabilitation setting. Administration time can be lengthy. Ecological validityRefers to the extent to which a measure captures behaviours that are reflective of those that would occur in a natural setting is supported.
How to obtain the tool?	The Baycrest MET can be obtained at https://cognitionandeverydaylifelabs.com/multiple-errands-test/

Psychometric Properties

Overview

A literature search was conducted to identify publications on the psychometric properties of the Multiple Errands Test (MET) relevant to a population of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Of the 10 studies reviewed, 8 included a mixed population of patients with acquired brain injury including stroke. Studies have reviewed psychometric properties of the original MET, Hospital Version (MET-HV), Simplified Version (MET-SV), Baycrest MET (BMET) and Virtual MET (VMET), as indicated in the summaries below. While research indicates that the MET demonstrates adequate validityThe degree to which an assessment measures what it is supposed to measure.
and reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
in populations with acquired brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., further research regarding responsivenessThe ability of an instrument to detect clinically important change over time.
of the measure is warranted.

Floor/Ceiling Effects

No studies have reported on floor/ceiling effects of the MET with a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.

Reliability

Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.:
Knight, Alderman & Burgess (2002) calculated internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MET-HV in a sample of 20 patients with chronic acquired brain injury (traumatic brain injury, n=12; strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., n=5, both TBI and stroke, n=3) and 20 healthy control subjects matched for gender, age and IQ, using Cronbach’s alpha. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. was adequate (α=0.77).

Inter-rater:
Knight, Alderman & Burgess (2002) calculated inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the MET-HV error categories in a sample of 20 patients with chronic acquired brain injury (traumatic brain injury, n=12; strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., n=5, both TBI and stroke, n=3) and 20 healthy control subjects matched for gender, age and IQ, using intraclass correlation coefficients. Participants were scored by 2 assessors. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was excellent (ICC ranging from 0.81-1.00). The ‘rule breaks’ error category demonstrated the strongest inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
(ICC=1.00).

Dawson, Anderson, Burgess, Cooper, Krpan and Stuss (2009) examined inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the BMET with clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=14) or traumatic brain injury (n=13) and healthy matched controls (n=25), using Intraclass CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
Coefficients and 2-way random effects models. Participants were scored by two assessors. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was adequate to excellent for the five summary measures used: mean number of tasks completed accurately (ICC = 0.80), mean number of rules adhered to (ICC = 0.71), mean number of total errors (ICC = 0.82), mean number of total rules broken (ICC = 0.88) and mean number of requests for help (ICC = 0.71).

Validity

Content:

Shallice & Burgess (1991) evaluated the MET in a sample of 3 patients with traumatic brain injury (TBI) who demonstrated above-average performance on measures of general ability and normal or near-normal performance on frontal lobe tests, and 9 age- and IQ-matched controls. Participants were monitored by two observers and were scored according to number of errors (inefficiencies, rule breaks, interpretation failures, task failures and total score) and qualitative observation. The patients demonstrated qualitatively and quantitatively impaired performance, particularly relating to rule breaks and inefficiencies. The most difficult subtest was the least sensitive part of the procedure and presented difficulties for both patients and control subjects.

Criterion:

Concurrent:
No studies have reported on the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the MET in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. population.

Predictive:
Maier, Krauss & Katz (2011) examined predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the MET-HV in relation to community participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. with a sample of 30 patients with acquired brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=19). Community participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. was measured using the Mayo-Portland Adaptability Inventory (MPAI-4) ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. Index (M2PI), completed by the participant and a significant other. The MET-HV was administered 1 week prior to discharge from rehabilitation and the M2PI was administered at 3 months post-discharge. Analyses were performed using Pearson correlation analysis and partial correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
controlling for cognitive status using FIM Cognitive scores. Predictably, higher MET-HV error scores correlated with more restrictions in community participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations.. There were adequate correlations between participants’ and significant others’ M2PI total score and MET-HV total error score (r = 0.403, 0.510 respectively), inefficiencies (r = 0.353, 0.524 respectively) and rule breaks (r = 0.361, 0.449 respectively). The ability for the MET total error score to predict the M2PI significant other score remained significant but poor following partial correction controlling for cognitive status using FIM Cognitive scores (r = 0.212).

Construct:

Convergent/Discriminant:
Knight, Alderman & Burgess (2002)* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MET-HV by comparison with tests of IQ and cognitive functioning, traditional frontal lobe tests and ecologically sensitive executive function tests, in a sample of 20 patients with chronic acquired brain injury (traumatic brain injury, n=12; stroke, n=5, both TBI and strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., n=3). Tests of IQ and cognitive functioning included the National Adult Reading Test – Revised Full Scale Intelligence Quotient (NART-R FSIQ), Weschler Adult Intelligence Scale – Revised Full Scale Intelligence Quotient (WAIS-R FSIQ), Adult Memory and Information Processing Battery (AMIPB), Rivermead Behavioural Memory Test (RBMT) and Visual Objects and Space Perception battery (VOSP). Frontal lobe tests included verbal fluency, the Cognitive Estimates Test (CET), Modified Card Sorting Test (MCST), Tower of London Test (TOLT) and versions of the hand manipulation and hand alternation tests. Ecologically sensitive executive function tests included the Behavioural Assessment of the Dysexecutive Syndrome battery (BADS) and the Test of Everyday Attention (TEA) Map Search and Visual Elevator tasks. The Dysexecutive (DEX) questionnaire was also used, although proxy reports were used rather than self-reports due to identified lack of insight of individuals with brain injury. There were excellent correlations between the MCST percentage perseverative errors with MET-HV rule breaks (r=0.66) and MET-HV total errors (r=0.67) following Bonferroni adjustment. There were excellent correlations between the BADS Profile score and the MET-HV task failures (r = -0.58), interpretation failures (r = 0.64) and total errors (r = -0.57). There was an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the DEX intentionality factor and MET-HV task failures (r = 0.70). In addition, the relationship between the MET-HV and DEX was re-evaluated to control for possible confounding effects; controlling variables age, familiarity and memory function with respect to MET-HV task failures resulted in excellent correlations with the DEX total score (r = 0.79) and DEX inhibitionThe ability to suppress automatic actions that are inappropriate in a given context that interfere with a certain behavior (Grieve & Gnanasekaran, 2008)
(r = 0.69), intentionality (r = 0.76) and executive memory (r = 0.67) factors. There was an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the RBMT Profile Score and the MET-HV number of task failures (r=-0.57). There were no significant correlations between the MET and other tests of IQ and cognitive functioning (MET-HV, NART-R FSIQ, WAIS-R FSIQ, AMIPB, VOSP), and other frontal lobe tests (verbal fluency, CET, TOLT, hand manipulation and hand alternation tests), other ecologically sensitive executive function tests (TEA Map Search and Visual Elevator tasks) or other DEX factors (positive affect, negative affect).
Note: Initial correlations were measured using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient and significance levels were subsequently adjusted by Bonferroni adjustment to account for multiple comparisons; results reported indicate significant correlations following Bonferroni adjustment.

Rand, Rukan, Weiss & Katz (2009a)* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MET-HV by comparison with measures of executive function and IADLs with a sample of 9 patients with subacute or chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Spearman correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Executive function was measured using the BADS Zoo Map test and IADLs were measured using the IADL questionnaire. There were excellent negative correlations between the BADS Zoo Map and MET-HV outcome measures of total number of mistakes (r = -0.93), partial mistakes in completing tasks (r = -0.80), non-efficiency mistakes (r = -0.86) and time to complete the MET (r = -0.79). There were excellent correlations between the IADL questionnaire and the MET-HV number of mistakes of rule breaks (r = 0.80) and total number of mistakes (r = -0.76).

Maier, Krauss & Katz (2011)* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MET-HV by comparison with the FIM Cognitive score with a sample of 30 patients with acquired brain injury including stroke (n=19), using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
analysis. There was an excellent negative correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between MET-HV total errors score and FIM Cognitive score (r = -0.67).

Alderman, Burgess, Knight and Henman (2003)* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the MET-SV by comparison with tests of IQ, executive function and everyday executive abilities with 50 clients with brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=9). Neuropsychological tests included the WAIS-R FSIQ, BADS, Cognitive Estimates Test, FAS verbal fluency test, a modified version of the Wisconsin Card Sorting Test (MWCST) and the DEX. There were adequate correlations between MET-SV task failure errors and WAIS-R FSIQ (r = -0.32), MWCST perseverative errors (r = 0.39), BADS profile score (r = -0.46) and Zoo-Map (r = -0.46) and Six Element Test (r = -0.41) subtests. There were adequate negative correlations between MET-SV social rule breaks and the Cognitive Estimates (r = -0.33), and between MET-SV task rule breaks, social rule breaks and total rule breaks and the BADS Action Program subtest (r = -0.42, -0.40, -0.43 respectively). There were poor to adequate negative correlations between the DEX and MET-SV rule breaks (r = -0.30), task failures (r = -0.25) and total errors (r = -0.37).

In a subgroup analysis of individuals with brain injury who passed traditional executive function tests but failed the MET-SV (n=17), there were adequate to excellent correlations between MET-SV inefficiencies and DEX factors of intentionality and negative affect (r = 0.59, -0.76); MET-SV interpretation failures and DEX inhibitionThe ability to suppress automatic actions that are inappropriate in a given context that interfere with a certain behavior (Grieve & Gnanasekaran, 2008)
and total (r = -0.67, -0.57); MET-SV total and actual rule breaks and DEX inhibitionThe ability to suppress automatic actions that are inappropriate in a given context that interfere with a certain behavior (Grieve & Gnanasekaran, 2008)
(r = -0.70, 0.66), intentionality (r = 0.60, 0.64) and total (r = -0.57, 0.59); MET-SV social rule breaks and DEX positive and negative affect (r = 0.79, -0.59); MET-SV task failures and DEX inhibitionThe ability to suppress automatic actions that are inappropriate in a given context that interfere with a certain behavior (Grieve & Gnanasekaran, 2008)
and positive affect (r = -0.58, -0.52), and MET-SV total errors and DEX intentionality (r = 0.67).

Dawson et al. (2009)* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the BMET by comparison with other measures of IADL and everyday function with 14 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
. Other measures included the DEX (significant other report), StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Profile (SIP), Assessment of Motor and Process Skills (AMPS) and Mayo Portland Adaptability Inventory (MPAI) (significant other report). There were excellent correlations between the BMET number of rules broken and the SIP – Physical (r = 0.78) and Affective behavior (r = 0.64) scores and the AMPS motor score (r = -0.75). There was an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the BMET time to completion and SIP physical score (r = 0.54).

Rand et al. (2009a)* examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the VMET by comparison with the BADS Zoo Map test and IADL questionnaire with the same sample of 9 patients with subacute or chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Spearman correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. There was an excellent negative correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the BADS Zoo Map and VMET outcome measure of non-efficiency mistakes (r = -0.87), and between the IADL and VMET total number of mistakes (r = -0.82).

Rand et al. (2009a) also examined the relationships between the scores of the VMET and those of the MET-HV using Spearman and Pearson correlation coefficients. Among patients with stroke, there were excellent correlations between MET-HV and VMET outcomes for the total number of mistakes (r = 0.70), partial mistakes in completing tasks (r = 0.88) and non-efficiency mistakes (r = 0.73). Analysis of the whole population indicated adequate to excellent correlations between MET-HV and VMET outcomes for the total number of mistakes (r = 0.77), complete mistakes of completing a task (r = 0.63), partial mistakes in completing tasks (r = 0.80), non-efficiency mistakes (r = 0.72) and use of strategies (r = 0.44), but not for rule break mistakes.

Raspelli et al. (2010) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the VMET by comparison with neuropsychological tests, with 6 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and 14 healthy subjects. VMET outcome measures included time, searched item in the correct area, sustained attention, maintained sequence and no perseveration. Neuropsychological tests included the Trail Making Test, Corsi spatial memory supra-span test, Street’s Completion Test, Semantic Fluencies and Tower of London test. There were excellent correlations between the VMET variable ‘time’ and the Semantic Fluencies test (r = -0.87) and the Tower of London test (r = -0.82); between the VMET variable ‘searched item in the correct area’ and the Trail Making Test (r = 0.96); and between the VMET variables ‘sustained attention’, ‘maintained sequence’ and ‘no perseveration’ and Corsi’s supra-span test (r = 0.84) and Street’s Completion Test (r = -0.86).

Raspelli et al. (2012) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the VMET by comparison with the Test of Attentional Performance (TEA) with 9 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. VMET outcome measures included time, errors, inefficiencies, rule breaks, strategies, interpretation failures and partial-task failures. Authors reported excellent correlations between the VMET outcomes time, inefficiencies and total errors and TEA tests (range r = -0.67 to 0.81).
Note: Other neuropsychological tests were administered but correlations are not reported (Mini Mental Status Examination (MMSE), Beck DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Inventory (BDI), State and Trait Anxiety Index (STAI), Behavioural Inattention Test (BIT) – Star Cancellation Test, Brief Neuropsychological Examination (ENB) – Token Test, Street’s Completion Test, Stroop Colour-Word Test, Iowa Gambling Task, DEX and ADL/IADL Tests).
*Note: The correlations between the MET and other measures of everyday executive functioning and IADLs also provide support for the ecological validityRefers to the extent to which a measure captures behaviours that are reflective of those that would occur in a natural setting
of the MET (as reported by the authors of these articles).

Known Group:
Knight, Alderman & Burgess (2002) examined known-group validityThe degree to which an assessment measures what it is supposed to measure.
of the MET-HV in a sample of 20 patients with chronic acquired brain injury (traumatic brain injury, n=12; strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., n=5, both TBI and strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., n=3) and 20 healthy control subjects (hospital staff members) matched for gender, age and IQ*. Clients with brain injury made significantly more rule breaks (p=0.002) and total errors (p<0.001), and achieved significantly fewer tasks (p<0.001) than control subjects. Clients with brain injury used significantly more strategies such as looking at a map (p=0.008), reading signs (p=0.006), although use of strategies had little effect on test performance. The test was able to discriminate between individuals with acquired brain injury and healthy controls.
*Note: IQ was measured using the National Adult Reading Test – Revised Full Scale Intelligence Quotient (NART-R FSIQ).

Rand et al. (2009a) examined known group validityThe degree to which an assessment measures what it is supposed to measure.
of the MET-HV with 9 patients with subacute or chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., 20 healthy young adults and 20 healthy older adults, using Kruskal Wallis H. Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. made more mistakes than older adults on VMET outcomes of total mistakes, mistakes in completing tasks, partial mistakes in completing tasks and non-efficiency mistakes, but not rule break mistakes or use of strategies mistakes. Older adults made more mistakes than younger adults on VMET outcomes of total mistakes, partial mistakes in completing tasks and non-efficiency mistakes, but not mistakes in completing tasks, rule break mistakes or use of strategies mistakes.

Alderman et al. (2003) examined known group validityThe degree to which an assessment measures what it is supposed to measure.
of the MET-SV with 46 individuals with no history of neurological disease (hospital staff members) and 50 clients with brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=9), using a series of t-tests. Clients with brain injury made significantly more rule breaks (t = 4.03), task failures (t = 10.10), total errors (t = 7.18), and social rule breaks (chi square 4.3) than individuals with no history of neurological disease. Results regarding errors were preserved when group comparisons were repeated using age, familiarity and cognitive ability (measured by the NART-R FSIQ) as covariates (F = 11.79, 40.82, 27.92 respectively). There was a significant difference in task failures between groups after covarying for age, IQ (measured by the WAIS-R FSIQ) and familiarity with the shopping centre (F = 11.57). Clients with brain injury made approximately three times more errors as healthy individuals. For both groups, rule breaks and task failures were the most common errors.

Dawson et al. (2009) examined known group validityThe degree to which an assessment measures what it is supposed to measure.
of the BMET with 14 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and 13 healthy matched controls, using a series of t-tests. Clients with stroke performed significantly worse on number of tasks completed accurately (d = 0.84, p<0.05), rule breaks (d = 0.92, p<0.05) and total failures (d = 1.05, r<0.01). The proportion of group members who completed fewer than 40% (< 5) tasks satisfactorily was also significantly different between the two groups (28% of clients with stroke vs. 0% of healthy matched controls, p<0.05).
Note: d is the effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
; effect sizes ≥ 0.7 are considered large.

Rand et al. (2009a) examined known group validityThe degree to which an assessment measures what it is supposed to measure.
of the VMET with a sample of 9 patients with subacute or chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., 20 healthy young adults and 20 healthy older adults, using Kruskal Wallis H. Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. made more mistakes than older adults on all VMET outcomes except for rule break mistakes. Older adults made more mistakes than young adults on all VMET outcomes except for the use of strategies mistakes.

Raspelli et al. (2010) examined known group validityThe degree to which an assessment measures what it is supposed to measure.
of the VMET with 6 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and 14 healthy subjects. There were significant differences between groups in time taken to execute the task (higher for healthy subjects) and in the partial error ‘Maintained task objective to completion’.

Raspelli et al. (2012) examined known group validityThe degree to which an assessment measures what it is supposed to measure.
of the VMET with 9 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., 10 healthy young adults and 10 healthy older adults, using Kruskal-Wallis procedures. Results showed that clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. scored lower in VMET time and errors than older adults, and that older adults scored lower in VMET time and errors than young adults.

SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
/ SpecificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
:
Knight, Alderman & Burgess (2002) investigated sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of the MET-HV in a sample of 20 patients with chronic acquired brain injury (traumatic brain injury, n=12; strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., n=5, both TBI and strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., n=3) and 20 healthy control subjects matched for gender, age and IQ*. A cut-off score ≥ 7 errors (i.e. 5th percentile of total errors of control subjects) resulted in correct identification of 85% of participants with acquired brain injury (85% sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
, 95% specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
).
*Note: IQ was measured using the National Adult Reading Test – Revised Full Scale Intelligence Quotient (NART-R FSIQ).

Alderman et al. (2003) reported on sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of the MET-SV with 46 individuals with no history of neurological disease and 50 clients with brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=9). Using a cutoff score ≥ 12 errors (i.e. 5th percentile of controls) results in 44% sensitivity (i.e. correct classification of clients with brain injury) and 95.3% specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
(i.e. correct classification of healthy individuals). The authors caution that deriving a single measure based only on number of errors fails to consider between-group qualitative differences in performance. Accordingly, error scores were recalculated to reflect “normality” of the error type, with weighting of errors according to prevalence in the healthy control group (acceptable errors seen in up to 95% of healthy controls = 1; errors demonstrated by ≥ 5% of healthy controls = 2; errors unique to the patient group = 3). Using a cutoff score ≥ 12 errors (5th percentile of controls) resulted in 82% sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
and 95.3% specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
. The MET-SV was more sensitive than traditional tests of executive function (Cognitive Estimates, FAS Verbal Fluency, MWCST), and MET-SV error category scores were highly predictive of rating s of executive symptoms of patients who passed traditional executive function tests but failed the MET-SV shopping task.

Responsiveness

Two studies used the MET (MET-HV, VMET and modified version of the MET-HV & MET-SV) to measure change following intervention.

Novakovic-Agopian et al. (2011) developed a modified version of the MET-HV and MET-SV to be used in local hospital settings. They developed 3 alternate forms that were used in a pilot study examining the effect of goal-oriented attentional self-regulation training with a sample of 16 patients with chronic brain injury including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. or cerebral hemorrhage (n=3). A pseudo-random crossover design was used. During the first 5 weeks, one group (Group A) completed goal-oriented attentional self-regulation training while the other group (Group B) only received a 2-hour educational instructional session. In the subsequent phase, conditions were switched such that participants in Group B received goals training for 5 weeks while those in Group A received educational instruction. At week 5 the group that received goal training first demonstrated a significant reduction in task failures (p<0.01), whereas the group that received the educational session demonstrated no significant improvement in MET scores. From week 5 to week 10 there were no significant changes in MET scores in either group.

Rand, Weiss and Katz (2009b) used the MET-HV and VMET to detect change in multi-tasking skills of 4 clients with subacute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. following virtual reality intervention using the VMall virtual supermarket. Clients demonstrated improved performance on both measures following 3 weeks of multi-tasking training using the VMall virtual supermarket.

References

Alderman, N., Burgess, P.W., Knight, C., & Henman, C. (2003). Ecological validity of a simplified version of the multiple errands shopping test. Journal of the International Neuropsychological Society, 9, 31-44.
Dawson, D.R., Anderson, N.D., Burgess, P., Cooper, E., Krpan, K.M., & Stuss, D.T. (2009). Further development of the Multiple Errands Test: Standardized scoring, reliability, and ecological validity for the Baycrest version. Archives of Physical Medicine and Rehabilitation, 90, S41-51.
Knight, C., Alderman, N., & Burgess, P.W. (2002). Development of a simplified version of the Multiple Errands Test for use in hospital settings. Neuropsychological Rehabilitation, 12(3), 231-255.
Maier, A., Krauss, S., & Katz, N. (2011). Ecological validity of the Multiple Errands Test (MET) on discharge from neurorehabilitation hospital. Occupational Therapy Journal of Research: Occupation, Participation and Health, 31(1) S38-46.
Novakovic-Agopian, T., Chen, A.J.W., Rome, S., Abrams, G., Castelli, H., Rossi, A., McKim, R., Hills, N., & D’Esposito, M. (2011). Rehabilitation of executive functioning with training in attention regulation applied to individually defined goals: A pilot study bridging theory, assessment, and treatment. The Journal of Health Trauma Rehabilitation, 26(5), 325-338.
Novakovic-Agopian, T., Chen, A. J., Rome, S., Rossi, A., Abrams, G., DÃŠ¼esposito, M., Turner, G., McKim, R., Muir, J., Hills, N., Kennedy, C., Garfinkle, J., Murphy, M., Binder, D., Castelli, H. (2012). Assessment of Subcomponents of Executive Functioning in Ecologically Valid Settings: The Goal Processing Scale. The Journal of Health Trauma Rehabilitation, 2012 Oct 16. [Epub ahead of print]
Rand, D., Rukan, S., Weiss, P.L., & Katz, N. (2009a). Validation of the Virtual MET as an assessment tool for executive functions. Neuropsychological Rehabilitation, 19(4), 583-602.
Rand, D., Weiss, P., & Katz, N. (2009b). Training multitasking in a virtual supermarket: A novel intervention after stroke. American Journal of Occupational Therapy, 63, 535-542.
Raspelli, S., Carelli, L., Morganti, F., Poletti, B., Corra, B., Silani, V., & Riva, G. (2010). Implementation of the Multiple Errands Test in a NeuroVR-supermarket: A possible approach. Studies in Health Technology and Informatic, 154, 115-119.
Raspelli, S., Pallavicini, F., Carelli, L., Morganti, F., Pedroli, E., Cipresso, P., Poletti, B., Corra, B., Sangalli, D., Silani, V., & Riva, G. (2012). Validating the Neuro VR-based virtual version of the Multiple Errands Test: Preliminary results. Presence, 21(1), 31-42.
Shallice, T. & Burgess, P.W. (1991). Deficits in strategy application following frontal lobe damage in man. Brain, 114, 727-741.

See the measure

How to obtain the Multiple Errands Test?

See the papers below for test instructions of the Simplified Version (MET-SV) and the Hospital Version (MET-HV):

Alderman, N., Burgess, P.W., Knight, C., & Henman, C. (2003). Ecological validity of a simplified version of the multiple errands shopping test.Journal of the International Neuropsychological Society, 9, 31-44.
Knight, C., Alderman, N., & Burgess, P.W. (2002). Development of a simplified version of the Multiple Errands Test for use in hospital settings.Neuropsychological Rehabilitation, 12(3), 231-255.

The Baycrest MET can be obtained at https://cognitionandeverydaylifelabs.com/multiple-errands-test/

Reintegration to Normal Living Index (RNLI)

Evidence Reviewed as of before: 19-08-2008

Author(s)*: Elissa Sitcoff, BA BSc

Editor(s): Nicol Korner-Bitensky, PhD OT; Lisa Zeltzer, MSc OT

Purpose

The Reintegration to Normal Living Index (RNLI) was developed to assess, quantitatively, the degree to which individuals who have experienced traumatic or incapacitating illness achieve reintegration into normal social activities (e.g. recreation, movement in the community, and interaction in family or other relationships). Reintegration to normal living was defined by the scale authors as the “reorganization of physical, psychological, and social characteristics of an individual into a harmonious whole so that one can resume well-adjusted living after incapacitating illness or trauma” (Wood-Dauphinee & Williams, 1987).

The RNLI has been tested for use with individuals with stroke, malignant tumors, degenerative heart disease, central nervous system disorders, arthritis, fractures and amputations; spinal cord injury; traumatic brain injury; rheumatoid arthritis; subarachnoid hemorrhage; hip fracture; physical disability; and community-dwelling elderly.

In-Depth Review

Purpose of the measure

The RNLI has been tested for use with individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., malignant tumors, degenerative heart disease, central nervous system disorders, arthritis, fractures and amputations; spinal cord injury; traumatic brain injury; rheumatoid arthritis; subarachnoid hemorrhage; hip fracture; physical disability; and community-dwelling elderly.

Available versions

The RNLI was developed by Wood-Dauphinee, Opzoomer, Williams, Marchand, and Spitzer in 1988.

Features of the measure

Items:

The RNLI index is made up of 11 declarative statements (e.g. I move around my living quarters as I feel necessary), including the following domains: indoor, community, and distance mobility; self-care; daily activity (work and school); recreational and social activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
;; family role(s); personal relationships; presentation of self to others and general coping skills. The first 8 items represent ‘daily functioning’ and the remaining 3 items represent ‘perception of self’.

Scoring:

Each domain is accompanied by a visual analogue scale (VAS) (0 to 10 cm). The VAS is anchored by the statements “does not describe my situation” (1 or minimal integration) and “fully describes my situation” (10 or complete integration). Individual item scores are summed to provide a total score out of 110 points that is proportionally converted to create a score out of 100.

Three- and 4-point categorical scoring systems were also developed (Wood-Dauphinee, Opzoomer, Williams, Marchand, and Spitzer, 1988), and the 3-point categorical system has been used in the evaluation of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. patients (Mayo et al., 2000; Mayo et al., 2002). In the 3-point system, an additional category is inserted between the two anchor points (“partially describes my situation”) and the respondent selects the most applicable of the three categories. This option yields total scale scores from 22-0, with higher scores indicating poorer reintegration (Mayo et al., 2000, Mayo et al., 2002).

Time:

The time to administer depends on the mode of administration (e.g. self-administration, interviewer-administration, proxy, postal, etc.) and the participant’s abilities, but typically takes less than 10 minutes to complete.

Subscales:

There are two subscales to the RNLI: Daily Functioning (indoor, community, and distance mobility; self-care; daily activity (work and school); recreational and social activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
; general coping skills) and Perception of Self (family role(s); personal relationships; and presentation of self to others.).

Equipment:

Only the test and a pencil are required to complete the RNLI.

Training:

The RNLI requires no training to administer.

Alternative forms of the Reintegration to Normal Living Index

Reintegration to Normal Living Index – Postal Version (RNLI-P) was developed by Daneski, Coshall, Tilling, and Wolfe in 2003.
This measure modified the original RNLI in phrasing and scoring for use by post with stroke patients. The RNLI – P uses an agree/disagree format (0=disagree, 1= agree).
There are also versions created with minor modifications in wording to the original RNLI for: individuals who use adaptive devices motor aids or human assistance where the use of equipment and resources are clarified; use by health care professionals; and use by significant others (Wood-Dauphinee, Opzoomer, Williams, Marchand, and Spitzer,1988).

Client suitability

Can be used with:

Patients with stroke.

Should not be used with:

The use of a visual analogue scale may not be appropriate for the assessment of some strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. patients (i.e. those with attentional deficits or visual impairments or difficulty comprehending the meaning of a VAS). Instead, the use of the 3- or 4-point categorical scoring system is recommended.

In what languages is the measure available?

The RNLI is available in Canadian English and Canadian French (Wood-Dauphinee & Williams, 1987; Wood-Dauphinee, Opzoomer, Williams, Marchand, & Spitzer, 1988).

Please click here to access the french language version. A research version is also available.

Summary

What does the tool measure?	The degree to which individuals who have experienced traumatic or incapacitating illness achieve reintegration into normal social activities.
What types of clients can the tool be used for?	The RNLI has been tested for use with individuals with stroke, malignant tumors, degenerative heart disease, central nervous system disorders, arthritis, fractures and amputations; spinal cord injury; traumatic brain injury; rheumatoid arthritis; subarachnoid hemorrhage; hip fracture; physical disability; and community-dwelling elderly.
Is this a screening or assessment tool?	Assessment
Time to administer	The amount of time it takes to administer the RNLI is dependent upon mode of administration and participant’s abilities but should take approximately 10 minutes.
Versions	Reintegration to Normal Living Index ( RNLI) Reintegration to Normal Living Index- Postal Version (RNLI-P). There are also versions created with minor modifications in wording to the original RNLI for: individuals who use adaptive devices motor aids or human assistance where the use of equipment and resources are clarified; use by health care professionals; and use by significant others. The original RNLI index is made up of 11 declarative statements Three- and 4-point categorical scoring systems are also available.
Other Languages	Canadian French (Please click here to access the french language version. A research version is also available.)
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: Six studies have examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the RNLI. Four reported excellent Cronbach’s alphas. One reported excellent Cronbach alphas for the total RNLI patient and significant other score as well as for the patient score on the Perception of Self subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). , adequate Cronbach alphas for the Daily Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). for both patient and significant other score, as well as on the significant other score on the Perception of Self subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). . One study reported adequate to excellent Cronbach alphas. Test-retest: Three studies have examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the RNLI and reported adequate test-retest agreement between items using kappa statistics, and excellent test-retest on the global score using correlation coefficients. Intra-rater: No studies have examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the RNLI. Inter-rater: No studies have examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the RNLI.
ValidityThe degree to which an assessment measures what it is supposed to measure.	Construct: Convergent/Discriminant: – Excellent correlations between the total score of the RNLI-P and the Frenchay Activities Index (FAI), the Short Form 36 Health Survey (SF-36) and with the Hospital Anxiety and DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. Scale-Depression subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). (HADS). Excellent correlations between the Daily Function subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). of the RNLI-P and the FAI and the SF-36. Poor correlations between the RNLI-P Daily Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). and the HADS-Anxiety subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). as well as between the Perceptions of Self subscale and both the FAI and the Barthel Index. – Excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between the RNLI and the Quality of Life Index (QL) and with a measure of psychological wellbeing. Excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between Daily Functioning subscale and with Quality of Life Index items Activity and Daily Living. Adequate correlations between Perceptions of Self subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). and Support and Outlook items from the Quality of Life Index. Strong correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between the RNLI and the ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. Survey/Mobility (PARTS/M). A positive relationship between the Health Options Scale and the RNLI for strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. survivors well as a positive relationship between the Herth Hope Index and the RNLI for both stroke survivors and their spouses. – Adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between the RNLI and items on the subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). related to physical performance of the Prosthetic Profile of the Amputee (PPA) with the exception of the item “active use of the prosthesis indoors” which was poor. No correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between items of the Perception of Self subscale of the RNLI with items on the subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). related to physical performance of the PPA with the exception of prosthetic wear which was adequate. Adequate to excellent correlations between items of the total RNLI with items in the subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). related to Physical performance of the PPA with the exception of items “Active use indoors” and “Active use outdoors” which had non-significant correlations. – Poor to adequate correlations between items of the total RNLI, and its two subscales with items on the PPA subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). related to acceptance of amputation and prosthesis. – Significant correlations between the RNLI and the Functional Independence Measure (FIM). – Adequate to excellent correlations between scores the total RNLI and the Daily Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). with patient (with Rheumatoid arthritis) age, number of affected joints, the Functional Independence Measure (FIM), the Lee Index (pain, fatigue, and stiffness), and the American Rheumatism Association Classification. The total RNLI was also adequately correlated to disease duration.
Acceptability	The use of the 3 or 4 point categorical scoring system may be more appropriate for the assessment of some strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. patients than the visual analogue scale
Feasibility	The administration of the RNLI is quick and simple and requires no training to administer. The RNLI index is made up of 11 declarative statements representing the domains ‘daily functioning’ (indoor, community, and distance mobility; self-care; daily activity (work and school); recreational and social activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. ;; family role(s); personal relationships; and ‘perception of self'(presentation of self to others, general coping skills. Each domain is accompanied by a visual analogue scale (VAS) (0 to 10 cm). The VAS is anchored by the statements “does not describe my situation” (1 or minimal integration) and “fully describes my situation” (10 or complete integration). Individual item scores are summed to provide a total score out of 110 points that is proportionally converted to create a score out of 100.
How to obtain the tool?	The RNLI is available by clicking here.

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the Reintegration to Normal Living Index (RNLI).

Floor/Ceiling Effects

Not yet examined.

Reliability

Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.:
Wood-Dauphinee, Opzoomer, Williams, Marchand, and Spitzer (1988) administered the RNLI to three samples of patients with varied diagnoses to determine internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.. The RNLI was completed by patients, significant others, and healthcare professionals. The Cronbach’s alphas were excellent for patients, significant others, and health care professionals (alpha = 0.90, 0.92, and 0.95, respectively). Corrected item to total correlations ranged from 0.39 (patient assessment of “comfort with self-care needs”) to 0.75 for patients, 0.61 to 0.87 for significant others, and 0.70 to 0.90 for health professionals.

Tooth, McKenna, Smith, and O’Rourke (2003) administered the RNLI to 57 pairs of patients and significant others six months after stroke rehabilitation. Cronbach’s alphas were excellent for the total RNLI patient and significant other scores (alpha = 0.80 and 0.81, respectively). For the Daily Functioning subscale, adequate Cronbach’s alphas were found for both patient and significant other scores (alpha = 0.71 and 0.73, respectively). For the Perception of Self subscale, Cronbach’s alpha was excellent for patient scores (alpha = 0.84) and adequate for significant other scores (alpha = 0.76).

Steiner et al. (1996) examined the internal consistency of the RNLI in two samples of community-dwelling persons aged 75 and over (n=414, n=50). Cronbach’s alphas were adequate (0.76) to excellent (0.83).

Daneski, Coshall, Tilling, and Wolfe (2002) examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of a postal version of the RNLI (the RNLI-P) administered to 76 patients with stroke (at one-year). The Cronbach’s alpha was excellent (0.84).

Stark, Edwards, Hollingsworth, and Gray (2005) administered the RNLI to 604 people between the ages of 18 and 80 years who had a mobility limitation (including patients with spinal cord injury, Multiple Sclerosis, strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., cerebral palsy, and polio), lived in the community, and had been discharged from rehabilitation for at least 1 year. The Cronbach’s alpha for this sample was excellent (0.91).

Bluvol and Ford-Gilboe (2004) administered the RNLI to both spouses in 40 families in which one of the partners had experienced a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. with moderate to severe functional impairments (6 months to 5 years post-stroke). The internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the measure was excellent for both the partners with stroke (alpha = 0.92) and their spouses (alpha = 0.85).

Test-retest:
Steiner et al. (1996) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the RNLI in 50 community-dwelling persons aged 75 and over interviewed twice, by the same interviewer, with 7 to 14 days between interviews. Test-retest for the total sample of community-dwelling elderly was excellent (r = 0.83). When examined by age group, correlations were excellent for the 75 to 79 age group (r = 0.82), 80 to 84 age group (r = 0.93), and for the 85+ age group (r = 0.76).

Daneski, Coshall, Tilling and Wolfe (2002) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of a postal version of the RNLI (the RNLI-P) in 26 patients with stroke (3-12 months post-stroke) who completed the test twice within a 2-week interval. All 11 items demonstrated agreement between the two occasions above that expected by chance. Kappa values ranged from poor to excellent agreement (kappa = 0.38 for the item “embarrassed when with others”, to 0.92 for the item “getting around outside”).

Korner-Bitensky, Wood-Dauphinee, Siemiatycki, Shapiro, and Becker (1994) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the RNLI in 366 patients with a diagnosis of stroke or orthopedic condition discharged from a rehabilitation hospital. The test was administered twice – once by face-to-face interview and once by a structured telephone interview to either a self or proxy respondent. The interclass coefficient (ICC) for the RNL Index was 0.80 indicating excellent agreement between the two modes of interview. However, for the self-respondents, poor community reintegration was reported more often during the home interview than the interview conducted over the telephone.

Type of rater:
Korner – Bitensky, Wood Dauphinee, Shapiro, and Becker (1994) analyzed the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of RNLI scores of 366 participants (with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. or an orthopedic condition post discharge from a rehabilitation hospital) who completed both a home interview (conducted by a health professional only) and a telephone interview (conducted by either a lay person or health professional). Results revealed that there were no significant differences on the comparison of kappa scores when patients were interviewed by lay interviewers or health professionals. When a dichotomized score of 40 was used (0-40 = no disability, scores of >40 equals disability), the group interviewed by phone by a layperson was significantly more likely to report difficulties in community reintegration compared to when interviewed face-to-face.

Wood-Dauphinee, Opzoomer, Williams, Marchand, and Spitzer (1988) analyzed the reliability of RNLI scores between patients and relatives and between patients and health professionals. Using Pearson’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient to measure reliability they reported adequate significant other to patient correlations of r = 0.62 and r = 0.65 in two different patient/significant other samples. They also reported poor to adequate health professional to patient correlations of r = 0.39 and r = 0.43. Based on these results, the authors stated that patients or significant others could complete the RNLI but that the use of health professionals as proxies should be avoided.

Trombly, Radomski, and Davis (1998) administered the RNLI to 16 adults with traumatic brain injury and their significant others. At admission to a treatment program, patients’ and proxies’ scores did not differ significantly, however at discharge and follow-up, they differed significantly

Tooth, McKenna, Smith, and O’Rourke (2003) examined patient proxy reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of RNLI scores in 57 subacute patients paired with a significant other 6 months post stroke rehabilitation. Intra-class CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
Coefficients were poor for the total RNLI score (0.36) and the Daily Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
(0.24). Adequate reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
was found for the Perception of Self subscale (0.55.).

Validity

Content:

The RNLI was developed based on literature reviews, incorporation of experiences of investigators, and open- and closed-ended questionnaires given to patients with myocardial infarction, cancer, and other chronic diseases, health professionals (physicians, social workers, physical and occupational therapists, psychologists), significant others of patients; and clergy and other lay people.

Construct:

Convergent/Discriminant:
Daneski, Coshall, Tilling and Wolfe (2003) examined the construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of a postal version of the RNLI (RNLI-P) with other similar measures in 76 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Excellent correlations were found between the total score on the RNLI-P and the Frenchay ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Index (FAI – Holbrook & Skilbeck, 1983) (r = 0.69), the Short Form 36 Health Survey (SF-36 – Ware, Snow, Kosinski & Gandek, 1993) (r = 0.74), and with the Hospital Anxiety and DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale-Depression subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
(HADS – Zigmond & Snaith, 1983) (r = -0.61). Excellent correlations were reported between the Daily Function subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
of the RNLI-P and the FAI (r = 0.74) and the SF-36 (r = 0.73). The RNLI-P Daily Function subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
correlated poorly with the HADS-Anxiety subscale(r=-0.30). The Perceptions of Self subscale correlated poorly with the FAI (r = 0.26) and with the Barthel Index (Mahoney & Barthel, 1965), (r = 0.06).

Wood-Dauphinee, Opzoomer, Williams, Marchand, and Spitzer (1988) administered the RNLI to 70 patients with myocardial infarct or cancer and reported excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with scores on the Quality of Life (QL) Index (Spitzer, Dobson, Hall, Chesterman, Levi, Shepherd, Battista & Catchlove, 1981) (r = 0.68) and with a measure of psychological well-being (r = 0.32 for positive wellbeing, -0.41 for negative wellbeing, and 0.41 for overall). Daily Functioning subscale scores showed excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with QL Index items Activity and Daily Living (r = 0.67) while Perceptions of Self scores correlated adequately with Support and Outlook from the QL Index (r = 0.36).Items on the QL Index that reflected dimensions not included on the RNLI, correlated less strongly (r < 0.20).

In a study describing the development and psychometric properties of the ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. Survey/Mobility (PARTS/M), Gray, Hollingsworth, Stark and Morgan (2006) administered the RNLI to 604 people with mobility limitations due to a diagnosis of spinal cord injury, Multiple Sclerosis, cerebral palsy, strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. or post poliomyelitis and reported a strong correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the two indices (canonical correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
=0.71).

Bluvol and Ford-Gilboe (2004) administered the Herth Hope Index (measure of hope – Herth, 1992), the Health Options Scale (measure of health work – Ford-Gilboe, 1997, 2002b) and the RNLI to both spouses in 40 families in which one of the partners had experienced a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. with moderate to severe functional impairments (6 months to 5 years post-stroke). They found a positive relationship between the Health Options Scale (health work) and the RNLI for strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. survivors (r = 0.50) but not for their spouses(r = 0.06) as well as a positive relationship between the Herth Hope Index (hope) and the RNLI (quality of life) for both stroke survivors (r = 0.59) and spouses (r = 0.32).
Note: Health work is defined as “an active process through which families learn ways of coping and developing that are conducive to healthy living over time” Ford-Gilboa 2002a.

Gauthier-Gagnon, and Grise (1994) administered the RNLI and the Prosthetic Profile of the Amputee (PPA) questionnaire (Grise, Gauthier-Gagnon, 1993) to 89 people with a lower limb amputation. Items on the Daily ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
of the RNLI correlated adequately (r = 0.36 to 0.56) with items on the subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
related to physical performance of the PPA with the exception of the item “active use of the prosthesis indoors” which was poor ( r = 0.28).

In this same study, items of the Perception of Self subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
of the RNLI failed to correlate with items on the subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
related to physical performance of the PPA with the exception of prosthetic wear which was adequate (r = 0.32).

Items of the total RNLI had adequate to excellent correlations (r = 0.36 to 0.53) with items in the subscale related to Physical performance of the PPA with the exception of items “Active use indoors” and “Active use outdoors” which had non-significant correlations.

Items of the total RNLI, and its two subscales revealed poor to adequate correlations (r = 0.53 to 0.30) with items on the subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
related to acceptance of amputation and prosthesis.

Daverat, Petit, Kemoun, Dartigues, and Barat (1995) conducted a longitudinal study of 149 individuals with long-standing spinal cord injury. The univariate analysis showed that the RNLI significantly correlated with the Functional Independence Measure (FIM) (Hamilton, Granger, & Sherwin, 1987) FIM. The multivariate analysis determined that the following significant seven independent variables contributed to 72% of the RNLI variance. They included the FIM, the Yale Scale Score (Chehrazi, Wagner, Collins, Freedman, 1981) the Centre for Epidemiological Studies DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale (CES-D – Radloff, 1977), living conditions, relationship, sexual life and age.

Calmels, Pereira, Domenach, Pallot-Prades, Alexandre, and Minaire (1994) administered the RNLI to 57 individuals with rheumatoid arthritis, with a mean disease duration of 15 years. In this study, scores on the total RNLI and the Daily Function subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
had adequate to excellent correlations with patient age, number of affected joints, the Functional Independence Measure (FIM), the Lee Index (pain, fatigue, and stiffness), and the American Rheumatism Association Classification (r = 0.38 to 0.84). The total RNLI was also adequately correlated to disease duration (r = 0.31).

McColl, Paterson, Davies, Doubt, and Law (2000) administered the RNLI to 61 community-dwelling individuals with a disability and found that RNLI scores were adequately correlated with the satisfaction subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
of the Canadian Occupational Performance Measure (COPM – Law et al., 1991, 1994, 1998) (r = 0.38) but only poorly correlated with the Performance subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
(r = 0.22). The RNLI had excellent correlations with the Life Satisfaction Scale (Michalos, 1980) (r = 0.71) and with the Satisfaction with Performance Scaled Questionnaire (Yerxa, Burnett-Beaulieu, Stocking & Azen, 1988) (r = 0.72).

Steiner et al. (1996) evaluated the performance of the RNLI in an elderly community-based population (n = 414). The RNLI demonstrated adequate positive correlations with instrumental activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living scale (Lawton, Moss, Fulcomer & Klegan, 1982) (r = 0.47) and perceived health (r = 0.45). Poor to adequate negative correlations were reported for living alone (r = -0.14) and number of both bed days (r = -0.16) and chronic conditions(r = -0.32). There was an unpredicted negative correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between age and RNLI (r = -0.11).

May and Warren (2002) examined the external and structural components of validityThe degree to which an assessment measures what it is supposed to measure.
of the spinal cord injury version of the Ferrans and Powers Quality of Life Index (Ferrans & Powers, 1992) in a sample of 98 individuals with spinal cord injury living in the community and reported an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the RNLI (r = -0.65).

Patrick, Perugini, and Leclerc (2002) reported that in a study of 48 consecutive referrals for neuropsychological evaluation following admission to a geriatric rehabilitation inpatient service (for various diagnosis including: orthopedic injury, stroke, functional deconditioning, Parkinson’s disease and other various medical conditions) that the RNLI was significantly correlated to the number of falls sustained and functional status at 6 months. Results of the partial correlations coefficients revealed significant relationships between the RNL and the California Verbal Learning Test (CVLT) (measures memory functioning) and Hooper Visual Organization Test (HVOT – measures spatial skills).
Note: The authors did not report the actual r scores.

Known groups:
Clarke, Black, Badley, Lawrence & Williams, (1999) divided subjects at 3 months and 1 year post-stroke by level of impairment (mild-moderate-severe according to Adam’s Hemispheric StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale), by the presence or absence of depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
(Zung Self-Rating DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
scale), by levels of physical disability (independent-moderately dependent-dependent according to the Functional Independence Measure), RNLI scores for these known groups demonstrated expected gradients and were significantly different as analyzed by analysis of variance. The difference in mean RLNI scores between categories in these analyses ranged from 12% to 62%.

Responsiveness

Wood-Dauphinee, Opzoomer, Williams, Marchand, and Spitzer (1988) administered the RNLI to a sample 70 patients to determine the responsivenessThe ability of an instrument to detect clinically important change over time.
of the RNLI. They concluded that the scale is sensitive to change but the use of subscales provides a more accurate reflection as change (improvement or worsening) in specific domains could be hidden within the total score.

References

Bluvol, A., Ford-Gilboe, M. (2004). Hope, health work and quality of life in families of stroke survivors. Journal of Advanced Nursing, 48(4) 322-332.
Calmels, P., Pereira, A., Domenach, M., Pallot-Prades, B., Alexandre, C., Minaire, P. (1994). Functional ability and quality of life in rheumatoid arthritis: Evaluation using the Functional Independence Measure and the Reintegration to Normal Living Index. Revue Du Rhumatisme, 61(11), 723-731.
Clarke, P. A., Black, S. E., Badley, E. M., Lawrence, J. M., Williams, J. L. (1999). Handicap in stroke survivors. Disability and Rehabilitation, 21(3), 116-123.
Daneski, K., Coshall, C., Tilling, K., Wolfe, C.D.A. (2003). Reliability and validity of a postal version of the Reintegration to Normal Living Index, modified for use with stroke patients. Clinical Rehabilitation, 17, 835-839.
Daverat, P., Petit, H., Kemoun, G., Dartigues, J. F., Barat, M. (1995). The long term outcome in 149 patients with spinal cord injury. Paraplegia, 33, 665-668.
Dawson, D. R., Levine, B., Schwartz, M., Stuss, D. T. (2000). Quality of life following traumatic brain injury: A prospective study. Brain and Cognition, 44, 35-39.
Friedland, J. F., Dawson, D. R. (2001). Function after motor vehicle accidents: A prospective study of mild head injury and posttraumatic stress. The Journal of Nervous and Mental Disease, 189(7), 426-434.
Gauthier-Gagnon, C., Grise, M-C. (1994). Prosthetic Profile of the Amputee Questionnaire: Validity and reliability. Archives of Physical Medicine and Rehabilitation, 75, 1309-1314.
Gray, D. B., Hollingsworth, H. H., Stark, S. L., Morgan, K. A. (2006). Participation Survey/Mobility: Psychometric properties of a measure of participation for people with mobility impairments and limitations. Archives of Physical Medicine and Rehabilitation, 87(2), 189-197
Korner – Bitensky, N., Wood Dauphinee, S., Shapiro, S., Becker, R. (1994). Eliciting health status information by telephone after discharge from hospital: Health professionals versus trained lay persons. Canadian Journal of Rehabilitation, 8(1) 23-34.
Korner-Bitensky, N., Wood-Dauphinee, S., Siemiatycki, J., Shapiro, S., Becker, R. (1994). Health related information postdischarge: Telephone versus face-to-face interviewing. Archives of Physical Medicine and Rehabilitation, 75, 1287-1296.
May, L. A, Warren, S. (2002). Measuring quality of life of persons with spinal cord injury: external and structural validity. Spinal Cord, 40, 341-350.
Mayo, N. E., Wood-Dauphinee S., Cote, R., Gayton, D., Carlton, J., Buttery, J., Tamblyn, R. (2000). There is no place like home: An evaluation of early supported discharge for stroke. Stroke, 31, 1016-1023.
Mayo N., Wood-Dauphinee S., Cote R., Durcan L., Carlton J. (2002). Activity, participation & quality of life 6 months post-stroke. Archives of Physical Medicine & Rehabilitation, 83, 1035-1042.
McColl, M. A., Paterson, M., Davies, D., Doubt, L., Law, M. (2000). Validity and community utility of the Canadian Occupational Performance Measure. Canadian Journal of Occupational Therapy, 67(1), 22-33.
Patrick, L., Perugini, M. Leclerc, C. ( 2002). Neuropsychological assessment and competency for independent living among geriatric patients. Topics in Geriatric Rehabilitation, 14(4) 65-77.
Stark, D. L., Edwards, D. F., Hollingsworth, H., Grey, D. B. (2005).Validation of the Reintegration to Normal Living Index in a population of community-dwelling people with mobility limitations. Archives of Physical Medicine & Rehabilitation, 86(2), 344-345.
Steiner, A., Raube, K., Stuck, A. E., Aronow, H. U., Draper, D., Rubenstein, L. Z., Beck, J. C. (1996). Measuring psychosocial aspects of well-being in older community residents: Performance of four short scales. The Gerontologist, 36(1), 54-62.
Tooth, L.R., McKenna, KT., Smith, M., O’Rourke, P.K. (2003). Reliability of scores between stroke patients and significant others on the Reintegration to Normal Living (RNL) Index. Disability and Rehabilitation, 25(9), 433-440.
Trombly, C. A., Radomski, M. V., Davis, E. S. (1998). Achievement of self identified goals by adults with traumatic brain injury: Phase 1. The American Journal of Occupational Therapy, 52(10), 810-818.
Wood-Dauphinee, S. L., Opzoomer, M. A., Williams, J. I., Marchand, B., Spitzer, W. O. (1988). Assessment of global function: The Reintegration to Normal Living Index. Archives of Physical Medicine and Rehabilitation, 69, 583-590.
Wood-Dauphinee, S., Williams, J. I. (1987). Reintegration to normal living as a proxy to quality of life. Journal of Chronic Diseases, 40(6), 491-499.

See the measure

You can obtain the RNLI here.

Please click here to access the french language version. A research version is also available.

Screening for Self-Medication Safety Post-Stroke Scale (S-5)

Evidence Reviewed as of before: 07-01-2011

Author(s)*: Annabel McDermott, OT

Editor(s): Nicol Korner-Bitensky, PhD OT

Purpose

The Screening for Self-Medication Safety Post-Stroke Scale (S-5) is a screen for clinicians to identify patients’ self-medication safety and readiness following stroke. The tool can also be used by health professionals to make recommendations to improve self-medication skills of patients post-stroke (Kaizer, Kim, Van & Korner-Bitensky, 2010).

In-Depth Review

Purpose of the measure

The ScreeningTesting for disease in people without symptoms.
for Self-Medication Safety Post-Stroke Scale (S-5) is a screen for clinicians to identify patients’ self-medication safety and readiness following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. (Kaizer, Kim, Van & Korner-Bitensky, 2010). It is a quick, inexpensive test that uses a checklist-style interview format.

Available versions

There is only one version of the ScreeningTesting for disease in people without symptoms.
for Self-Medication Safety Post-Stroke Scale (S-5), which was developed by Kaizer, Kim, Van and Korner-Bitensky in 2010.

Features of the measure

Items of the measure:

The S-5 consists of 16 items that assess five domains:

Cognition (orientation; immediate and delayed memory recall)
Communication (comprehension; reading)
Motor function
Visual-perception
Judgement/executive functions/self-efficacy.

The patient must be able to correctly answer 2 of the first 3 questions regarding orientation to time and space in order to progress with the screen.

Scoring and Score Interpretation:

Each item is scored according to a yes/no response. There is no cumulative score. A score of “no” on any 1 item indicates the need for further assessment regarding this domain, or can be used to guide intervention planningPlanning ability involves anticipating future events, formulating a goal or endpoint, and devising a sequence of steps or actions that will achieve the goal or endpoint" (Anderson, 2008, p. 17)
to address this area of difficulty.

Each item also has a “concern” box, where the clinician can identify any concerns regarding the particular item. A summary “Concerns and Recommendations” section at the end of the tool also enables the clinician to document specific concerns and suggestions.

Equipment:

Pill bottle with childproof cap
Pill bottle without childproof cap
Pill bottle with a pharmacy label: must include the information commonly found on a label (medication name, dosage, frequency, time of day to take medication and the name of a person)
Liquid bottle with “push and turn” cover and a medicine cup
1 syringe without needle
8 disc-shaped white pills (e.g. shape of a vitamin C)
1 oval-shaped blue or green gel-capsule pill
1 oval shaped orange pill
1 small and 1 larger disc-shaped white pill
Three objects: pen, coin & a key

Time:

The S-5 takes approximately 10 minutes to administer.

Training requirements:

No training requirements specified.

Subscales:

None.

Alternative forms of the S-5

Not applicable

Client suitability

Can be used with:

Clients following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Should not be used in:

Not specified.

In what languages is the measure available?

English.

Summary

What does the tool measure?	Self-medication safety.
What types of clients can the tool be used for?	Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screening or assessment tool?	ScreeningTesting for disease in people without symptoms. tool.
Time to administer	Approximately 10 minutes.
Versions	There are no alternative versions.
Other Languages	There are no official translations.
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Test-retest: The test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the S-5 is currently under study.
Validity	Content: This tool is not intended as a comprehensive assessment of self-medication safety. Some daily self-medication tasks were intentionally not included due to its intended use as a screen only. Accordingly, content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension. of this tool was reported as satisfactory. Criterion: Concurrent: The concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." has not been examined as there is currently no gold standardA measurement that is widely accepted as being the best available to measure a construct. for assessing self-medication safety with this population. Construct: Known groups: The known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured. of the S-5 is currently under study.
Floor/Ceiling Effects	Not yet examined.
Does the tool detect change in patients?	Not yet examined.
Acceptability	The S-5 is a quick and simple test to administer, with minimal equipment requirements and specific instructions for the assessor to follow.
Feasibility	Administration of the S-5 is quick and easy, and can be performed by any member of the multidisciplinary team. Feedback from expert clinicians and patients indicates acceptable administration time, effort and complexity.
How to obtain the tool?	Click here to see a copy of the S-5.

Psychometric Properties

Overview

Please refer to the article by Kaizer et al. (2010) for information regarding the psychometric properties of the S-5

References

Kaizer, F., Kim, A., Van, M. T., & Korner-Bitensky, N. (2010). Creation and preliminary validation of the Screening for Self-Medication Safety Post-Stroke Scale (S-5). Journal of Rehabilitation Medicine, 42, 239-245.

See the measure

How to obtain the Assessment?

Click here to see a copy of the S-5

Stroke Impact Scale (SIS)

Evidence Reviewed as of before: 29-06-2018

Author(s)*: Lisa Zeltzer, MSc OT; Katherine Salter, BA; Annabel McDermott

Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Purpose

The Stroke Impact Scale (SIS) is a stroke-specific, self-report, health status measure. It was designed to assess multidimensional stroke outcomes, including strength, hand function Activities of Daily Living / Instrumental Activities of Daily Living (ADL/IADL), mobility, communication, emotion, memory and thinking, and participation. The SIS can be used both in clinical and in research settings.

In-Depth Review

Purpose of the measure

The Stroke Impact Scale (SIS) is a stroke-specific, self-report, health status measure. It was designed to assess multidimensional stroke outcomes, including strength, hand function, ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living / Instrumental ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living (ADL/IADL), mobility, communication, emotion, memory and thinking, and participation. The SIS can be used both in clinical and research settings.

Available versions

The Stroke Impact Scale was developed at the Landon Center on Aging, University of Kansas Medical Center. The scale was first published as version 2.0 by Duncan, Wallace, Lai, Johnson, Embretson, and Laster in 1999. Version 2.0 of the SIS is comprised of 64 items in 8 domains (Strength, Hand function, Activities of Daily Living (ADL)Basic tasks that involve bodily issues (bathing, dressing, toileting, transferring, continence, eating and walking) that are done on a daily basis. / Instrumental ADL, Mobility, Communication, Emotion, Memory and thinking, Participation). Based on the results of a Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
process, 5 items were removed from version 2.0 to create the current version 3.0 (Duncan, Bode, Lai, & Perera, 2003b).

Features of the measure

Items:

The SIS version 3.0 includes 59 items and assesses 8 domains:

Strength – 4 items
Hand function – 5 items
ADL/IADL – 10 items
Mobility – 9 items
Communication – 7 items
Emotion – 9 items
Memory and thinking – 7 items
ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations./Role function – 8 items

An extra question on stroke recovery asks that the client rate on a scale from 0 – 100 how much the client feels that he/she has recovered from his/her stroke.

To see the items of the SIS, please click here.

Instructions on item administration:

Prior to administering the SIS, the purpose statement must be read as written below. It is important to tell the respondent that the information is based on his/her point of view.

Purpose statement:
“The purpose of this questionnaire is to evaluate how strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. has impacted your health and life. We want to know from your point of view how stroke has affected you. We will ask you questions about impairments and disabilities caused by your stroke, as well as how strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. has affected your quality of life. Finally, we will ask you to rate how much you think you have recovered from your strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.”.

Response sheets in large print should be provided with the instrument, so that the respondent may see, as well as hear, the choice of responses for each question. The respondent may either answer with the number or the text associated with the number (eg. “5” or “Not difficult at all”) for an individual question. If the respondent uses the number, it is important for the interviewer to verify the answer by stating the corresponding text response. The interviewer should display the sheet appropriate for that particular set of questions, and after each question must read all five choices.

Questions are listed in sections, or domains, with a general description of the type of questions that will follow (eg. “These questions are about the physical problems which may have occurred as a result of your stroke”). Each group of questions is then given a statement with a reference to a specific time period (eg. “In the past week how would you rate the strength of your…”). The statement must be repeated before each individual question. Within the measure the time period changes from one week, to two weeks, to four weeks. It is therefore important to emphasize the change in the time period being assessed for the specific group of questions.

Scoring:

The SIS is a patient-based, self-report questionnaire. Each item is rated using a 5-point Likert scaleLikert scaling is one type of response to items in a questionnaire or tool. For example, Likert scaling would have you rate an item such as "I am satisfied with the care I received" on a scale using a 1-to-5 response scale where:
• 1 = strongly disagree
• 2 = disagree
• 3 = undecided
• 4 = agree
• 5 = strongly agree
You will find various options and scaling methods for the number of response choices (1-to-7, 1-to-9, 0-to-4). Odd-numbered scales usually have a middle value that is labelled Neutral or Undecided. Some tools used forced-choice Likert scaling with an even number of responses and no middle neutral or undecided choice.. The patient rates his/her difficulty completing each item, where:

1 = an inability to complete the item
5 = no difficulty experienced at all.

Note: Scores for three items in the Emotion domain (3f, 3h, 3i) must be reversed before calculating the Emotion domain score (i.e. 1 » 5, 2 » 4, 3 = 3, 4 » 2, 5 » 1). The SIS scoring database (see link below) takes this change of direction into account when scoring. When scoring manually, use the following equation to compute the item score for 3f, 3h and 3i: Item score = 6 – individual’s rating.

A final single-item Recovery domain assesses the individual’s perception of his/her recovery from stroke, measured in the form of a visual analogue scale from 0-100, where:

0 = no recovery
100 = full recovery.

Domain scores range from 0-100 and are calculated using the following equation:

Domain score = [(Mean item score – 1) / 5-1 ] x 100

Scores are interpreted by generating a summative score for each domain using an algorithm equivalent to that used in the SF-36 (Ware & Sherbourne, 1992).

See http://www.kumc.edu/school-of-medicine/preventive-medicine-and-public-health/research-and-community-engagement/stroke-impact-scale/instructions.html to download the scoring database.

Time:

The SIS is reported to take approximately 15-20 minutes to administer (Finch, Brooks, Stratford, & Mayo, 2002).

Subscales:

The SIS 3.0 is comprised of 8 subscales or ‘Domains’:

Strength
Hand function
ADL/IADL
Mobility
Communication
Emotion
Memory and thinking
ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations.

A final single-item domain measures perceived recovery since strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. onset.

Equipment:

Only the scale and a pencil are needed.

Training:

The SIS 3.0 requires no formal training for administration (Mulder & Nijland, 2016). Instructions for administration of the SIS 3.0 are available online through the University of Kansas Medical Center SIS information page.

Alternative forms of the SIS

SIS-16 (Duncan et al., 2003a).

Duncan et al. (2003) developed the SIS-16 to address the lack of sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
to differences in physical functioning in functional measures of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. outcome. Factor analysis of the SIS 2.0 revealed that the four physical domains (Strength, Hand function, ADL/IADL, Mobility) are highly correlated and can be summed together to create a single physical dimension score (Duncan et al., 1999; Mulder & Nijland, 2016). Accordingly, the SIS-16 consists of 16 items from the SIS 2.0:

ADL/IADL – 7 items
Mobility – 8 items
Hand Function – 1 item.

All other domains should remain separate (Duncan et al., 1999).

SF-SIS (Jenkinson et al., 2013).

Jenkinson et al. (2013) developed a modified short form of the SIS (SF-SIS) comprised of eight items. The developers selected the one item from each domain that correlated most highly with the total domain score, through three methods: initial pilot research, validation analysis and a focus group. The final choice of questions for the SF-SIS comprised those items that were chosen by methods on 2 or more occasions. The SF-SIS was evaluated for face validityA form of content validity, face validity is assessed by having 'experts' (this could be clinicians, clients, or researchers) review the contents of the test to see if the items seem appropriate. Because this method has inherent subjectivity, it is typically only used during the initial phases of test construction.
and acceptability within a focus group of patients from acute and rehabilitation stroke settings and with multidisciplinary strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. healthcare staff. The SF-SIS has also been evaluated for content, convergent and discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess.
(MacIsaac et al., 2016).

Client suitability

Can be used with:

The SIS can only be administered to patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
The SIS 3.0 and SIS-16 can be completed by telephone, mail administration, by proxy, and by proxy mail administration (Duncan et al., 2002a; Duncan et al., 2002b; Kwon et al., 2006). Studies have shown potential proxy bias for physical domains (Mulder & Nijland, 2016). It is recommended that possible responder bias and the inherent difficulties of proxy use be weighed against the economic advantages of a mailed survey when considering these methods of administration.

Should not be used with:

The SIS version 2.0 should be used with caution in individuals with mild impairment as items in the Communication, Memory and Emotion domains are considered easy and only capture limitations in the most impaired individuals (Duncan et al., 2003).
Respondents must be able to follow a 3-step command (Sullivan, 2014).
Time taken to administer the SIS is a limitation for individuals with difficulties with concentration, attention or fatigue following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (MacIsaac et al., 2016).

In what languages is the measure available?

The SIS was originally developed in English.

Cultural adaptations, translations and psychometric testing have also been conducted in the following languages:

Brazilian (Carod-Artal et al., 2008)
French (Cael et al., 2015)
German (Geyh, Cieza & Stucki, 2009)
Italian (Vellone et al., 2010; Vellone et al., 2015)
Japanese (Ochi et al., 2017)
Korean (Choi et al., 2017; Lee & Song, 2015)
Nigerian (Hausa) (Hamza et al., 2012; Hamza et al., 2014)
Portuguese (Goncalves et al., 2012; Brandao et al., 2018)
Ugandan (Kamwesiga et al., 2016)
United Kingdom (Jenkinson et al., 2013)

The MAPI Research Institute has translated the SIS and/or SIS-16 into numerous languages including Afrikaans, Arabic, Bulgarian, Cantonese, Czech, Danish, Dutch, Farsi, Finnish, French, German, Greek, Hebrew, Hungarian, Icelandic, Italian, Japanese, Korean, Malay, Mandarin, Norwegian, Portuguese, Russian, Slovak, Spanish, Swedish, Tagalog, Thai and Turkish. Translations may not be validated.

Summary

What does the tool measure?	Multidimentional stroke outcomes, including strength, hand function, Activities of daily living/Instrumental activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. of daily living, mobility, communication, emotion, memory, thinking and participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations..
What types of clients can the tool be used for?	Patients with stroke.
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	The SIS takes 15-20 minutes to administer.
Versions	SIS 2.0, SIS 3.0, SIS-16, SF-SIS.
Other Languages	The SIS has been translated into several languages. Please click here to see a list of translations.
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: SIS 2.0: Two studies reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.; one study reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. for 5/8 domains and adequate internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. for 3/8 domains. SIS 3.0: Two studies reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.; one study reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. for 6/8 domains and adequate internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. for 2/8 domains. SIS-16: One study reported good spread of item difficulty. SF-SIS: One study reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.. Test-retest: SIS 2.0: One study reported adequate to excellent test-rest reliability in all domains except for the Emotion domain.
Validity	Criterion : Concurrent: SIS 2.0: Excellent correlations with the Barthel Index, FMA, nstrumental ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. of Daily Living (IADL) Scale, Duke Mobility Scale and Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. Scale; adequate to excellent correlations with the FIM; adequate correlations with the NIHSS and MMSE; and poor to excellent correlations with the SF-36. SIS 3.0: Excellent correlation between SIS Hand Function and MAL-QOM; excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between SIS ADL/IADL and FIM, Barthel Index, Lawton IADL Scale; excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between SIS Strength and Motricity Index; excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between SIS Mobility and Barthel Index; adequate to excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between SIS ADL/IADL and NEADL; adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between SIS Social ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. and SF-36 Social Functioning, Lawton IADL scale; adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between SIS Memory domain and MMSE; poor to adequate correlations between remaining SIS domains and FIM, NEADL, FMA, MAL-AOU, MAL-QOM, FAI. SIS-16: Excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Barthel Index; adequate to excellent correlations with the STREAM total and subscale scores; adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with SF-36 Physical Functioning. Predictive: SIS 2.0: Physical function, Emotion and ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domains were statistically significant predictors of the patient’s own assessment of recovery; SIS scores were poor predictors of mean steps per day. SIS 3.0: Pre-treatment SIS scores were compared with outcome measures after 3 weeks of upper extremity rehabilitation: Hand function and ADL/IADL domains showed adequate to excellent correlations with FIM, FMA, MAL-AOU, MAL-QOM, FAI, and NEADL; other domains demonstrated poor to adequate correlations with outcome measures. SIS-16: – Admission scores show an excellent correlation with actual length of stay and an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with predicted length of stay; there was a significant correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with discharge destination (home/rehabilitation). – The combination of early outcomes of MAL-QOM and SIS show high accuracy in predicting final QOL among patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Construct: Convergent/Discriminant: SIS 2.0: Domains demonstrate adequate to excellent correlations with corresponding WHOQOL-BREF subscales and Zung’s Self-Rating DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. Scale; poor correlations between the SIS Communication domain and both WHOQOL-BREF and Zung’s Self-Rating Depression Scale; and a poor correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between the SIS Physical domain and the WHOQOL Environment scores. SIS 3.0: Excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the SF-SIS, EQ-5D, mRS, BI, NIHSS, EQ-5D; moderate to excellent correlations with the EQ-VAS; and a moderate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the SIS-VAS. SIS 3.0 telephone survey: Adequate to excellent correlations with the FIM and SF-36V. SIS-16: Adequate to excellent correlations with the WHOQOL-BREF Physical domain; poor correlation with the WHOQOL Social relationships domain. SF-SIS: Excellent correlations with the EQ-5D, mRS, BI, NIHSS, EQ-5D; moderate to excellent correlations with the EQ-VAS; and moderate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the SIS-VAS. Known groups: SIS 2.0: Most domains can differentiate between patients with varying degrees of stroke severity. SIS 3.0: Physical and ADL/IADL domains showed score discrimination and distribution for different degrees of stroke severity. SIS-16: Can discriminate between patients of varying degrees of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity.
Floor/Ceiling Effects	Three studies have examined floor/ceiling effects of the SIS. SIS 2.0: Two studies reported the potential for floor effects in the domain of Hand function among patients with moderate strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity, and a potential for ceiling effects in the Communication, Memory and Emotion domains. SIS 3.0: One study reported minimal floor and ceiling effects for the Social participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domain; one study reported ceiling effects for the Hand function, Memory and thinking, Communication, Mobility and ADL/IADL domains over time. SIS-16: One study reported no floor effects and minimal ceiling effects.
Does the tool detect change in patients?	Five studies have investigated responsivenessThe ability of an instrument to detect clinically important change over time. of the SIS. SIS 2.0: One study reported significant change in patients’ recovery in the expected direction between assessments at 1 and 3 months, and at 1 and 6 months post-stroke, however sensitivity to change was affected by strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity and time of post-stroke assessment. SIS 3.0: – One study determined change scores for a clinically important difference (CID)Clinically Important Difference (CID) is the smallest change in a measure's score that is perceived significant by a patient or healthcare professional. within four subscales of the Strength, ADL/IADL, Mobility, Hand function. The MDC was 24.0, 17.3, 15.1 and 25.9 (respectively); minimal CID was 9.2, 5.9, 4.5 and 17.8 (respectively). – One study reported medium responsivenessThe ability of an instrument to detect clinically important change over time. for Hand function, StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. recovery and SIS total score; other domains showed small responsivenessThe ability of an instrument to detect clinically important change over time. . – One study found ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. and Recovery from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. were the most responsive domains over the first year post-stroke; Strength and Hand function domains also showed high clinically meaningful positive/negative change. SIS-16: One study reported change scores of 23.1 indicated statistically significant improvement from admission to discharge, and sensitivity to change was large.
Acceptability	– SIS 3.0 and SIS-16 are available in proxy version. The patient-centred nature of the scale’s development may enhance its relevance to patients and assessment across multiple levels may reduce patient burden. – Time taken to administer the SIS has been identified as a limitation. – The SIS 2.0 should be used with caution in individuals with mild impairment as some domains only capture limitations in the most impaired individuals.
Feasibility	– The SIS is a patient-based self-report scale that takes 15-20 minutes to administer. – The SIS can be administered in person or by proxy, by mail or telephone. – The SIS does not require any formal training. – Instructions for administration of the SIS 3.0 are available online.
How to obtain the tool?	Please click here to see a copy of the SIS.

Psychometric Properties

Overview

We conducted a literature search to identify relevant publications on the psychometric properties of the SIS. Seventeen studies were included. Studies included in this review are specific to the original English versions of the SIS version 2.0, SIS 3.0 or SIS-16.

Floor/Ceiling Effects

Duncan et al. (1999) found that SIS version 2.0 showed the potential for floor effects in the Hand function domain in the moderate stroke group (40.2%) and a possible ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." in the Communication domain for both the mild (35.4%) and moderate (25.7%) strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. groups. The highest percentage of ceiling effects for the SIS was for the Communication domain (35%) compared with a 64.6% ceiling rate for the Barthel Index (Mahoney & Barthel, 1965).

Duncan et al. (2003b) conducted a Rasch analysis which confirmed these two effects observed in Duncan et al. (1999) – a floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect."
in the SIS Hand function domain and a ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." in the Communication domain. A ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." in the Memory and Emotion domains was also reported.

Lai et al. (2003) examined floor/ceiling effects of the SIS-16 and SIS Social ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domain in a sample of 278 patients at 3 months post-stroke. The authors reported floor/ceiling effects of 0% and 4% (respectively) for the SIS-16, and 1% and 5% (respectively) for the SIS Social ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domain.

Richardson et al. (2016) examined floor/ceiling effects of the SIS 3.0 in a sample of 164 patients with subacute stroke. Measures were taken at three timepoints: on admission to the study and at 6-month and 12-month follow-up (n=164, 108, 37 respectively). Poor ceiling effects (>20%) were seen for the Hand function domain at baseline, 6 months and 12 months (25.0%, 36.4%, 37.8%, respectively); the Memory and thinking domain at 6 months and 12 months (22.2%, 21.6%, respectively); the Communication domain at 6 months and 12 months (30.6%, 27%, respectively); the Mobility domain at 6 months (20.4%); and the ADL/IADL domain at 12 months (21.6%). There were no significant floor effects at any timepoint.

Reliability

Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.:
Duncan et al (1999) examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SIS version 2.0 using Cronbach’s alpha coefficients and reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. for each of the 8 domains (ranging from a=0.83 to 0.90).

Duncan et al. (2003b) examined reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the SIS version 2.0 by Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
. Item separation reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
is the ratio of the “true” (observed minus error) variance to the obtained variation. The smaller the error, the higher the ratio will be. It ranges from 0.00 to 1.00 and is interpreted the same as the Cronbach’s alpha. Item separation reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the SIS version 2.0 ranged from 0.93-1.00. A separation index > 2.00 is equivalent to a Cronbach’s alpha of 0.80 or greater (excellent). In this study, 5 out of 8 domains had a separation index that exceeded 2.00 (in addition to the composite physical domain). The values for the Emotion and Communication domains were only in the adequate range because of the ceiling effect in those domains and those for the Hand function domain were only adequate because of the floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect."
in that domain.

Edwards and O’Connell (2003) administered the SIS version 2.0 to 74 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. (ranging from a=0.87 for participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. to a=0.95 for hand function). The percentage of item-domain correlations >0.40 was 100% for all domains except emotion and ADL/IADL. In the ADL/IADL scale, one item (cutting food) was more closely associated with hand function than ADL/IADL.

Lai et al. (2003) examined reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the SIS-16 and SIS Social Participation domain in a sample of 278 patients at 3 months post-stroke. Both the SIS-16 and SIS Social ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domain showed good spread of item difficulty, with easier items that are able to measure lower levels of physical functioning in patients with severe strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Jenkinson et al. (2013) examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SIS 3.0 and the SF-SIS among individuals with stroke (n=73, 151 respectively), using Cronbach’s alpha. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SIS 3.0 was excellent for all domains (a=0.86 to 0.96). Higher order factor analysis of the SIS 3.0 showed one factor with an eigenvalue > 1 that accounted for 68.76% of the variance. Each dimension of the SIS 3.0 loaded on this factor (eigen value = 5.5). Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SF-SIS was high (a=0.89). Factor analysis of the SF-SIS similarly showed one factor that accounted for 57.25% of the variance.

Richardson et al. (2016) examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SIS 3.0 in a sample of 164 patients with subacute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Cronbach’s alpha. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. was measured at three timepoints: on admission to the study and at 6-month and 12-month follow-up. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of all domains was excellent at all timepoints (a=0.81 to 0.97). The composite Physical Functioning score was excellent at all timepoints (a=0.95 to 0.97).

MacIsaac et al. (2016) examined internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SIS 3.0 in a sample of 5549 individuals in an acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. setting and 332 individuals in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. rehabilitation setting, using Cronbach’s alpha. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. was excellent within both acute and rehabilitation data sets (a=0.98, 0.93 respectively). Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of individual domains was excellent for both acute and rehabilitation data sets, except for the Emotion domain (a=0.60, 0.63 respectively) and the Strength domain (a=0.77, rehabilitation data set only).

Test-retest:
Duncan et al. (1999) examined test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the SIS version 2.0 in 25 patients who were administered the SIS at 3 or 6 months post stroke and again one week later. Test-retest was calculated using intraclass correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients (ICC), which ranged from adequate to excellent (ICC=0.7 to 0.92) with the exception of the Emotion domain, which had only a poor correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
(ICC=0.57).

Validity

Content:

Development of the SIS was based on a study at the Landon Center on Aging, University of Kansas Medical Center (Duncan, Wallace, Studenski, Lai, & Johnson, 2001) using feedback from individual interviews with patients and focus group interviews with patients, caregivers, and health care professionals. Participants included 30 individuals with mild and moderate strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., 23 caregivers, and 9 strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. experts. Qualitative analysis of the individual and focus group interviews generated a list of potential items. Consensus panels reviewed the potential items, established domains for the measure, developed item scales, and decided on mechanisms for administration and scoring.

Criterion:

Concurrent:
Duncan et al. (1999) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the SIS by comparison with the Barthel Index, Functional Independence Measure (FIM), Fugl-Meyer Assessment (FMA), Mini-Mental State Examination (MMSE), National Institute of Health StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale (NIHSS), Medical Outcomes Study Short Form 36 (SF-36), Lawton Instrumental Activities of Daily Living (IADL)Complex tasks that involve social or societal issues (shopping, bill paying, cooking, housework, etc.) that are done on a regular basis. Scale, Duke Mobility Scale and Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale. The following results were found for each domain of the SIS:

SIS Domain	Comparative Measure	CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.	Rating
Hand function	FMA – Upper Extremity Motor	r = 0.81	Excellent
Mobility	FIM Motor	r = 0.83	Excellent
	Barthel Index	r = 0.82	Excellent
	Duke Mobility Scale	r = 0.83	Excellent
	SF-36 Physical Functioning	r = 0.84	Excellent
Strength	NIHSS Motor	r = -0.59	Adequate
Strength	FMA Total	r = 0.72	Excellent
ADL/IADL	Barthel Index	r = 0.84	Excellent
	FIM Motor	r = 0.84	Excellent
	Lawton IADL Scale	r = 0.82	Excellent
Memory	MMSE	r = 0.58	Adequate
Communication	FIM Social/Cognition	r = 0.53	Adequate
Communication	NIHSS Language	r = -0.44	Adequate
Emotion	Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. Scale	r = -0.77	Excellent
Emotion	SF-36 Mental Health	r = 0.74	Excellent
ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations.	SF-36 Emotional Role	r = 0.28	Poor
	SF-36 Physical Role	r = 0.45	Adequate
	SF-36 Social Functioning	r = 0.70	Excellent
Physical	Barthel Index	r = 0.76	Excellent
	FIM Motor	r = 0.79	Excellent
	SF-36 Physical Functioning	r = 0.75	Excellent
	Lawton IADL Scale	r = 0.73	Excellent

Duncan et al. (2002a) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the SIS version 3.0 and SIS-16 using Pearson correlations. The SIS was correlated with the Mini-Mental State Examination (MMSE), Barthel Index, Lawton IADL Scale and the Motricity Index. The SIS ADL/IADL domain showed an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the Barthel Index (r=0.72) and with the Lawton IADL Scale (r=0.77). The SIS Mobility domain showed an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the Barthel Index (r=0.69). The SIS Strength domain showed an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the Motricity Index (r=0.67). The SIS Memory domain showed an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the MMSE (r=0.42).

Lai et al. (2003) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the SIS-16 and SIS Social ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domain by comparison with the SF-36 Physical Functioning and Social Functioning subscales, Barthel Index and Lawson IADL Scale, using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Measures were administered to 278 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. at 3 months post-stroke. There was an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between SIS-16 and SF-36 Physical Functioning (r=0.79), and an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between SIS Social ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. and SF-36 Social Functioning (r=0.65). There was an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between SIS-16 and the Barthel Index at 3 months post-stroke (r=0.75), and an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between SIS Social ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. and Lawton IADL Scale at 3 months post-stroke (r=0.47).

Lin et al. (2010a) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the SIS version 3.0 by comparison with the Fugl-Meyer Assessment (FMA), Motor Activity Log – Amount of Use and – Quality of Movement (MAL-AOU, MAL-QOM), Functional Independence Measure (FIM), Frenchay ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Index (FAI) and Nottingham Extended ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living Scale (NEADL). Concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
was measured using Spearman correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients prior to and on completion of a 3-week intervention period. SIS Hand Function showed an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with MAL-QOM at pre-treatment and post-treatment (r=0.65, 0.68, respectively, p<0.01), and adequate correlations with all other measures (FMA, MAL-AOU, FIM, FAI, NEADL). SIS ADL/IADL showed an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the FIM at pre-treatment and post-treatment (r=0.69, 0.75, respectively, p<0.01). Correlations between SIS ADL/IADL and the NEADL were adequate at pre-treatment (r=0.54, p<0.01) and excellent at post-treatment (r=0.62, p<0.01). Correlations between the SIS ADL-IADL and all other measures (FMA, MAL-AOU, MAL-QOM, FAI) were adequate at pre-treatment and post-treatment. Other SIS domains demonstrated poor to adequate correlations with comparison measures.

Ward et al. (2011) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the SIS-16 by comparison with the Stroke Rehabilitation Assessment of Movement (STREAM), using Spearman correlations. Measures were administered to 30 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. on admission to and discharge from an acute rehabilitation setting. Correlations between the SIS-16 and STREAM total and subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
scores were adequate to excellent on admission (STREAM total r=0.7073; STREAM subtests r=0.5992 to 0.6451, p<0.0005) and discharge (STREAM total r=0.7153; STREAM subtests r=0.5499 to 0.7985, p<0.0002).

Richardson et al. (2016) examined concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the SIS 3.0 by comparison with the 5-level EuroQol 5D (EQ-5D-5L), using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Measures were administered to patients with subacute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. on admission to the study and at 6-month and 12-month follow-up (n=164, 108, 37, respectively). At admission correlations with the EQ-5D-5L were excellent for the ADL (r=0.663) and Hand function (r=0.618) domains and Physical composite score (r=0.71); correlations with other domains were adequate (r=0.318 to 0.588), except for the Communication domain (r=0.228). At 6-month follow-up correlations with the EQ-5D-5L were excellent for the Strength (r=0.628), ADL (r=0.684), Mobility (r=0.765), Hand function (r=0.668), ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. (r=0.740) and Recovery domains (r=0.601) and Physical composite score (r=0.772); correlations with other domains were adequate (r=0.402 to 0.562). At 12-month follow-up correlations with the EQ-5D-5L were excellent for the Strength (r=0.604), ADL (r=0.760), Mobility (r=0.683) and ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. (r=0.738) domains and the Physical composite score (r=756); correlations with other domains were adequate (r=0.364 to 0.592).

Predictive:
Duncan et al. (1999) examined which domain scores of the SIS version 2.0 could most accurately predict a patient’s own assessment of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. recovery, using multiple regression analysis. The SIS domains of Physical function, Emotion, and ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. were found to be statistically significant predictors of the patient’s assessment of recovery. Forty-five percent of the variance in the patient’s assessment of percentage of recovery was explained by these factors.

Fulk, Reynolds, Mondal & Deutsch (2010) examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the 6MWT and other widely used clinical measures (FMA-LE, self-selected gait-speed, SIS and BBS) in 19 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The SIS was found to be a poor predictor of mean steps per day (r=0.18, p=0.471). Although gaitThe pattern of walking, which is often characterized by elements of progression, efficiency, stability and safety.
speed and balance were related to walking activity, only the 6MWT was found to be a predictor of community ambulation in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Huang et al. (2010) examined change in quality of life after distributed constraint-induced movement therapy (CIMT) in a sample of 58 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using CHAID analysis. Predictors of change included age, gender, side of lesion, time since strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., cognitive status (measured by the MMSE), upper extremity motor impairmentLoss of strength and coordination, decrease in arm or leg movement
(measured by the FMA-UE) and independence in activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living (measured by the FIM). Initial FIM scores were the strongest predictor of overall SIS score (p=0.006) and ADL/IADL domain score (p=0.004) at post-treatment. Participants with FIM scores ≤ 109 showed significantly greater improvement in overall SIS scores than participants with FIM scores > 109. There were no significant associations between other SIS domains and other predictors.

Lin et al. (2010a) examined predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the SIS version 3.0 by comparing pre-treatment SIS scores with post-treatment scores of the Fugl-Meyer Assessment (FMA), Motor Activity Log – Amount of Use and – Quality of Movement (MAL-AOU, MAL-QOM), Functional Independence Measure (FIM), Frenchay ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Index (FAI) and Nottingham Extended ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living Scale (NEADL). Predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
was measured using Spearman correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients prior to and on completion of a 3-week intervention period. The SIS Hand Function showed excellent correlations with MAL-AOU (r=0.61, p<0.01) and MAL-QOM (r=0.66, p<0.01), and adequate correlations with all other measures (FMA, FIM, FAI, NEADL). The SIS ADL/IADL showed an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the FIM (r=0.70, p<0.01), and adequate correlations with all other measures (FMA, MAL-AOU, MAL-QOM, FAI, NEADL). Other SIS domains demonstrated poor to adequate correlations with comparison measures.

Ward et al. (2011) examined predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the SIS-16 and other clinical measures (STREAM, FIM) in a sample of 30 patients in an acute rehabilitation setting, using Spearman rho coefficients and Wilcoxon rank-sum tests. Results indicated an adequate correlation between SIS-16 admission scores and predicted length of stay (rho=-0.6743, p<0.001) and an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between SIS-16 admission scores and actual length of stay (rho=-0.7953, p<0.001). There was an significant correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with discharge destination (p<0.05).

Lee et al. (2016) developed a computational method to predict quality of life after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. rehabilitation, using Particle Swarm-Optimized Support Vector Machine (PSO-SVM) classifier. A sample of 130 patients with subacute/chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. received occupational therapy for 1.5-2 hours/day, 5 days/week for 3-4 weeks. Predictors of outcome included 5 personal parameters (age, gender, time since strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. onset, education, MMSE score) and 9 early functional outcomes (Fugl-Meyer Assessment, Wolf Motor Function Test, Action Research Arm Test, Functional Independence Measure, Motor Activity Log – Amount of Use (MAL-AOU) and – Quality of Movement (MAL-QOM), ABILHAND, physical function, SIS). The combination of early outcomes of MAL-QOM and SIS showed highest accuracy (70%) and highest cross-validated accuracy (81.43%) in predicting final QOL among patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. SIS alone showed high accuracy (60%) and cross-validated accuracy (81.43%).

Construct:

Duncan et al. (2003b) performed a Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
on version 2.0 of the SIS. For measures that have been developed using a conceptual hierarchy of items, the theoretical ordering can be compared with the empirical ordering produced by the Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
as evidence of the construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the measure. In this study, the expectation regarding the theoretical ordering of task difficulty was consistent with the empirical ordering of the items by difficulty for each domain, providing evidence for the construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the SIS.

Convergent/Discriminant:
Edwards and O’Connell (2003) examined discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess.
of the SIS version 2.0 and SIS-16 in a sample of 74 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., by comparison with the World Health Organization Quality of Life Bref-Scale (WHOQOL-BREF) and Zung’s Self-Rating DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale (ZSRDS). There were adequate to excellent correlations between the SIS-16 and the WHOQOL-BREF Physical domain (r=0.40 to 0.63); correlations with the WHOQOL-BREF Social relationships domain were poor (r=0.13 to 0.18). There were adequate to excellent correlations between the SIS ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domain and all WHOQOL-BREF domains (r=0.45 to 0.69). The correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the SIS ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domain and the WHOQOL-BREF Physical domain was excellent (r=0.69). The SIS ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domain demonstrated an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the ZSRDS (r=-0.56). There were adequate correlations between the SIS Memory and Emotion domains and the WHOQOL-BREF Psychological domain (r=0.49, 0.70, respectively) and between the SIS Memory and Emotion domains and the ZSRDS (r=-0.38, -0.62, respectively). There was a poor correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the SIS Physical domain and the WHOQOL-BREF Environment scores (r=0.15). Neither the ZSRDS nor the WHOQOL-BREF assess communication, accordingly both measures demonstrated poor correlations with the SIS Communication domain (ZSRDS: r=-0.28; WHOQOL-BREF: r=0.11 to 0.28).
Note: Some correlations are negative because a high score on the SIS indicates normal performance whereas a high score on other measures indicates impairment.

Jenkinson et al. (2013) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the SIS version 3.0 and the SF-SIS in a sample of individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=73, 151, respectively) by comparison with the EuroQoL EQ-5D, using Spearmans correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. The SIS and SF-SIS demonstrated identical excellent correlations with the EQ-5D (r=0.83)

MacIsaac et al. (2016) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the SIS 3.0 and the SF-SIS in a sample of 5549 patients in an acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. setting and 332 patients in a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. rehabilitation setting, using Spearman’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient. Convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
was measured by comparison with the SIS-VAS, patient-reported outcome measures the EuroQoL EQ-5D and EQ-5D-VAS, and functional measures the Barthel Index (BI), modified Rankin Score (mRS), and the National Institutes of Health Stroke Scale (NIHSS). Within acute data, the SIS and SF-SIS demonstrated significant excellent correlations with the mRS (p=-0.87, -0.80, respectively), the BI (p=0.89, 0.80), the NIHSS (p=-0.77, -0.73), the EQ-5D (p=0.88, 0.82) and the EQ-VAS (p=0.73, 0.72). Within rehabilitation data, the SIS and SF-SIS demonstrated excellent correlations with the BI (p=0.72, 0.65, respectively) and the EQ5D (p=0.69, 0.69), and moderate correlations with the SIS-VAS (p=0.56, 0.57) and the EQ-VAS (p=0.46, 0.40). Correlations between the SIS and SF-SIS were excellent in the acute data (p=0.94) and rehabilitation data (p=0.96).

Kwon et al. (2006) examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the SIS 3.0 by telephone administration in a sample of 95 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., using Pearson coefficients. Convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
was measured by comparison with the Functional Independence Measure (FIM) – Motor component (FIM-M) and – Cognitive component (FIM-C), with the Medical Outcomes Study Short Form 36 for veterans (SF-36V). Patients were administered the SIS at 12 weeks post-stroke and the FIM and SF-36 at 16 weeks post-stroke. The SIS 3.0 telephone survey showed adequate to excellent correlations with the FIM (r=0.404 to 0.858, p<0.001) and SF-36V (r=0.362 to 0.768, p<0.001).

Known groups:
Duncan et al. (1999) found that all domains of the SIS version 2.0, with the exception of the Memory/thinking and Emotion domains, were able to discriminate between patients across 4 Rankin levels of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity (p<0.0001, except for the Communication domain, p=0.02). These results suggest that scores from most domains of the SIS can differentiate between patients based on strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity.

Lai et al. (2003) administered the SIS and SF-36 to 278 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. 90 days after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The SIS-16 was able to discriminate among the Modified Rankin Scale (MRS) levels of 0 to 1, 2, 3, and 4. The SIS ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domain was also able to discriminate across the MRS levels of 0 to 1, 2, and 3 to 4. These results suggest that the SIS can discriminate between patients of varying degrees of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity.

Kwon et al. (2006) administered the SIS 3.0 by telephone administration to a sample of 95 patients at 12 weeks post-stroke. The MRS was administered to patients at hospital discharge. SIS 3.0 scores were reported by domains: SIS-16, SIS-Physical and SIS-ADL; all domains showed score discrimination and distribution for different degrees of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity: MRS 0/1 vs. MRS 4/5; MRS 2 vs. MRS 4/5; and MRS 3 vs. MRS 4/5.

Sensitivity and Specificity:

Beninato, Portney & Sullivan (2009) examined sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of the SIS-16 relative to a history of multiple falls in a sample of 27 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants reported a history of no falls or one fall (n=18) vs. multiple falls (n=9), according to Tinetti’s definition of falls. SIS-16 cut-off scores of 61.7 yielded 78% sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
and 89% specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
. Area under the ROC curve was adequate (0.86). Likelihood ratios were used to calculate post-test probability of a history of falls, and results showed high positive (LR+ = 7.0) and low negative (LR- = 0.25) likelihood ratios. Results indicate that the SIS-16 demonstrated good overall accuracy in detecting individuals with a history of multiple falls.

Responsiveness

Duncan et al. (1999) examined responsivenessThe ability of an instrument to detect clinically important change over time.
of the SIS version 2.0. Significant change was observed in patients’ recovery in the expected direction between assessments at 1 and 3 months, and at 1 and 6 months post-stroke, however sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
to change was affected by strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity and time of post-stroke assessment. All domains of the SIS showed statistically significant change from 1 to 3 months and 1 to 6 months post-stroke, but this was not observed between 3 and 6 months post-stroke for the domains of Hand function, Mobility, ADL/IADL, combined physical, and ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. among patients recovering from minor strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. For patients with moderate strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., statistically significant change was observed at both 1 to 3 months and 1 to 6 months post-stroke in all domains, and from 3 to 6 months for the domains of Mobility, ADL/IADL, combined physical, and ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations..

Lin et al. (2010a) examined responsivenessThe ability of an instrument to detect clinically important change over time.
of the SIS version 3.0 in a sample of 74 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were randomly assigned to receive constraint-induced movement therapy (CIMT), bilateral arm training (BAT) or conventional rehabilitation over a 3-week intervention period. ResponsivenessThe ability of an instrument to detect clinically important change over time.
was measured according to change from pre- to post-treatment, using Wilcoxon signed rank test and Standardised Response Mean (SRM). Most SIS domains showed small responsivenessThe ability of an instrument to detect clinically important change over time.
(SRM = 0.22-0.33, Wilcoxon Z = 1.78-2.72). Medium responsivenessThe ability of an instrument to detect clinically important change over time.
was seen for Hand Function (SRM = 0.52, Wilcoxon Z = 4.24, P<0.05), StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Recovery (SRM = 0.57, Wilcoxon Z = 4.56, P<0.05) and SIS total score (SRM=0.50, Wilcoxon Z = 3.89, P<0.05).

Lin et al. (2010b) evaluated the clinically important difference (CID)Clinically Important Difference (CID) is the smallest change in a measure's score that is perceived significant by a patient or healthcare professional. within four physical domains of the SIS 3.0 (strength, ADL/IADL, mobility, hand function) in a sample of 74 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were randomly assigned to receive CIMT, BAT or conventional rehabilitation over a 3-week intervention period. The following change scores were found to indicate a true and reliable improvement (MDC): Strength subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
= 24.0; ADL/IADL subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
= 17.3; Mobility subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
= 15.1; and Hand Function subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
= 25.9. The following mean change scores were considered to represent a CID: Strength subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
= 9.2; ADL/IADL subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
= 5.9; Mobility subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
= 4.5; and Hand Function subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
= 17.8. CID values were determined by the effect-size index and from comparison with a global rating of change (defined by a score of 10-15% in patients’ perceived overall recovery from pre- to post-treatment).
Note: Lin et al. (2010b) note that CID estimates may have been influenced by the age of participants and baseline degree of severity. Younger patients needed greater change scores from pre- to post-treatment to have a clinically important improvement compared to older patients. Those with higher baseline severity of symptoms showed greater MDC values therefore must show more change from pre- to post-treatment in order to demonstrate significant improvements. Also, the results may be limited to stroke patients who demonstrate improvement after rehabilitation therapies, Brunnstromm stage III and sufficient cognitive ability. Therefore, a larger sample size is recommended for future validation of these findings.

Ward et al. (2011) examined responsivenessThe ability of an instrument to detect clinically important change over time.
of the SIS-16 and other clinical measures (STREAM, FIM) in a sample of 30 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Change scores were evaluated using Wilcoxon signed rank test and responsivenessThe ability of an instrument to detect clinically important change over time.
to change was assessed using standardized response means (SRM). Measures were taken on admission to and discharge from an acute rehabilitation setting (average length of stay 23.3 days, range 7-53 days). SIS-16 change scores indicated statistically significant improvement from admission to discharge (23.1, p<0.0001) and sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
to change was large (SRM=1.65).

Guidetti et al. (2014) examined responsivenessThe ability of an instrument to detect clinically important change over time.
of the SIS 3.0 in a sample of 204 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. who were assessed at 3 and 12 months post-stroke, using Wilcoxon’s matched pairs test. Clinically meaningful change within a domain was defined as a change of 10-15 points between timepoints. The ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. and Recovery domains were the most responsive domains over the first year post-stroke, with 27.5% and 29.4% of participants (respectively) reporting a clinically meaningful positive change, and 20% and 10.3% of participants (respectively) reporting a clinically meaningful negative change, from 3 to 12 months post-stroke. The Strength and Hand function domains also showed high clinically meaningful positive change (23%, 18.0% respectively) and negative change (14.7%, 14.2% respectively) from 3 to 12 months post-stroke. There were significant changes in scores on the Strength (p=0.045), Emotion (p=0.001) and Recovery (p<0.001) domains from 3 to 12 months post-stroke. The Strength, Hand function and ParticipationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. domains had the highest perceived impact (i.e. lowest mean scores) at 3 months and 12 months.

References

Beninato, M., Portney, L.G., & Sullivan, P.E. (2009). Using the International Classification of Functioning, Disability and Health as a framework to examine the association between falls and clinical assessment tools in people with stroke. Physical Therapy, 89(8), 816-25.
Brandao, A.D., Teixeira, N.B., Brandao, M.C., Vidotto, M.C., Jardim, J.R., & Gazzotti, M.R. (2018). Translation and cultural adaptation of the Stroke Impact Scale 2.0 (SIS): a quality-of-life scale for stroke. Sao Paulo Medical Journal, 136(2), 144-9. doi: 10.1590/1516-3180.2017.0114281017
Brott, T.G., Adams, H.P., Olinger, C.P., Marler, J.R., Barsan, W.G., Biller, J., Spilker, J., Holleran, R., Eberle, R., Hertzberg, V., Rorick, M., Moomaw, C.J., & Walker, M. (1989). Measurements of acute cerebral infarction: A clinical examination scale. Stroke, 20, 864-70.
Cael, S., Decavel, P., Binquet, C., Benaim, C., Puyraveau, M., Chotard, M., Moulin, T., Parrette, B., Bejot, Y., & Mercier, M. (2015). Stroke Impact Scale version 2: validation of the French version. Physical Therapy, 95(5), 778-90.
Carod-Artal, F.J., Coral, L.F., Trizotto, D.S., Moreira, C.M. (2008). The Stroke Impact Scale 3.0: evaluation of acceptability, reliability, and validity of the Brazilian version. Stroke, 39, 2477-84.
Choi, S.U., Lee, H.S., Shin, J.H., Ho, S.H., Koo, M.J., Park, K.H., Yoon, J.A., Kim, D.M., Oh, J.E., Yu, S.H., & Kim, D.A. (2017). Stroke Impact Scale 3.0: reliability and validity evaluation of the Korean version. Annals of Rehabilitation Medicine, 41(3), 387-93.
Collin, C. & Wade, D. (1990). Assessing motor impairment after stroke: a pilot reliability study. Journal of Neurology, Neurosurgery, and Psychiatry, 53, 576-9.
Duncan, P. W., Bode, R. K., Lai, S. M., & Perera, S. (2003b). Rasch analysis of a new stroke-specific outcome scale: The Stroke Impact Scale. Archives of Physical Medicine and Rehabilitation, 84, 950-63.
Duncan, P. W., Lai, S. M., Tyler, D., Perera, S., Reker, D. M., & Studenski, S. (2002a). Evaluation of Proxy Responses to the Stroke Impact Scale. Stroke, 33, 2593-9.
Duncan, P.W., Reker, D.M., Horner, R.D., Samsa, G.P., Hoenig, H., LaClair, B.J., & Dudley, T.K. (2002b). Performance of a mail-administered version of a stroke-specific outcome measure: The Stroke Impact Scale. Clinical Rehabilitation, 16(5), 493-505.
Duncan, P.W., Wallace, D., Lai, S.M., Johnson, D., Embretson, S., & Laster, L.J. (1999). The Stroke Impact Scale version 2.0: Evaluation of reliability, validity, and sensitivity to change. Stroke, 30, 2131-40.
Duncan, P.W., Wallace, D., Studenski, S., Lai, S.M., & Johnson, D. (2001). Conceptualization of a new stroke-specific outcome measure: The Stroke Impact Scale. Topics in Stroke Rehabilitation, 8(2), 19-33.
Duncan, P.W., Lai, S.M., Bode, R.K., Perea, S., DeRosa, J.T., GAIN Americas Investigators. (2003a). Stroke Impact Scale-16: A brief assessment of physical function. Neurology, 60, 291-6.
Edwards, B. & O’Connell, B. (2003). Internal consistency and validity of the Stroke Impact Scale 2.0 (SIS 2.0) and SIS-16 in an Australian sample. Quality of Life Research, 12, 1127-35.
Finch, E., Brooks, D., Stratford, P.W., & Mayo, N.E. (2002). Physical Rehabilitations Outcome Measures. A Guide to Enhanced Clinical Decision-Making (2nd ed.), Canadian Physiotherapy Association, Toronto.
Folstein, M.F., Folstein, S.E., & McHugh, P.R. (1975). “Mini-mental state”. A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12(3), 189-98.
Fugl-Meyer, A.R., Jaasko, L., Leyman, I., Olsson, S., & Steglind, S. (1975). The post-stroke hemiplegic patient: a method for evaluation of physical performance. Scandinavian Journal of Rehabilitation Medicine, 7, 13-31.
Fulk, G.D., Reynolds, C., Mondal, S., & Deutsch, J.E. (2010). Predicting home and community walking activity in people with stroke. Archives of Physical Medicine and Rehabilitation, 91, 1582-6.
Geyh, S., Cieza, A., & Stucki, G. (2009). Evaluation of the German translation of the Stroke Impact Scale using Rasch analysis. The Clinical Neuropsychologist, 23(6), 978-95.
Goncalves, R.S., Gil, J.N., Cavalheiro, L.M., Costa, R.D., & Ferreira, P.L. (2012). Reliability and validity of the Portuguese version of the Stroke Impact Scale 2.0 (SIS 2.0). Quality of Life Research, 21(4), 691-6.
Guidetti, S., Ytterberg, C., Ekstam, L., Johansson, U., & Eriksson, G. (2014). Changes in the impact of stroke between 3 and 12 months post-stroke, assessed with the Stroke Impact Scale. Journal of Rehabilitative Medicine, 46, 963-8.
Hamilton, B.B., Granger, C.V., & Sherwin, F.S. (1987). A uniform national data system for medical rehabilitation. In: Fuhrer, M. J., ed. Rehabilitation Outcome: Analysis and Measurement. Baltimore, Md: Paul Brookes, 137-47.
Hamza, A.M., Nabilla, A.S., & Loh, S.Y. (2012). Evaluation of quality of life among stroke survivors: linguistic validation of the Stroke Impact Scale (SIS) 3.0 in Hausa language. Journal of Nigeria Soc Physiotherapy, 20, 52-9.
Hamza, A.M., Nabilla, A.-S., Yim, L.S., & Chinna, K. (2014). Reliability and validity of the Nigerian (Hausa) version of the Stroke Impact Scale (SIS) 3.0 index. BioMed Research International, 14, Article ID 302097, 7 pages. doi: 10.1155/2014/302097
Hogue, C., Studenski, S., Duncan, P.W. (1990). Assessing mobility: The first steps in preventing fall. In: Funk, SG., Tornquist, EM., Champagne, M.T., Copp, L.A., & Wiese, R.A., eds. Key Aspects of Recovery. New York, NY: Springer, 275-81.
Hsieh, F.-H., Lee, J.-D., Chang, T.-C., Yang, S.-T., Huang, C.-H., & Wu, C.-Y. (2016). Prediction of quality of life after stroke rehabilitation. Neuropsychiatry, 6(6), 369-75.
Huang, Y-h., Wu, C-y., Hsieh, Y-w., & Lin, K-c. (2010). Predictors of change in quality of life after distributed constraint-induced therapy in patients with chronic stroke. Neurorehabilitation and Neural Repair, 24(6), 559-66. doi: 10.1177/1545968309358074
Jenkinson, C., Fitzpatrick, R., Crocker, H., & Peters, M. (2013). The Stroke Impact Scale: validation in a UK setting and development of a SIS short form and SIS index. Stroke, 44, 2532-5.
Kamwesiga, J.T., von Koch, L., Kottorp, A., & Guidetti, S. (2009). Cultural adaptation and validation of Stroke Impact Scale 3.0 version in Uganda: a small-scale study. SAGE Open Medicine, 4: 2050312116671859. doi: 10.1177/2050312116671859
Kwon, S., Duncan, P., Studenski, S., Perera, S., Lai, S.M., & Reker, D. (2006). Measuring stroke impact with SIS: Construct validity of SIS telephone administration. Quality of Life Research, 15, 367-76.
Lai, S.M., Perera, S., Duncan, P.W., & Bode, R. (2003). Physical and Social Functioning After Stroke: Comparison of the Stroke Impact Scale and Short Form-36. Stroke, 34, 488-93.
Lawton, M. & Brody, E. (1969). Assessment of older people: self-maintaining and instrumental activities of daily living. Gerontologist, 9, 179 -86.
Lee, H.-J. & Song, J.-M. (2015). The Korean language version of Stroke Impact Scale 3.0: cross-cultural adaptation and translation. Journal of the Korean Society of Physical Medicine, 10(3), 47-55.
Lin, K.C., Fu, T., Wu, C.Y., Hsieh, Y.W., Chen, C.L., & Lee, P.C. (2010a). Psychometric comparisons of the Stroke Impact Scale 3.0 and Stroke-Specific Quality of Life Scale. Quality of Life Research, 19(3), 435-43. doi: 10.1007/s11136-010-9597-5.
Lin K.-C., Fu T., Wu C.Y., Wang Y.-H., Wang Y-.H., Liu J.-S., Hsieh C.-J., & Lin S.-F. (2010b). Minimal detectable change and clinically important difference of the Stroke Impact Scale in stroke patients. Neurorehabilitation and Neural Repair, 24, 486-92.
MacIsaac, R., Ali, M., Peters, M., English, C., Rodgers, H., Jenkinson, C., Lees, K.R., Quinn, T.J., VISTA Collaboration. (2016). Derivation and validation of a modified short form of the Stroke Impact Scale. Journal of the American Heart Association, 5:e003108. doi: 10/1161/JAHA.115003108.
Mahoney, F.I. & Barthel, D.W. (1965). Functional evaluation: The Barthel Index. Maryland State Medical Journal, 14, 61-5.
Mulder, M. & Nijland, R. (2016). Stroke Impact Scale. Journal of Physiotherapy, 62, 117.
Ochi, M., Ohashi, H., Hachisuka, K., & Saeki, S. (2017). The reliability and validity of the Japanese version of the Stroke Impact Scale version 3.0. Journal of UOEH, 39(3), 215-21. doi: 10.7888/juoeh.39.215
Richardson, M., Campbell, N., Allen, L., Meyer, M., & Teasell, R. (2016). The stroke impact scale: performance as a quality of life measure in a community-based stroke rehabilitation setting. Disability and Rehabilitation, 38(14), 1425-30. doi: 10.310/09638288.2015.1102337
Sullivan, J. (2014). Measurement characteristics and clinical utility of the Stroke Impact Scale. Archives of Physical Medicine and Rehabilitation, 95, 1799-1800.
Vellone, E., Savini, S., Barbato, N., Carovillano, G., Caramia, M., & Alvaro, R. (2010). Quality of life in stroke survivors: first results from the reliability and validity of the Italian version of the Stroke Impact Scale 3.0. Annali di Igiene, 22, 469-79.
Vellone, E., Savini, S., Fida, R., Dickson, V.V., Melkus, G.D., Carod-Artal, F.J., Rocco, G., & Alvaro, R. (2015). Psychometric evaluation of the Stroke Impact Scale 3.0. Journal of Cardiovascular Nursing, 30(3), 229-41. doi: 10.1097/JCN.0000000000000145
Ward, I., Pivko, S., Brooks, G., & Parkin, K. (2011). Validity of the Stroke Rehabilitation Assessment of Movement Scale in acute rehabilitation: a comparison with the Functional Independence Measure and Stroke Impact Scale-16. Physical Medicine and Rehabilitation, 3(11), 1013-21. doi: 10.1016/j.pmrj.2011.08.537
Ware, J.E. Jr., & Sherbourne, C.D. (1992). The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Medical Care, 30, 473-83.
Yesavage, J.A., Brink, T., Rose, T.L., Lum, O., Huang, V., Adey, M., & Leirer, V.O. (1983). Development and validation of a geriatric depression screening scale: A preliminary report. Journal of Psychiatric Research, 17, 37-49.

See the measure

How to obtain the SIS?

Please click here to see a copy of the SIS.

This instrument was developed by:

Pamela Duncan, PhD, PT
Dennis Wallace, PhD
Sue Min Lai, PhD, MS, MBA
Stephanie Studenski, MD, MPH
DallasJohnson, PhD, and
Susan Embretson, PhD.

In order to gain permission to use the SIS and its translations, please contact MAPI Research Trust: contact@mapi-trust.org

Evidence Reviewed as of before: 18-02-2019

Author(s)*: Annabel McDermott, OT

Expert Reviewer: Trixie Reichardt, MHSc, RD, Rosemary Martino, PhD

Content consistency: Gabriel Plumier

Purpose

The Toronto Bedside Swallowing Screening Test (TOR-BSST©) is a screening tool which identifies patients at risk for dysphagia following stroke.

In-Depth Review

Purpose of the measure

The Toronto Bedside Swallowing ScreeningTesting for disease in people without symptoms.
Test (TOR-BSST©) is a screening tool administered at the bedside by trained screeners which identifies patients at risk for dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
following stroke.

Available versions

Features of the measure

Items:

Baseline vocal quality
Tongue movement
50mL water test
Cup sip
Final judgment of vocal quality

Scoring:

The TOR-BSST© uses binary scoring (i.e. abnormal/normal) for each item. Failure on any item discontinues the screen and prompts referral to a Speech-Language Pathologist dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
expert.

What to consider before beginning:

The TOR-BSST© should only be used with patients who are alert, able to sit upright at 90 degrees, and are able to follow simple instructions. Patients who do not meet these guidelines should not be screened but, instead, be referred to a Speech-Language Pathologist for assessment.

International best practice guidelines advise that, following stroke, patients should undergo screeningTesting for disease in people without symptoms.
for swallowing difficulties before oral intake of food, fluids or oral medication. ScreeningTesting for disease in people without symptoms.
should be performed by specially trained personnel, using a validated screening tool. Swallowing should be screened as soon as possible after admission provided that the patient is able to participate. Patients who fail the swallowing screening should be referred to a Speech-Language Pathologist for comprehensive swallowing assessment. For patients who are confirmed at high risk of aspiration and/or dysphagia should undergo an instrumental assessment such as videofluoroscopy swallowing study (VFS) and/or fibreoptic evaluation of swallowing (FEES).

Time:

The TOR-BSST© takes less than 10 minutes to administer and score. Administration ceases immediately on failure of an item.

Training requirements:

The TOR-BSST© can be administered by health professionals who have undergone the requisite 4-hour didactic standardized training program. Didactic training is followed by individual training/competency observations. Training is provided by Speech-Language Pathologists who have completed the “TOR-BSST© Training for the SLP DysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
Expert” trainers course.

See The Swallowing Lab (https://swallowinglab.com/tor-bsst/) for details.

Equipment:

Client suitability

Can be used with:

The TOR-BSST© is suitable for use with individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. across the recovery continuum (Martino et al., 2009).
The TOR-BSST© is being validated for use with critically ill patients who have undergone prolonged intubation and may be at risk of swallowing problems.

Should not be used in:

Following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., patients should be assessed and managed according to best practice guidelines for dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
. The TOR-BSST© should not be used with individuals with decreased alertness or cognition, or those who are being tube-fed. Patients who are being tube-fed have already been identified to have dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
and therefore should be referred to a Speech-Language Pathologist for a comprehensive assessment and management.

In what languages is the screening tool available?

English
French
Chinese
German
Italian
Portuguese (Brazil)

Summary

What does the tool measure?	Risk for dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration. following stroke.
What types of clients can the tool be used for?	The TOR-BSST© was developed for patients with stroke across the recovery continuum.
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Screening tool
Time to administer	Ten minutes.
Versions	There is one version of the TOR-BSST©.
Languages	Chinese, English, French, German, Italian, Portuguese (Brazil)
Measurement Properties
Reliability	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: No studies have reported on the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the TOR-BSST©. Test-retest: No studies have reported on the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the TOR-BSST©. Intra-rater: No studies have reported on the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the TOR-BSST©. Inter-rater: Two studies have reported excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the TOR-BSST©.
ValidityThe degree to which an assessment measures what it is supposed to measure.	Content: Development of the TOR-BSST© involved item generation from systematic reviewA systematic review is a summary of available research on a given topic that compares studies based on design and methods. It summarizes the findings of each, and points out flaws or potentially confounding variables that may have been overlooked. A critical analysis of each study is done in an effort to rate the value of its stated conclusions. The research findings are then summarized, and a conclusion is provided. and subsequent item reduction, in combination with consultation with expert Speech-Language Pathologists. Criterion: Concurrent: No studies have reported on the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the TOR-BSST©. Predictive: One study has conducted a randomized controlled diagnostic study of the TOR-BSST© by comparison with videofluoroscopy. Construct: Convergent/Discriminant: No studies have reported on the convergent or discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess. of the TOR-BSST©. Known Groups: No studies have reported on the known group validityThe degree to which an assessment measures what it is supposed to measure. of the TOR-BSST©.
Floor/Ceiling Effects	Not applicable
Does the tool detect change in patients?	The TOR-BSST© is designed as a screening test and scored using binary responses, so is not intended to detect change.
Acceptability	– The TOR-BSST© is quick to administer. – The TOR-BSST© requires specialised training.
Feasibility	The TOR-BSST© is suitable for administration across acute and rehabilitation settings. The screeningTesting for disease in people without symptoms. is easily portable and is quick to administer, score and interpret.
How to obtain the tool?	Click here for information regarding the TOR-BSST©.

Psychometric Properties

Overview

The TOR-BSST© was developed and validated by Dr. Martino of The Swallowing Lab, University Health Network, University of Toronto.

A literature search was conducted to identify all relevant publications on the psychometric properties of the TOR-BSST©. Four studies were identified.

Floor/Ceiling Effects

The TOR-BSST© is a 5-item screening test to determine risk of dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
. The screening should be discontinued as soon as an individual fails an item.

Reliability

Inter-rater:
Martino et al. (2009) established inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the TOR-BSST© in the first 50 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. enrolled, using intraclass correlation coefficient (ICC)Intraclass correlation (ICC) is used to measure inter-rater reliability for two or more raters. It may also be used to assess test-retest reliability. ICC may be conceptualized as the ratio of between-groups variance to total variance. and 95% confidence intervals (CI). Results indicated excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
(ICC=0.92; CI, 0.85 to 0.96).

Martino et al. (2006) examined 24-hour inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the TOR-BSST© item and total screen scores in a sample of 286 patients with stroke (acute, n=78; subacute/chronic, n=208), using kappa statistics. Results indicated moderate reliability for the total score, with a higher reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
early after training (k = 0.90). Item reliability ranged from poor to adequate; the item ‘water swallowing’ including both the 50-ml and sip achieved the highest item reliability (k=0.82; CI, 0.66-0.98).

Validity

Content:

Initial item generation for the TOR-BSST© resulted from systematic reviewA systematic review is a summary of available research on a given topic that compares studies based on design and methods. It summarizes the findings of each, and points out flaws or potentially confounding variables that may have been overlooked. A critical analysis of each study is done in an effort to rate the value of its stated conclusions. The research findings are then summarized, and a conclusion is provided.
of the accuracy and benefit of non-invasive bedside dysphagia screening tests with patients with stroke (see Martino, Pron & Diamant, 2000). Two measures were shown to be accurate predictors of dysphagia by videofluroscopic assessment (VFS) of aspiration, and a further two were considered to show promising (although inconsistent) predictive ability:

Dysphonia/coughing during the 50mL Kidd water swallow test
Impaired pharyngeal sensation
Impaired tongue movement
General dysphonia – voice before or voice after water intake

The final measure, general dysphonia, was defined as two sub-items (voice before, voice after).

Item reduction was then performed, whereby positive results across the 5 items were compared with the total score. The item ‘water swallow’ contributed 25% to the total positive score, indicating that this item was the most frequent single item to identify dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
. The item ‘tongue movements’ contributed 8% to the total positive score. The remaining items contributed less than 5% each to the total positive score, and so were considered for elimination on review of practical application as determined by expert Speech-Language Pathologists. These expert clinicians considered the item ‘pharyngeal sensation’ to be impractical due to difficulty differentiating from a gag reflex in the clinical setting.

Martino et al. (2014) conducted item descriptive analysis in the original sample of 311 patients with stroke from acute and rehabilitation settings. The TOR-BSST© was administered by trained nurses. Items were eliminated individually to evaluate the impact of each item on the total score. Results showed that the ‘water swallow’ item contributed most significantly to identification of dysphagia, identifying 42.7% of patients in the acute setting and 29.0% of patients in the rehabilitation setting.

Criterion:

Predictive:
Martino et al. (2009) examined predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the TOR-BSST© by comparison with gold standardA measurement that is widely accepted as being the best available to measure a construct.
VFS assessment identifying any abnormal swallow physiology including all severity. The randomized controlled diagnostic study design included four blinded Speech-Language Pathologists and 68 patients with stroke in acute and rehabilitation settings. Nine participants with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. were eliminated when the TOR-BSST© and VFS assessments were performed more than 24 hours apart as per a priori criteria for patient flow. VFS assessment was used to confirm findings obtained by TOR-BSST© screening; clinicians rated the VFS images using three standardized scales: (1) Penetration Aspiration Scale; (2) Mann Assessment of Swallowing Ability (MASA) dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
subscore; and (3) MASA aspiration subscore. Across the entire sample of acute and rehab patients, results showed that 61% (n=36) of patients were confirmed by experts to have no dysphagia vs. 39% (n=23) with dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
. These results indicate high accuracy to predict dysphagia using the TOR-BSST©, where dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
is defined by aspiration and/or physiological abnormality on VFS.

Construct:

Known Group:
No studies have reported on the known-group validityThe degree to which an assessment measures what it is supposed to measure.
of the TOR-BSST(c).

Sensitivity & Specificity:

Martino et al. (2009) examined sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of the TOR-BSST© by comparison with VFS assessment, in a sample of 68 patients with stroke in acute and rehabilitation settings. Nine patients were eliminated when the TOR-BSST© and VFS assessments were performed more than 24 hours apart. The TOR-BSST showed 91.3% sensitivity (CI, 71.9 – 98.7) and 66.7% specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
(CI, 49.0 – 81.4) among all patients. SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
was 96.3% and 63.6% (respectively) among patients in an acute setting, and 80.0% and 68.0% (respectively) among patients in rehabilitation settings. The TOR-BSST© showed high negative predictive value of 93.3% and 89.5% in participants in acute and rehabilitation strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. settings, respectively.

Martino et al. (2014) conducted sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
analysis of the TOR-BSST© in the original sample of 311 patients with stroke from acute and rehabilitation settings. The TOR-BSST© was administered by trained nurses using the standard 10 teaspoons plus a sip of water. Positive screeningTesting for disease in people without symptoms.
occurred in 59.2% of patients in the acute setting (n=103) and 38.5% of patients in the rehabilitation setting (n=208).

Martino et al. (2014) further examined sensitivity of the TOR-BSST© when modifying administration according to water volume intake. Using the original sample from Martino et al. (2009), sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
was examined on administration of 1 to 10 teaspoons of water to determine the acceptable cut-point to identify dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
. Among all participants (n=311), sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
ranged from moderate to excellent for 5, 8 and 10 teaspoons of water (79%, 92%, 96% respectively). Among patients in the acute setting and rehabilitation settings, sensitivities were 84% and 75% (respectively) for 5 teaspoons of water, 93% and 92% (respectively) for 8 teaspoons, and 95% and 97% (respectively) for 10 teaspoons. Results indicate greater accuracy on administration of 10x 5mL teaspoons of water, as per the original assessment guidelines

References

Martino, R., Maki, E., & Diamant, N. (2014). Identification of dysphagia using the Toronto Bedside Swallowing Screening Test (TOR-BSST©): are 10 teaspoons of water necessary? International Journal of Speech-Language Pathology, 16(3), 193-8. https://www.ncbi.nlm.nih.gov/pubmed/24833425
Martino, R., Nicholson, G., Bayley, M., Teasell, R., Silver, F., & Diamant, N. (2006). Interrater reliability of the Toronto Bedside Swallowing Screening Test (TOR-BSST©) [Abstract]. Dysphagia, 21(4), 287-334. https://doi.org/10.1007/s00455-006-9044-5
Martino, R., Pron, G., & Diamant, N. (2000). Screening for oropharyngeal dysphagia in stroke: insufficient evidence for guidelines. Dysphagia, 15, 19-30. https://www.ncbi.nlm.nih.gov/pubmed/10594255
Martino, R., Silver, F., Teasell, R., Bayley, M., Nicholson, G., Streiner, D.L., & Diamant, N.E. (2009). The Toronto Bedside Swallowing Screening Test (TOR-BSST): Development and validation of a dysphagia screening tool for patients with stroke. Stroke, 40, 555-61. https://www.ncbi.nlm.nih.gov/pubmed/19074483

See the measure

Other measures of dysphagia:

Instrumental Assessments:

Videofluoroscopy swallowing study (gold standard)
Fiberoptic endoscopic examination of swallowing
Rosenbeck’s Penetration Aspiration Scale

Clinical Bedside Assessments:

The Modified Mann Assessment of Swallowing Ability (Modified MASA)

ScreeningTesting for disease in people without symptoms.
Tools:

Massey Bedside Swallowing Screen Volume-Viscosity Swallowing Test (Clave et al., 2008)
The Gugging Swallowing Screen (GUSS) (Trapl et al., 2007)