Canadian Neurological Scale (CNS)

Evidence Reviewed as of before: 09-01-2012

Author(s)*: Katie Marvin, MSc. PT (Candidate)

Editor(s): Annabel McDermott, OT; Nicol Korner-Bitensky, PhD OT

Expert Reviewer: Dr. Robert Cote, MD

Purpose

The Canadian Neurological Scale (CNS) was developed as a simple tool to be used in the evaluation and monitoring of neurological status of patients with stroke in the acute phase (Cote, Hachinski, Shurvell, Norris & Wolfson, 1986).

In-Depth Review

Purpose of the measure

The Canadian Neurological Scale (CNS) was developed as a simple tool to be used in the evaluation and monitoring"The process of checking the task over time for ‘quality control’ and the adjustment of behavior" (Stuss, 2009, p. 9-10)
of neurological status of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. in the acute phase (Cote, Hachinski, Shurvell, Norris & Wolfson, 1986). The CNS evaluates 10 clinical domains, including mentation (level of conciousness, orientation and speech) and motor function (face, arm and leg).

Features of the measure

Items :

The CNS is comprised of 8-items measuring the level of consciousness, orientation, speech, motor function and facial weakness.

If patient is alert or drowsy: monitor with CNS (sections A1 and A2)
If patient is stuporous or comatose: monitor with Glasgow Coma Scale

MENTATION

Level of Consciousness

Alert 3.0
Spontaneous eye opening, normal level of consciousness
Drowsy 1.5When stimulated verbally patient remains awake and alert but tends to doze

Orientation

Oriented 1.0
1. Where are you? (City and Hospital)
2. What is the month and year?
Speech can be slurred but must be intelligible.
Disoriented 0.0Patient cannot state both place and time or cannot express answers in words or intelligible speech.

It is acceptable for patient to write answer to questions of orientation

Speech

Receptive deficit 0.0Ask pt. 1) to close eyes; 2) Point to ceiling; 3) Does a stone sink in water?If pt. does not complete the above 3, go to Section A2.
Expressive deficit 0.5
Normal Speech 1.0

Adapated from Canadian Neurological Scale Cheat Sheet by Brown, M.and Li, J available from: http://www.heartandstroke.on.ca/site/c.pvI3IeNWJwE/b.5385163/k.5CDC/HCP__Canadian_Neurological_Scale_CNS.htm

SECTION A1 – No Comprehension Deficit

Face:	Ask pt. to smile:
None 0.5	No weakness – 0.5
Present 0.0	Weakness – 0.0 (Record L or R)
Arm: Proximal	Ask pt. to lift arms to shoulder level and apply resistance above elbows bilaterally
None 1.5	No weakness – 1.5
Mild 1.0	Movement to 90°, unable to oppose pressure – 1.0
Significant 0.5	Movement < 90° – 0.5
Total 0.0	Absence of motion – 0.0
Arm: Distal	Ask pt. to bend wrist back. Apply pressure on back of the hand.
None 1.5	No weakness – 1.5
Mild 1.0	Can bend wrist, unable to oppose pressure – 1.0
Significant 0.5	Some movement of fingers – 0.5
Total 0.0	Absence of movement – 0.0
Leg: Proximal	Ask pt. to flex knee to 90°. Push down on each thigh one at a time.
None 1.5	No weakness – 1.5
Mild 1.0	Can lift leg, unable to oppose pressure – 1.0
Significant 0.5	Lateral movement but no power to lift leg – 0.5
Total 0.0	Absence of movement – 0.0
Leg: Distal	Ask pt. to point toes and feet upward. Push down on each foot one at a time.

None 1.5	No weakness – 1.5
Mild 1.0	Can point foot & toes upward, unable to oppose pressure-1.0
Significant 0.5	Some movement of toes, but cannot lift toes or foot – 0.5
Total 0.0	Absence of movement – 0.0

SECTION A2 – Comprehension Deficit

Face:	Ask pt. to mimic your grin (if unable, apply pressure to sternum).
Symmetrical 0.5	Symmetrical – 0.5
Asymmetrical 0.0	Asymmetrical – 0.0
Arms:	Demonstrate/place pt. arms in front of pt. at 90° (if unable, apply finger nail bed pressure bilaterally and compare response)
Equal 1.5	Equal motor response – 1.5
Unequal 0.0	Unequal motor response – 0.0 (record L or R)
Legs:	Thighs flexed to 90° (if unable, apply toenail bed pressure bilaterally and compare response)
Equal 1.5	Maintain position or withdraw equally – 1.5

Scoring and Score Interpretation:

Mentation: Comprised of evaluating consciousness, orientation and speech.
Motor function evaluations are separated into sections A1 and A2. A1 is administered if the patient is able to understand and follow instructions. A2 is administered in the presence of comprehension deficits (Cote et al., 1986, 1989). Each motor item is rated for severity and each rating is weighted “according to the relative importance of a particular neurological deficit” (Cote et al., 1989).
It should be noted that assessment using the CNS focuses on limb weakness over other possible neurological impairments (Muir, Weir, Murray, Povey & Lees, 1996).
The CNS scores only the motor strength of the weakest limb. For patients with a comprehension deficit, asymmetry in strength is scored. Therefore, in addition to using the CNS, clinicians may wish to further evaluate and document the upper and lower extremity strength and power in patients with comprehensive deficit (O’Farrell & Yong Zou, 2008).
Scores from each section are summed to provide a total score out of a possible 11.5. Lower scores are representative of increasing severity.

Nilanont et al. (2010) developed and validated a conversion model that allows clinicians and researchers to predict NIHSS scores for patients based on their CNS score in order to allow for comparability between the two scales. CNS scores can be reliably converted into NIHSS scores using the following conversion: NIHSS = 23 – (2 x CNS score).

Time:

The CNS takes approximately 5 to 10 minutes to complete (Cote et al., 1986, 1989; O’Farrell & Yong Zou, 2008).

Training requirements:

It is advised that the CNS be completed by a healthcare professional trained in its administration. The CNS does not need to be completed by a neurologistThis team member is responsible for "the diagnostic evaluation, medical treatment, prevention of stroke recurrence, patient and family education, staff and trainee education, research, program evaluation."(Suggested by Philips et al, 2002)
.

A trained observer rates the patent’s ability to answer questions and perform activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
. Training is minimal and is available through participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. in a 2-hour workshop or a self-directed learning package and review video. For more details on training requirements please visit the following website: http://www.heartandstroke.on.ca/site/c.pvI3IeNWJwE/b.5385163/k.5CDC/HCP__Canadian_Neurological_Scale_CNS.htm

Subscales:

The subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
items encompass level of consciousness, orientation, speech, motor function and facial weakness.

Equipment:

None typically reported.

Alternative forms of the assessment

None typically reported

Client suitability

Can be used with:

Patients in the acute phase of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. who are either alert or drowsy (Cote et al., 1986).

Should not be used with:

As the CNS was designed as an observational scale, measurement by self-report or by telephone is not possible.

Languages of the measure

None reported.

Summary

What does the tool measure?	The CNS measures neurological status.
What types of clients can the tool be used for?	Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. in the acute phase who are either alert or drowsy.
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	Approximately 5 to 10 minutes.
Versions	There are no alternative versions reported.
Other Languages	No information reported.
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: One study examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the CNS and reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.. Inter-rater: Two studies investigated the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. and found adequate to excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. .
ValidityThe degree to which an assessment measures what it is supposed to measure.	Criterion: Concurrent: One study evaluated the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the CNS and found excellent concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." between the CNS and global neurological examination. Predictive: Two studies evaluated the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the CNS and found initial CNS scores to predict death within 3 to 6 months, morbidity, and recovery of ADL within 3 to 5-months. Construct: Discriminant: One study evaluated the discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess. of the CNS and found excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between total CNS scores and standard neurological examinations.
Floor/Ceiling Effects	Floor/ceiling effects have not yet been examined.
Does the tool detect change in patients?	The CNS can be used to monitor change in neurological status in patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Acceptability	The CNS is short and simple. Patient burden associated with its use should be minimal.
Feasibility	A trained healthcare professional should administer the CNS. It may be used both prospectively and retrospectively.
How to obtain the tool?	The CNS is available from strokecenter.org. A full version of the measure can be found in the following article: Cote, R., Hachinski, V., Shurvell, B., Norris, J. & Wolfson, C. (1986). The Canadian Neurological Scale: A preliminary study in acute stroke. Stroke, 17(4), 731-737.

Psychometric Properties

Overview

For the purposes of this review, we conducted a literature search to identify all relevant publications on the psychometric properties of the CNS.

Floor/Ceiling Effects

Floor/ceiling effects have not yet been examined.

Reliability

Internal ConsistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.:
Cote et al. (1986) examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the CNS in 34 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Four raters (one neurologistThis team member is responsible for "the diagnostic evaluation, medical treatment, prevention of stroke recurrence, patient and family education, staff and trainee education, research, program evaluation."(Suggested by Philips et al, 2002)
, one resident in neurology and two nurses) evaluated the patients. Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency., as calculated using Cronbach’s alpha, was excellent for all domains (leg weakness 0.896; facial weakness 0.934; distal arm weakness 0.969; orientation 0.979; proximal arm weakness 0.98; and speech 1.00). No differences between professionals were found.

Test-retest:
Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
has not been reviewed.

Intra-rater:
Intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
has not been examined.

Inter-rater:
Cote et al. (1986) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the CNS in 34 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Four raters (one neurologistThis team member is responsible for "the diagnostic evaluation, medical treatment, prevention of stroke recurrence, patient and family education, staff and trainee education, research, program evaluation."(Suggested by Philips et al, 2002)
, one resident in neurology and two nurses) evaluated the patients within two to four hours of each other using the CNS. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
, as calculated using kappa statistics, was adequate to excellent for all domains (leg weakness 0.722-0.842; facial weakness 0.535-1.00; distal arm weakness 0.758-0.974; orientation 0.744-1.00; proximal arm weakness 0.788-1.00; and speech 0.934-1.00).

Brushnell, Johnston and Goldstein (2001) looked at the retrospective inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
scoring of both the CNS and the National Institute of Health StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale (NIHSS). They compared data from academic medical centers to data from community hospitals with neurologists and community hospitals without neurologists. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
for the CNS, as calculated using Intraclass correlation coefficient (ICC)Intraclass correlation (ICC) is used to measure inter-rater reliability for two or more raters. It may also be used to assess test-retest reliability. ICC may be conceptualized as the ratio of between-groups variance to total variance., was found to be excellent for all charts reviewed (academic medical center ICC=0.97; community hospital with neurologists 0.88; community hospital without neurologists 0.78). The inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
for the NIHSS was excellent for the charts reviewed from the academic medical centre and the community hospital with a neurologistThis team member is responsible for "the diagnostic evaluation, medical treatment, prevention of stroke recurrence, patient and family education, staff and trainee education, research, program evaluation."(Suggested by Philips et al, 2002)
(ICC=0.93; 0.89 respectively), however only adequate agreement was found for charts reviewed from the community hospital without a neurologistThis team member is responsible for "the diagnostic evaluation, medical treatment, prevention of stroke recurrence, patient and family education, staff and trainee education, research, program evaluation."(Suggested by Philips et al, 2002)
(ICC=0.48). More data was missing for the NIHSS in comparison to that missing for the CNS likely due to the fact that the NIHSS requires a more detailed neurological examination. These results suggest that scoring the CNS retrospectively is reliable regardless of whether the medical record contains evaluation material from a neurologistThis team member is responsible for "the diagnostic evaluation, medical treatment, prevention of stroke recurrence, patient and family education, staff and trainee education, research, program evaluation."(Suggested by Philips et al, 2002)
.

Validity

Content :

Content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension.
has not been reviewed.

Criterion :

Concurrent:
Cote et al. (1989) evaluated the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the CNS in the original validation study involving 157 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Patients were evaluated by staff neurologists or neurology residents upon admission to the hospital and were classified as either having no, mild, moderate or severe deficit resulting from acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Nurses then evaluated the patients using the Glascow Coma Scale (GCS) and the CNS. An average interval of 3.71 hours occurred between assessments. Concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
was evaluated by correlating CNS item with the appropriate components of the neurological examination, and the total score of the CNS with the global assessment on the neurological examination (no, mild, moderate or severe). Concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
, as measured by Spearman rank correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, was found to be excellent between the global neurological examination and the total CNS score (0.775). The concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
between the neurological components and CNS was found to be excellent for orientation (0.716), speech (0.691) and weakness (0.767); and adequate for level of consciousness (0.574).

Predictive:
Cote et al. (1989) evaluated the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the CNS in 157 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Three outcomes were evaluated: 1) death at 6 months; 2) any vascular event within 6 months (for example, MI, CVA or vascular death); and 3) independence in ADL at 5 months or beyond. Initial CNS scores were found to significantly predict death within 6 months, morbidity, and recovery of ADL within 5-months. For patients with scores of ≥ 11, only 2.1% had died at 6 months, 2.1% experienced another vascular event, approximately 90% were independent in ADLS at 5 months or beyond; compared to those that scored <9 initially where 13.2% had died at 6 months, 20.6% experienced another vascular event and <50% were independent in ADLS at 5 months or beyond.

Muir et al. (1996) compared the CNS, National Institute of Health StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale (NIHSS) and the Middle Cerebral Artery StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Score (MCANS) to see which scale best predicted good (alive at home) or poor (alive and requiring in care or dead) outcome at 3-months in 373 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Predictive accuracy of the variables was compared by ROC curves and stepwise logistic regression. Logistic regression showed that the NIHSS added significantly to the predictive value of all other scores. The overall accuracy for the CNS, NIHSS and MCANS as stand alone measures was adequate (0.79, 0.79 and 0.83 respectively).

SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
/Specificity:
The sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
/specificity has not been examined.

Construct :

Convergent/Discriminant:
Cote et al. (1989) evaluated the discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess.
of the CNS and the Glascow Coma Scale (GCS) by comparing results with a standard neurological examination. Results from the GCS and the CNS evaluation of 157 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. were compared with a standard neurological examination. Excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
was found between the total CNS score and the standard neurological examination (r2=0.769), however only adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
was found between the GCS and the standard neurological examination (r2=0.563). These results suggest that the CNS may better discriminate neurological deficit.

Known Groups:
The known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured.
has not been examined.

Responsiveness

Cote et al. (1989) evaluated the responsivenessThe ability of an instrument to detect clinically important change over time.
of the CNS in 79 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The CNS was administered on admission and throughout the first 48 hours. Patients were classified as either 1) remaining stable over first 48 hours or 2) status changed over first 48 hours. A change in score ≥ 1 yielded the highest negative predictive value (0.969), with a sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of 0.933 and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
of 0.508). The results of this study suggest that the CNS can be used to monitor clinically significant differences in neurological status.

Hagen, Bugge, and Alexander (2003) examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the CNS and other commonly used outcome measures in 136 patients in the early post-stroke period. The outcomes measures were administered at 1, 3 and 6 months after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. onset. The sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of the CNS to detect change from 1 to 3 months and 3 to 6 months, as calculated using Standardized Response MeanThe standardized response mean (SRM) is calculated by dividing the mean change by the standard deviation of the change scores.
, was small (SRM=0.2860 and 0.2849 respectively). These results suggest that the CNS has some ability to detect change in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. in the subacute phase of recovery.

References

Bushnell, C.D., Johnston, D.C.C. & Goldstein, L. B. (2001). Retrospective assessment of the initial stroke severity: Comparison of the NIH Stroke Scale and the Canadian Neurological Scale. Stroke, 32, 656-660.
Cote, R., Battista, R.N., Wolfson, C., Boucher, J., Adam, J., Hachinski, V. (1989). The Canadian Neurological Scale: Validation and reliability assessment. Neurology, 39, 638-643.
Cote, R., Hachinski, V., Shurvell, B., Norris, J. & Wolfson, C. (1986). The Canadian Neurological Scale: A preliminary study in acute stroke. Stroke, 17(4), 731-737.
Cuspineda, E., Machado, C., Aubert, E., Galan, L, Liopis, F, Avila, Y. (2003). Predicting outcome in acute stroke: A comparison between QEEG and the Canadian Neurological Scale. Clinical Electroencephalography, 34(1), 1-4.
Muir, K.W., Weir, C.J., Murray, G.D., Povey, C., Lees, K.R. (1996). Comparison of neurological scales and scoring systems for acute stroke prognosis. Stroke, 27, 1817-1820.
Nilanont, Y., Komoitri, C., Saposnik, G., Cote, R., Di Legge, S., Jin, Y. et al. (2010). The Canadian Neurological Scale and the NIHSS: Development and validation of a simple conversion model. Cerebrovascular Disease, 30(2), 120-126.
Shinar, D., Gross, C.R., Mohr, J.P., Caplan, L.R., Price, T.R., Wolf, P.A. et al. (1985). Interobserver variability in the assessment of neurologic history and examination in the stroke data bank. Archives of Neurology, 42, 557-565.

See the measure

How to obtain the CNS?

The CNS is available from strokecenter.org.

A full version of the measure can be found in the following article: Cote, R., Hachinski, V., Shurvell, B., Norris, J. & Wolfson, C. (1986). The Canadian Neurological Scale: A preliminary study in acute stroke. Stroke, 17(4), 731-737.

A training video has been produced by the Heart & Stroke Foundation of Canada using the Canadian Neurological Scale.

Charlson Comorbidity Index (CCI)

Evidence Reviewed as of before: 03-02-2009

Author(s)*: Sabrina Figueiredo, BSc

Editor(s): Lisa Zeltzer, MSc OT; Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Purpose

The Charlson Comorbidity Index (CCI) assesses comorbidity level by taking into account both the number and severity of 19 pre-defined comorbid conditions. It provides a weighted score of a client’s comorbidities which can be used to predict short term and long-term outcomes such as function, hospital length of stay and mortality rates. The CCI is the most widely used scoring system for comorbities used by researchers and clinicians (Charlson, Pompei, Ales, & Mackenzie, 1987; Elixhauser, Steiner, Harris, & Coffey, 1998).

In-Depth Review

Purpose of the measure

Available versions

The CCI was published by Charlson, Pompei, Ales, and Mackenzie in 1987.

Features of the measure

Items:
The CCI is comprised of 19 comorbid conditions: myocardial infarct, congestive heart failure, peripheral vascular disease, cerebrovascular disease, dementia, chronic pulmonary disease, connective tissue disease, ulcer disease, mild liver disease, diabetes, hemiplegiaComplete paralysis of the arm, leg, and trunk on one side of the body that results from damage to the parts of the brain that control muscle movements. Hemiplegia is not a progressive condition, nor is it a disease., moderate or several renal disease, diabetes with end organ damage, any tumor, leukemia, lymphoma, moderate or severe liver disease, metastatic solid tumor, AIDS. Each disease is given a different weight based on the strength of its association with 1-year mortality as follows (Charlson et al., 1987):

Assigned weights for diseases	Comorbid Conditions
1	Myocardial infarct, congestive heart failure, peripheral vascular disease, cerebrovascular disease, dementia, chronic pulmonary disease, connective tissue disease, ulcer disease, mild liver disease, diabetes
2	HemiplegiaComplete paralysis of the arm, leg, and trunk on one side of the body that results from damage to the parts of the brain that control muscle movements. Hemiplegia is not a progressive condition, nor is it a disease., moderate or several renal diseases, diabetes with end organ damage, any tumor, leukemia, lymphoma
3	Moderate or severe liver disease
6	Metastatic solid tumor, AIDS

The CCI can be completed from medical records, administrative databases, or interview-based questionnaires (Bravo, Dubois, Hebert, De Wals, & Messier, 2002).

Scoring:
The total score in the CCI is derived by summing the assigned weights of all comorbid conditions presented by the client. Higher scores indicate a more severe condition and consequently, a worse prognosis (Charlson, Szatrowski, Peterson, & Gold, 1994).

Time:
Not reported

Subscales:
None

Equipment:
Not applicable.

Training:
No specific training is available.

Alternative forms of the CCI

The CCI has a weighted age version, two adaptations to be used with ICD-9 databases, and one version to be used with clients with amputations (Charlson et al., 1994; Deyo, Cherkin, & Ciol, 1992; Melchiore, Findley, & Boda, 1996; Romano, Roos, & Jollis, 1993).

Client suitability

Can be used with:

Clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
The CCI is a general scoring system allowing for its use with a variety of populations (Groot, Beckerman, Lankhorst, & Bouter, 2003).

Should not be used in:

To date, there is no information on restrictions for using the CCI.

In what languages is the measure available?

Not applicable

Summary

What does the tool measure?	The CCI measures comorbidity level.
What types of clients can the tool be used for?	The CCI can be used with, but is not limited to clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screening or assessment tool?	ScreeningTesting for disease in people without symptoms. .
Time to administer	Not reported.
Versions	Age CCI; ICD-9-CM; CCI for clients with amputations.
Other Languages	Not applicable
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: No studies have examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the CCI. Test-retest: One study has examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the CCI and reported excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). using Intraclass Correlation Coefficient (ICC)Intraclass correlation (ICC) is used to measure inter-rater reliability for two or more raters. It may also be used to assess test-retest reliability. ICC may be conceptualized as the ratio of between-groups variance to total variance. and Spearman’s Rank CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. . Intra-rater: No studies have examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the CCI. Inter-rater: One study examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the CCI and reported adequate inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. using ICC.
ValidityThe degree to which an assessment measures what it is supposed to measure.	Content: One study examined the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension. of the CCI by reporting the steps for generating the weighted comorbidity index. Criterion: Concurrent: No studies have examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the CCI. Predictive: Four studies have examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the CCI and reported that the CCI was able to predict function at 3 months post-stroke, poor outcomes on the modified Rankin Scale at discharge, and mortality after 1 year. In contrast, the CCI was not able to predict length of stay, Functional Independence Measure scores, and modified Rankin Scale scores at 4 months post-stroke. Construct: Convergent: Three studies examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the CCI and reported excellent correlations between the CCI and the Functional Comorbidity Index, poor to adequate correlations between the CCI and total numbers of medication consumed, numbers of hospitalization, length of stay, total costs, laboratory studies, therapeutic interventions, consultations and days of interruption of the rehabilitation program using Spearman rank correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. . Known Groups: No studies have examined the known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured. of the CCI.
Floor/Ceiling Effects	No studies have examined floor/ceiling effects of the CCI.
SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." / SpecificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).	No studies have examined the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." /specificity of the CCI.
Does the tool detect change in patients?	No studies have examined the responsivenessThe ability of an instrument to detect clinically important change over time. of the CCI.
Acceptability	The CCI is the most widely used index to assess comorbidity.
Feasibility	The CCI can be completed from medical records, administrative databases, or interview-based questionnaires.
How to obtain the tool?	The CCI can be obtained from its original publication: (Charlson, Pompei, Ales, & Mackenzie, 1987)

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the Charlson Comorbidity Index (CCI) in individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. We identified 6 studies.

Reliability

Test-retest:
Katz, Chang, Sangha, Fossel, and Bates (1996) evaluated the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the questionnaire format of the CCI in 25 inpatients with different diagnoses including strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were evaluated by the same rater twice within 24 hours. Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was excellent as calculated using Intraclass CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
Coefficient (ICC = 0.92) and Spearman’s Rank CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
(rho = 0.94).

Inter-rater:
Liu, Domen and Chino (1997) assessed the inter-rater reliability of the CCI in 10 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The CCI was administered by two examiners blinded to each other’s scores. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
, as calculated using Intraclass CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
Coefficient, was adequate (ICC = 0.67).

Validity

Content:
Charlson et al. (1987) identified the comorbid conditions of 559 inpatients with breast cancer. They then calculated the relationship of potential prognostically important variables to survival using Cox’s regression analysis. Finally, the adjusted relative risk was estimated to each comorbid condition.

Criterion:
Concurrent:
No gold standardA measurement that is widely accepted as being the best available to measure a construct.
exists against which to compare the CCI.

Predictive:
Liu et al. (1997) estimated the ability of the CCI at hospital admission to predict length of stay and the Functional Independence Measure (FIM) score (Keith, Granger, Hamilton, & Sherwin, 1987) at discharge. Predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
was calculated in 106 clients with Spearman’s Rank CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
. Correlation between the CCI and the FIM was poor(rho = -0.19) as was the correlation between the CCI and length of stay (rho = 0.16). These results suggest that the CCI measured at hospital admission may not be predictive of length of stay or the FIM at discharge.

Goldstein, Samsa, Matchar, and Horner (2004) examined in 960 clients with acute stroke whether the CCI measured at admission was able to predict the modified Rankin Scale (mRS) (Rankin, 1957) at hospital discharge, and, 1-year mortality rates. Predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
was analyzed using logistic regression. The CCI was dichotomized into low comorbidity (scores <2) and high comorbidity (scores <2) and the mRS into good outcomes (scores <2) and poor outcomes (scores ≥2). Higher CCI scores were associated with a 36% increased odds of having poor outcomes on the modified Rankin Scale and 72% greater odds of death at 1 year post-stroke.

Fischer, Arnold, Nedeltchev, Schoenenberger, Kappeler, Hollinger et al. (2006) verified in 259 clients the ability of the CCI, as measured at admission to a stroke unitStroke units are designed to provide multidisciplinary specialized care for patients who have had a stroke. In the best units, the team consists of nurses, pharmacists, social workers, medical staff, and occupational, physical and speech therapists. Stroke units can be located in a special unit in a defined location, or can used as a roving stroke specialist team. (Hill, M. Stroke Units in Canada. CMAJ. 2002:167:649-50.), to predict poor outcomes on the modified Rankin Scale (mRS – Rankin, 1957) at 4 months after hospital discharge. The mRS was dichotomized into good outcomes (scores ≤ 2) and poor outcomes (scores >2). Logistic regression analyses revealed that the CCI was not able to predict poor outcomes on the mRS. In this study, the predictors of the mRS score at 4 months post-stroke were strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity, atrial fibrilation, coronary artery disease and diabetes.

Tessier, Finch, Daskalopoulou, and Mayo (2008) examined, in 672 participants, the ability of the CCI, the Functional Comorbidity Index (Groll, Bombardier, & Wright, 2005), and a stroke-specific comorbidity index (developed by the same authors) to predict function 3 months post-stroke. Predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
was calculated by use of c-statistics to calculate the area under the Receiver Operating Characteristic (ROC) curve. The ability of the CCI (AUC = 0.76), the Functional Comorbidity Index (AUC = 0.71) and the stroke-specific comorbidity index (AUC = 0.71) to predict function at 3 months post-stroke were all adequate. These results suggest that the percentage of patients correctly classified according to their function at 3 months post-stroke is slightly higher when using the CCI over these other comorbidity measures.

Construct:
Convergent/Discriminant:

Katz et al. (1996) tested the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the CCI by comparing it to self-reported number of prescription medications consumed, number of hospitalizations, length of stay and total financial costs in 170 hospital inpatients, including those with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Correlations, as calculated using Spearman’s Rank CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, were all poor between the CCI and self-reported number of prescription medications (rho = 0.06), number of hospitalizations (rho = 0.22), length of stay (rho = 0.20) and total costs (rho = 0.26).

Liu et al. (1997) measured the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the CCI in 106 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., by comparing it to the number of medication consumed, laboratory studies, therapeutic interventions, number of consultations during hospital’s stay, and days of interruption of participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. in rehabilitation due to complications. Adequate correlations were found between the CCI and the total number of medications consumed (rho = 0.48) and poor correlations were found between the CCI and laboratory studies (rho = 0.28), therapeutic interventions (rho = 0.19), consultations (rho = 0.25), and days of interruption of rehabilitation participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. (rho = 0.22).

Tessier et al. (2008) analyzed the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the CCI by comparing it to the Functional Comorbidity Index (Groll et al., 2005) in 437 clients with Correlations were found to be excellent (rho = 0.62).

Known groups:
No studies have examined known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured.
of the CCI.

Responsiveness

No studies have examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the CCI.

References

Bravo, G., Dubois, M.F., Hebert, R., De Wals, P., & Messier, L. (2002). A perspective evaluation of the Charlson Comorbidity Index for use in long-term care patients. JAGS, 50, 740-745.
Charlson, M., Pompei, P., Ales, M.L., & Mackenzie C.R. (1987). A new method of classifying comorbidity in longitudinal studies: Development and validation. J Chronic Dis, 40, 373-393.
Charlson, M., Szatrowski, T.P., Peterson, J., & Gold, J. (1994). Validationof a Combined Comorbidity Index. Journal of Clinical Epidemiology, 47(11), 1245-1251.
De Groot, V., Beckerman, H., Lankhorst, G.J., & Bouter, L.M. (2003). How to measure comorbidity: A critical review of available methods. Journal of Clinical Epidemiology, 56, 221-229.
Deyo, R.A., Cherkin, D.C., & Ciol, M.A. (1992). Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. Journal Clinical Epidemiology, 45, 613-619.
Elixhauser, A., Steiner, C., Harris, D.R., & Coffey, R.M. (1998).Comorbidity measures for use with administrative data. Medical Care, 36(1), 8-27.
Fischer, U., Nedeltchev, K., Schoenenberger, R.A., Kappeler, L., Hollinger, P.,Schroth, G. et al. (2006). Impact of comorbidity on ischemic stroke outcome. Acta Neurol Scand, 113, 108-113.
Goldstein, L.B., Samsa, G.P., Matchar, D.B., & Horner, R.D. (2004). Charlson Index Comorbidity Adjustment for Ischemic Stroke Outcome Studies. Stroke, 35, 1941-1945.
Groll, D., Bombardier, C., & Wright, J. (2005). The development of a comorbidity index with physical function as the outcome. Journal of Clinical Epidemiology, 58, 595-602.
Hall, W. H., Ramachandran, R., Narayan, S., Jani, A. B., & Vijayakumar, S. (2004). An electronic application for rapidly calculating Charlson comorbidity score. BMC Cancer, 4, 94.
Katz, J., Chang, L., Sangha, O., Fossel, A., & Bates, D. (1996). Can comorbidity be measured by questionnaire rather than medical record review? Medical Care, 34(1), 73-84.
Keith, R.A., Granger, C.V., Hamilton, B.B., & Sherwin, F.S. (1987). The functional independence measure: A new tool for rehabilitation. Adv Clin Rehabil, 1, 6-18.
Liu, M., Domen, K., & Chino, N. (1997). Comorbidity measures for stroke outcome research: A preliminary study. Arch Phys Rehabil, 78, 166-172.
Melchiore, P.J., Findley, T., Boda, W. (1996). Functional outcome and comorbidity indexes in the rehabilitation of the traumatic versus the vascular unilateral lower limb amputee. Am J Phys Med Rehabil, 75, 9-14.
Rankin, J. (1957). Cerebral vascular accidents in patients over the age of 60. Scott Med J, 2, 200-215.
Romano, P.S., Roos, L.L., & Jollis, J.G. (1993). Adapting a clinical comorbidity index for use with ICD-9-CM administrative data: differing perspectives. Journal of clinical epidemiology, 46 (10) 1075-1079.
Tessier, A., Finch. L., Daskalopoulou, S.S., Mayo, N.E. (2008). Validation of the Charlson Comorbidity Index for Predicting Functional Outcome of Stroke. Arch Phys Med Rehabil, 89, 1276-1283.

See the measure

How to obtain the CCI

An electronic application for rapidly calculating Charlson Comorbidity Index score

The following link will allow you to download an Excel Spread sheet calculator for Charlson Comorbidity Index: Excel calculator Charlson Index

Glasgow Coma Scale (GCS)

Evidence Reviewed as of before: 29-09-2008

Author(s)*: Lisa Zeltzer, MSc OT

Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Purpose

The Glasgow Coma Scale (GCS) was developed to describe the depth and duration of impaired consciousness or coma. In this measure, three aspects of behaviour are independently measured: motor responsiveness, verbal performance, and eye opening. The GCS can be used with individuals with traumatic brain injury, stroke, non-traumatic coma, cardiac arrest, and toxic ingestions.

In-Depth Review

Purpose of the measure

The Glasgow Coma Scale (GCS) was developed to describe the depth and duration of impaired consciousness or coma. In this measure, three aspects of behaviour are independently measured: motor responsivenessThe ability of an instrument to detect clinically important change over time.
, verbal performance, and eye opening. The GCS can be used with individuals with traumatic brain injury, stroke, non-traumatic coma, cardiac arrest, and toxic ingestions.

Available versions

The GCS was published in 1974 by Graham Teasdale and Bryan J. Jennett. In 1976, Teasdale and Jennett distinguished between “normal” and “abnormal” flexion, which increased the “best motor response” item by one point.

Features of the measure

Items:

The GCS is comprised of three components: 1) Best eye response, which is believed to indicate whether the arousal mechanisms in the brainstem are active; 2) Best verbal response, which is believed to be the most common definition of the end of a coma, or the recovery of consciousness; and 3) Best motor response, which is thought to be associated with central nervous system functioning. Each component has a number of grades starting with the most severe. Best eye response has 4 grades; Best verbal response has 5 grades; Best motor response has 6 grades.

Best eye response (E)

No eye opening
Eye opening in response to pain (patient responds to pressure on the patient’s fingernail bed; if this does not elicit a response, supraorbital and sternal pressure or rub may be used).
Eye opening to speech (not to be confused with an awaking of a sleeping person; such patients receive a score of 4, not 3).
Eyes opening spontaneously

Best verbal response (V)

No verbal response
Incomprehensible sounds (moaning but no words).
Inappropriate words (random or exclamatory articulated speech, but no conversational exchange).
Confused (the patient responds to questions coherently but there is some disorientation and confusion).
Oriented (patient responds coherently and appropriately to questions such as the patient’s name and age, where they are and why, the year, month, etc.).

Best motor response (M)

No motor response
Extension to pain (decerebrate response: rigid adduction and extension of the arms, legs stiffly extended, downward pointing of the toes, backward arching of the head, wrists pronated and fingers flexed).
Flexion in response to pain (decorticate response: arms flexed, or bent inward on the chest, the hands are clenched into fists, and the legs extended).
Withdraws from pain (pulls part of body away when pinched; normal flexion)
Localizes to pain (purposeful movements towards changing painful stimuli; e.g. hand crosses mid-line and gets above clavicle when supra-orbital pressure applied).
Obeys commands (the patient does simple things as asked).

Scoring:

In the GCS, each of the component scores as well as the sum of the components are considered. The total score is out of 15-points, with lower scores indicating more severe impairment. The lowest possible GCS total score is 3, indicating deep coma or death, and the highest possible score is 15, indicating a fully awake individual. The total score of the GCS is calculated by summing E + V + M.

The score is expressed in the form GCS (total score) = score on E + score on V + score on M. For example, GCS 9 = E2 V4 M3 indicates a total score of 9, a score of 2 on Best eye response (E), a score of 4 on Best verbal response (V), and a score of 3 on Best motor response (M). Note: For a patient who is intubated, the V is expressed as V intubated.

Interpretation of the GCS total score is as follows:

Minor head injury = 13-15
Moderate head injury = 9-12
Severe head injury (coma) = 8 or less

Time:

Not reported.

Subscales:

The GCS has 3 subscales: Best eye response, Best motor response, and Best verbal performance.

Equipment:

Only a pencil and the test are needed.

Training of administrator:

Although no training is required to administer the GCS, one study examined whether the GCS can be used reliably and accurately by inexperienced examiners and found that experienced medical personnel can use the measure with extremely high levels of accuracy and reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
, but inexperienced examiners may create significant errors, especially in the intermediate levels of consciousness, when the detection of neurologic changes is critical to patient monitoring"The process of checking the task over time for ‘quality control’ and the adjustment of behavior" (Stuss, 2009, p. 9-10)
(Rowley & Fielding, 1991). Thus, it is recommended that the inexperienced examiner be supervised by an expert when completing the GCS.

Alternative form of the GCS:

The GCS cannot be used with children, especially below the age of 36 months. This is due to the verbal performance component which is likely to be poor in even a healthy child. Thus, the Pediatric Glasgow Coma Scale (Reilly, Simpson, Sprod, & Thomas, 1988) was developed as an alternative to the GCS.

The Pediatric Glasgow Coma Scale can be obtained at the following website:

https://www.mdcalc.com/pediatric-glasgow-coma-scale-pgcs

Client suitability

Can be used with:

The GCS can be used with clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. However, the National Institutes of Health Stroke Scale (NIHSS), developed specifically for use with a stroke population may be a more useful assessment of consciousness in this population.

Should not be used with:

Clients with dysphasiaImpaired speech with difficulty or inability to put words in their proper order. This disorder affects the power of expression (speech, writing or signs) or loss of the power of comprehension (spoken or written language). More severe forms of dysphasia are called aphasia.
, aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada), and clients who are intubated will have a reduced score on the verbal response scale resulting in a reduced total GCS score but a normal level of consciousness. This should be taken into account when interpreting the GCS results in these individuals. Although it has been suggested that the verbal score be omitted in these clients, and an 8-level (3 to 10) modified GCS be used (Prasad, 1996; Prasad & Menon, 1998), the results of a larger study has suggested that the verbal subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
still be included because it adds important prognostic information (Weir, Bradford, & Lees, 2003).
In clients with hemiparesis, ensure that the motor scale is being applied to the less affected arm so that a “best” response can be obtained.
The GCS should be administered prior to administration of a sedative or paralytic agent, or after these drugs have been metabolized. Airway, breathing, and circulation should be assessed and stabilized prior to administering the GCS.

In what languages is the measure available?

The GCS has been translated into Chinese and is available online at the following
website: http://www.coma.ulg.ac.be/medical/acute.html

Summary

What does the tool measure?	To describe the depth and duration of impaired consciousness or coma.
What types of clients can the tool be used for?	The GCS can be used with individuals with traumatic brain injury, cerebrovascular events, nontraumatic coma, cardiac arrest, and toxic ingestions.
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	Not reported.
Versions	Pediatric Glasgow Coma Scale
Other Languages	Chinese
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: One study examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the GCS and reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.. Inter-rater: Two studies examined the inter-rater reliability of the GCS using kappa statistics. One study reported adequate inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. and the other reported excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. for all three GCS subscales.
Validity	Criterion: Predictive: Three studies examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. of the GCS. One reported that the GCS adequately predicted stroke mortality. One study reported that the total GCS score predicted acute mortality with 87% accuracy using just the Best eye response and Best motor response subscales, and with 88% accuracy using all three subscales. One study reported that the GCS was able to predict both 2-week mortality and 3-month recovery from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Construct: Convergent: One study examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the GCS with the 60-Second Test (SST) and reported an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. using Spearman’s rho.
Floor/Ceiling Effects	No studies have examined floor/ceiling effects of the GCS in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." /Specificity	Not reported.
Does the tool detect change in patients?	One study examined the responsivenessThe ability of an instrument to detect clinically important change over time. of the GCS and reported that it had poor sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." to change.
Acceptability	Caution should be used in interpreting the scores of intubated clients, or clients with dysphasiaImpaired speech with difficulty or inability to put words in their proper order. This disorder affects the power of expression (speech, writing or signs) or loss of the power of comprehension (spoken or written language). More severe forms of dysphasia are called aphasia. or aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury. An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada).
Feasibility	The GCS is a short measure that requires no additional equipment. Although training is not required, it is recommended as the measure is more reliable when completed by an experienced clinician. The scale is simple to score and cutoffs are well established in this measure.
How to obtain the tool?	The GCS is available free online from the following website: http://www.strokecenter.org/professionals/stroke-diagnosis/stroke-assessment-scales/

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the GCS in individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. We identified only three studies specifically examining the GCS in stroke (Weir et al., 2003; Prasad & Menon, 1998; Weingarten, Bolus, Riedinger, Maldonado, Stein, & Ellrodt, 1990). Thus, in this review we will present psychometric data from studies examining neurological patients that include patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Floor/Ceiling Effects

Not reported.

Reliability

Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.:
Mayer, Dennis, Peery, Fitsimmons, Du, Bernardini, Commichau, et al. (2003) examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the GCS in 171 patients in the neurointensive care unit. Cronbach’s alpha was found to be excellent (alpha = 0.83).

Test-retest:
Not reported.

Inter-rater:
Gill, Reiley, and Green (2004) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the GCS in 116 emergency department patients with various diagnoses (10 clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., 9% of the sample). Two attending emergency physicians independently assessed the GCS within 5 minutes of each other while blinded to each other’s scores. Kappa statistics were calculated for each of the GCS subtests and the total score. Best eye response had adequate inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
(weighted k = 0.72) as did Best verbal response (weighted k = 0.48) and Best motor response (weighted k = 0.40). The agreement percentage for total GCS was 32% (Kendall’s T-b = 0.74; Spearman rho = 0.86; Spearman rho2 = 75%). Agreement percentage for GCS Best eye response was 74% (T-b = 0.72; Spearman rho = 0.76; Spearman rho2 = 57%), verbal 55% (T-b = 0.59; Spearman rho = 0.67; Spearman rho2 = 44%), and motor 72% (T-b = 0.74; Spearman rho = 0.81; Spearman rho2 = 65%).

Mayer et al. (2003) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the CGS in 64 patients in the neurointensive care unit. The GCS was administered by 2 or 3 examiners within 5 to 10 minutes of each other. Examiners were blinded to each other’s scores. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was excellent for the total GCS (weighted k = 0.91), Best visual response (weighted k = 0.86), Best motor response (weighted k = 0.91) and Best verbal response (weighted kappa = 0.76).

Validity

Content:

Not available.

Criterion:

Concurrent:
No gold standardA measurement that is widely accepted as being the best available to measure a construct.
exists against which to compare the GCS.

Predictive:
Weingarten et al. (1990) examined whether the GCS was as accurate in predicting strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. mortality as APACHE II (Knaus, Draper, Wagner, & Zimmerman, 1985), a scale that consists of the GCS score plus 11 other physiological variables, age, and a chronic health evaluation. 246 patients hospitalized with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., including 49 oversampled mortalities were included in the study. The GCS was found to adequately predict stroke mortality, and was found to be as accurate as the APACHE II score in predicting strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. mortality with the oversampled mortalities (r = -0.50 and r = 0.50, respectively) and after excluding the oversampled mortalities (r = -0.40 and r = 0.39, respectively).

Prasad and Menon (1998) compared the predictive accuracy of three alternative strategies for verbal scoring in 275 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. who were either intubated or had dysphasiaImpaired speech with difficulty or inability to put words in their proper order. This disorder affects the power of expression (speech, writing or signs) or loss of the power of comprehension (spoken or written language). More severe forms of dysphasia are called aphasia.
. The total GCS score predicted acute mortality with 87% accuracy using just the Best eye response and Best motor response subscales, versus 88% accuracy with all three subscales. Thus, the authors concluded that the verbal subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
could be excluded from the total GCS score without loss of predictive value in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Weir, Bradford, and Lees (2002) examined the ability of the GCS to predict 2-week mortality and 3-month recovery (survival, living at home) in a large cohort of individuals with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The results of 1217 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (including 349 patients with dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
) were analyzed. Area under the receiver operating curve (AUC) was used by the authors to compare versions of the GCS. The results of the AUC calculations indicated that the total GCS score had a greater AUC than the GCS without the verbal score for predicting 2-week mortality. This was apparent for all participants together (AUC = 0.78 for the total GCS score; 0.76 for the GCS without the verbal score) and for only the participants with dysphasiaImpaired speech with difficulty or inability to put words in their proper order. This disorder affects the power of expression (speech, writing or signs) or loss of the power of comprehension (spoken or written language). More severe forms of dysphasia are called aphasia.
(AUC = 0.72 for total GCS score; 0.71 for the GCS without the verbal score). Similarly, the total GCS score was also better than the GCS without the verbal score for predicting 3-month recovery in all participants (AUC = 0.71 for the total GCS score; 0.67 for the GCS without the verbal score) and in participants with dysphasiaImpaired speech with difficulty or inability to put words in their proper order. This disorder affects the power of expression (speech, writing or signs) or loss of the power of comprehension (spoken or written language). More severe forms of dysphasia are called aphasia.
only (AUC = 0.74 for the total GCS score; 0.70 for the GCS without the verbal score). These results suggest that in contrast to the findings by Prasad and Menon (1998), the verbal subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
should not be excluded in clients with dysphasiaImpaired speech with difficulty or inability to put words in their proper order. This disorder affects the power of expression (speech, writing or signs) or loss of the power of comprehension (spoken or written language). More severe forms of dysphasia are called aphasia.
since it adds important prognostic information. These results also suggest that the total GCS score can predict early mortality and 3-month recovery and that the GCS better predicted the outcome of early mortality than the outcome of 3-month recovery.

Construct:

Convergent/Discriminant:
Mayer et al. (2003) examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the GCS with the 60-Second Test (SST) in 171 patients in the neurointensive care unit using Spearman’s rho. The GCS and SST had an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
(Spearman’s rho = 0.72).

Known groups:
Not examined.

Responsiveness

Mayer et al. (2003) examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the GCS in 36 patients in the neurointensive care unit. Patients underwent a baseline testing, followed by 1-13 follow-up encounters performed every 12-24 hours. The neurologistThis team member is responsible for "the diagnostic evaluation, medical treatment, prevention of stroke recurrence, patient and family education, staff and trainee education, research, program evaluation."(Suggested by Philips et al, 2002)
performed a brief standardized examination and provided a global clinical impression of change in level of consciousness (better, the same, or worse) compared with the prior encounter. According to the global impression of a neurologistThis team member is responsible for "the diagnostic evaluation, medical treatment, prevention of stroke recurrence, patient and family education, staff and trainee education, research, program evaluation."(Suggested by Philips et al, 2002)
, patients improved in 24% and worsened in 26% of the 187 follow-up examinations. SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of the GCS to these changes in level of consciousness was poor (46%).

References

Gill, M. R., Reiley, D. G., & Green, S. M. (2004). Interrater Reliability of Glasgow Coma Scale Scores in the Emergency Department. Ann Emerg Med, 43, 215-223.
Knaus, W. A., Draper, E. A., Wagner, D. P., & Zimmerman, J. E. (1985). APACHE II: A severity of disease classification system. Crit Care Med, 13, 818-829.
Lenfant, F., Sobraques, P., Nicolas, F., Combes, J. C., Honnart, D., Freysz, M. (1997). Use of Glasgow Coma Scale by anesthesia and intensive care internists in brain injured patients. Ann Fr Anesth Reanim, 16, 239-243.
Mayer, S. A., Dennis, L. J., Peery, S., Fitsimmons, B. -F., Bernardini, G. L., Commichau, C., Eldaief. (2003). Quantification of lethargy in the neuro-ICU: The 60-Second Test, Neurology, 61(4), 543-545.
Prasad, K. (1996). The Glasgow Coma Scale: A critical appraisal of its clinimetric properties. Journal of Clinical Epidemiology, 49(7), 755-763.
Prasad, K. & Menon, G. R. (1998). Comparison of three strategies of verbal scoring of the Glasgow Coma Scale in patients with stroke. Cerebrovasc Dis, 8, 79-85.
Reilly, P., Simpson, D., Sprod, R., Thomas, L. (1988). Assessing the conscious level in infants and young children: A paediatric version of the Glasgow Coma Scale.Child’s Nerv Syst, 4(1), 30-33.
Rowley, G, Fielding, R. (1991). Reliability and accuracy of the Glasgow Coma Scale with experienced and inexperienced users. Lancet, 337, 535-538.
Teasdale, G., Jennett, B. (1974). Assessment of coma and impaired consciousness. A practical scale. The Lancet, 2(7872), 81-84.
Teasdale, G. M., & Jennett, B. (1976). Assessment and prognosis of coma after head injury. Acta Neurochir (Wien), 34, 45-55.
Warlow, C. P., Dennis, M. S., van Gijn, D., Hankey, G. J., Sandercock, P., Bamford, J. M., et al. (2001). Stroke: A Practical Guide to Management (2nd ed.). Malden, MA: Blackwell Publishing.
Weir, C. J., Bradford, A. P., & Lees, K. R. (2003). The prognostic value of the components of the Glasgow Coma Scale following acute stroke. Q J Med, 96, 67-74.

See the measure

How to obtain the GCS:

The GCS is available at the following website:
http://www.strokecenter.org/trials/scales/glasgow_coma.html

Modified Rankin Scale (MRS)

Evidence Reviewed as of before: 19-08-2008

Author(s)*: Lisa Zeltzer, MSc OT

Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc; Sabrina Figueiredo, BSc

Purpose

The Modified Rankin Scale (MRS) is a single item, global outcomes rating scale for patients post-stroke. It is used to categorize level of functional independence with reference to pre-stroke activities rather than on observed performance of a specific task.

In-Depth Review

Purpose of the measure)

Available versions

The original Rankin Scale was developed in Scotland in 1957 and was used to assess disability in patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (Rankin, 1957). It consisted of a single item, with five grades representing no, slight, moderate, moderately severe, and severe disability. The Rankin Scale was modified in 1988 as part of a study of aspirin in strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. prevention (UK-TIA Study Group, 1988) and renamed the MRS. This modification was not reported in the aspirin study, but was described subsequently by van Swieten, Koudstaal, Visser, Schouten, and van Gijn (1988). An additional grade was included (grade 0 = no symptoms at all) because of reported concerns about a lack of grading comprehensiveness. The wording of the definitions for grades 1 and 2 were also altered because of concerns of ambiguity (Bamford, Sandercock, Warlow, & Slattery, 1989). The changes were reportedly also made to accommodate language disorders and cognitive defects, to allow comparison between patients with different kinds of neurological deficits and to add a further dimension by referring to previous activities (van Swieten et al., 1988).

Features of the measure

Items:

The MRS is a single item scale.

The conventional method of administration for the MRS is a guided interview process. The assessment is carried out by asking the patient about their activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living, including outdoor activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
. Information about the patient’s neurological deficits on examination, including aphasia and intellectual deficits, should be obtained. All aspects of the patient’s physical, mental performance, and speech should be combined in the choice of a single MRS grade.

The categories within the MRS have been criticized as being broad and poorly defined, left open to the interpretation of the individual rater (Wilson et al., 2002). A structured interview format for the administration of the MRS is available (see section Alternative forms of the Modified Rankin Scale – MRS).

Scoring:

A single MRS grade should be assigned based on the following criteria (Dromerick, Edwards, & Diringer, 2003):

Rankin Grade	Description
0	No symptoms
1	No significant disability despite symptoms; able to carry out all usual duties and activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
2	Slight disability: unable to carry out all previous activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. but able to look after own affairs without assistance
3	Moderate disability: requiring some help, but *able to walk without assistance
4	Moderately severe disability: unable to walk without assistance, and unable to attend to own bodily needs without assistance
5	Severe disability: bedridden, incontinent, and requiring constant nursing care and attention

* It is unclear whether the term ‘without assistance’ allows for aids or modifications, or whether it refers only to assistance from another person.

Some studies have examined the ability of MRS scores to be dichotomized. de Haan, Limburg, Bossuyt, van der Meulen, and Aaronson (1995) suggested that MRS scores be dichotomized for the purposes of comparison in evaluating the effectiveness of an intervention. They suggested that a score of 0-3 indicate mild to moderate disability, and a score of 4-5 indicate severe disability. Currently, there is no standardized or consistent method of dichotomization (Sulter, Steen, & de Keyser, 1999), as there is a lack of consensus regarding favorable vs. unfavorable poor outcome in terms of Rankin score. Dichotomization has also been criticized as being associated with a loss of information when determining the benefits derived from a particular rehabilitation intervention. For example, Lai and Duncan (2001) reported that 62% of patients included in their study experienced recovery represented by a shift of 1 or more Rankin grades in the first 3 months following stroke. If these shifts were between grades 0 and 1 or between 4 and 5, for example, no change would be reported using a dichotomized system of outcomes where favourable outcome was defined as MRS = 0, 1, and 2 and unfavourable as MRS = 3, 4 or 5. In a study from Weisscher, Vermeulen, Roos, and de Haan (2008), 15% of patients were classified as having a favorable outcome when it was defined as MRS = 0-1. Among these patients, 84% were able to perform outdoor activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
. When favorable outcome was defined as a MRS = 0-2, 37% were classified as having a favorable outcome. However, among this group, only 56% were able to perform outdoor activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
. Lai and Duncan (2001) have suggested that transition in Rankin grades may be more appropriate in the assessment of intervention benefit. Weisscher et al. (2008) stated that defining favorable and unfavorable outcomes is an arbitrary decision.

The authors suggested that if favorable outcome is expressed by the ability to perform outdoor activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
then the score 0-1 should be chosen. However, if complex ADL are considered as the main outcome, then a score of 0-2 on the MRS should be considered the best dichotomization option. Sulter et al. (1999) suggest that an appropriate definition may be that poor outcome exists if any of the following occur: death, institutionalization due to strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., MRS score >3, or Barthel Index score <60.

Time:

5-15 minutes (New & Bushbinder, 2006)

Subscales:

There are no subscales to the MRS.

Equipment:

Administration of the MRS does not require any specialized equipment.

Training:

No formal training is required to administer the MRS.

Alternative form of the Modified Rankin Scale (MRS)

Modified Rankin Scale-Structured Interview (MRS-SI) (Wilson et al., 2002).
Wilson et al. (2002) developed a structured interview to improve the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the MRS. The structured interview differs from the conventional guided interview for the MRS by defining specific questions to grade each category. The structured interview developed for the study consisted of 5 sections: (1) constant care (e.g. does the person require constant care?), (2) basic ADL (e.g. is assistance essential for eating, using the toilet, daily hygiene, or walking?), (3) instrumental ADL (e.g. is assistance essential for preparing a simple meal, doing household chores, looking after money, shopping, or traveling locally?), (4) limitations in participationAs defined by the International Classification of Functioning, Disability and Health, participation is an individual's involvement in life situations in relation to health conditions, body functions or structures, activities, and contextual factors. Participation restrictions are problems an individual may have in the manner or extent of involvement in life situations. in usual social roles (e.g. has there been a change in the person’s ability to participate in previous social and leisure activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
?), and (5) checklist for the presence of common stroke symptoms (e.g. does the person have difficulty reading/writing, speaking or finding the right word, problems with balance/coordination, visual problems, numbness, difficulty with swallowing, or other symptom resulting from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.?). Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
improved significantly after training in the structured interview (Wilson et al., 2005). Furthermore, the extent of disagreement between raters on the MRS-SI was less than what has been observed with the MRS.

Client suitability

Can be used with:

Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Should not be used with:

The MRS has not been evaluated for use with proxy respondents.

In what languages is the measure available?

The MRS is available in:

German (Berger et al., 1999)
Persian (Oveisgharan et al., 2006)
Dutch (e.g. Hop, Rinkel, Algra, & van Gijn, 1998).

Summary

What does the tool measure?	Level of post-stroke functional independence.
What types of clients can the tool be used for?	Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	The MRS takes 5-15 minutes to administer.
Versions	Original Rankin Scale (RS), Modified Rankin Scale-Structured Interview (MRS-SI)
Other Languages	German, Persian, Dutch
Measurement Properties
Reliability	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: No studies have examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the MRS. Test-rest: Two studies have examined the test-rest reliability of the MRS and reported excellent test-retest. Intra-rater: Only one study has examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the MRS and reported excellent intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. . Inter-rater: Six studies have examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the MRS. Two reported adequate to excellent, three reported excellent inter-rater (note: one study used an expanded guidance scheme – guided interview format -, two reported systematic differences between raters using ANOVA), and one reported poor inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. .
ValidityThe degree to which an assessment measures what it is supposed to measure.	Criterion: Concurrent: Excellent correlations with the Barthel Index, Frenchay Activities Index, the motor component of the Functional Independence Measure, Short Form-36 Physical Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). and the Euroqol 5D.Adequate correlations with the Stroke-Adapted Sickness Impact Profile-30 and the Glasgow Coma Scale as well as adequate to excellent correlations with Magnetic Resonance Imaging (MRI) findings. Predictive: The most relevant predictors were MRS scores before the strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. event, the presence of diabetes, and severity of left arm weakness. Construct: Convergent/Discriminant: One study reported that the MRS was closely related to the Glasgow Outcome Scale, the NIH StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale, and the Barthel Index. One study reported an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between the MRS and the Barthel Index. One study reported adequate to excellent correlations between the MRS and five impairment scales (the Orgogozo Scale, the NIH Stroke Scale, the Canadian Neurological Scale, the Mathew scale, and the Scandinavian StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale). Finally, one study reported a weak correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between the MRS and the Sickness Impact Profile subscales of Cognitive Alertness and Social Interaction.
Floor/Ceiling Effects	One study examined the floor effects of the MRS and reported an adequate floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect." .
Does the tool detect change in patients?	One study examined the responsivenessThe ability of an instrument to detect clinically important change over time. of the MRS when administered to strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. rehabilitation inpatients at admission and discharge and reported that the MRS was poor at detecting change.
Acceptability	The MRS has not been evaluated for use with proxy respondents.
Feasibility	The MRS is single item, global outcomes rating scale that takes 5 -15 minutes to administer and does not require any formal training or specialized equipment. The categories of the MRS have been criticized for being broad, poorly defined and left open to rater interpretation. The MRS- Structured Interview (MRS-SI) differs from the conventional guided interview format of the MRS by defining specific questions to grade each category. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the MRS has been shown to improve with the use of this structured interview format.
How to obtain the tool?	Please click here to obtain a copy of the MRS.

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the MRS.

Floor/Ceiling Effects

Dromerick, Edwards, and Diringer (2003) administered the MRS to 95 strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. rehabilitation inpatients and reported that the MRS displayed an adequate floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect."
(18%) at admission to rehabilitation.

Reliability

Test-retest:
Wolfe, Taub, Woodrow, and Burney (1991) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the MRS in 50 patients with stroke of varying severity. Two out of three research nurses interviewed patients on two occasions that were 2-3 weeks apart. The test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
using the weighted kappa statisticA measure of the degree of nonrandom agreement between observers or measurements of the same categorical variable (Last JM, A Dictionary of Epidemiology, 2nd Ed, Oxford University Press, 1988).
was excellent (kappa w = .95).

Wilson et al. (2005) examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the MRS in patients at least 6 months post-stroke, using two raters who performed repeat assessments with a mean test-retest interval of 7 days. Agreement was measured using the kappa statisticA measure of the degree of nonrandom agreement between observers or measurements of the same categorical variable (Last JM, A Dictionary of Epidemiology, 2nd Ed, Oxford University Press, 1988).
. Comparison of Rankin grades showed that there was excellent agreement between the first and second assessments. Agreement between the first and second assessments was found in 85% of cases for rater 1 (kappa = 0.81; kappa w = 0.94), and in 96% for rater 2 (kappa = 0.95; kappa w = 0.99).

Intra-rater:
Wolfe et al. (1991) examined the intra-rater reliability of the MRS in a sample of 14 patients who were assessed twice by the same observer within a 2-week period at least 3 months post-stroke. Exact agreement was reported in 86% of observations (kappa w = 0.95). The intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the MRS as reported in this study is considered to be excellent.

Inter-rater:
van Swieten et al. (1988) examined the inter-rater reliablity of the MRS in 100 patients who were interviewed by two physicians using kappa statistics. Physician agreement on the degree of handicap of the patients occurred for 65% of the patients. The physicians differed by one Rankin grade in 32% of the patients and by two grades in 3% of the patients. The kappa for all pairwise observations was adequate (kappa = 0.56; kappa w = 0.91). For the outpatient group, the kappa was excellent (kappa = 0.82). For the inpatient group, the kappa was adequate (kappa = 0.51).

Wolfe et al (1991) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the MRS in 50 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. of varying severity. Two out of three research nurses interviewed patients. The kappa coefficients were excellent and ranged from kappa = 0.75 to kappa = 0.96. However, analysis of variance revealed that there was evidence of a systematic difference between the raters (F 2,48 = 6.02, p = 0.005), with raters 1 and 3 estimating the grade 0.42 and 0.33 points higher than rater 2.

Wilson et al. (2002) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the MRS in 63 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The MRS was administered by two raters. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was measured with the kappa statisticA measure of the degree of nonrandom agreement between observers or measurements of the same categorical variable (Last JM, A Dictionary of Epidemiology, 2nd Ed, Oxford University Press, 1988).
and was found to be excellent (kappa w = 0.78). However, overall agreement between the 2 raters was only 57%, and one rater assigned significantly lower grades than the other (p = 0.048).

Wilson et al. (2005) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the MRS in patients at least 6 months post-stroke. Fifteen raters were recruited for the study and pairs of raters assessed a total of 113 patients on the MRS. Agreement between raters was observed in only 43% of cases (kappa = 0.25, kappa w = 0.71).

Shinohara, Minematsu, Amano, and Ohashi (2006) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the MRS when an expanded guidance scheme (a guided interview format) and corresponding questionnaire was used. Twenty raters (neurologists and nurses) watched videotapes of 30 patients interviewed and scored each patient. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was calculated using the intraclass correlation coefficient (ICC)Intraclass correlation (ICC) is used to measure inter-rater reliability for two or more raters. It may also be used to assess test-retest reliability. ICC may be conceptualized as the ratio of between-groups variance to total variance.. In this study, inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was excellent (ICC = 0.95 for neurologists and ICC = 0.96 for nurses).

Quinn, Dawson, Walters, and Lees (2008) assessed the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the MRS among 2942 evaluators from 30 different countries. The evaluators rated 5 non-scripted videotaped interviews. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was calculated using Kappa statistics. The overall inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the MRS was adequate (kappa = 0.67). The agreement level at each grade of the MRS was poor for a score of 0 (kappa = 0.19), adequate for a score of 2 (kappa = 0.48) and 3 (kappa = 0.74), and excellent for a score of 4 (kappa = 0.95). The agreement level for scores of 0 and 5 were not reported since the videotaped interviews did not include clients with a full range of disabilities. The inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
by country was poor for Italy (kappa = 0.34), adequate for Belgium (kappa = 0.73), Czech Republic (kappa = 0.68), France (kappa = 0.64), Hungary (kappa = 0.70), Netherlands (kappa = 0.50), South Korea (kappa = 0.67), Sweden (kappa = 0.65), Unites States (kappa = 0.73) and the United Kingdom (kappa = 0.69) and excellent for Australia (kappa = 0.77), Germany (kappa = 0.78), Portugal (kappa = 0.80), Slovakia (kappa = 0.75) and Spain (kappa = 0.84). The agreement level was excellent for both native and non-native English speakers (kappa = 0.77; kappa = 0.76). Among assessors from the United Kingdom the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was adequate for all professional backgrounds: general medicine (kappa = 0.66), geriatrics (kappa = 0.54), neurology (kappa = 0.56), and research assistants (kappa = 0.65).
Note: The inter-reliability by country was calculated only for countries with more than 50 certified evaluators.

Validity

Criterion:

Concurrent:
Cup, Scholte op Reimer, Thijssen, and van Kuyk-Minis (2003) examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the MRS with the Canadian Occupational Performance Measure (COPM), the Barthel Index (BI), the Frenchay ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Index (FAI), the Stroke-Adapted Sickness Impact Profile-30 (SA-SIP30), and the Euroqol 5D (EQ-5D) in 26 patients post-stroke at their place of residence. The MRS had a statistically significant correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the BI, FAI, SA-SIP30 and the EQ-5D. Spearman’s rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients were excellent for the BI, FAI and EQ-56 (r = -0.81, -0.80, and 0.68, respectively). An adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
was found between the MRS and the SA-SIP30 (r = 0.47).
Note: Some correlations are negative because a high score on the MRS indicates increased impairment whereas a low score on other measures indicates increased impairment.

Kwon, Harzema, Duncan, and Min-Lai (2004) examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the Barthel Index (BI), the motor component of the Functional Independence Measure (M-FIM), and the MRS using Spearman correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients. Excellent correlations were observed between the MRS and the BI (r = -0.89) and between the M-FIM and the MRS (r = -0.89).

Weimar et al. (2002) examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the MRS from a sample of 4,264 patients with acute ischemic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. from 30 hospitals in Germany during a 1-year period. The patients were administered the Barthel Index (BI), the MRS, the Short Form-36 Physical Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
(SF-36 PF), and the Center for Epidemiologic Studies-Depression short form (CES-D). The MRS had an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the SF-36 PF (r = 0.84) and with the BI (r = 0.82).

Schaefer, Huisman, Sorensen, Gonzalez, and Schwamm (2004) examined whether diffusion-weighted Magnetic Resonance Imaging (MRI) findings (thought to demonstrate lesions that are not visualized with conventional MRI sequences) and conventional MRI findings correlate with discharge MRS and Glasgow Coma Scale scores in 26 patients with diffuse axonal injury. Using Spearman rank correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients, the results of this study showed that the strongest correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
was between signal-intensity abnormality volume on diffusion-weighted images and MRS score, which was excellent (r = 0.77). For lesion number, the strongest correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
was between lesion number on images acquired and all sequences and MRS score, which was also excellent (r = 0.66). For lesion location, the strongest correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
was between lesion location in the corpus callosum and MRS score, which was adequate (r = 0.51). There was an adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the MRS and the Glasgow Coma Scale.

Predictive:
Weimar et al. (2002) identified the most important predictors of adverse outcomes on the Barthel Index (BI) and MRS following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The most relevant predictors were MRS scores before the strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. event, the presence of diabetes, and severity of left arm weakness.
Note: Although MRS scores > 3 was an inclusion criterion in this study, it did not specify how the MRS scores were obtained before the strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. event.

Construct:

Convergent/Discriminant :
Tilley et al. (1996) found that the MRS was closely related to the Glasgow Outcome Scale (94% agreement; Φ = 0.88) and with impairment measured by the NIH StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale (86% agreement; phi coefficient = 0.67) and the Barthel Index (87% agreement; Φ = 0.76). These results raise concern about the construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the MRS. The results of this study lends support to the assertion that the MRS is closer to a disability scale than a handicap scale.

de Haan, Horn, Limburg, van Der Meulen, and Bossuyt (1993) evaluated 87 patients who had a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. 6 months prior to evalutation. Impairments were scored on five strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. scales: the Orgogozo Scale, the National Institutes of Health StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale, the Canadian Neurological Scale, the Mathew scale, and the Scandinavian StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale. Disability was assessed with the Barthel Index, handicap with the MRS, and quality of life with the Sickness Impact Profile. The correlations between MRS and the 5 impairment scales using Pearson’s coefficients ranged from adequate to excellent (ranging from r = -0.56 to r = -0.71).
Note: Some correlations are negative because a high score on the MRS indicates increased impairment whereas a low score on other measures indicates increased impairment.

de Haan, Limburg, Bossuyt, van der Meulen, and Aaronson (1995) reported a strong relationship (using Somers’ D) between activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living as measured by the Barthel Index (0.73) and the subscales of the Sickness Impact Profile including Instrumental activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living (0.65), Mobility (0.60) and Living arrangements (0.74) The weakest associations reported were between the MRS and the Sickness Impact Profile subscales of Cognitive Alertness (0.34) and Social Interaction (0.37).

Wolfe et al. (1991) administered the MRS and the Barthel Index (which assesses disability) to 50 patients post-stroke. The correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the MRS and the Barthel Index was measured using kappa statistics. There was an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
(kappa = 0.72; weighted kappa = 0.91) between the two scales, which lends support to the assertion that the MRS is closer to a disability scale than a handicap scale.

Responsiveness

Dromerick et al. (2003) examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the MRS in comparison to 3 other disability scales (the International StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Trial Measure; the Barthel Index (BI); the Functional Independence Measure (FIM). The MRS was administered to 95 strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. rehabilitation inpatients at admission and at discharge. The MRS was poor at detecting change. When compared to the FIM, the receiver operating characteristics analysis showed that the MRS (C-statistic C = 0.59) was much less sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
to change compared with the BI (C-statistic C = 0.82), indicating a corresponding lower specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
for the MRS. The MRS detected change in 55 subjects, including all who changed on the International StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Trial Measure. The BI detected change in 71 patients and the FIM detected change in 91 patients. The results of this study suggest that the global scales (MRS and the International StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Trial Measure) are much less sensitive to changes in disability than the activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living scales (the BI and the FIM).

References

Bamford, J. M., Sandercock, A. G., Warlow, C. P., Slattery, J. (1989). Interobserver agreement for the assessment of handicap in stroke patients (letter). Stroke, 20, 828.
Bamford, J. M., Vessey, M., Fowler, G., Molyneux, A., Hughes, T., Burn, J., et al. (1988). A prospective study of acute cerebrovascular disease in the community: The Oxfordshire Community Stroke Project 1981-1986. 1. Methodology, demography and incident cases of first-ever stroke. J Neurol Neurosurg Psychiatry; 51, 1373-1380.
Berger, K., Weltermann, B., Kolominsky-Rabas, P., Meves, S., Heuschmann, P., Bohner, J., Neundorfer, B., Hense, H. W., Buttner, T. (1999). The reliability of stroke scales. The german version of the NIHSS, ESS and Rankin scales. Fortschr Neurol Psychiatr, 67(2), 81-93.
Cup, E. H. C., Scholte op Reimer, W. J. M., Thijssen, M. C. E., van Kuyk-Minis, M. A. H. (2003). Reliability and validity of the Canadian Occupational Performance Measure in stroke patients. Clin Rehabil, 17, 402-409.
de Haan, R., Horn, J., Limburg, M., van Der Meulen, J., Bossuyt, P. (1993). A comparison of five stroke scales with measurement of disability, handicap and quality of life. Stroke, 24, 1178-1181.
de Haan, R., Limburg, M., Bossuyt, P., van der Meulen, J., Aaronson, N. (1995). The clinical meaning of Ranking ‘handicap’ grades after stroke. Stroke, 26, 2027-2030.
Dromerick, A. W., Edwards, D. F., Diringer, M. N. (2003). Sensitivity to changes in disability after stroke: A comparison of four scales useful in clinical trials. Journal of Rehabilitation Research and Development, 40(1), 1-8.
Hop, J., Rinkel, G. J. E., Algra, A., van Gijn, J. (1998). Quality of life in patients and partners after aneurismal subarachnoid hemorrhage. Stroke, 29, 798-804.
Kwon, S., Harzema, A. G., Duncan, P. W., Min-Lai, S. (2004). Disability measures in stroke: Rehationship among the Barthel Index, the Functional Independence Measure, and the Modified Rankin Scale. Stroke, 35, 918-923.
Lai, S. M., Duncan, P. W. (2001). Stroke recovery profile and the Modified Rankin Assessment. Neuroepidemiology, 20, 26-30.
New, P. W., Buchbinder, R. (2006). Critical appraisal and review of the Rankin Scale and its derivatives. Neuroepidemiology, 26, 4-15.
Oveisgharan, S., Shirani, S., Ghorbani, A., Soltanzade, A., Baghaei, A., Hosseini, S., Sarrafzadegan, N. (2006). Barthel Index in a middle-east country: Translation, validity and reliability. Cerebrovascular Diseases, 22, 350-354.
Quinn, T.J., Dawson, J., Walters, M.R., Lees, K.R. (2008). Variability in Modified Rankin Score across a large cohort of international observers. Stroke, 39, 2975-2979.
Rankin, J. (1957). Cerebral vascular accidents in patients over the age of 60. Scott Med J, 2, 200-215.
Schaefer, P. W., Huisman, T., Sorensen, G., Gonzalez, G., Schwamm, L. (2004). Diffusion-weighted MR Imaging in closed head injury: High correlation with initial Glasgow Coma Scale score and score on Modified Rankin Scale at discharge. Neuroradiology, 233, 58-66.
Shinohara, Y., Minematsu, K., Amano, T., Ohashi, Y. (2006). Modified Rankin Scale with expanded guidance scheme and interview questionnaire: interrater Agreement and Reproducibility of Assessment. Cerebrovasc Dis, 21, 271-278.
Sulter, G., Steen, C., De Keyser, J. (1999). Use of the Barthel index and modified Rankin scale in acute stroke trials. Stroke, 30, 1538-1541.
Tilley, B. C., Marler, J., Geller, N. L., Lu, M., Legler, J., Brott, T., et al (1996). Use of a global test for multiple outcomes in stroke trails with application to the National Institute of Neurological Disorders and Stroke t-PA stroke trial. Stroke, 27, 2136-2142.
UK-TIA Study Group. (1988). The UK-TIA aspirin trial: Interim results. Br Med J, 296, 316-320.
van Swieten, J. C., Koudstaal, P. J., Visser, M. C., Schouten, H. J., van Gijn, J. (1988). Interobserver agreement for the assessment of handicap in stroke patients. Stroke, 19, 604-607.
Weimar, C., Kurth, T., Kraywinkel, K., Wagner, M., Busse, O., Ludwig, R., Diener, H-C. (2002). Assessment of functioning and disability after ischemic stroke. Stroke, 33, 2053-2059.
Weisscher, N., Vermeulen, M., Roos, Y.B., de Haan, R.J. (2008). What should be defined as good outcome in stroke trials; a modified Rankin score of 0-1 or 0-2? J Neurol, 255, 867-874.
Wilson, L. J. T., Hareendran, A., Hendry, A., Potter, J., Bone, I., Muir, K. W. (2005). Reliability of the Modified Rankin Scale across multiple raters: Benefits of a structured interview. Stroke, 36, 777-781.
Wilson, L. J. T., Harendran, A., Grant, M., Baird, T., Schultz, U. G. R., Muir, K. W., Bone, I. (2002). Improving the assessment of outcomes in stroke: Use of a structured interview to assign grades on the Modified Rankin Scale. Stroke, 33, 2243-2246.
Wolfe, C. D., Taub, N. A., Woodrow, E.J., Burney, P. G. (1991). Assessment of scales of disability and handicap for stroke patients. Stroke, 22, 1242-1244.

See the measure

How to obtain the MRS?

Please click here to obtain a copy of the MRS and the MRS-SI.

The MRS is also available in New, P. W., Buchbinder, R. (2006). Critical appraisal and review of the Rankin Scale and its derivatives. Neuroepidemiology, 26, 4-15.

The MRS-SI can be found in Wilson, L. J. T., Harendran, A., Grant, M., Baird, T., Schultz, U. G. R., Muir, K. W., Bone, I. (2002). Improving the assessment of outcomes in stroke: Use of a structured interview to assign grades on the Modified Rankin Scale. Stroke, 33, 2243-2246.

On-line training can be obtained at http://www.rankinscale.org/ The training modules comprise an introductory description of the mRS followed by 4 brief patient interviews. These interviews should be scored anonymously for practice purposes before optional group discussion. Correct scores and their justification follow each case (20 minutes). A transcript of the interviews is available. Certification of successful training will depend on correct completion of 5 further scenarios under ‘test’ conditions. Certification lasts for one year, after which re-certification is recommended. scenarios.

Link to MRS training program: http://www.rankinscale.org/

National Institutes of Health Stroke Scale (NIHSS)

Evidence Reviewed as of before: 19-08-2008

Author(s)*: Lisa Zeltzer, MSc OT

Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Content consistency: Gabriel Plumier

Purpose

The National Institutes of Health Stroke Scale (NIHSS) is a 15-item impairment scale, intended to evaluate neurologic outcome and degree of recovery for patients with stroke. The scale assesses level of consciousness, extraocular movements, visual fields, facial muscle function, extremity strength, sensory function, coordination (ataxia), language (aphasia), speech (dysarthria), and hemi-inattention (neglect) (Lyden, Lu, & Jackson, 1999; Lyden, Lu, & Levine, 2001). The NIHSS was designed to assess differences in interventions in clinical trials, although its use is increasing in patient care as an initial assessment tool and in planning postacute care disposition (Schlegel et al., 2003; Schlegel, Tanne, Demchuk, Levine, & Kasner, 2004).

In-Depth Review

Purpose of the measure

The NIHSS is a 15-item impairment scale, intended to evaluate neurologic outcome and degree of recovery for patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The scale assesses level of consciousness, extraocular movements, visual fields, facial muscle function, extremity strength, sensory function, coordination (ataxia), language (aphasia), speech (dysarthria), and hemi-inattention (neglect) (Lyden, Lu, & Jackson, 1999; Lyden, Lu, & Levine, 2001). The NIHSS was designed to assess differences in interventions in clinical trials, although its use is increasing in patient care as an initial assessment tool and in planningPlanning ability involves anticipating future events, formulating a goal or endpoint, and devising a sequence of steps or actions that will achieve the goal or endpoint" (Anderson, 2008, p. 17)
postacute care disposition (Schlegel et al., 2003; Schlegel, Tanne, Demchuk, Levine, & Kasner, 2004).

Available versions

Original version: Brott, Adams, Olinger, Marler, Barsan, Biller, Spilker, Holleran, Eberle, Hertzberg, Rorick, Moomaw, and Walker (1989).

Features of the measure

Items:
Items of the NIHSS are based on three previously used scales, the Toronto StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale, the Oxbury Initial Severity Scale and the Cincinnati StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale (Brott et al., 1989).

The scale has 15 items in total which assess the following:

Level of consciousness

ResponsivenessThe ability of an instrument to detect clinically important change over time.
of the patient (rated from 0 – 3).
Questions: Patients are asked to state the month and their age (rated from 0 – 2).
Commands: The patient is asked to open and close the eyes and then to grip and release the non-paretic hand (hand not affected by partial motor paralysis) (rated from 0 – 2).

Best gaze

Horizontal eye movements of patient (rated from 0 – 2).

Visual

To assess the presence of hemianopia (rated from 0 – 3).

Facial palsy

Patients are asked to show their teeth or raise their eyebrows and close their eyes. Look for symmetry (rated from 0 – 3).

Motor arm

Left arm: Arm is extended (palms down) 90 degrees (if sitting) or 45 degrees (if supine). Drift is scored if the arm falls before 10 seconds (rated from 0 – 4, or UN if amputation or joint fusion).
Right arm: Same as in a.

Motor leg

Left leg: Leg is raised at 30 degrees (supine). Drift is scored if the leg falls before 5 seconds (rated from 0 – 4, or UN if amputation or joint fusion).
Right leg: Same as in a.

Limb ataxia

Finger-to-nose and heel-to-shin test (rated from 0 – 2, or UN if amputation or joint fusion).

Sensory function

If level of consciousness is impaired, score if a grimace or an asymmetric withdrawal is observed (rated from 0 – 2).

Best language (aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada))

Standard pictures are named (rated from 0 – 3).

Dysarthria

Patient is asked to read or repeat words from a list (rated from 0 – 2, or UN if intubated or other physical barrier).

Extinction and inattention (formerly called neglect)

Sufficient information to detect neglect may be obtained from prior testing (rated from 0 – 2).

An additional item that measures distal motor function has been used in a few drug trials, but is not widely used in ongoing research or in clinical practice.

Time:
The examination requires less than 10 minutes to complete.

Scoring:
Each item is scored from 0 – 2, 0 – 3, or 0 – 4, and untestable items are scored as “UN”. A score of 0 indicates normal performance. Total scores on the NIHSS range from 0 – 42, with higher values reflecting more severe cerebral infarcts. StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity is further stratified in the following way:

(Source: Brott et al., 1989)

≥ 25 – Very severe neurological impairment

5-14 – Mild to adequately severe neurological impairment

< 5 – Mild impairment

The predictive value of the scale can also aid in planningPlanning ability involves anticipating future events, formulating a goal or endpoint, and devising a sequence of steps or actions that will achieve the goal or endpoint" (Anderson, 2008, p. 17)
a patient’s rehabilitation or long-term care needs, even as early as the day of admission. NIHSS scores can be interpreted in the following way:

(Source: Schlegel et al., 2003; Rundek et al., 2000; Goldstein & Samsa, 1997; DeGraba, Hallenbeck, Pettigrew, Dutha, & Kelly, 1999)

≥ 14 – Severe: Long-term care in nursing facility required

6-13 – adequate: Acute inpatient rehabilitation required

≤ 5 – Mild: 80% with this score are discharged home

The NIHSS can be completed and scored automatically at the following link:
http://sitemaker.umich.edu/chant/yale_nihss_calculator

Equipment:
None typically reported.

Subscales:
The subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
items encompass level of consciousness, vision, extraocular movements, facial palsy, limb strength, ataxia, sensation, and speech and language.

Training:
A trained observer rates the patent’s ability to answer questions and perform activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
. Training is minimal and is available through instructional videos: a 45-minute training program tape, and two certification tapes (Lyden et al., 1994), or alternatively one can be trained and certified online at the following website: http://www.nihstrokescale.org/. A new training DVD is now available and has established reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
(Lyden et al., 2005).

It is important to note that one must be both trained and certified in order to administer the NIHSS.

As the NIHSS was designed as an observational scale, measurement by self-report or by telephone is not possible. However, measurement by video telemedicine appears to be reliable and could offer a method for remote assessment (Meyer et al., 2005; Shafqat, Kvedar, Guanci, Chang, & Schwamm, 1999). This method of administration would require slightly more time to complete.

To see video clips of the NIHSS items being administered by telemedicine, visit the following link: https://telestroke.massgeneral.org/about.asp

Schmülling, Grond, Rudolf, and Kiencke (1998) examined whether the NIHSS could be reliably administered without any formal training program. The results of this study suggest that good inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the NIHSS depends on adequate training of the raters. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
among untrained raters was only poor (kappa = 0.33).

Alternative forms of NIHSS

11-item modified NIHSS (mNIHSS).
Developed by deleting poorly reproducible or redundant items (level of consciousness, face weakness, ataxia, and dysarthria) and collapsing the sensory item from 3 into 2 responses (Lyden, Lu, Levine, Brott, & Broderick, 2001). The mNIHSS consists of ten items with excellent reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
and one item with adequate reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
(Meyer, Hemmen, Jackson, & Lyden, 2002). The total score for the mNIHSS is 31.
5-item NIHSS (sNIHSS-5) and 8-item NIHSS (sNIHSS-8).
For pre-hospital assessment of stroke severity, an 8-item and a 5-item NIHSS have undergone preliminary evaluation. The 8 items that were most predictive of “good outcome” three months after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. were: right leg, left leg, gaze, visual fields, language, level of consciousness, facial palsy, and dysarthria. The sNIHSS-8 comprises all 8 of these items and the sNIHSS-5 contains only the first 5. In the validation models, receiver operator characteristic’s (ROC) for the sNIHSS-8 and sNIHSS-5 were adequate (ROC = 0.77 and 0.76, respectively). Furthermore, no significant difference between the sNIHSS-8 and the sNIHSS-5 was observed. The sNIHSS-5 retained much of the predictive performance of the full NIHSS (Tirschwell et al., 2002).

Client suitability

Can be used with:

Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

The NIHSS is designed so that virtually any strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. will register some abnormality on the scale.

Should not be used in:

The NIHSS can be administered to virtually any patient with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., however, a potential flaw with the NIHSS is that there may be a ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." below the theoretical limit in patients with very severe strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. because many scale items cannot be tested in these patients (Muir, Weir, Murray, Povey, & Lees, 1996).
Can be estimated retrospectively from the admission neurological examination (Bushnell, Johnston, & Goldstein, 2001; Kasner et al., 1999; Williams, Yilmaz, & Lopez-Yunez, 2000), although actual testing is preferable.

In what languages is the measure available?

The NIHSS has been translated into the following languages: (http://www.proqolid.org/)

Cantonese for Hong-Kong
Estonian
Hindi
Hungarian
Italian
Marathi
Portuguese
Telugu

The NIHSS has been translated and validated in the following languages:

Chinese (Sun, Chiu, Yeh, & Chang, 2006)
German (Berger et al., 1999)
Spanish (Dominguez et al., 2006)

Summary

What does the tool measure?	Neurologic outcome and degree of recovery for patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.
What types of clients can the tool be used for?	Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	It takes less that 10 minutes to complete the NIHSS.
Versions	11-item modified NIHSS (mNIHSS); 5-item NIHSS (sNIHSS-5); 8-item NIHSS (sNIHSS-8).
Other Languages	Translated in Cantonese for Hong-Kong; Estonian; Hindi; Hungarian; Italian; Marathi; Portuguese; Telugu. Translated and validated in Chinese; German; Spanish.
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistency: No studies have examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the NIHSS. Test-retest: Only one study has examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). the original NIHSS and reported adequate to excellent test-retest. Intra-rater: Only one study has examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the original NIHSS and reported excellent intra-rater. Inter-rater: – Out of 11 studies examining the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the original NIHSS, six reported excellent inter-rater, one reported adequate inter-rater, three reported adequate to excellent inter-rater, and one reported poor to excellent inter-rater. – Out of three studies examining the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the mNIHSS, two studies reported excellent inter-rater, and one study reported that inter-rater was improved with the mNIHSS in comparison to the original NIHSS.
ValidityThe degree to which an assessment measures what it is supposed to measure.	Construct: Modified NIHSS: The correlation between the original NIHSS and mNIHSS was excellent. Criterion: Concurrent: Original NIHSS: Poor correlations between NIHSS and the Modified Rankin Scale and the Barthel Index; adequate to excellent correlations with infarct volumes using computed tomography and excellent correlations using MRI. Concurrent: Modified NIHSS: Excellent correlations between mNIHSS and the Modified Rankin Scale, the Barthel Index, and the Glasgow Outcome Scale were reported in a retrospective analysis, however, in a prospective analysis the mNIHSS had poor concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." with the Barthel Index and the modified Rankin Scale. Adequate to excellent correlations have been reported with infarct volumes using computed tomography and excellent correlations using MRI. Predictive: The NIHSS was found to predict Barthel Index, Rankin Scale, and Glasgow Outcome Scale scores at 3-month outcome; administered in the first 24 hours after stroke onset, the NIHSS can retrospectively predict the next level of care after acute hospitalization; NIHSS also predicted clinical outcome; recovery; the likelihood of a patient’s recovery after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.; discharge destination; 3-month mortality; presence and location of a vessel occlusion.
Floor/Ceiling Effects	A significant ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." has been detected with the NIHSS.
Does the tool detect change in patients?	One study assessed the responsivenessThe ability of an instrument to detect clinically important change over time. of the original NIHSS by comparing the scale scores on patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. to the patients’ infarction size as measured by computed tomography at 1 week. Although most patients improved clinically, 4/15 items changed only minimally.
Acceptability	The NIHSS can be administered to virtually any patient with stroke, however, a potential flaw with the NIHSS is that there may be a ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." below the theoretical limit in patients with very severe strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. because many scale items cannot be tested in these patients (Muir, Weir, Murray, Povey, & Lees, 1996). The scale cannot be completed by proxy or by self-report as it is an observational scale. However, measurement by video telemedicine appears to be reliable and could offer a method for remote assessment.
Feasibility	It is important to note that one must be both trained and certified in order to administer the NIHSS. Training and certification can be obtained online at the following website: http://www.nihstrokescale.org/ No specialized equipment is required and relatively little space is needed to administer the NIHSS.
How to obtain the tool?	This measurement tool is available in the following article: https://www.ahajournals.org/doi/10.1161/STROKEAHA.116.015434

Psychometric Properties

Overview

The NIHSS has established reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
and validityThe degree to which an assessment measures what it is supposed to measure.
for use in prospective clinical research, and predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
for long-term strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. outcome (Adams et al., 1999; Brott et al., 1989; Lyden et al., 1994). For the purposes of this review, we conducted a literature search to identify all relevant publications on the psychometric properties of the NIHSS.

Reliability

Original NIHSS:
Brott et al. (1989) designed the NIHSS and assessed the scale’s reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
in 24 patients with stroke. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
for the scale was adequate (mean kappa = 0.69). Agreement was excellent for six items: papillary response (kappa = 0.95), best motor arm performance (kappa = 0.85), best motor leg performance (kappa = 0.83), best gaze (kappa = 0.82), and level of consciousness questions (kappa = 0.80). The lowest agreement was for the qualitative assessment of level of consciousness (kappa = 0.49). Of the 15 test items, the most inter-rater reliable item was pupillary response. Less reliable items were upper or lower extremity motor function. Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was adequate to excellent (mean kappa = 0.66 to 0.77). The correlation between the first examination scores and the second examination scores (within 24 hours) was excellent (r = 0.98). Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
did not differ significantly when administered by different health care professionals such that the correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
of one examiner’s score for the first exam with a different examiner’s score for the second examination was excellent; for example, a first examination by the neurologist of an individual patient correlated with a second examination of that patient by the emergency department nurseIn charge of, but not limited to, the "assessment and provision of care needs; support and education for patients and families; discharge planning."(Suggested by Philips et al, 2002)
with Spearman’s correlation = 0.98. These results suggest that the NIHSS can be reliably administered to patients with stroke.

Meyer et al. (2002) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the NIHSS and the mNIHSS in 45 patients with a history of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Two neurologists tested each patient. Dysarthria was the only item of the NIHSS found to have poor inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
(kappa = 0.289), and four items were found to have adequate reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
. Ten items were found to have excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
. Kappa scores ranged from 0.289 to 0.975. The kappa value for the total NIHSS score was excellent (kappa = 0.969). The results of this study suggest that the NIHSS has high inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
.

Similarly, Goldstein, Bertels and Davis (1989) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the NIHSS in 20 patients with stroke. A pair of clinical strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. fellows rated each patient. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
ranged from adequate to excellent for 9 out of 13 items.

Goldstein and Samsa (1997) examined the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the NIHSS when administered by non-neurologists in the setting of a clinical trial. Thirty physician investigators (30% non-neurologists) and 29 non-physician study coordinators were trained to administer the NIHSS. Four patients were rated and after 3 months had elapsed, then the same four patients were re-rated, in order to provide a measure of intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
. Four new patients were also rated after 3 months and were compared to the initial 4 ratings in order to assess inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
. The intraclass correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients (ICC’s) were excellent for the initial four cases (ICC = 0.94) and for the four new cases rated 3 months later (ICC = 0.92). The overall ICC based on the ratings of these 8 cases was excellent (ICC = 0.95), suggesting that NIHSS administration by non-neurologists has a high level of inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
for the cases rated during the initial training session and re-rated after 3 months had elapsed (ICC = 0.93), suggesting that NIHSS administration by non-neurologists also has a high level of intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
.

Lyden et al. (1994) trained raters to administer the NIHSS to 11 patients using a training video. The inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of this method was then calculated. Moderate to excellent agreement was established on most NIHSS items (unweighted kappa > 0.60). Only two items, ataxia and facial paresis, showed poor agreement (unweighted kappa < 0.40). The results of this study demonstrate the strong reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the NIHSS when raters are trained by a standardized video.

Shafqat et al. (1999) evaluated the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of administering the NIHSS remotely (by telemedicine link) by obtaining one bedside and one remote NIHSS score independently for 20 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Kappa coefficients were calculated for inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
between bedside and remote administration scores. Excellent agreement was achieved for four items (orientation, kappa = 0.75; motor arm, kappa = 0.82; motor leg, kappa = 0.83; neglect, kappa = 0.77). Six items displayed adequate agreement (language, kappa = 0.65; dysarthria, kappa = 0.55; sensation, kappa = 0.48; visual fields, kappa = 0.60; facial palsy, kappa = 0.40; gaze, kappa = 0.41). Two items achieved poor agreement (commands, kappa = 0.29; ataxia, kappa = -0.07). Total NIHSS scores obtained by bedside and remote methods of administration were highly correlated (r = 0.97). These results suggest that the NIHSS can be reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
administered by telemedicine.

Similar to the study by Shafqat et al. (1999), Meyer et al. (2005) also examined the reliability of NIHSS administration by wireless and site-independent telemedicine in 25 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Patients were evaluated by both remote and bedside examination. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
between remote and beside examiners for the NIHSS was found to be poor for two items (facial palsy, kappa = 0.22; limb ataxia, kappa = 0.34), adequate for 3 items (left leg motor, kappa = 0.74; language, kappa = 0.73; dysarthria, kappa = 0.61). Ten items showed excellent agreement (kappa’s ranged from 0.80 to 1.00). The ICC was excellent for the total NIHSS score (ICC = 0.94). Taken together with the results by Shafqat et al. (1999), the NIHSS can be reliably administered by wireless and site-independent telemedicine.

Dewey et al. (1999) examined the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the NIHSS in a community-based sample of 31 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Two neurologists and one of two research nurses assessed the patients. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
, as there was a high level of agreement for total scores between the two neurologists (ICC = 0.95) and between each neurologistThis team member is responsible for "the diagnostic evaluation, medical treatment, prevention of stroke recurrence, patient and family education, staff and trainee education, research, program evaluation."(Suggested by Philips et al, 2002)
and research nurseIn charge of, but not limited to, the "assessment and provision of care needs; support and education for patients and families; discharge planning."(Suggested by Philips et al, 2002)
(ICC = 0.92 and 0.96). While there was adequate to excellent agreement among neurologists and research nurseIn charge of, but not limited to, the "assessment and provision of care needs; support and education for patients and families; discharge planning."(Suggested by Philips et al, 2002)
(weighted kappa > 0.4) for the majority of the NIHSS items, there was poor agreement for the item ‘limb ataxia’ item. The results of this study suggest that the NIHSS can be reliably administered to a community-based sample.

Schmülling et al. (1998) examined the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the NIHSS when administered by untrained raters in 22 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. All diagnoses were confirmed by computed tomography. Four neurologists assessed the patients. Two raters were video trained and experienced in administering the NIHSS, and the other two were inexperienced and were given no training in administering the NIHSS. Excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
(kappa = 0.61) was achieved among the trained raters, however only adequate inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
(kappa = 0.33) was achieved among the untrained raters. Between trained and untrained raters, the unweighted kappa was adequate (kappa = 0.45). The reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of individual items also differed between trained and untrained raters. Among trained raters, only two items had adequate agreement (ataxia, kappa = 0.34; neglect, kappa = 0.32), and the rest were excellent. Among the untrained raters, 6 items had adequately reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
, and 4 items had poor reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
(ataxia, kappa = -0.03; gaze, kappa = 0.06; visual fields, kappa = -0.02; dysarthria, kappa = 0.18). The results of this study suggest that the NIHSS has excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
only when raters are trained and knowledgeable on how to correctly administer the NIHSS.

Kasner et al. (1999) examined whether NIHSS scores could be retrospectively estimated from medical records. NIHSS scores of 39 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. were estimated from notes from medical records by 6 raters. These scores were compared to their actual NIHSS scores to which the raters had been blinded. Overall inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
(ICC = 0.82). Agreement between pairs of raters ranged from good to excellent (ICC’s ranged from 0.70 to 0.89). Over 90% of the estimated NIHSS scores were within 5 points at both admission and discharge for all pairs of raters. The results of this study suggest that the NIHSS can be reliably abstracted from medical records for retrospective studies on acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. outcome.

Williams et al. (2000) developed an algorithm for retrospective NIHSS scoring from chart documentation. One investigator prospectively scored the admission NIHSS in 32 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Two raters retrospectively scored the NIHSS by applying the algorithm to photocopied admission notes. Linear regression was used to assess inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
and agreement between prospective and retrospective NIHSS scores. Weighted kappa statistics were calculated to assess the level of agreement of individual NIHSS items. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was excellent, (r = 0.98) as was agreement between prospective and retrospective NIHSS scores (r = 0.94). Agreement for individual items ranged from adequate (response to commands, kappa = 0.54; visual, kappa = 0.64; ataxia, kappa = 0.66; sensory, kappa = 0.60; dysarthria, kappa = 0.69, extinction/inattention, kappa = 0.57) to excellent (response to questions, kappa = 0.87; best gaze, kappa = 0.94; facial palsy, kappa = 0.76; left arm, kappa = 0.85; left leg, kappa = 0.87; right arm, kappa = 0.79; right leg, kappa = 0.75; best language, kappa = 0.80). Only one item, level of consciousness, had poor agreement (kappa = -0.10). The results of this study suggest that retrospective NIHSS scoring with the developed algorithm is reliable and unbiased even if information is missing from chart documentation.

Bushnell et al. (2001) looked at the retrospective scoring of both the Canadian Neurological Scale and the NIHSS. They compared data from academic medical centers to community hospitals with neurologists and community hospitals without neurologists. More data was missing for the NIHSS in comparison to the amount of data missing for the Canadian Neurological Scale. Almost perfect levels of inter-rater agreement was found for NIHSS scores retrospectively at the academic medical centers (ICC = 0.93) and at community hospitals with neurologists (ICC = 0.89), however, only adequate agreement was found at community hospitals without neurologists (ICC = 0.48). These results suggest that scoring the NIHSS retrospectively may not be reliable unless the medical record contains evaluation material from a neurologistThis team member is responsible for "the diagnostic evaluation, medical treatment, prevention of stroke recurrence, patient and family education, staff and trainee education, research, program evaluation."(Suggested by Philips et al, 2002)
.

Modified NIHSS:
Lyden et al. (2001) developed the mNIHSS and assessed the scale’s reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
using the certification data originally collected to assess the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of investigators in the National Institute of Neurological Disorders and StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. rtPA (recombinant tissue plasminogen activator) Trial. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
was improved with the mNIHSS in comparison to the original NIHSS. The number of scale items with poor kappa coefficients decreased from 8 (20%) to 3 (14%): loss of consciousness commands, gaze, and language. The mNIHSS remains to be tested prospectively, as the original NIHSS may be more appropriate for clinical monitoring"The process of checking the task over time for ‘quality control’ and the adjustment of behavior" (Stuss, 2009, p. 9-10)
of patients.

Meyer et al. (2002) also examined the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the mNIHSS in 45 patients with a history of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Two neurologists tested each patient. Ten out of eleven mNIHSS kappa scores showed excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
(ranging from kappa = 0.841 to kappa = 0.975). Only gaze had a adequate kappa score of 0.661. The total mNIHSS kappa was excellent (kappa = 0.988). In this study, the mNIHSS was found to be more reliable than the original NIHSS.

Meyer et al. (2005) examined the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of mNIHSS administration by wireless and site-independent telemedicine in 25 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Patients were evaluated by both remote and bedside examination. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
between remote and beside examiners for the mNIHSS was found to be adequate for two items (left leg motor, kappa = 0.74; language, kappa = 0.69). Nine items showed excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
(kappas ranged from 0.80 to 1.00). The ICC was excellent for the total mNIHSS score (ICC = 0.95). The results of this study suggest that the mNIHSS can be reliably administered by wireless and site-independent telemedicine.

Validity

Construct:
Original NIHSS:
N/A

Modified NIHSS:
Meyer et al. (2002) tested the construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the NIHSS and mNIHSS in 45 patients with a history of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Two neurologists tested each patient. The Spearman correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient between NIHSS and mNIHSS (for both examiners) was excellent (r = 0.947 and r = 0.941), with an overall average correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
of r = 0.944. Construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the mNIHSS was demonstrated in this study as the scale was found to perform similarly to the NIHSS.

Criterion:
Concurrent:
Original NIHSS:
Meyer et al. (2002) examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the NIHSS and mNIHSS by comparing the scales with the Barthel Index and the Modified Rankin Scale. The coefficients for the examiners combined for NIHSS versus Barthel Index and Modified Rankin Scale were -0.165 (the correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
is negative because a high score on the NIHSS indicates severe neurological impairment, whereas a high score on the BI indicates functional independence) and 0.219 respectively. The authors suggest that the poor relationships observed may be due to the fact that patients in this study had only mild deficits, rendering it difficult to determine concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
, especially at the higher end of the scale.

Brott et al. (1989) assessed the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the NIHSS by comparing the scale scores obtained prospectively on 65 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. to the patients’ infarction size as measured by computed tomography at 1 week. The Spearman’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the total NIHSS score at 7 days and the computed tomography scan lesion volume at 7 days was excellent (r = 0.74). The patients’ initial neurologic deficit as measured by the scale also correlated with the 7-10 day computed tomography lesion volume (r = 0.78). The scale-computed tomography correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
at 7 days for patients with left hemisphere infarctions was 0.72, while this correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
for patients with right hemisphere infarctions was 0.74. The results of this study demonstrate that the NIHSS has excellent concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
with infarct volumes using computed tomography.

Schiemanck, Post, Witkamp, Kappelle and Prevo (2005) examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of infarct volumes in 94 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. as assessed by magnetic resonance imaging (MRI) with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity as measured by the NIHSS at 2 weeks post-stroke. A strong correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between lesion volume and NIHSS score was found (r = 0.61), suggesting that the NIHSS has excellent concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
with infarct volumes using MRI.

However, Saver et al. (1999) also investigated the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of infarct volumes with 3-month NIHSS scores in 191 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. In this study, computed tomography scans at days 6 to 11 were only adequately correlated with 3-month NIHSS scores (r=0.54).

Similarly, Lyden, Claesson, Havstad, Ashwood, and Lu (2004) examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of baseline NIHSS scores with 30-day infarct volumes using computed tomography in patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. seen within 12 hours of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. onset. Baseline NIHSS scores and lesion volumes were also found to be only adequately correlated (r = 0.37).

Derex et al. (2004) examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the NIHSS with lesion volumes in 49 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Patients underwent MRI prior to thrombolysis and were then administered the NIHSS at day one. Baseline NIHSS scores were highly correlated with baseline diffusion-weighted imaging lesion volumes (r = 0.71), and correlated adequately with perfusion-weighted imaging abnormality volumes (r = 0.58) and time to peak delays (r = 0.41). The NIHSS score also correlated with the site of arterial occlusion.

Fink et al. (2002) examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the NIHSS with lesion volumes measured by diffusion weighted imaging within 24 hours of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. in 153 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The NIHSS was adequately correlated with acute diffusion weighted imaging lesion volumes (r = 0.48, right; r = 0.58, left) and with acute NIHSS scores and perfusion-weight imaging hypoperfusion volumes (r = 0.62, right; r = 0.60, left). However, a difference was observed in left- versus right-sided strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Among patients with diffusion weighted imaging lesions larger than the median volume, 8/37 with right-sided strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. had an NIHSS score of 0 – 5 compared with 1/39 patients with left-sided strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. However, multiple linear regression analysis revealed a significantly lower acute NIHSS on the right compared with the left side when adjusted for strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. volume, suggesting that patients with a right-sided strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. may have a low NIHSS score despite substantial lesion volume.

Woo et al. (1999) concurred with the findings of Fink et al. (2002). By using the placebo arm of the National Institute of Neurological Disorders and StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. rtPA (recombinant tissue plasminogen activator) Trial to examine whether total volume of cerebral infarction in patients with right hemisphere strokes would be greater than the volume of cerebral infarction in patients with left hemisphere strokes who have similar NIHSS scores. The results of this study suggested that the volume for right hemisphere strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. was statistically greater than the volume for left hemisphere strokes, when the baseline NIHSS score was adjusted. For each 5-point category of the NIHSS score (eg. from 16-20), the median volume of right hemisphere strokes was approximately double the median volume of left hemisphere strokes. The Spearman rank correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the 24-hour NIHSS score and 3-month lesion volume was 0.72 for patients with left hemisphere strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and 0.71 for patients with right hemisphere strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The results of this study show that for a given NIHSS score, the median volume of right hemisphere strokes is consistently larger than the median volume of left hemisphere strokes. Therefore, care must be taken when infarction size is being predicted from NIHSS score.

Modified NIHSS:
In a retrospective analysis, Lyden et al. (2001) measured the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the mNIHSS by comparing the correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
of mNIHSS with the other neurological scales (the Barthel Index, the Modified Rankin Scale, and the Glasgow Outcome Scale) measured at 3 months. The mNIHSS showed an excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with these scales at all time points, with correlations being strongest at 90 days (r = -0.82 for Barthel Index; r = 0.83 for modified Rankin Scale; r = 0.82 for Glasgow Outcome Scale). CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the Barthel Index is negative because a high score on the Barthel Index indicates functional independence whereas a high score on the mNIHSS indicates neurological deficit.

In a prospective analysis, Meyer et al. (2002) found that the mNIHSS demonstrated poor concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
with the Barthel Index and the Modified Rankin Scale. The coefficients for mNIHSS versus Barthel Index and modified Rankin Scale were -0.238 (the correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
is negative because a high score on the NIHSS indicates severe neurological impairment, whereas a high score on the Barthel Index indicates functional independence) and 0.296, respectively. The absolute Spearman correlations were higher with the use of the mNIHSS in comparison to the original NIHSS, however, values were not statistically significant. The weak relationships observed with the mNIHSS and the other scales may be due to the fact that patients in this study had only mild deficits, rendering it difficult to determine concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
, especially at the higher end of the scale.

Predictive:
Original NIHSS:
Lyden et al. (1999) used data from the National Institute of Neurological Disorders and StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (NINDS) tPA StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Trial to determine whether the NIHSS was valid in patients treated with tissue plasminogen activator. To assess the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the NIHSS, the scale was compared over time with the 3-month outcome of the Barthel Index, the Rankin Scale, and the Glasgow Outcome Scale. The correlations between the NIHSS and the other clinical outcomes were significant but adequate at baseline (Placebo group: Barthel Index, r = -0.48; Rankin Scale, r = 0.51; Glasgow Outcomes Scale, r = 0.49; Treatment group: Barthel Index, r = -0.51, Rankin Scale, r = 0.56; Glasgow Outcomes Scale, r = 0.56) and at 2 hours (Placebo group: Barthel Index, r = -0.58; Rankin Scale, r = 0.61; Glasgow Outcomes Scale, r = 0.60; Treatment group: Barthel Index, r = -0.65; Rankin Scale, r = 0.70; Glasgow Outcomes Scale, r = 0.68) after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The correlations were greater for the measurements later in time (24 hours, 7-10 days, 90 days post-stroke), which suggests that after 2 hours from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., the NIHSS may have greater predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
in terms of the 3-month outcome.

Schlegel et al. (2003) tested whether the NIHSS in the first 24 hours after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. onset could predict the next level of care after acute hospitalization in a retrospective study of 94 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. From medical records it was determined that 59% of patients were discharged home, 30% to rehabilitation, and 11% to a long-term nursing facility. For each 1-point increase in NIHSS score, the likelihood of going home was significantly reduced (OR = 0.79). The category of NIHSS score also predicted the next level of care. An NIHSS score 5 was strongly associated with discharge home. When compared with patients with an NIHSS ≤ 5, patients with a score from 6 to 13 were nearly 5 times more likely to be discharged to rehabilitation (OR = 4.8). Patients who scored >13 were nearly 10 times more likely to require rehabilitation (OR = 9.5) and more than 100-fold more likely to be placed in a long-term nursing facility (OR = 310). The results of this study suggest that the NIHSS, administered in the first 24 hours after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. onset, can retrospectively predict the next level of care after acute hospitalization.

Schlegel et al. (2004) examined whether the NIHSS could predict the next level of care in 46 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. treated with thrombolysis (recombinant tissue plasminogen activator). In a multinomial regression analysis, increasing NIHSS score was a strong independent predictor of discharge to rehabilitation or nursing facilities, roughly doubling for each 5-point increment (score 6 – 10: rehabilitation OR = 1.78, nursing facility OR = 2.31; score 11 – 15: rehabilitation OR = 2.66, nursing facility OR = 5.05; score 16 – 20: rehabilitation OR = 5.31, nursing facility OR = 16.30; score > 20 rehabilitation OR = 8.36, nursing facility OR = 27.40). The results of this study suggest that strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity as determined by the admission NIHSS score is a major independent predictor of the next level of care following hospitalization and treatment with thrombolysis for acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Demchuk et al. (2001) examined factors that were independently predictive of good outcome among 1,205 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. who were treated with alteplase (a type of thrombolytic therapy). Using multivariable logistic regression modeling, the most important predictor of outcome identified was found to be baseline strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity as measured by the NIHSS score. The higher the NIHSS score, the worse the odds were of having a good outcome (OR good outcome = 1.00 for NIHSS score ≤ 5; OR good outcome = 0.05 for NIHSS > 20).

Muir et al. (1996) compared the NIHSS, the Canadian Neurological Scale, and the Middle Cerebral Artery Neurological Score to see which scale best predicted good (alive at home) or poor (alive in care or dead) outcome in 408 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Predictive accuracy of the variables was compared by ROC curves and stepwise logistic regression. Logistic regression showed that the NIHSS added significantly to the predictive value of all other scores. The NIHSS overall accuracy was excellent (0.83). A cutoff point of 13 on the NIHSS best predicted 3-month outcome.

Adams et al. (1999) found that the NIHSS strongly predicts the likelihood of a patient’s recovery after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. in a post-hoc analysis by strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. subtype of 1,268 patients enrolled in an acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. trial. NIHSS scores were taken at baseline, 7 days, and 3 months after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. A score of ≥ 16 forecasted a high probability of death or severe disability whereas a score of ≤ 6 forecasted a good recovery. The baseline NIHSS score strongly predicted outcome at 7 days and at 3 months. By 7 days, 2/3 of the patients scoring ≤ 3 at baseline had an excellent outcome. One additional point on the NIHSS decreased the likelihood of excellent outcomes at 7 days by 24% and at 3 months by 17%. Patients with lacunar infarcts had significantly higher likelihood of an excellent outcome at 7 days and 3 months than did patients with non-lacunar strokes, but odds were poorer compared with patients with other types of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. when scores were 10 or more. At 3 months, excellent outcomes were noted in 46% of patients with NIHSS scores of 7 – 10 and in 23% of patients with scores of 11 – 15. Very few patients with baseline scores of > 15 had excellent outcomes after 3 months.

Albers, Bates, Clark, Bell, Verro, and Hamilton (2000) examined patients administered intravenous tissue-type plasminogen activator for treatment of acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. in 389 patients. A multivariate analysis found a less severe baseline NIHSS score (≤ 10) was a predictor of favorable outcome. For every 5-point increase in baseline NIHSS score, patients had a 22% decrease in the odds of recovery (OR = 0.78), and patients with baseline NIHSS scores greater than 10 had a 75% decrease in the odds of recovery (OR = 0.25).

DeGraba et al. (1999) administered the NIHSS serially to 127 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. for the first 48 hours of admission to the neuroscience intensive care unit and found that a 3-point or greater increase on the NIHSS indicated strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. progression. A significant cutoff that allowed for the greatest likelihood of predicting patient progression occurred when NIHSS scores were stratified as ≤ 7 and > 7. Patients with an initial NIHSS score of ≤ 7 experienced a 14.8% worsening rate and were more likely to be functionally normal (45% were functionally normal at 48 hours). Patients with an initial NIHSS score of > 7 had a 65.9% worsening rate and were less likely to be functionally normal at 48 hours (only 2.4% were functionally normal). These results demonstrate the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the NIHSS.

Frankel et al. (2000) examined whether a practical method for predicting a poor outcome after acute ischemic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. could be developed. Using data from the placebo arm of Part 1 and 2 of the National Institute of Neurological Disorders and StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. rt-PA (recombinant tissue plasminogen activator) StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Trial, patients with an NIHSS score > 17 with atrial fibrillation, yielded a positive predictive value of 96%. At 24 hours, the best predictor was an NIHSS score > 22, yielding a positive predictive value of 98%. At 7 – 10 days, the best predictor was an NIHSS score > 16, yielding a positive predictive value of 92%. The results of this study suggest that patients with a severe neurologic deficit after acute ischemic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., as measured by the NIHSS, have a poor prognosis and that during the first week after acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., it is possible to identify a subset of patients who are highly likely to have a poor outcome.

Rundek et al. (2000) examined predictors of discharge destinations following acute care hospitalization in 893 patients who survived acute care hospitalization for a first strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., followed prospectively. Polytomous logistic regression was used to determine predictors for rehabilitation and nursing home placement versus returning home. Among the survivors of acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. care hospitalization, 611 patients were discharged to their homes, 168 to rehabilitation, and 114 to nursing homes. Patients with adequate neurological deficits (NIHSS score from 6 – 13; rehabilitation OR = 8.0, nursing home OR = 3.8) and severe neurological deficits (NIHSS score ≥ 14; rehabilitation OR = 17.9, nursing home OR = 27.9) had more than a threefold increased risk of being sent to a nursing home and more than an eightfold increased risk of being sent to rehabilitation, demonstrating the clinical predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the NIHSS.

Bohannon, Lee, and Maljanian (2002) examined what variables predicted three hospital outcomes (hospital length of stay, hospital charges, and hospital discharge destination). NIHSS scores and Barthel Index scores correlated with all three outcomes. The correlations between NIHSS scores and hospital length of stay and hospital charges (ranging from r = 0.276 to r = 0.381) were positive, indicating that patients with more severe strokes had a longer hospital length of stay and higher hospital charges. The correlations between NIHSS scores and discharge destination were negative (r = -0.344 and r = -0.355), meaning that patients with more severe strokes were less likely to be discharged home. Regression analysis showed that once post-admission Barthel Index scores were accounted for, no other variable added to the prediction of hospital length of stay or discharge destination, however the NIHSS score added to the explanation of hospital charges provided by post-admission Barthel Index scores.

Derex et al. (2003) examined whether pre-treatment MRI parameters predicted clinical outcome in 49 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. treated by intravenous recombinant tissue plasminogen activator. Univariate and multivariate logistic regression analyses were used to identify the predictors of clinical outcome. The results of these analyses suggested that baseline NIHSS score was the best independent predictor of clinical outcome at day 60 (OR = 1.28).

Baird et al (2001) used logistic regression to develop a 3-item scale for predicting good strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. recovery, which was tested in 63 patients. By combining the NIHSS with the time from onset and lesion volume (as detected by diffusion weighted imaging) a score could be obtained to accurately predict strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. recovery. Scores of 0 to 2 indicate low probability of recovery, 3 to 4 medium, and 5 to 7 high. This score can help early decision-making regarding aggressiveness of care, discharge planningPlanning ability involves anticipating future events, formulating a goal or endpoint, and devising a sequence of steps or actions that will achieve the goal or endpoint" (Anderson, 2008, p. 17)
, and rehabilitation options.

Briggs, Felberg, Malkoff, Bratina, and Grotta (2001) examined the NIHSS scores of 138 patients admitted within 24 hours of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. to help determine if patients with mild strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. fared better by admission to a general ward or to the intensive care unit. They found a general positive correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between baseline NIHSS score and discharge Rankin score in adequate patients regardless of whether they were admitted to the intensive care unit or the ward (R2 = 0.273 and 0.09, respectively). Patients with mild strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (NIHSS score < 8) admitted to a general ward had fewer complications and more favorable discharge Rankin Scale scores than similar patients admitted to the intensive care unit. There was no obvious cutoff baseline NIHSS score that was predictive of better outcome (lower Rankin) in intensive care unit patients. There was no statistical difference in length of stay. Routinely admitting patients with NIHSS scores < 8 to intensive care appears to have no cost or outcomes benefit.

Di Legge, Saposnik, Nilanont, and Hachinski (2006) identified a subset of variables that were independently associated with major neurological improvement at 24 hours and good outcome at 3 months after treatment for 219 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. who received intravenous recombinant tissue plasminogen activator in the emergency department. Using logistic regression, the results of this study suggested that among other predictors, pre-treatment NIHSS score was an excellent negative predictor of good outcome at 3 months (OR = 0.83).

Chang, Tseng, Tan, and Liou (2006) examined factors related to 3-month mortality at admission in 360 patients with first-ever acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Multivariate logistic regression analysis was used to identify the main predictors of 3-month stroke-related mortality. Admission NIHSS score (OR = 1.17), history of cardiac disease (OR = 2.73), and posterior circulation strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (OR = 5.25) were significant risk factors for 3-month mortality.

Fischer et al. (2005) examined the admission NIHSS scores of 226 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. who underwent arteriography. Patients with NIHSS scores ≥ 10 had positive predictive values to show arterial occlusions in 97% of carotid and 96% of vertebrobasilar strokes. With an NIHSS score ≥ 12, the positive predictive value to find a central occlusion was 91%. In a multivariate analysis, NIHSS subitems such as level of consciousness questions (OR = 4.0), gaze (OR = 2.9), motor leg (OR = 4.2), and neglect (OR = 3.2) were predictors of central occlusions. There was a significant association between NIHSS scores and the presence and location of a vessel occlusion. With an NIHSS score ≥ 10, a vessel occlusion would likely be seen on arteriography, and with a score ≥ 12, its location would probably be central.

Modified NIHSS:
Lyden et al. (2001) examined the predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the mNIHSS using the outcome results of the National Institute of Neurological Disorders and StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. recombinant tissue plasminogen activator StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Trial. Using the mNIHSS to test for treatment effect on improvement at 24 hours and treatment effect on minimal or no disability at 3 months after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., the scale scores differentiated the two treatment groups at 24 hours and at 3 months. The proportion of patients who improved ≥ 4 points within 24 hours after treatment was significantly increased by recombinant tissue plasminogen activator (OR = 1.3). Likewise, the odds ratio for complete/nearly complete resolution of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. symptoms 3 months after treatment was significant (OR = 1.7) with the mNIHSS.

Content :
Original NIHSS:
Lyden et al. (1999) used data from the National Institute of Neurological Disorders and StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. recombinant tissue plasminogen activator Trial to determine whether the NIHSS was valid in patients treated with tissue plasminogen activator. To assess the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension.
of the scale, an exploratory factor analysis of NIHSS data was performed within the first 24 hours after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., to derive an underlying factor structure. The results from this analysis suggested that there were two factors, representing left and right brain function, underlying the NIHSS. The internal scale structure remained consistent in placebo and treated groups and when administered successively over time, confirming the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension.
of the scale.

Modified NIHSS:
Lyden et al. (2001) developed and assessed the validityThe degree to which an assessment measures what it is supposed to measure.
of the mNIHSS. Content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension.
was determined using factor analysis, and the goodness of fit was recalculated on the basis of a 4-factor solution restricted to the 11 NIHSS items involved in the mNIHSS. To prevent the confounding effects of time or treatment, the goodness of fit was calculated for data collected at 2 hours, 24 hours, 7 to 10 days, and 3 months after recombinant tissue plasminogen activator or placebo treatment. The results suggested that the internal structure of the mNIHSS was identical to that of the NIHSS. The goodness of fit (comparative fit index = 0.96) was equal to that of the NIHSS. When used over time, and in placebo-treated versus active-treated groups, the mNIHSS values ranged from 0.93 to 0.96 and were as strong as those of the NIHSS.

Responsiveness

Original NIHSS:
Brott et al. (1989) assessed the responsivenessThe ability of an instrument to detect clinically important change over time.
of the NIHSS by comparing the scale scores obtained prospectively on 65 patients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. to the patients’ infarction size as measured by computed tomography at 1 week. Although most patients improved clinically, 4/15 items changed only minimally: facial palsy (-2% improvement for item score at 1 week), plantar reflex (7% improvement for item score at 1 week), dysarthria (-1% improvement for item score at 1 week), and language (6% improvement for item score at 1 week). Also, change in limb ataxia (59% improvement) and best gaze (52% improvement) may have been overstated, based on infarction size observed. The other 10 items changed an average of 25% over 7 days. Raters in this study also had to conclude whether patients changed neurologically from the previous examination and from baseline. This was defined as “Same” (a change of 0-1 scale point), “Better” (an improvement of ≥ 2 scale points), and “Worse” (a deterioration of ≥ 2 scale points). Based on these definitions, from baseline to 7-10 days, agreement was achieved for 40/63 patients surviving at 7-10 days (63%) (compared quantitative criteria for patient change with the investigator’s judgment of patient change). The results of this study demonstrate that the NIHSS is responsive to change.

Modified NIHSS:
Lyden at al. (2001) examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the mNIHSS in a retrospective analysis. The mNIHSS imitated the original NIHSS in the predictive models, which can be taken as an indicator of responsivenessThe ability of an instrument to detect clinically important change over time.
. That is, the mNIHSS tends to predict response of patients to recombinant tissue plasminogen activator as well as the original scale, when used in the multivariable model. Likewise, the mNIHSS predicts likelihood of hemorrhage after recombinant tissue plasminogen activator treatment as well as the original in the multivariable model of symptomatic hemorrhage. Further, the power to detect a 4-point or greater improvement by 24 hours was increased from 24% with the NIHSS to 51% with the mNIHSS. Within-patient responsivenessThe ability of an instrument to detect clinically important change over time.
could not be assessed in this study.

Floor and Ceiling Effects

Muir et al. (1996) suggested that a potential shortcoming of the NIHSS is that because many scale items cannot be tested in patients with very severe strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., there may be a ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." below the theoretical limit.

Williams, Weinberger, Harris, Clark, and Biller (1999) administered the NIHSS to patients 1 and 3 months post-stroke. A ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." of the NIHSS was observed in the upper extremity domain: although 62% of patients reported upper extremity dysfunction 1 month after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., only 11% had an NIHSS arm score > 1.

Pickard, Johnson, and Feeny (2005) compared five health-related quality of life measures administered at baseline and at 6 months. A notable ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." was observed with the NIHSS at 6 months (20% of patients).

References

Adams, H. P., Davis, P. H., Leira, E. C., Chang, K-C., Bendixen, B. H., Clarke, W. R., Woolson, R. F., Hansen, M. D. (1999). Baseline NIH Stroke Scale score strongly predicts outcome after stroke: a report of the Trial of Org 10172 in Acute Stroke Treatment (TOAST). Neurology, 53, 126 -31.
Albanese, M. A., Clarke, W. R., Adams, H. P., Woolson, R. F., and TOAST Investigators. (1994). Ensuring reliability of outcome measures in multicenter clinical trials of treatments for acute ischemic stroke. Stroke, 25, 1746 -1751.
Albers GW, Bates, V. E., Clark, W. M., Bell, R., Verro, P., Hamilton, S. A. (2000). Intravenous tissue-type plasminogen activator for treatment of acute stroke: the Standard Treatment with Alteplase to Reverse Stroke (STARS) study. JAMA, 283, 1145 -1150.
Baird, A. E., Dambrosia, J., Janket, S., Eichbaum, Q., Chaves, C., Silver, B., Barber, P., Parsons, M., Darby, D., Davis, S. (2001). A three-item scale for the early prediction of stroke recovery. Lancet, 357, 2095 -2099.
Berger, K., Weltermann, B., Kolominsky-Rabas, P., Meves, S., Heuschmann, P., Bohner, J., Neundorfer, B., Hense, H. W., Buttner, T. (1999). The reliability of stroke scales. The German version of NIHSS, ESS and Rankin scales [German]. Fortschr Neurol Psychiatr, 67(2), 81-93.
Bohannon, R. W., Lee, N., Maljanian, R. (2002). Postadmission function best predicts acute hospital outcomes after stroke. Am J Phys Med Rehabil, 81, 726 -730.
Briggs, D. E., Felberg, R. A., Malkoff, M. D., Bratina, P., Grotta, J. C. (2001). Should mild or adequate stroke patients be admitted to an intensive care unit? Stroke, 32, 871-876.
Brott, T. G., Haley, Jr., E. C., Levy, D. E., Barsan, W., Broderick, J., Sheppard, G. L., Spilker, J., Dongable, G. L., Massey, S., Reed, R. (1992). Urgent therapy for stroke I: Pilot study of tissue plasminogen activator administered within 90 minutes.
Stroke, 23, 632-640.
Brott, T. G., Adams, H. P., Olinger, C. P., Marler, J. R., Barsan, W. G., Biller, J., Spilker, J., Holleran, R., Eberle, R., Hertzberg, V., Rorick, M., Moomaw, C. J., Walker, M. (1989). Measurements of acute cerebral infarction: a clinical examination scale. Stroke, 20, 864 -70.
Bushnell, C. D., Johnston, D. C. C., Goldstein, L. B. (2001). Retrospective assessment of initial stroke severity: comparison of the NIH Stroke Scale and the Canadian Neurological Scale. Stroke, 32, 656 -60.
Chang, K-C., Tseng, M-C., Tan, T-Y., Liou, C-W. (2006). Predicting 3-month mortality among patients hospitalized for first-ever acute ischemic stroke. Journal of the Formosan Medical Association, 105(4), 310-7.
DeGraba, T. J., Hallenbeck, J. M., Pettigrew, K. D., Dutha, A. J., Kelly, B. J. (1999). Progression in acute stroke. Value of initial NIH Stroke Scale on patient stratification in future trials. Stroke, 30: 1208 -1212.
Demchuk, A. M., Tanne, D., Hill, M. D., Kasner, S. E., Hanson, S., Grond, M., Levine, S. R., The Multicentre tPA Stroke Survey Group. (2001). Predictors of good outcome after intravenous tPA for acute ischemic stroke. Neurology, 57, 474 – 480.
Dewey, H. M., Donnan, G. A., Freeman, E. J., Sharples, C. M., Macdonell, R. A. L., McNeil, J. J., Thrift, A. G. (1999). Interrater Reliability of the National Institutes of Health Stroke Scale: Rating by Neurologists and Nurses in a Community-Based Stroke Incidence Study. Cerebrovascular Diseases, 9, 323-327.
Dominguez, R., Vila, J. F., Augustovski, F., Irazola, V., Castillo, P. R., Escalante, R. R., Brott, T. G., Meschia, J. F. (2006). Spanish cross-cultural adaptation and validation of the National Institutes of Health Stroke Scale.
Derex, L., Nighoghossian, N., Hermier, M., Adeleine, P., Berthezene, Y., Philippeau, F., Honnorat, J., Froment, J. C., Trouillas, P. (2004). Influence of pretreatment MRI parameters on clinical outcome, recanalization and infarct size in 49 stroke
patients treated by intravenous tissue plasminogen activator. J Neurol Sci, 225, 3 -9.
Fink, J. N., Selim, M. H., Kumar, S., Silver, B., Linfante, I., Caplan, L. R., Schlaug, G. (2002). Is the association of National Institutes of Health Stroke Scale scores and acute magnetic resonance imaging stroke volume equal for patients with right- and left-hemisphere ischemic stroke? Stroke, 33, 954 -958.
Fischer, U., Arnold, M., Nedeltchev, K., Brekenfeld, C., Ballinari, P., Remonda, L., Schroth, G., Mattle, H. (2005). NIHSS score and arteriographic findings in acute ischemic stroke Stroke, 36, 2121-2125.
Goldstein, L., Bertels, C., Davis, J. (1989). Interrater reliability of the NIH Stroke Scale. Arch. Neurol, 46, 660-662.
Goldstein, L. B., Samsa, G. P. (1997). Reliability of the National Institutes of Health stroke scale: Extension to non-neurologists in the context of a clinical trial. Stroke, 28, 307 -310.
Haley, E. C., Levy, D. E., Brott, T. G., Sheppard, G. L., Wong, M. C., Kongable, G. L., Torner, J. C. Marler, J. R. (1992). Urgent therapy for stroke. II: Pilot study of tissue plasminogen activator administered 91-180 minutes from onset. Stroke, 23,
641-645.
Kasner, S. E., Chalela, J. A., Luciano, J. M., Cucchiara, B. L., Raps, E. C., McGarvey, M. L., Conroy, M. B., Localio, A. R. (1999). Reliability and validity of estimating the NIH Stroke Scale score from medical records. Stroke, 30, 1534 -37.
Kasner, S. E., Cucchiara, B. L., McGarvey, M. L., Luciano, J. M., Liebeskind, D. S., Chalela, J. A. (2003). Modified National Institutes of Health Stroke Scale can be estimated from medical records. Stroke, 34, 568 -70.
Lai. S. M., Duncan, P. W., Keighley, J. (1998). Prediction of functional outcome after stroke. Comparison of the Orpington Prognostic Scale and the NIH Stroke Scale. Stroke, 29, 1838-1842.
Lyden, P., Raman, R., Liu, L., Grotta, J., Broderick, J., Olson, S., Shaw, S., Spilker, J., Meyer, B., Emr, M., Warren, M., Marler, J. (2005). NIHSS training and certification using a new digital video disk is reliable. Stroke, 36, 2446-2449.
Lyden, P. Lau, G. T. (1991). A critical appraisal of stroke evaluation and rating scales. Stroke, 22, 1345-1352. Lyden, P., Brott, T., Tilley, B., Welch, K. M., Mascha, E. ., Levine, S., Haley, E. C., Grotta, J., Marler, J. (1994). Improved reliability of the NIH Stroke Scale using video training. Stroke, 25, 2220-2226.
Lyden, P., Lu, M. Jackson, C., Marler, J., Kothari, R., Brott, T., Zivin, J. (1999) Underlying structure of the National Institutes of Health Stroke Scale: Results of a factor analysis. Stroke 30, 2347-2354.
Lyden, P. D., Lu, M., Levine, S. R., Brott, T. G., Broderick, J. (2001). NINDS rtPA Stroke Study Group. A modified National Institutes of Health Stroke Scale for use in stroke clinical trials: preliminary reliability and validity. Stroke, 32, 1310 -17.
Lyden, P., Claesson, L., Havstad, S., Ashwood, T., Lu, M. (2004). Factor analysis of the National Institutes of Health Stroke Scale in patients with large strokes. Arch Neurol, 61, 1677-80.
Meyer, B. C., Lyden, P. D., Al-Khoury, L., Cheng, Y., Raman, R., Fellman, R., Beer, J., Rao, R., Zivin, J. A. (2005). Prospective reliability of the STRokE DOC wireless/site independent telemedicine system. Neurology, 64, 1058 -60.
Meyer, B. C., Hemmen, T. M., Jackson, C. M., Lyden, P. D. (2002). Modified National Institutes of Health stroke scale for use in stroke clinical trials: prospective reliability and validity. Stroke, 33, 1261 -66.
Muir, K. W., Weir, C. J., Murray, G. D., Povey, C., Lees, K. R. (1996). Comparison of neurological scales and scoring systems for acute stroke prognosis. Stroke, 27, 1817-1820.
National Institute of Neurological Disorders and Stroke rt-PA Stroke Study Group (1995). Tissue Plasminogen activator for acute ischemic stroke. N. Eng. J. Med, 333, 1581-1587.
Nighoghossian, N., Hermier, M., et al. (2004). Influence of pretreatment MRI parameters on clinical outcome, recanalization and infarct size in 49 stroke patients treated by intravenous tissue plasminogen activator. J Neurol Sci, 225, 3 -9.
Olinger, C. P., Adams, H. P., Brott, T. G., Biller, J., Barsan, W. G., Toffol, G. J., Eberle, R. W., Marler, J. R. (1990). High-dose intravenous naloxone for the treatment of
acute ischemic stroke. Stroke, 21, 721-725.
Pickard, A. S., Johnson, J. A., Feeny, D. H. (2005). Responsiveness of generic health-related quality of life measures in stroke. Qual Life Res, 14, 207-219.
Rundek, T., Mast, H., Hartmann, A., Boden -Albala, B., Lennihan, L., Lin, I.-F., Paik, M. C., Sacco, R. L. (2000). Predictors of resource use after acute hospitalization: the Northern Manhattan Stroke Study. Neurology, 55, 1180 -87.
Saver, J. L., Johnston, K. C., Homer, D., et al. (1999). Infarct volume as a surrogate or auxiliary outcome measure in ischemic stroke clinical trials. Stroke, 30, 293 -98.
Schiemanck, S. K., Post, M. W. M., Witkamp, T. D., Kappelle, L. J., Prevo, A. J. H. (2005). Relationship between ischemic lesion volume and functional status in the 2nd week after middle cerebral artery stroke. Neurorehabil Neural Repair, 19, 133 -38.
Schlegel, D., Kolb, S. J., Luciano, J. M., Tovar, J. M., Cucchiara, B. L., Liebeskind, D. S., Kasner, S. E. (2003). Utility of the NIH Stroke Scale as a predictor of hospital disposition. Stroke, 34, 134 -37.
Schlegel, D. J., Tanne, D., Demchuk, A. M., Levine, S. R., Kasner, S. E. (2004). Multicenter rt-PA Stroke Survey Group. Prediction of hospital disposition after thrombolysis for acute ischemic stroke using the National Institutes of Health Stroke Scale. Arch Neurol, 61, 1061 -64.
Schmülling, S., Grond, M., Rudolf, J., Kiencke, P. (1998). Training as a prerequisite for reliable use of NIH Stroke Scale [letter]. Stroke,

See the measure

How to obtain the NIHSS:

This measurement tool is available in the following article: https://www.ahajournals.org/doi/10.1161/STROKEAHA.116.015434

See the measure:

Please click here for a copy of the NIHSS

Evidence Reviewed as of before: 18-02-2019

Author(s)*: Annabel McDermott, OT

Expert Reviewer: Trixie Reichardt, MHSc, RD, Rosemary Martino, PhD

Content consistency: Gabriel Plumier

Purpose

The Toronto Bedside Swallowing Screening Test (TOR-BSST©) is a screening tool which identifies patients at risk for dysphagia following stroke.

In-Depth Review

Purpose of the measure

The Toronto Bedside Swallowing ScreeningTesting for disease in people without symptoms.
Test (TOR-BSST©) is a screeningTesting for disease in people without symptoms.
tool administered at the bedside by trained screeners which identifies patients at risk for dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Available versions

Features of the measure

Items:

Baseline vocal quality
Tongue movement
50mL water test
Cup sip
Final judgment of vocal quality

Scoring:

The TOR-BSST© uses binary scoring (i.e. abnormal/normal) for each item. Failure on any item discontinues the screen and prompts referral to a Speech-Language Pathologist dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
expert.

What to consider before beginning:

The TOR-BSST© should only be used with patients who are alert, able to sit upright at 90 degrees, and are able to follow simple instructions. Patients who do not meet these guidelines should not be screened but, instead, be referred to a Speech-Language Pathologist for assessment.

International best practice guidelines advise that, following stroke, patients should undergo screening for swallowing difficulties before oral intake of food, fluids or oral medication. Screening should be performed by specially trained personnel, using a validated screening tool. Swallowing should be screened as soon as possible after admission provided that the patient is able to participate. Patients who fail the swallowing screening should be referred to a Speech-Language Pathologist for comprehensive swallowing assessment. For patients who are confirmed at high risk of aspiration and/or dysphagia should undergo an instrumental assessment such as videofluoroscopy swallowing study (VFS) and/or fibreoptic evaluation of swallowing (FEES).

Time:

The TOR-BSST© takes less than 10 minutes to administer and score. Administration ceases immediately on failure of an item.

Training requirements:

The TOR-BSST© can be administered by health professionals who have undergone the requisite 4-hour didactic standardized training program. Didactic training is followed by individual training/competency observations. Training is provided by Speech-Language Pathologists who have completed the “TOR-BSST© Training for the SLP DysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
Expert” trainers course.

See The Swallowing Lab (https://swallowinglab.com/tor-bsst/) for details.

Equipment:

Client suitability

Can be used with:

The TOR-BSST© is being validated for use with critically ill patients who have undergone prolonged intubation and may be at risk of swallowing problems.

Should not be used in:

Following stroke, patients should be assessed and managed according to best practice guidelines for dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
. The TOR-BSST© should not be used with individuals with decreased alertness or cognition, or those who are being tube-fed. Patients who are being tube-fed have already been identified to have dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
and therefore should be referred to a Speech-Language Pathologist for a comprehensive assessment and management.

In what languages is the screening tool available?

English
French
Chinese
German
Italian
Portuguese (Brazil)

Summary

What does the tool measure?	Risk for dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration. following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
What types of clients can the tool be used for?	The TOR-BSST© was developed for patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. across the recovery continuum.
Is this a screening or assessment tool?	ScreeningTesting for disease in people without symptoms. tool
Time to administer	Ten minutes.
Versions	There is one version of the TOR-BSST©.
Languages	Chinese, English, French, German, Italian, Portuguese (Brazil)
Measurement Properties
Reliability	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: No studies have reported on the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the TOR-BSST©. Test-retest: No studies have reported on the test-retest reliability of the TOR-BSST©. Intra-rater: No studies have reported on the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings. of the TOR-BSST©. Inter-rater: Two studies have reported excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the TOR-BSST©.
ValidityThe degree to which an assessment measures what it is supposed to measure.	Content: Development of the TOR-BSST© involved item generation from systematic reviewA systematic review is a summary of available research on a given topic that compares studies based on design and methods. It summarizes the findings of each, and points out flaws or potentially confounding variables that may have been overlooked. A critical analysis of each study is done in an effort to rate the value of its stated conclusions. The research findings are then summarized, and a conclusion is provided. and subsequent item reduction, in combination with consultation with expert Speech-Language Pathologists. Criterion: Concurrent: No studies have reported on the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the TOR-BSST©. Predictive: One study has conducted a randomized controlled diagnostic study of the TOR-BSST© by comparison with videofluoroscopy. Construct: Convergent/Discriminant: No studies have reported on the convergent or discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess. of the TOR-BSST©. Known Groups: No studies have reported on the known group validityThe degree to which an assessment measures what it is supposed to measure. of the TOR-BSST©.
Floor/Ceiling Effects	Not applicable
Does the tool detect change in patients?	The TOR-BSST© is designed as a screeningTesting for disease in people without symptoms. test and scored using binary responses, so is not intended to detect change.
Acceptability	– The TOR-BSST© is quick to administer. – The TOR-BSST© requires specialised training.
Feasibility	The TOR-BSST© is suitable for administration across acute and rehabilitation settings. The screeningTesting for disease in people without symptoms. is easily portable and is quick to administer, score and interpret.
How to obtain the tool?	Click here for information regarding the TOR-BSST©.

Psychometric Properties

Overview

The TOR-BSST© was developed and validated by Dr. Martino of The Swallowing Lab, University Health Network, University of Toronto.

A literature search was conducted to identify all relevant publications on the psychometric properties of the TOR-BSST©. Four studies were identified.

Floor/Ceiling Effects

The TOR-BSST© is a 5-item screeningTesting for disease in people without symptoms.
test to determine risk of dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
. The screeningTesting for disease in people without symptoms.
should be discontinued as soon as an individual fails an item.

Reliability

Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.:
No studies have reported on internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the TOR-BSST©.

Test-retest:
No studies have reported on the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the TOR-BSST©.

Intra-rater:
No studies have reported on the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
of the TOR-BSST©.

Inter-rater:
Martino et al. (2009) established inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the TOR-BSST© in the first 50 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. enrolled, using intraclass correlation coefficient (ICC)Intraclass correlation (ICC) is used to measure inter-rater reliability for two or more raters. It may also be used to assess test-retest reliability. ICC may be conceptualized as the ratio of between-groups variance to total variance. and 95% confidence intervals (CI). Results indicated excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
(ICC=0.92; CI, 0.85 to 0.96).

Martino et al. (2006) examined 24-hour inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the TOR-BSST© item and total screen scores in a sample of 286 patients with stroke (acute, n=78; subacute/chronic, n=208), using kappa statistics. Results indicated moderate reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
for the total score, with a higher reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
early after training (k = 0.90). Item reliability ranged from poor to adequate; the item ‘water swallowing’ including both the 50-ml and sip achieved the highest item reliability (k=0.82; CI, 0.66-0.98).

Validity

Content:

Initial item generation for the TOR-BSST© resulted from systematic review of the accuracy and benefit of non-invasive bedside dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
screeningTesting for disease in people without symptoms.
tests with patients with stroke (see Martino, Pron & Diamant, 2000). Two measures were shown to be accurate predictors of dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
by videofluroscopic assessment (VFS) of aspiration, and a further two were considered to show promising (although inconsistent) predictive ability:

Dysphonia/coughing during the 50mL Kidd water swallow test
Impaired pharyngeal sensation
Impaired tongue movement
General dysphonia – voice before or voice after water intake

The final measure, general dysphonia, was defined as two sub-items (voice before, voice after).

Item reduction was then performed, whereby positive results across the 5 items were compared with the total score. The item ‘water swallow’ contributed 25% to the total positive score, indicating that this item was the most frequent single item to identify dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
. The item ‘tongue movements’ contributed 8% to the total positive score. The remaining items contributed less than 5% each to the total positive score, and so were considered for elimination on review of practical application as determined by expert Speech-Language Pathologists. These expert clinicians considered the item ‘pharyngeal sensation’ to be impractical due to difficulty differentiating from a gag reflex in the clinical setting.

Martino et al. (2014) conducted item descriptive analysis in the original sample of 311 patients with stroke from acute and rehabilitation settings. The TOR-BSST© was administered by trained nurses. Items were eliminated individually to evaluate the impact of each item on the total score. Results showed that the ‘water swallow’ item contributed most significantly to identification of dysphagia, identifying 42.7% of patients in the acute setting and 29.0% of patients in the rehabilitation setting.

Criterion:

Predictive:
Martino et al. (2009) examined predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the TOR-BSST© by comparison with gold standardA measurement that is widely accepted as being the best available to measure a construct.
VFS assessment identifying any abnormal swallow physiology including all severity. The randomized controlled diagnostic study design included four blinded Speech-Language Pathologists and 68 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. in acute and rehabilitation settings. Nine participants with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. were eliminated when the TOR-BSST© and VFS assessments were performed more than 24 hours apart as per a priori criteria for patient flow. VFS assessment was used to confirm findings obtained by TOR-BSST© screeningTesting for disease in people without symptoms.
; clinicians rated the VFS images using three standardized scales: (1) Penetration Aspiration Scale; (2) Mann Assessment of Swallowing Ability (MASA) dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
subscore; and (3) MASA aspiration subscore. Across the entire sample of acute and rehab patients, results showed that 61% (n=36) of patients were confirmed by experts to have no dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
vs. 39% (n=23) with dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
. These results indicate high accuracy to predict dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
using the TOR-BSST©, where dysphagiaDifficulty, discomfort or pain in swallowing due to problems in nerve or muscle control. It is common in patients who have had a stroke. Dysphagia ranges from slight discomfort to complete inability to swallow. Dysphagia may compromise nutrition and hydration and may lead to aspiration pneumonia and dehydration.
is defined by aspiration and/or physiological abnormality on VFS.

Construct:

Convergent/Discriminant:
No studies have reported on the convergent/discriminant validityThe degree to which an assessment measures what it is supposed to measure.
of the TOR-SST©.

Known Group:
No studies have reported on the known-group validityThe degree to which an assessment measures what it is supposed to measure.
of the TOR-BSST(c).

Sensitivity & Specificity:

Martino et al. (2009) examined sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of the TOR-BSST© by comparison with VFS assessment, in a sample of 68 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. in acute and rehabilitation settings. Nine patients were eliminated when the TOR-BSST© and VFS assessments were performed more than 24 hours apart. The TOR-BSST showed 91.3% sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
(CI, 71.9 – 98.7) and 66.7% specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
(CI, 49.0 – 81.4) among all patients. SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
and specificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
was 96.3% and 63.6% (respectively) among patients in an acute setting, and 80.0% and 68.0% (respectively) among patients in rehabilitation settings. The TOR-BSST© showed high negative predictive value of 93.3% and 89.5% in participants in acute and rehabilitation stroke settings, respectively.

Martino et al. (2014) conducted sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
analysis of the TOR-BSST© in the original sample of 311 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. from acute and rehabilitation settings. The TOR-BSST© was administered by trained nurses using the standard 10 teaspoons plus a sip of water. Positive screeningTesting for disease in people without symptoms.
occurred in 59.2% of patients in the acute setting (n=103) and 38.5% of patients in the rehabilitation setting (n=208).

Martino et al. (2014) further examined sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of the TOR-BSST© when modifying administration according to water volume intake. Using the original sample from Martino et al. (2009), sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
was examined on administration of 1 to 10 teaspoons of water to determine the acceptable cut-point to identify dysphagia. Among all participants (n=311), sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
ranged from moderate to excellent for 5, 8 and 10 teaspoons of water (79%, 92%, 96% respectively). Among patients in the acute setting and rehabilitation settings, sensitivities were 84% and 75% (respectively) for 5 teaspoons of water, 93% and 92% (respectively) for 8 teaspoons, and 95% and 97% (respectively) for 10 teaspoons. Results indicate greater accuracy on administration of 10x 5mL teaspoons of water, as per the original assessment guidelines

References

Martino, R., Maki, E., & Diamant, N. (2014). Identification of dysphagia using the Toronto Bedside Swallowing Screening Test (TOR-BSST©): are 10 teaspoons of water necessary? International Journal of Speech-Language Pathology, 16(3), 193-8. https://www.ncbi.nlm.nih.gov/pubmed/24833425
Martino, R., Nicholson, G., Bayley, M., Teasell, R., Silver, F., & Diamant, N. (2006). Interrater reliability of the Toronto Bedside Swallowing Screening Test (TOR-BSST©) [Abstract]. Dysphagia, 21(4), 287-334. https://doi.org/10.1007/s00455-006-9044-5
Martino, R., Pron, G., & Diamant, N. (2000). Screening for oropharyngeal dysphagia in stroke: insufficient evidence for guidelines. Dysphagia, 15, 19-30. https://www.ncbi.nlm.nih.gov/pubmed/10594255
Martino, R., Silver, F., Teasell, R., Bayley, M., Nicholson, G., Streiner, D.L., & Diamant, N.E. (2009). The Toronto Bedside Swallowing Screening Test (TOR-BSST): Development and validation of a dysphagia screening tool for patients with stroke. Stroke, 40, 555-61. https://www.ncbi.nlm.nih.gov/pubmed/19074483

See the measure

Other measures of dysphagia:

Instrumental Assessments:

Videofluoroscopy swallowing study (gold standardA measurement that is widely accepted as being the best available to measure a construct.
)
Fiberoptic endoscopic examination of swallowing
Rosenbeck’s Penetration Aspiration Scale

Clinical Bedside Assessments:

The Modified Mann Assessment of Swallowing Ability (Modified MASA)

Screening Tools:

Massey Bedside Swallowing Screen Volume-Viscosity Swallowing Test (Clave et al., 2008)
The Gugging Swallowing Screen (GUSS) (Trapl et al., 2007)