Medical Outcomes Study Short Form 36 (SF-36)

Evidence Reviewed as of before: 19-08-2008

Author(s)*: Lisa Zeltzer, MSc OT

Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc; Maxim Ben Yakov, BSc PT

Purpose

In-Depth Review

Purpose of the measure

The Medical Outcomes Study 36-item Short-Form Health Survey is a widely used, generic, patient-report measure created to assess health-related quality of life (HRQOL) in the general population. It was developed as part of the Medical Outcomes Study (a two-year study of patients with chronic conditions) (Ware & Sherbourne, 1992). Today, the SF-36 is the most commonly used generic instrument for measuring quality of life (de Haan, 2002). The SF-36 can be used, but is not limited to, persons with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Available versions

The SF-36 was published in 1992 by Ware and Sherbourne, and further developed and validated in 1993 and 1994 respectively (Ware & Sherbourne, 1992; McHorney, Ware & Raczek, 1993; McHorney, Ware, Lu & Sherbourne, 1994). In 1996, Version 2.0 of the SF-36 (SF-36v2) was introduced, to correct for deficiencies identified in the original version. Changes include a few wording alterations, for example, “downhearted and blue” in a question on mental health symptoms is now “downhearted and depressed”. SF-36v2 is now considered “the international version” of the SF-36 (Andresen & Meyers, 2000). The original SF-36 questions had variable numbers and formats for response categories, and these have been increased and/or standardized among scales and questions. Role Functioning items now have five levels of responses rather than two. This may increase the responsivenessThe ability of an instrument to detect clinically important change over time.
of the scales. Early reports of tests of this new version have been positive (Jenkinson, Stewart-Brown, Petersen & Paice, 1999). Versions 1.0 and 2.0 of the SF-36 are available with two recall periods: the standard 4-week recall, and the acute 1-week recall period.

Features of the measure

Items:

Items of the SF-36 are divided into eight different domains:

Physical component:

Physical functioning (10 items)
Role limitations due to physical problems (4 items)
Bodily pain (2 items)
General health perceptions (5 items)

Mental component

Social functioning (2 items)
General mental health (5 items)
Role limitations due to emotional problems (3 items)
Vitality (4 items)

Other

Health transition (1 question): The respondent is asked to rate their current health status compared to their health status one year ago. This question remain separate from the 8 subscales and is not scored.

There are 11 questions in the SF-36, with 36 items in total. With the exception of the general change in health status questions, subjects are asked to respond with reference to the past 4 weeks. An acute version of the SF-36 refers to problems in the past week only (McDowell & Newell, 1996).

Scoring:

The SF-36 does not lend itself to the generation of an overall summary score. This is because information within the individual responses is lost in the total scale score (since the total score can be achieved in a variety of ways from individual item responses) (Dorman et al., 1999). The recommended scoring system for the SF-36 is a weighted Likert system for each item. Items within subscales are totaled to provide a summed score for each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
or dimension. Each of the 8 summed scores is linearly transformed onto a scale from 0 (negative health) to 100 (positive health) to provide a score for each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
. A physical component score (PCS) and mental component score (MCS) can be derived from the scale items. However, these summary scores should be interpreted with caution. Hobart et al. (2002) examined the use of this two-dimensional model and found that these two scales accounted for only 60% of the variance in SF-36 scores. This finding suggests that there is a significant loss of information when this two-dimensional model is used.

Subscales:

The SF-36 has 8 subscales

Physical Functioning,
Role Limitations due to Physical Problems,
General Health Perceptions,
Vitality,
Social Functioning,
Role Limitations due to Emotional Problems,
General Mental Health,
Health Transition.

Equipment:

Only the test and a pencil are required. Computer administered and telephone voice recognition interactive systems of administration of the SF-36 are currently being evaluated (SF-36 Health Survey Update: John E. Ware, Jr.).

Training:

No training is required for administration of the SF-36. The SF-36 is suitable for self-administration, computerized administration, or administration by a trained interviewer in person or by telephone, to persons age 14 and older (Ware & Sherbourne, 1992).

Time:

The SF-36 is considered simple to administer and takes an average of 10 minutes to complete (Andreson & Meyers, 2000). The SF-36 has been studied for use by a proxy, however, administration by proxy is not recommended for patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., as agreement has been found to be poor in this patient population (Segal & Schall, 1994; Dorman, Slattery, Farrell, & Dennis, 1998). Instead, a stroke-specific quality of life measure such as the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale, which has been evaluated successfully for use by proxy respondents, may be more a more appropriate measure to be administered by proxy. Another reliable measure of health status for strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. patients by proxy is the Health Utilities Index (HUI) which has been reported to have adequate to excellent agreement in between patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and their proxies (Mathias, Bates, Pasta, Cisternas, Feeny & Patrick, 1997).

The SF-36 can also be completed as a mail survey. As a self-completed, mailed questionnaire, it has been shown to have reasonably high response rates (83% – Brazier et al., 1992, O’Mahoney, Rodgers, Thomson, Dobson, & James, 1998; 75% – 83% Dorman et al., 1998; 85% – Dorman et al., 1999; 82% overall and 69% for those over age 85 – Walters et al., 2001). However, data is typically more complete when interviewer administration is used. However, low completion rates may not be limited to self-completion or postal administration. Andresen et al. (1999) administered the SF-36 to nursing home residents by face-to-face interview and reported that only 1 in 5 residents were able to complete it. It is possible that data completeness is indicative of respondent acceptance and understanding of the survey as relevant to them (O’Mahoney et al., 1998; Andresen et al., 1999). Hayes et al. (1995) identified that the most common items missing on the self-completed questionnaire referred to work or to vigorous activity. Older respondents recognized these questions as relevant to much younger people and not pertinent to their own situation. The authors suggested modifications to some of the questions, which may increase acceptability to older populations.

Alternative forms of the SF-36

SF-12 (Ware, Kosinski, & Keller, 1996)

The SF-12 was developed as an abbreviated version of the SF-36 for use in large surveys of general and specific populations as well as large longitudinal studies of health outcomes. It can be self-administered, or administered via interview, telephone, or computer. The SF-12 takes 5 minutes or less to complete (Nemeth, 2006). The SF-12v2 was later developed to correspond to the SF-36v2 and has demonstrated the same improvements as observed with the SF-36v2 (Ware, Kosinski, Turner-Bowker & Gandek, 2002). Versions 1.0 and 2.0 of the SF-12 are available with two recall periods: the standard 4-week recall, and the acute 1-week recall period.

SF-8 (QualityMetric, Incorporated)

The SF-8, a new generic eight-item assessment, generates a health profile consisting of eight scales and two summary measures describing HRQOL. The SF-8 uses one question to measure each of the eight SF-36 domains. The development, validation and norming of the new SF-8, including standard (4-week recall), acute (1-week recall), and 24-hour recall versions is documented in the SF-8 manual, “How to Score and Interpret Single-Item Health Status Measures: A Manual for Users of the SF-8 Health Survey” (Ware, Kosinski, Dewey & Gandek, 2001). The SF-8 Health Survey can be self-administered, computer-administered, or given by a trained interviewer in person or by telephone to persons aged 14 and older. It takes approximately 1-2 minutes to complete and it has been translated and validated for use in more than 30 countries (for a list of these countries, click on this list) (accessed July 12, 2006).

SF-6D (Brazier, Usherwood, Harper, & Thomas, 1998; Brazier, Roberts, & Deverill, 2002)

The SF-6D is a preference-based scoring system that uses six subscales from the SF-36, to allow for calculations of utilities from SF-36 and SF-36v2 responses. The eight dimensions from SF-36 were reduced to six by omitting General Health Perceptions and combining Role Limitations-Physical and Role Limitatons-Emotional. Good reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
and validityThe degree to which an assessment measures what it is supposed to measure.
have been reported for the SF-6D (Petrou & Hockley, 2005; Brazier, Roberts, Tsuchiya & Busschbach, 2004).

For a fee, all versions of the SF Health Survey can be scored online via Quality Metric’s website (accessed July 12, 2006).

Client suitability

Can be used with:

Individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

The SF-36 is the most widely used measure to assess HRQOL in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., however, its suitability in this patient population has been contentious:

Hobart, Williams, Moran, and Thompson (2002) reported that of their sample of 177 post-stroke patients, five of the eight SF-36 subscales were found to have limited validityThe degree to which an assessment measures what it is supposed to measure.
as outcome measures, and that the reporting of physical and mental summary scores were not supported. The authors questioned the use of the SF-36 in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
de Haan (2002) reported that when the results of the relatively small study of Hobart et al. (2002) were taken in conjunction with the findings of previous research, there was insufficient evidence to question the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
and validityThe degree to which an assessment measures what it is supposed to measure.
of the SF-36 subscales in strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Should not be used in:

Patients who cannot understand written or spoken language. Make sure the patient is fluent in the language used in the survey.
More severely affected strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. survivors who need a proxy to complete (Dorman et al., 1998). Instead, a stroke-specific quality of life measure such as the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale, which has been evaluated successfully for use by proxy respondents, may be more a more appropriate measure to be administered by proxy. Another more reliable measure of health status for strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. patients by proxy is the Health Utilities Index (HUI) which has been reported to have moderate to high agreement in interrater reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
between strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. patients and proxies (Mathias et al., 1997).
Patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada). For patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada), a stroke-specific quality of life measure developed specifically for patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada), such as the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and AphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) Quality Of Life Scale (SAQOL-39), should be used (Hilari, Byng, Lamping, & Smith, 2003).
The SF-36 should not be used to document individual patient change. Dorman, Slattery, Farrell, Dennis, and Sandercock (1998) found that although the SF-36 can function effectively as a discriminatory measure for assessing health-related quality-of-life outcomes in groups of patients after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., the SF-36 may not be adequate for serial assessments of individual patients, unless large differences over time are expected. Thus, the SF-36 should be used for large group comparisons only.

In what languages is the measure available?

The SF-36 is available in a number of languages. In 1991, the International Quality of Life Assessment launched a project aimed at translating, validating and norming the SF-36 health survey. The project, which is based at the Health Assessment Lab in Boston, has sponsored investigators from 14 countries: Australia, Belgium, Canada, Denmark, France, Germany, Italy, Japan, The Netherlands, Norway, Spain, Sweden, the United Kingdom (English version), and the United States (English and Spanish versions). In addition, the SF-36 has been translated for use in more than 40 other countries, including: Argentina, Armenia, Austria, Bangladesh, Brazil, Bulgaria, Cambodia, Chile, China, Colombia, Costa Rica, Croatia, Czech Republic, Finland, Greece, Guatemala, Honduras, Hong Kong, Hungary, Iceland, Israel, Korea, Latvia, Lithuania, Mexico, New Zealand, Peru, Poland, Portugal, Romania, Russia, Singapore, Slovak Republic, South Africa, Switzerland, Taiwan, Tanzania, Turkey, the United Kingdom (Welsh), the United States (Chinese, Japanese, Vietnamese), Uruguay, Venezuela, and Yugoslavia. There are more than 500 publications that use translations or English-language adaptations of the SF-36. For information about the availability of SF-36 translations, visit https://www.qualitymetric.com/health-surveys-old/the-sf-36v2-health-survey/.

Summary

What does the tool measure?	Health related quality of life
What types of clients can the tool be used for?	The SF-36 is a generic measure that can be used, but is not limited to, persons with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	The SF-36 is considered simple to administer and takes an average of 10 minutes to complete.
Versions	SF-12; SF-8, SF-6D
Other Languages	The SF-36 is available in a number of languages. There are more than 500 publications that use translations or English-language adaptations of the SF-36. For information about the availability of SF-36 translations, visit www.sf-36.org
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: Out of 10 studies examining the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SF-36, five reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. (except for the subscales of Social Functioning in three studies and General Health in one study, which were considered adequate). Two studies reported adequate to excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.. Three studies reported poor to excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.. Test-retest: Out of the five studies examining test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the SF-36, three reported adequate to excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). . One study reported adequate test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). . One reported poor to excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). . Inter-rater: No studies have examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the SF-36.
ValidityThe degree to which an assessment measures what it is supposed to measure.	Criterion: Predictive: Subscales of the SF-36 have been found to be predictive of death, hospitalizations, physician visits, and the burden of depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. among depressed elderly persons. Construct: Convergent: Adequate correlations between the SF-36 Physical Health subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). and the ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. of Daily Living Index; the SF-36 Social Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). and social isolation on the Nottingham Health Profile; the General Health subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). and the EuroQol overall HRQOL rating; the SF-36 Bodily Pain subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). and all EuroQol domains; and the Role Functioning-Emotional subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). with the EuroQol psychological domain. Excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between the Physical Health scores from the SF-36 and the Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. Scale; the Vitality subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). on the SF-36 and energy subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). on the Nottingham Health Profile; and the Bodily Pain subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). on the SF-36 with the EuroQol pain domain. Known groups: SF-36 scores discriminated between patients diagnosed with one or more chronic physical problems and healthy age-matched controls; individuals older than 75 and younger than 75; groups based on setting (general practice versus hospital outpatients); migraine sufferers and controls; groups based on recent visits to their family doctor, hospital inpatient stays and longstanding illness; patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and their age and gender matched controls.
Floor/Ceiling Effects	Of the 8 studies examined, 6 reported that the SF-36 had significant floor and ceiling effects, 1 reported significant ceiling effects only, and 1 reported significant floor effects only.
Does the tool detect change in patients?	Out of 3 studies examined, 1 reported that the SF-36 had a large ability to detect change, 1 reported moderate to large ability to detect change, (except for the Social Functioning and Mental Health dimensions which both had small effect sizes); 1 reported small (Role Limitations-Emotional, Mental component summary score) to large (Bodily Pain, Physical component summary score) ability to detect change. To our knowledge, no studies have examined the ability of the SF-36 to detect change in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Acceptability	The SF-36 cannot be used with patients who cannot understand written or spoken language, severely affected patients who need a proxy to complete, or patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury. An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada). To our knowledge, no studies have examined the ability of the SF-36 to detect change in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Feasibility	The SF-36 is simple to administer and requires no training or special equipment. It is suitable for self-administration, computerized administration, or administration by a trained interviewer in person or by telephone, to persons age 14 and older.
How to obtain the tool?	All versions of the SF-36 can be viewed by visiting the website: www.qualitymetric.com

Psychometric Properties

Overview

Extensive psychometric testing has been conducted on the SF-36. However, little research has been conducted specifically in a post-stroke population. For the purposes of this review, we conducted a literature search to identify all relevant publications on the psychometric properties of the SF-36. We then selected to review articles from high impact journals, and from a variety of authors. The creators of the SF-36 have performed many of the psychometric studies that exist on the survey, however, we preferentially reviewed studies carried out by other authors who were not implicated in the development of the SF-36.

Floor and Ceiling Effects

Lai, Perera, Duncan, and Bode (2003) administered the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale and the SF-36 to 278 strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. subjects approximately 90 days after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. In comparison to the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale-16 (characterizes physical functioning), the SF-36 Physical Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
had major floor effects (floor effects of 37% and 100% were observed for patients with a modified Rankin scale grade 4 or 5, respectively). Further, in contrast to the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale-Participation (characterizes social functioning), the SF-36 Social Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
had major ceiling effects (ceiling effects up to 60% for modified Rankin scale grade 0).

Anderson et al. (1996) examined the SF-36 in a cohort of 90 long-term (1-year) stroke survivors. The validityThe degree to which an assessment measures what it is supposed to measure.
of the SF-36 was assessed by comparing patients’ scores on the SF-36 with those obtained for the Barthel Index, the 28-item General Health Questionnaire, and the Adelaide ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Profile. Large ceiling effects were reported for the SF-36 Role Limitations-Physical (53%), Bodily Pain (43%), Social Functioning (67%) and Role Limitations-Emotional (72%) subscales. No floor effects exceeding 7% were reported for the SF-36, and scores for the SF-36 Physical Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
were more uniformly distributed than Barthel Index scores suggesting the SF-36 has lower floor and ceiling effects than the Barthel Index.

Brazier et al. (1996) tested the psychometric properties of the SF-36 and the EuroQol on an elderly female population (n=380) aged 75 and older, and compared these scales to the Office of Population Census and Surveys Disability Survey. Patients were administered the scales at baseline and again six months later. Major floor effects (in excess of 25%) were reported for the Role Limitations-Physical and Role Limitations-Emotional subscales.

Hobart et al. (2002) examined SF-36 data from 177 people after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Notable floor effects were observed for the Role Limitations-Physical (59.1%), Role Limitations-Emotional (63.1%), Social Functioning (29.9%), and Bodily Pain (25.6%) subscales. Notable ceiling effects were also observed for the Role Limitations-Emotional (63.1%), Social Functioning (29.9%) and Bodily Pain (25.6%) subscales.

O’Mahoney et al. (1998) examined the suitability of the SF-36 for assessing quality of life in older patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Floor effects were high for the Role Limitations-Physical (54%) and Role Limitations-Emotional (35%) subscales and for the Social Functioning (17%) and Physical Functioning (18%) subscales. Ceiling effects were also substantial for the Role Limitations-Physical (16%), Role Limitations-Emotional (51%), Social Functioning (18%) and Bodily Pain (25%) subscales.

Weinberger, Oddone, Samsa and Landsm (1996) administered the SF-36 three times over a 4-week period to 172 veterans receiving care in a General Medicine Clinic. Telephone, face-to-face, and self-administration modes of administering the SF-36 were compared. For face-to-face administration of the SF-36, notable floor effects were observed for the Role Limitations-Physical (43.8%) and Role Limitations-Emotional (30.3%) subscales. Notable ceiling effects were observed for the Social Functioning (31.5%), Role Limitations-Physical (14.6%), and Role Limitations-Emotional (47.2%) subscales. For telephone administration, significant floor effects were observed for the Role Limitations-Physical (53.2%) and Role Limitations-Emotional (34.0%) subscales. Significant ceiling effects were observed for the Role Limitations-Emotional (36.2%) subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
only. Self-administration of the SF-36 resulted in significant floor effects for the Role Limitations-Physical (47.1%), and Role Limitations-Emotional (25.0%) subscales. Further, notable ceiling effects were observed for the Social Functioning (27.8%), Role Limitations-Physical (14.7%), and Role Limitations-Emotional (52.8%) subscales.

Walters, Munro and Brazier (2001) administered the SF-36 to a community-dwelling population over the age of 65. Substantial floor (30.9-61%) and ceiling effects across all age groupings (65-69, 70-74, 75-79, 80-84, and 85+) were observed for the Role Functioning-Physical (floor effects: 30.9%-60% and ceiling effects: 11.7%-38.6%) and Role Functioning-Emotional (floor effects: 25.6%-50.4% and ceiling effects: 32.2% – 53.2%) subscales. Substantial ceiling effects were also noted for the Social Functioning and Bodily Pain subscales (15%-46.7% and 14.1%-21.1%, respectively).

Andresen, Gwendell, Gravitt, Aydelotte, and Podgorski (1999) administered the SF-36 to 97 nursing home residents and reported substantial floor effects of 26.8% and 29.5% for the Physical Functioning and Role Limitations-Physical subscales, respectively. Substantial ceiling effects of 36.1%, 49.5% and 21.6% were reported for the Social Functioning, Role Limitations-Emotional, and Bodily Pain subscales, respectively.

ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
studies have demonstrated excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency., with Cronbach’s alpha generally exceeding 0.80 for all scales except Social Functioning. Social Functioning may sometimes be lower due to the fact that there are fewer items (only 2 items) in the subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
(Ware, Snow, Kosinski & Gandek, 1993; Brazier et al., 1992; Lyons, Perry, & Littlepage, 1994; McHorney, Ware, Lu, & Sherbourne, 1994; Ruta, Garratt, Wardlaw, & Russell, 1994). Test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
evaluations have also suggested that the SF-36 scores can generally be reproduced (Brazier et al. 1992; Beaton, Hogg-Johnson, & Bombardier, 1997).

Brazier et al. (1992) found considerable evidence for the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the SF-36. For the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SF-36, Cronbach’s alpha was found to be excellent, exceeding 0.85, and reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
coefficients exceeded 0.75 for all dimensions of the scale with the exception of the Social Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
(alpha = 0.73). To identify the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
, Brazier et al. (1992) calculated correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients and found coefficients ranging from adequate (0.60 for Social Functioning) to excellent (0.81 for Physical Functioning).

Jenkinson, Coulter and Wright (1993) mailed the SF-36 in a large community sample to explore the questionnaire’s internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. and validityThe degree to which an assessment measures what it is supposed to measure.
. Cronbach’s alpha on all subscales of the SF-36 were excellent, exceeding 0.80, with the exception being the Social Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
, which was of adequate internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. (alpha = 0.76). In the case of the Social Functioning dimension, the results were considered acceptable due to the small number of items (2 items using a 5-point scale).

Jenkinson, Wright and Coulter (1994) mailed the SF-36 to 13,042 randomly selected subjects between the ages of 16-64 years. The internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SF-36 was found range from adequate to excellent (alpha ranged from 0.76 for Social Functioning to 0.90 for Physical Functioning). The internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. was then calculated by breaking the data down into five subgroups of overall self-rated general health (poor, fair, good, very good, excellent). All alpha values were adequate, exceeding 0.70, except for the Social Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
, which was poor (exceeded 0.50). Due to the small number of items in this domain this result is considered acceptable.

Brazier et al. (1996) calculated the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the SF-36 in 380 women over the age of 75. Spearman’s rank correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients between scores for those who said their health had not changed between initial assessment and first follow-up by perceived health change were calculated and coefficients ranged from poor (r = 0.28 for Social Functioning) to adequate (0.70 for Vitality) over a retest period of 6 months. These results suggest that the SF-36 has only adequate test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
in the elderly. Brazier et al (1996) also examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SF-36 and reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. (alpha ≥ 0.80) for all subscales but poor internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. for the subscales Social Functioning (0.56) and General Health (0.66).

Andresen et al. (1999) administered the SF-36 to 97 nursing home residents and then re- administered the SF-36 after 1 week. Test-retest intraclass correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients (ICC) ranged from adequate to excellent (from 0.55 to 0.82). Further, the ICCs for both the physical summary and mental summary scores were excellent (ICC = 0.82 and 0.79 respectively).

Essink-Bot, Krabbe, Bonsel, and Aaronson (1997) administered the SF-36, The Nottingham Health Profile, the COOP/WONCA charts (The Dartmouth Primary Care Cooperative Information Project/World Organization of National Colleges, Academies, and Academic Associations of General Practices/Family Physicians), and the EuroQol to migraine sufferers. The scales of the SF-36 yielded internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. estimates ranging from adequate (alpha = 0.76 for General Health) to excellent (0.91 for Physical Functioning). The mean alpha coefficient was considered excellent (alpha = 0.84). The internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SF-36 subscales exceeded that of the Nottingham Health Profile scales.

Walters, Munro and Brazier (2001) reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. (Cronbach’s alpha ≥ 0.80) for all subscales of the SF-36 except for the Social Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
(alpha = 0.79) when the survey was administered by mail to a sample of 9,897 subjects aged 65-104 years.

McHorney, Ware and Sherbourne (1994) evaluated data from 3,445 patients from the Medical Outcomes Study (MOS) and replicated data across 24 subgroups differing in socio-demographic characteristics, diagnosis, and disease severity. Across patient groups, all scales passed tests for item- internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. (97% passed). ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
coefficients ranged from a low of 0.65 to a high of 0.94 across scales (median = 0.85) and varied somewhat across patient subgroups.

Weinberger et al. (1996) tested whether the SF-36 is influenced by method of administration (face-to-face interview, self administration and telephone interview) in 172 veterans receiving care at a General Medical Clinic. All patients were asked to complete the SF-36 three times over a 4-week period. Cronbach’s alpha coefficients indicated that items in all eight SF-36 domains were highly internally consistent, regardless of the mode of administration, however they showed large variation over short intervals. Specifically, of 24 computed Cronbach’s alphas (i.e., eight scales times three modes of administration), only one was below 0.70 (Social Function via telephone administration), whereas 17 exceeded 0.80. Cronbach’s alphas did not differ significantly by method of administration. Test-retest correlations ranged from r = 0.55 (Physical Role Function by telephone administration) to r = 0.94 (Physical Function by self-administration).

Hagen, Bugge, and Alexander (2003) examined the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the SF-36 in patients in the early post-stroke period. The SF-36 was administered at 1, 3 and 6 months after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. onset. The internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the eight subscales at all three time-points was good except for 1-month Vitality (alpha = 0.68) and 3-month General Health (alpha = 0.67), which were considered poor.
Dorman et al. (1998) assessed the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
and the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SF-36 in 2,253 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. ICC’s ranged from poor (0.28 for Mental Health) to excellent (0.80 for Social Functioning). Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SF-36 was excellent (ranging from 0.81 for Social Functioning to 0.96 for Emotional Role Functioning). Dorman et al. concluded that although the SF-36 can function effectively as a discriminatory measure for assessing health-related quality-of-life outcomes in groups of patients after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., the level of test re-test reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
reported in strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. populations indicates that the SF-36 may not be adequate for serial assessments of individual patients, unless large differences over time are expected. Thus, the SF-36 should be used for large group comparisons only.

Furthermore, test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
was negatively affected by the use of proxy respondents in this study. While the use of a proxy may be the only means by which to include data from more severely affected strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. survivors, the subjective nature of the SF-36 may make proxy use difficult or even inadvisable.

Hobart, Williams, Moran and Thompson (2002) argue that the SF-36 has limited reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
as the General Health Perceptions and Social Functioning scales generate low reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
scores and have limited convergent and discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess.
. However, de Haan (2002) argues that Hobart et al.’s conclusions can be challenged. The reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of only one scale (General Health Perceptions) was marginally less (Cronbach’s alpha = 0.68) than the authors’ predefined criteria of alpha = 0.70. Although it is often recommended that coefficient values should be above 0.80, de Haan points out that coefficients above 0.70 are generally regarded as acceptable for scales when assessing outcome on a group level.

Anderson, Laubscheret and Burns (1996) administered the Australian version of the SF-36 to 90 individuals at one-year post-stroke. The authors concluded that the SF-36 has satisfactory internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency., however alphas ranged from 0.60 for the Vitality scale (indicating poor internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.) to 0.90 for Physical Functioning, Bodily Pain and Role Limitations-Emotional (excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.). The Cronbach’s alphas of four subscales of the SF-36 fell below 0.80 (General Health, Vitality, Social Functioning and Mental Health).

Validity

Criterion:

Predictive:
McHorney (1996) examined data from the Medical Outcomes Study. The General Health Perceptions subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
was found to be most predictive of death (death rate of patients in lowest quartile for SF-36 General Health scale was three times greater than for patients with SF-36 scores in the highest quartile, followed by scores in Physical Functioning). Baseline Physical Functioning, Role Limitations-physical, and Pain subscales were most predictive of hospitalizations. Moreover, Pain, General Health and Vitality subscales were most predictive of physician visits.
Beusterien, Steinwald, & Ware (1996) found that the SF-36 Mental Health subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
and mental component summary measure were strongly associated with severity of depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
in cross-sectional analyses. These results suggest that the SF-36 is useful for estimating the burden of depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
among depressed elderly persons.

Rumsfeld et al. (1999) tested whether the physical and mental component summary scores from the preoperative SF-36 health status survey predicted mortality in 3,956 patients following coronary artery bypass graft surgery (CABG). The physical component summary of the preoperative SF-36 was found to be a statistically significant risk factor for 6-month mortality following CABG surgery. In multivariate analysis, a 10-point lower SF-36 physical component summary score had an odds ratio (OR) of 1.39 for predicting mortality. The SF-36 mental component summary score was not associated with 6-month mortality in multivariate analyses (OR = 1.09). Thus, preoperative patient self-report of the physical component of the SF-36 health status may be helpful for risk stratification and clinical decision making for patients undergoing CABG surgery.

Construct:

Walters et al. (2001) reported significant relationships in expected directions to support construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
among older adults. Scores in all scales were reported to decrease as age increased. Women reported worse health than men on all scales even after adjusting for age. Respondents who had recently visited their physician reported poorer health on all scales and people living alone had lower scores except on general health.

Ware, Kosinski, and Keller (1994) examined the construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the 8 subscales of the SF-36. Physical Functioning was shown to be the best all around measure of physical health (r = 0.85), and Mental Health was the most valid measure of mental health (r = 0.87). Interestingly, Mental Health was one of the poorest measures of the physical component (r = 0.17) and Physical Functioning was the poorest measure of the mental component (r = 0.12). The Vitality (r = 0.47 for physical health and r = 0.65 mental health component) and General Health (r = 0.69 for the physical health component and r = 0.37 for the mental health component) subscales had excellent or adequate validityThe degree to which an assessment measures what it is supposed to measure.
for both components.

Construct (in patients with stroke):

Wilkinson et al. (1997) interviewed 106 people less than 75 years old and their caregivers following a first-ever strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Rank correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients of the Barthel Index with the SF-36 subscales in first-ever strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. patients ranged from poor (r = 0.22 for Role Limitation-Emotional subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
) to excellent (0.81 for Physical Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
).

Convergent/Discriminant:
Convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the SF-36 is generally strongly supported in comparison to similar domains of condition-specific measures (Fielder, Denholm, Lyons, & Fielder, 1996; Nortvedt, Riise, Myhr, & Nyland, 1999; The Counseling Versus Antidepressants in Primary Care Study Group, 1999; Benninger, Ahuja, Gardner, and Grywalski, 1998; Buchwald et al., 1996; Anderson, Laubscher, & Burns, 1996) and other generic HRQOL measures (Andresen et al., 1999; Andresen, Rothenberg, & Kaplan, 1998; Rothwell, McDowell, Wong, & Dorman, 1997). Discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess.
is usually rated highly for the SF-36 (e.g. Andresen et al., 1999; The Canadian Burden of Illness Study Group, 1998; Buchwald, Pearlman, Umali, Schmaling, & Katon, 1996, Komaroff et al., 1996, O’Neill & Kelly, 1996) although some studies disagree (e.g. Colantonio, Dawson, McLellan, 1998; Lalonde, Clarke, Joseph, Mackenzie, & Grover, 1999; Myers & Wilks, 1999).

Andresen et al (1999) administered the SF-36, the Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale and the Mini-Mental State Examination to 97 nursing home residents. ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living and medication intake data were recorded. Convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
between the SF-36 Physical Health subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
and the ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living Index was adequate (r ranged from -0.37 to -0.43). These correlations are negative because a high score on the SF-36 indicates positive health status, whereas a high score on the ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living index indicates dependence. Physical health scores from the SF-36 correlated more strongly with Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale scores than ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living Index scores (-0.63 vs. 0.01). However, the Role Limitations-Physical subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
correlated more strongly with Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale scores than ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living scores. Social Functioning, Role Limitations-Emotional, Vitality and Mental Health subscales all correlated more strongly with Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale scores than ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living scores.

Brazier et al. (1992) reported correlations of -0.41 (Social Functioning vs. social isolation) to -0.68 (Vitality vs. energy) between similar scales on the SF-36 and Nottingham Health Profile. Correlations between dimensions less clearly related ranged form -0.18 (Physical Functioning vs. emotional reaction) to -0.53 (Social Functioning vs. emotional reactions). These correlations are negative because a high score on the SF-36 indicates positive health status, whereas a high score on the Nottingham Health Profile indicates poorer perceived health status.

Dorman et al (1999) reported that the SF-36 Physical Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
correlated most closely with mobility, self-care and activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
domains of EuroQol (r = 0.57, 0.65 and 0.63, respectively) and less strongly with the EuroQol psychological domain (r = 0.34). SF-36 Bodily Pain subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
correlated with the EuroQol pain domain (r = 0.66) and adequately correlated with all EuroQol domains. Role Functioning-Emotional correlated most closely with the EuroQol psychological domain (r = 0.43), and correlated least with the EuroQol self care domain (r = 0.24). The SF-36 Mental Health subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
was not closely related to the psychological domain (r = 0.21) or to the physical EuroQol domains (r = 0.06 to 0.10). The SF-36 General Health subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
correlated adequately with EuroQol overall HRQOL rating (r = 0.66).

Known Groups:
Patients diagnosed with ≥ 1 chronic physical problem had lower scores on all dimensions of the SF-36 except Mental Health, in comparison to healthy age-matched controls. The SF-36 scores were distributed as expected for sex, age, social class and use of health services (Brazier et al., 1992).

The SF-36 was found to discriminate between age groups (>75 years versus 75+) on Physical Functioning, Vitality and Change in Health subscales and between groups based on setting (general practice versus hospital outpatients) on the Physical Functioning and Role Functioning-Physical subscales (Hayes et al. 1995).

Essink-Bot et al. (1997) reported that the SF-36 was able to discriminate between migraine sufferers and controls on all subscales (ROC/AUC = 0.54 – 0.67) although this relationship was poor. The SF-36 was also able to discriminate between groups of migraine sufferers based on absence from work (0 vs. ≥ 0.5 days, ROC/AUC ranged from poor, 0.61 to adequate, 0.79).

Brazier et al. (1996) reported that SF-36 scores distinguished groups based on recent visits to their family doctor, hospital inpatient stays and longstanding illness.

Known Groups (in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.):
Anderson et al. (1996) administered the Australian version of the SF-36 to 90 strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. survivors (1-year post-stroke). ValidityThe degree to which an assessment measures what it is supposed to measure.
was assessed by comparing patients’ scores on the SF-36 with those obtained for the Barthel Index, the 28-item General Health Questionnaire, and the Adelaide ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Profile, an instrument developed from the Frenchay ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Index. Construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
was demonstrated by significant differences across all eight SF-36 scales for patients with identified health problems. For patients dependent in activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living, the difference in mean scores was greatest for the physical functioning and general health scales, whereas for patients with emotional health problems, the strongest associations were with the Social Functioning, Role Limitations-Emotional, and Mental Health subscales.

Mayo et al. (2002) interviewed persons with first-ever strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and a population-based sample of community-dwelling individuals without strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. by telephone at 6-month intervals for 2 years of follow-up. SF-36 scores successfully discriminated those with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. from their age and gender-matched controls.

Cross-diagnostic:

Dallmeijer et al. (2007) examined the unidimensionality and differential item functioning of the Physical Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
of the SF-36 using Rasch analysisRasch analysis is a statistical measurement method that allows the measurement of an attribute - such as upper limb function - independently of particular tests or indices.Â Â It creates a linear representationÂ using many individual items, ranked byÂ item difficulty (e.g. picking up a very small item, versus a task requiring a very gross grasp) and person ability.Â Â Â A well performing Rasch model will have items hierarchically placed from simple to more difficult, and individuals with high abilities should be able to perform all the items below a level of difficulty.Â The Rasch model is statistically strong because it enables ordinal measures to be converted into meaningful interval measures. It also allows information fromÂ various tests or tools with different scoring systems to be applied using the Rasch model.
in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., multiple sclerosis, and amyotrophic lateral sclerosis (ALS). All items of the Physical Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
, except one for the ALS group (bathing/dressing item), formed a unidimensional scale, supporting the use of a sum score as a measure of Physical Functioning within these diagnostic groups. The pooled analysis showed inadequate fit to the Rasch model for the ‘walking several hundred meters’ item of the other 9 items, 5 showed differential item functioning for strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. vs. multiple sclerosis and ALS, while no differential item functioning was found between multiple sclerosis and ALS. Thus, when comparing the data of patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., with that of patients with multiple sclerosis and/or patients with ALS, adjustments are necessary for differential item functioning.

Responsiveness

Harwood and Ebrahim (2000) examined the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
to change of the SF-36 in 81 patients before and after hip replacement. Eighty-nine percent of patients reported improvements three months after surgery. The largest changes were seen on the SF-36 Pain scale (large effect sizes of 1.2 at three months and 1.5 at 6-12 months), Physical Function (large effect sizes of 1.1 at 3 months and 1.3 at 6-12 months) and Role Limitation-Physical (large effect sizes of 0.8 at 3 months and 1.2 at 6-12 months) scales, suggesting that some of the SF-36 dimensions are very sensitive to change.

Brazier, Walters, Nicholl and Kohler (1996) tested the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of the SF-36, EuroQol and the Office of Population Census and Surveys Disability Survey in an elderly female population. These measures were administered by interview in a hospital clinic at baseline. A random subsample of respondents was retested six months later. SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of the instruments was quantified by estimating effect sizes for hypothesized changes in health status. There was some evidence of greater sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
to lower levels of morbidity in the SF-36. Hypothesizing a change from having a long standing illness to no long-standing illness was associated with moderate to large effect sizes across dimensions of the three instruments, except the Social Functioning (ES = 0.41) and Mental Health (ES = 0.31) dimensions of the SF-36 which both had small effect sizes. The effect sizes for differences in instrument scores between the age groups were small (in the range 0.00-0.50), with the highest for Physical Functioning. The SF-36 was rated as more sensitive to change than the EuroQol for older adult women.

In a study by Mossberg and McFarland (2001), 6 outpatient rehabilitation clinics incorporated the SF-36 into everyday practice. Ninety patients completed the SF-36 health status questionnaire before initiating treatment and again at discharge. Only nonsurgical patients without comorbidities were enrolled. Effect sizes for the SF-36 (admission to outpatient rehabilitation to discharge) ranged from small (0.48 for Role Limitations-Emotional) to large (1.38 for Bodily Pain). The physical component summary score effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
was large (ES = 0.80) and the mental component summary score effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
was small (ES = 0.45).

The SF-36 is increasingly being used in strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. studies (Anderson, Laubscher & Burns, 1996; Duncan et al. 1997) and in strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. clinical trials. However, the psychometric properties of the SF-36 soon after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. are not well known, as most of the current data are from patients one year or more after the strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (e.g. Anderson et al., 1996; Duncan et al., 1997). We did not identify any studies on the responsivenessThe ability of an instrument to detect clinically important change over time.
of the SF-36 in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Muller-Nordhorn et al. (2004) examined the responsivenessThe ability of an instrument to detect clinically important change over time.
to change of the SF-12 in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. or transitory ischemic attack. Patients (n=558) were administered the SF-12 at baseline (referring to status prior to the event) and after 12 months. In patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., standardized response means (SRMs) were small for the physical component summary scale of the SF-12 (SRM 0.49) and moderate for the mental component summary scale of the SF-12 (SRM 0.52). In patients with transitory ischemic attack, SRMs were below 0.2 for the physical component summary scale of the SF-12 and small for the mental component summary scale of the SF-12 (SRM 0.34). SRMs increased with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity as indicated by the National Institutes of Health StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale score. Thus, the SF-12 summary scales show a small to moderate responsivenessThe ability of an instrument to detect clinically important change over time.
to change in patients after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. ResponsivenessThe ability of an instrument to detect clinically important change over time.
to change was higher in patients with greater strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. severity.

The observation that patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. had scores similar to patients with transient ischemic attacks raises questions about the ability of the SF-36 to discriminate and to be responsive to clinical changes in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (Duncan et al., 1997). Currently, no evaluative stroke-specific HRQOL instrument is available, and it remains to be seen whether the generic HRQOL instruments such as the SF-36 are sufficiently responsive to be useful in clinical trials. More information regarding the responsivenessThe ability of an instrument to detect clinically important change over time.
of the SF-36 will be known when a number of ongoing current strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. trials are completed (Williams, 1998).

References

Aaronson, N. K., Muller, M., Cohen, P. D. A., Essink-Bot, M. L., Fekkes, M., Sanderman, R., Sprangers, M. A., Velder, A., Verrips, E. (1998). Translation, validation and norming of the Dutch language version of the SF-36 health survey in community and chronic disease populations. J Clin Epidemiol, 51, 1055-1068
Anderson, C., Laubscher, S., Burns, R. (1996). Validation of the Short Form 36 (SF-36) Health Survey Questionnaire among stroke patients. Stroke, 27(10), 1812-1816.
Andresen, E. M., Meyers, A. R. (2000). Health-related quality of life outcomes measures. Arch Phys Med Rehabil, 81(12), S30-45.
Andresen, E. M., Gwendell, W., Gravitt, G. W., Aydelotte, M. E., Podgorski, C. A. (1999). Limitations of the SF-36 in a sample of nursing home residents. Age and Ageing, 28, 562-566.
Andresen, E. M., Fouts, B. S., Romeis, J. C., Brownson, C. A. (1999). Performance of health-related quality-of-life instruments in a spinal cord injured population. Arch Phys Med Rehabil, 80. 877-884.
Andresen, E. M., Rothenberg, B. M., Kaplan, R. M. (1998). Performance of a self-administered mailed version of the Quality of Well-Being (QWB-SA) questionnaire among older adults. Med Care, 36, 1349-1360.
Beaton, D. E., Hogg-Johnson, S., Bombardier, C. (1997). Evaluating changes in health status: Reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders. J Clin Epidemiol, 50(1), 79-93.
Beaton, D. E., Hogg-Johnson, S., Bombardier, C. (1997). Evaluating changes in the health status: reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders. J Clin Epidemiol, 50, 79-93.
Benninger, M. S., Ahuja, A. S., Gardner, G., Grywalski, C. (1998). Assessing outcomes for dysphonic patients. J Voice, 12, 540-550.
Beusterien, K. M., Steinwald, B., Ware, J. E. (1996). Usefulness of the SF-36 Health Survey in measuring health outcomes in the depressed elderly. J Geriatr Psychiatry Neurol, 9(1), 13-21.
Beck, A. T., Rial, W. Y., Rickets, K. (1974). Short form of Depression Inventory: Cross-validation. Psychological-Reports , 34(3), 1184-1186.
Brazier, J., Roberts, J., Tsuchiya, A., Busschbach, J. (2004). A comparison of the EQ-5D and SF-6D across seven patient groups. Health Econ 13, 873-884.
Brazier, J., Usherwood, T., Harper, R., Thomas, K. (1998). Deriving a preference-based single index from the UK SF-36 Health Survey. J Clin Epidemiol, 51, 1115-1128.
Brazier, J.E., Walters, S.J., Nicholl, J.P. & Kohler, B. (1996). Using the SF-36 and EuroQol on an Elderly Population. Quality of Life Research, 5, 195-204.
Brazier, J., Roberts, J., Deverill, M. (2002). The estimation of a preference-based measure of health from the SF-36. J Health Econ, 21, 271-292.
Brazier, J. E., Harper, R., Jones, N. M. B. et al. (1992). Validating the SF-36 health survey questionnaire: new outcome measure for primary care. BMJ, 305, 160-164.
Buchwald, D., Pearlman, T., Umali, J., Schmaling, K., Katon, W. (1996). Functional status in patients with chronic fatigue syndrome, other fatiguing illnesses, and healthy individuals. Am J Med, 101, 364-370.
Ciconelli, R. M. (1997). Translation and validation to the Portuguese of the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36) [doctoral thesis]. Federal University of SÃÂƒÃ‚£o Paulo, SÃÂƒÃ‚£o Paulo, Brazil.
Colantonio, A., Dawson, D. R., McLellan, B. A. (1998). Head injury in young adults: long-term outcome. Arch Phys Med Rehabil, 79, 550-558.
Dallmeijer, A. J., de Groot, V., Roorda, L. D., Schepers, V. P. M., Lindeman, E., van den Berg, L. H., Beelen, A., Dekker, J. (2007). Cross-diagnostic validity of the SF-36 physical functioning scale in patients with stroke, multiple sclerosis and amyotrophic lateral sclerosis: A study using rasch analysis. J Rehabil Med, 9, 63 -169.
de Haan, R. J. (2002). Measuring quality of life after stroke using the SF-36. Stroke, 33, 1176-1177.
Dorman, P., Slattery, J., Farrell, B., Dennis, M., Sandercock, P. (1998). Qualitative comparison of the reliability of health status assessments with the EuroQol and SF-36 Questionnaires After Stroke. Stroke, 29, 63-68.
Dorman, P. J., Dennis, M., Sandercock, P. (1999). How do scores on the EuroQol relate to scores on the SF-36 after stroke? Stroke, 30(10), 2146-2151.
Duncan, P. W., Samsa, G. P., Weinberger, M., Goldstein, L. B., Bonito, A., Witter, D. M., Enarson, C., Matchar, D. (1997). Health status of individuals with mild stroke. Stroke, 28, 740-745.
Essink-Bot, M. A., Krabbe, P. F., Bonsel, G. J., Aaronson, N. K. (1997). An empirical comparison of four generic health status measures: The Nottingham Health Profile, the Medical Outcomes Study 36-Item Short-Form Health Survey, the COOP/WONCA Charts, and the EuroQol Instrument. Med Care, 35(5), 522-537.
Fielder, H., Denholm, S. W., Lyons, R. A., Fielder, C. P. (1996). Measurement of health status in patients with vertigo. Clin Otolaryngol, 21,124-126.
Fukuhara, S., Ware, J. E., Kosinski, M., Wada, S., Gandek, B. (1998). Psychometric and Clinical Tests of Validity of the Japanese SF-36 Health Survey. J Clin Epidemiol, 1, 1045-1053.
Hagen, S., Bugge, C., Alexander, H. (2003). Psychometric properties of the SF-36 in the early post-stroke phase. Journal of Advanced Nursing, 44(5), 461-468.
Harwood, R. H., Ebrahim, S. (2000). A comparison of the responsiveness of the Nottingham extended activities of daily living scale, London handicap scale, and SF-36. Disability & Rehabilitation , 22(17), 786-793.
Hayes, V., Morris, J., Wolfe, C., Morgan, M. (1995). The SF-36 Health Survey Questionnaire: Is it suitable for use with older adults? Age and Ageing, 24, 120-125.
Hilari, K., Byng, S., Lamping, D. L., Smith, S. C. (2003). Stroke and Aphasia Quality of Life Scale-39 (SAQOL-39): Evaluation of acceptability, reliability, and validity. Stroke, 34, 1944-1950.
Hobart, J. C., Williams, L. S., Moran, K., Thompson, A. J. (2002). Quality of life measurement after stroke: Uses and abuses of the SF-36. Stroke, 33, 1348-1356.
Jenkinson, C., Coulter, A., Wright, L. (1993). Short form 36 (SF36) health survey questionnaire: Normative data for adults of working age. BMJ, 306(6890), 1437-1440.
Jenkinson, C., Wright, L., Coulter, A. (1994). Criterion validity and reliability of the SF-36 in a population sample. Quality of Life Research, 3(1), 7-12.
Jenkinson, C., Stewart-Brown, S., Petersen, S., Paice, C. (1999). Assessment of the SF-36 version 2 in the United Kingdom. J Epidemiol Community Health, 53(1), 46-50.
Komaroff, A.L., Fagioli, L.R., Doolittle, T.H., Gandek, B., Gleit, M.A., Gueriero, R.T., et al. (1996). Health status in patients with chronic fatigue syndrome and in general population and disease comparison groups. Am J Med,101, 281-90.
Lai, S-M., Perera, S., Duncan, P. W., Bode, R. (2003). Physical and social functioning after stroke: Comparison of the Stroke Impact Scale and Short Form-36. Stroke, 34, 488-493.
Lalonde, L., Clarke, A. E., Joseph, L., Mackenzie, T., Grover, S. A. (1999). Comparing the psychometric properties of preference-based and nonpreference-based health-related quality of life in coronary heart disease. Qual Life Res, 8, 399-409.
Lyons, R. A., Perry, H. M., Littlepage, B. N. C. (1994). Evidence for the validity of the Short-Form 36 Questionnaire (SF-36) in an elderly population. Age Aging, 23, 182-184.
Mathias, S. D., Bates, M. M., Pasta, D. J., Cisternas, M. G., Feeny, D., Patrick, D. L. (1997). Use of the Health Utilities Index with stroke patients and their caregivers. Stroke, 28, 1888-1894.
Mayo, N. E., Wood-Dauphinee, S., Cote, R., Durcan, L., Carlton, J. (2002). Activity, Participation, and Quality of Life 6 Months Poststroke. Arch Phys Med Rehabil, 83, 1035-1042.
McDowell, I., Newell, C. (1996). Measuring Health. A Guide to Rating Scales and Questionnaires. 2nd ed. NewYork: Oxford University Press.
McHorney, C. A. (1996). Measuring and monitoring general health status in elderly persons: Practical and methodological issues in using the SF-36 health survey. The Gerontologist, 36(5), 571-583.
McHorney, C. A., Ware, J. E. Jr., Raczek, A. E. (1993). The MOS 36-Item Short-Form Health Survey (SF-36): II Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care, 31, 247-263.
McHorney, C. A., Ware, J. E. Jr., Lu, J. F., Sherbourne, C. D. (1994). The MOS 36-item Short-Form Health Survey (SF-36): III Tests of data quality, scaling assumptions, and reliability across diverse patient groups. Med Care, 32, 40-66.
Mossberg, K., McFarland, C. (2001). A patient-oriented health status measure in outpatient rehabilitation. Am J Phys Med Rehabil, 80(12), 896-902.
Muller-Nordhorn, J., Nolte, C. H., Rossnagel, K., Jungehulsing, G. J., Reich, A., Roll, S., Villringer, A., Wllich, S. N. (2004). Responsiveness to change of the SF-12 in patients with cerebrovascular disease. Biometrical Journal, 46(S1), 50.
Myers, C., Wilks, D. (1999). Comparison of Euroqol EQ-5D and SF-36 in patients with chronic fatigue syndrome. Qual Life Res, 8, 9-16.
Nemeth, G. (2006). Health related quality of life outcome instruments. European Spine Journal, 15(1), S44-S51.
Nortvedt, M. W., Riise, T., Myhr, K. M., Nyland, H. I. (1999). Quality of life in multiple sclerosis: measuring the disease effects more broadly. Neurology, 53, 1098-1103.
O’Mahony, P. G., Rodgers, H., Thomson, R. G., Dobson, R., James, O. F. W. (1998). Is the SF-36 suitable for assessing health status of older stroke patients? Age and Ageing, 27, 19-22.
O’Neill, P., Kelly, P. (1996). Postal questionnaire study of disability in the community associated with psoriasis. Br Med J, 313, 919-921.
Petrou, S., Hockley, C. (2005). An investigation into the empirical validity of the EQ-5D and SF-6D based on hypothetical preferences in a general population. Health Econ, 14, 1169-1189.
Ren, X. S., Amick, B., Zhou, L., et al. (1998). Translation and Psychometric Evaluation of a Chinese Version of the SF-36 Health Survey in the U.S. J Clin Epidemiol, 51(11), 1129.
Rothwell, P. M., McDowell, Z., Wong, C. K., Dorman, P. J. (1997). Doctors and patients don’t agree: cross sectional study of patients’ and doctors’ perceptions and assessments of disability in multiple sclerosis. British Med J, 314, 1580-1583.
Rumsfeld, J. S., MaWhinney, S., McCarthy, M., Shroyer, A. L., VillaNueva, C. B., O’Brien, M., Moritz, T. E., Henderson, W. G., Grover, F. L., Sethi, G. K., Hammermeister, K. E. (1999). Health-related quality of life as a predictor of mortality following coronary artery bypass graft surgery. Participants of the Department of Veterans Affairs Cooperative Study Group on Processes, Structures, and Outcomes of Care in Cardiac Surgery. JAMA, 14(281), 1298-1303.
Ruta, D. A., Garratt, A. M., Wardlaw, D., Russell, I. T. (1994). Developing a valid and reliable measure of health outcome for patients with low back pain. Spine, 19, 1887-1896.
Segal, M. E., Schall, R. R. (1994). Determining functional/health status and its relation to disability in stroke survivors. Stroke, 25, 2391-2397.
The Canadian Burden of Illness Study Group. (1998). Burden of illness of multiple sclerosis: part II: quality of life. Can J Neurol Sci, 25, 31-38.
The Counselling Versus Antidepressants in Primary Care Study Group. (1999). How disabling is depression? Evidence from a primary care sample. Br J Gen Pract, 49(439), 95-98.
Walters, S. J., Munro, J. F., Brazier, J. E. (2001). Using the SF-36 with older adults: A cross-sectional community-based survey. Age and Ageing, 30, 337-343.
Ware, J. E., Kosinski, M., Dewey, J. E., Gandek, B. (2001). How to Score and Interpret Single-Item Health Status Measures: A Manual for Users of the SF-8 Health Survey. Lincoln RI: QualityMetric Incorporated.
Ware, J. E., Kosinski, M., Keller, S. D. (1994). SF-36 Physical and Mental Health Summary Scales: A User’s Manual. Boston, MA: The Health Institute.
Ware, J. E. Jr., Sherbourne, C. D. (1992) The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care, 30, 473-483.
Ware, J. Jr., Kosinski, M., Keller, S. D. (1996). A 12-item short-form health survey: Construction of scales and preliminary tests of reliability and validity. Med Care, 34(3), 220-233.
Ware, J. E., Snow, K. K., Kosinski, M., Gandek, B. (1993). SF-36Ã‚® Health Survey Manual and Interpretation Guide. Boston, MA: New England Medical Center, The Health Institute.
Ware, J. E., Kosinski, M., Turner-Bowker, D. M., Gandek, B (2002) SF-12v2: How to score version 2 of the SF-12 Health Survey. Lincoln RI: QualityMetric Incorporated.
Weinberger, M., Oddone, E. Z., Samsa, G. P., Landsman, P. B. (1996). Are health-related quality-of-life measures affected by the mode of administration? J Clin Epidemiol, 49(2), 135-140.
Wilkinson, P. R., Wolfe, C. D., Warburton, F. G., Rudd, A. G., Howard, R. S., Ross-Russell, R. W., Beech, R. (1997). Longer term quality of life and outcome in stroke patients: Is the Barthel Index alone an adequate measure of outcome? Quality in Health Care, 6, 125-130.
Williams, L. S. (1998). Health-Related Quality of Life Outcomes in Stroke. Neuroepidemiology , 17, 116-120.

See the measure

How to obtain the SF-36

Permission to use the SF-36 should be obtained from the Medical Outcomes Trust who oversees the standardized administration of the SF-36 and will provide updates on administration and scoring (McDowell & Newell 1996). Various computer applications are available to assist in scoring the SF-36 including free Excel templates that can be downloaded from the Internet.

All versions of the SF-36 can be viewed by visiting the website www.qualitymetric.com

Samples of the various versions of the SF-36 are also available on this website Please click here to see a copy of the SF-36

Stroke Specific Quality of Life Scale (SS-QOL)

Evidence Reviewed as of before: 19-08-2008

Author(s)*: Lisa Zeltzer, MSc OT

Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Purpose

The Stroke Specific Quality Of Life scale (SS-QOL) is a patient-centered outcome measure intended to provide an assessment of health-related quality of life (HRQOL) specific to patients with stroke.

In-Depth Review

Purpose of the measure

Available versions

The SS-QOL was published and validated in 1999 by Williams, Weinberger, Harris, and Clark.

Features of the measure

Items:
Scale domains and items were derived from a series of interviews with post-stroke patients (Williams et al. 1999a).

Patients must respond to each question of the SS-QOL with reference to the past week. It is a self-report scale containing 49 items in 12 domains:

Mobility (6 items)
Energy (3 items)
Upper extremity function (5 items)
Work/productivity (3 items)
Mood (5 items)
Self-care (5 items)
Social roles (5 items)
Family roles (3 items)
Vision (3 items)
Language (5 items)
Thinking (3 items)
Personality (3 items)

Subscales:
Energy, Upper extremity function, Work/productivity, Mood, Self-care, Social roles, Family roles, Vision, Language, Thinking, and Personality.

Equipment:
Only a pencil and the test are needed.

Training:
No training is required, as the SS-QOL is intended to be self-administered. One study suggests that the scale can be administered to patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. reliably over the telephone (Williams, Redmon, Saul & Weinberger, 2000).

Time:
It takes approximately 10-15 minutes to complete the SS-QOL scale.

Scoring:
Items are rated on a 5-point Likert scaleLikert scaling is one type of response to items in a questionnaire or tool. For example, Likert scaling would have you rate an item such as "I am satisfied with the care I received" on a scale using a 1-to-5 response scale where:
• 1 = strongly disagree
• 2 = disagree
• 3 = undecided
• 4 = agree
• 5 = strongly agree
You will find various options and scaling methods for the number of response choices (1-to-7, 1-to-9, 0-to-4). Odd-numbered scales usually have a middle value that is labelled Neutral or Undecided. Some tools used forced-choice Likert scaling with an even number of responses and no middle neutral or undecided choice.. There are 3 different response sets (see table below). Patients must respond to each item using the corresponding response set as indicated on the scale (Williams et al. 1999a). For example, the item “did you have any trouble doing daily work around the house?” requires response set 2, which ranges from “couldn’t do it at all” to “no trouble at all”.

Response Sets:

1. Total help	2. A lot of help	3. Some help	4. A little help	5. No help needed
1. Couldn’t do it at all	2. A lot of trouble	3. Some trouble	4. A little trouble	5. No trouble at all
1. Strongly agree	2. Moderately agree	3. Neither agree nor disagree	4. Moderately disagree	5. Strongly disagree

Higher scores indicate better functioning. The SS-QOL yields both domain scores and an overall SS-QOL summary score. The domain scores are unweighted averages of the associated items while the summary score is an unweighted average of all twelve domain scores (Williams et al. 1999b).

Alternative forms of SS-QOL

The StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and AphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) Quality Of Life Scale (SAQOL-39 – Hilari, Byng, Lamping, & Smith, 2003). Developed from the SS-QOL for use in patients with long-term aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada), the SAQOL-39 has four subdomains (Physical, Psychosocial, Communication, and Energy). It is an interview-administered self-report scale. It is comprised of items from the SS-QOL that have been modified to ensure they are appropriate for use in individuals with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada). The SAQOL-39 has four additional items that were added to increase the content validityRefers to the extent to which a measure represents all aspects of a given social concept. Example: A depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension.
of the scale with this population. These four items focus on the difficulties with understanding speech, issues with decision-making, and the impact of language difficulties on family and social life.

Hilari et al. (2003) reported that the SAQOL-39 has good acceptability, adequate to excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. (Cronbach’s alphas ranging from 0.74 to 0.94), excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
(intraclass correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient = 0.89 to 0.98), and poor to excellent construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
(corrected domain-total correlations, r = 0.38 to 0.58; convergent, r = 0.55 to 0.67; discriminant, r = 0.02 to 0.27 validityThe degree to which an assessment measures what it is supposed to measure.
). Further research is needed to confirm its psychometric properties and to determine its appropriateness as a clinical outcome measure.

Client suitability

Can be used with:

Individuals with mild or moderate strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Should not be used in:

Patients without strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The SS-QOL was developed and validated specifically for individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and has been examined for use in this population only.
Severe strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. populations. The SS-QOL has not yet been tested among patients with severe strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Should be used with caution in patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada). Although the modified version of the scale, the SAQOL-39, has been validated for use in patients with long-term aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada), it is a relatively new measure that requires further psychometric testing.
Patients who require a proxy to complete. A study by Williams et al. (2006) compared proxy ratings of the SS-QOL to patient self administration in 225 patient-proxy pairs. Proxies rated all domains of SS-QOL lower than the patients. The intraclass correlation coefficient (ICC)Intraclass correlation (ICC) is used to measure inter-rater reliability for two or more raters. It may also be used to assess test-retest reliability. ICC may be conceptualized as the ratio of between-groups variance to total variance. for each domain ranged from poor (r = 30 for role function) to adequate (r = 0.59 for physical function). Proxy overall SS-QOL score was also rated lower than the patient score (3.7 versus 3.4) with an ICC of r = 0.41. It is recommended that information obtained from proxy respondents be treated as supplementary rather than substantive and that use of proxy be restricted to individuals either living with or in daily contact with the patient (Snow, Cook, Lin, Morgan & Magaziner, 2005; Muus, Petzold & Ringsberg, 2009).
For patients who require a proxy, the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale is a more reliable and valid measure of HRQOL (Duncan, Lai, Tyler, Perera, Reker, & Studenski, 2002).

In what languages is the measure available?

Danish (SS-QOL-DK): translated Muus & Ringsberg, 2005 and validated Muus, Williams & Ringesberg, 2007.
German: translated Ewart & Stucki, 2007 and initial validation study completed Ewart & Stucki, 2007. The initial validation study revealed validityThe degree to which an assessment measures what it is supposed to measure.
of the total SS-QOL German score, however, some subscales (Energy, Mood and Thinking) were not validated. Further research is required.

Summary

What does the tool measure?	Health related quality of life
What types of clients can the tool be used for?	The SS-QOL was developed for use in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment.
Time to administer	Approximately 10-15 minutes to complete.
Versions	The StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and AphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury. An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) Quality Of Life Scale (SAQOL-39)
Other Languages	Translated and validated in Danish. Translated in German.
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: One study examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SS-QOL and found that the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. ranged from adequate (for work/productivity subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). ) to excellent (for self-care). Test-retest: One study examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the SS-QOL and found excellent test-retest. Inter-rater: One study examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the SS-QOL and found excellent inter-rater.
ValidityThe degree to which an assessment measures what it is supposed to measure.	Criterion: Predictive: The SS-QOL summary score significantly predicted overall post-stroke health-related quality of life. Construct: Convergent: Most domains of the SS-QOL correlate with the Barthel Index, the Beck DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression. Inventory, and subscales of the SF-36.
Floor/Ceiling Effects	One study reported ceiling effects exceeding 20% in 10 out of 12 domains of the SS-QOL, and a floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect." of 24% in the Energy domain. Floor or ceiling effectA ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement. Example: A memory test that assesses how many words a participant can recall has a total of five words that each participant is asked to remember. Because most individuals can remember all five words, this measure has a ceiling effect. See also "floor effect." exceeding 20% are typically considered poor.
Does the tool detect change in patients?	One study found that the SS-QOL had only a moderate ability to detect change in patients between 1 and 3 months post-stroke. A subsequent study involving an alternative language version of the SS-QOL, found a small to moderate ability to detect change in patients between 3 and 12 month post-stroke. In a later study, the minimal clinically detectable difference for the mobility, self-care and upper extremity function subscales was defined as a mean change in score of at least 1.5, 1.2 and 1.2 respectively.
Acceptability	Further investigation on the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest . , validityThe degree to which an assessment measures what it is supposed to measure. , and sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." of the SS-QOL is required with larger numbers of subjects. This measure has not been tested in severely affected patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. For patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury. An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada), the SAQOL-39 is a more suitable version of the measure, however, it is a relatively new measure, which requires further psychometric testing. The scale is not suitable for use by proxy.
Feasibility	No training is required for the SS-QOL as the measure is intended to be completed by self-report. The measure is simple to score and is based on a 5-point Likert scaleLikert scaling is one type of response to items in a questionnaire or tool. For example, Likert scaling would have you rate an item such as "I am satisfied with the care I received" on a scale using a 1-to-5 response scale where: • 1 = strongly disagree • 2 = disagree • 3 = undecided • 4 = agree • 5 = strongly agree You will find various options and scaling methods for the number of response choices (1-to-7, 1-to-9, 0-to-4). Odd-numbered scales usually have a middle value that is labelled Neutral or Undecided. Some tools used forced-choice Likert scaling with an even number of responses and no middle neutral or undecided choice..
How to obtain the tool?	Click here to find a copy of the SS-QOL.

Psychometric Properties

Overview

The StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Specific Quality of Life Scale (SS-QOL) is a new scale and has not been well studied. It has not been tested among severe strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. populations. To our knowledge, the creators of the SS-QOL have personally gathered the majority of psychometric data that are currently published on the scale. Further investigation on the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
, validityThe degree to which an assessment measures what it is supposed to measure.
, and sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
of the SS-QOL is required with larger numbers of subjects.

Floor and Ceiling Effects

Czechowsky and Hill (2002) examined the SS-QOL and reported ceiling effects exceeding 20% in 10 out of 12 domains of the SS-QOL, and a ceiling effects exceeding 20% are typically considered poor.

Reliability

Test-retest:
In a study by Williams et al. (2000), the SS-QOL was administered by a trained interviewer to 47 stroke survivors at baseline and again within 2 hours of the initial interview. SS-QOL scores were highly correlated (r = 0.92), showing excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
.

Inter-rater:
The SS-QOL was also administered by a trained interviewer to 24 stroke survivors and then a second trained interviewer re-administered the SS-QOL within 2 hours of the first interview. SS-QOL scores were highly correlated (r = 0.92), demonstrating excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the SS-QOL.

Validity

Criterion:
Predictive:
Williams et al. (1999b) administered the SS-QOL to a total of 71 patients 1-month post-ischemic stroke. Multivariate analysis showed that the SS-QOL summary score significantly predicted overall post-stroke health-related quality of life (HRQOL) (OR = 2.97). When scores were examined on the domain level, however, only one domain, Family Roles, was significantly different between groups, with higher scores in those patients with better overall HRQOL.

Construct:
Convergent:
Williams et al. (1999a) examined the validity of the SS-QOL in 34 survivors of strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and reported that most domains of the SS-QOL correlated with the Barthel Index, Beck DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely "pull themselves together" and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Inventory, and subscales of the SF-36. The Energy, Family Roles, Mobility and Work/Productivity domains were significantly associated with corresponding subscales on the SF-36. Total SS-QOL score correlated excellently with the overall SF-36 health status rating (r = 0.65). The self-care domain was adequately correlated with the Barthel Index (r = 0.45). Upper Extremity Function showed a positive but poor relationship with the Barthel Index and the National Institutes of Health StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale Upper Extremity score (r = 0.18).

However, in this study, a few domains did not show a significant relationship with their corresponding measures. Scores in the Language and Thinking domains were not associated with selected items from the National Institutes of Health StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale (r = 0.00 and r = 0.10 respectively). This most likely occurred because patients with language and cognitive deficits were excluded, i.e., there were no patients with a score > 1 on these items. Furthermore, the SS-QOL Social Roles domain was not associated with the SF-36 Social Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
score (r = 0.01). Finally, the Vision domain of the SS-QOL did not correlate with the National Institutes of Health StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale Visual Field and Ocular Movement scores (r = 0.11).

Responsiveness

Williams et al. (1999a) examined the standardized effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
scores for the interval between 1 and 3 months post-stroke in 34 individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Effect sizes ranged from small (ES = 0.20 for the personality domain) to large (ES = 0.83 for the social roles domain). One half of the SS-QOL domains demonstrated less than moderate effect sizes. The ‘amount of help’ response set appeared to lack responsivenessThe ability of an instrument to detect clinically important change over time.
. The results of this study demonstrate that the SS-QOL has only adequate responsivenessThe ability of an instrument to detect clinically important change over time.
.

Muus et al. (2011) investigated the responsivenessThe ability of an instrument to detect clinically important change over time.
of the Danish language version of the SS-QOL (SSQOL-DK). Patients were assessed at 3 and 12 months following strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Small standardized effect sizes were found for all domains (-0.03-0.40), except the social roles domain which demonstrated moderate standardized effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
(-0.53).

Lin, Fu, Wu & Hsieh (2011) examined the minimal clinically important difference (CID)Clinically Important Difference (CID) is the smallest change in a measure's score that is perceived significant by a patient or healthcare professional., of the mobility, self-care and upper extremity function subscales of the SS-QOL. The study included 74 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. receiving rehabilitation and the SS-QOL was administered at baseline and at 3 weeks. The MCID ranges for the mobility, self-care and upper extremity function subscales were 1.5 – 2.4, 1.2 – 1.9, and 1.2 – 1.8 respectively. The results of the study indicate that mean change of score on the mobility, self-care and upper extremity function subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
should reach 1.5, 1.2 and 1.2, respectively, in order for change to be interpreted as clinically meaningful.

References

Czechowsky, D., Hill, M. D. (2002). Neurological Outcome and Quality of Life after Stroke due to Vertebral Artery Dissection. Cerebrovascular Diseases, 13, 192-197.
Duncan, P. W., Lai, S. M., Tyler, D., Perera, S., Reker, D. M., Studenski, S. (2002). Evaluation of proxy responses to the Stroke Impact Scale. Stroke, 33, 2593-2599.
Ewart, T. & Stucki, G, (2007). Validity of the SS-QOL in Germany and in survivors of hemorrhagic or ischemic stroke. Neurorehabilitation and Neuro Repair, 21, 161-168.
Hilari, K., Byng, S., Lamping, D. L., Smith, S. C. (2003). Stroke and Aphasia Quality of Life Scale-39 (SAQOL-39): Evaluation of acceptability, reliability, and validity. Stroke, 34, 1944-1950.
Lin, K-C., Fu, T., Wu, C-Y. & Hsieh, C-J. (2011). Assessing the stroke-specific quality of life for outcomes measurement in stroke rehabilitation: Minimal detectable change and clinically important difference. Health and Quality of Life Outcomes, 9, 5. Retrieved April 25, 2012 from Sage Journals database.
Muus, I., Christensen, D., Petzold, M., Harder, I., Johnsen, S.P., Kirkevold, M., Ringsberg, K.C. (2011). Responsiveness and sensitivity of the Stroke Specific Quality of Life Danish version. Disability and Rehabilitation, 33(25-26), 2425-2433.
Muus, I., Petzold, M. & Ringsberg, K.C. (2009). Health-related quality of life after stroke: Reliability of proxy responses. Clinical Nursing Research, 18(2), 103-118.
Muus, I., Ringsberg, K. C. (2005). Stroke Specific Quality of Life Scale: Danish adaptation and a pilot study for testing psychometric properties. Scand J Caring Sci, 19, 140-147.
Muus, I., Williams, L.S. & Ringsberg, K.C. (2007). Validation of the Stroke Specific Quality of Life Scale (SS-QOL): Test of reliability and validity of the Danish version (SS-QOL-DK). Clinical Rehabilitation, 21, 620-627.
Snow, A.L., Cook, K.F., Lin, P.S., Morgan, R.O. & Magaziner, J. (2005). Proxies and other external raters: Methodological considerations. Health Services Research, 40(5), 1976-1693.
Williams, L. S., Weinberger, M., Harris, L. E., Clark, D. O., Biller, J. (1999a). Development of a stroke-specific quality of life scale. Stroke, 30(7), 1362-1369.
Williams, L. S., Weinberger, M., Harris, L. E., Biller, J. (1999b). Measuring quality of life in a way that is meaningful to stroke patients. Neurology, 53, 1839-1843.
Williams, L. S., Redmon, G., Saul, D. C., Weinberger, M. (2000). Reliability and telephone validity of the Stroke-specific Quality of Life (SS-QOL) scale. Stroke, 32, 339-b.
Williams, L. S., Bakas, T., Brizendine, E., Plue, L., Tu, W., Hendrie, H., Kroenke, K. (2006). How valid are family proxy assessments of stroke patients’ health-related quality of life? Stroke, 37, 2081-2085.

See the measure

Please click here for a copy of the Stroke-Specific-Quality-of-Life-Scale (SS-QOL).

Stroke-Adapted Sickness Impact Profile (SA-SIP30)

Evidence Reviewed as of before: 19-08-2008

Author(s)*: Lisa Zeltzer, MSc OT

Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Purpose

In-Depth Review

Purpose of the measure

The Stroke-Adapted Sickness Impact Profile (SA-SIP30 – van Straten, de Haan, Limburg, Schuling, Bossuyt, & van den Bos, 1997) was developed from the original 136-item Sickness Impact Profile (SIP-136), and assesses quality of life in patients who have sustained a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The scale was developed specifically for use in strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. outcome research in order to overcome the major problem observed with the SIP-136, its length (Finch, Brooks, Stratford, & Mayo, 2002).

Available versions

The SA-SIP30 was adapted from the original SIP-136 first published in 1976 by Bergner, Bobbitt, Pollard, Martin, and Gilson and later revised in 1981 by Bergner, Bobbit, Carter and Gilson.

Features of the measure

Items:

van Straten et al. (1997) followed a three-stage process to eliminate items and subscales that were least relevant to strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. survivors (i.e. those applying to fewer than 10% of patients) as well as those with the lowest levels of reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
from the original SIP (van Straten et al. 1997; Golomb, Vickrey, & Hays, 2001).

A criticism of the SA-SIP30 is that no attempt has been made to enhance the scale with items or domains of potential importance to stroke. Thus, the SA-SIP30 does not assess pain, recreation, energy, general health perceptions, overall quality of life or strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. symptoms (Golomb, Vickrey, & Hays, 2001).

The SA-SIP30 contains 30 items. Each item takes the form of a statement describing changes in behavior that reflect the impact of illness on some aspect of daily life. Patients are asked to mark items most descriptive of themselves on a given day. All responses are “yes” or “no”. Scale items are weighted to reflect the relative importance of the item to health status and are the same as the weights used in the SIP-136. In addition to maintaining much of the original subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
structure of the SIP-136, these weights help facilitate comparisons with studies using the original SIP-136.

Scoring:

The scoring of items, subscales, dimensions and total score is the same as for the original SIP. To score the scale, weights are applied to marked items, summed for each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
and expressed as a percentage for each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
ranging from 0 to 100%. Higher scores indicate less desirable health outcomes (van Straten et al., 1997; van Straten, de Haan, Limburg, & van den Bos, 2000; Finch et al., 2002; Cup, Scholte op Reimer, Thijssen, & van Kuyk-Minis, 2003). Regression weights have also been provided to allow for a calculation of estimated SIP-136 scores from SA-SIP30 scores.

Cut-off scores representative of poor health have been defined as the following: patients with scores > 33 are known to be impaired in activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living, unable to live independently, experience difficulties in self care, mobility and in performing their main activity. Similar profiles have been observed for Physical dimension scores > 40, but no cut-off values could be defined using the Psychosocial dimension (van Straten et al., 2000).

Subscales:

There are 8 subscales:

Body Care and Movement (5 items)
Social Interaction (5 items)
Mobility (3 items)
Communication (3 items)
Emotional Behavior (4 items)
Household Management (4 items)
Alertness Behavior (3 items)
Ambulation (3 items)

Subscales can be combined to form 2 dimensions:

Physical: includes the subscales Body care and movement, Ambulation, Household management and Mobility (15 items)
Psychosocial: includes the subscales Alertness behavior, Communication, Social interaction and Emotional behavior (15 items)

Equipment:

No special equipment is required to administer the SA-SIP30.

Training:

The scale is intended for self-administration or by interview (Buck, Jacoby, Massey, & Ford, 2000). No special training is necessary, however a user’s manual and trainer’s manual are available for the original SIP (McDowell & Newell, 1996). There is not yet any evidence that the SA-SIP30 can be administered by proxy, however, the original SIP-136 can be used in this fashion (Sneeuw, Aaronson, de Haan, & Limburg, 1997).

Time:

The average scale completion time has not been reported, however, the SA-SIP30 is known to be a shorter scale than the original SIP, which takes 30 minutes on average to administer.

Alternative forms of the SA-SIP30

None.

Client suitability

Can be used with:

Patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Should not be used in:

The SA-SIP30 should be administered with caution to patients who have experienced a severe strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. van Straten et al. (1997) noted that the SA-SIP30 might be less effective for patients with severe strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. because in developing the SA-SIP30, higher item weights were mostly associated with items that were removed, and these had been descriptive of more severe health status. Evidence of this came from the observation that agreement between scores obtained with the original SIP-136 and the SA-SIP30 were lower among more severely ill patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. than among healthier patients (van Straten et al., 1997). However, it is important to note that in a subsequent study by van de Port et al. (2004), this trend was only observed on the Physical dimension of the SA-SIP30 and even then, the trend was less notable than on the SIP-68 (a short version of the original SIP-136).
The SA-SIP30 should be administered with caution to patients who have a major physical disability. van Straten et al. (2000) found that the total scores of the SA-SIP30 were largely explained by the Physical dimension of the scale (66% for the subscales of the Physical dimension versus 25% for the subscales of the Psychosocial dimension). This might result in any patient with a serious physical disability being automatically detected by the scale as having poor health-related quality of life.
Patients who require a proxy to complete. Although the original SIP has been validated for proxy use, proxy use has not been examined using the SA-SIP30. For patients who have had a strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and who require a proxy, the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale is known to be a reliable and valid measure of quality of life (Duncan, Lai, Tyler, Perera, Reker, & Studenski, 2002).
Patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada). The SA-SIP30 has not been validated for use in patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada). A French questionnaire, the SIP-65, has been validated to assess quality of life in patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada), however this scale is not available in English (Benaim et al., 2003). The StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. and AphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) Quality of Life Scale-39 (SAQOL-39) is another measure that assesses quality of life and was developed specifically for use in patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada). This scale has been found to be an acceptable, reliable, and valid measure in patients with long-term aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) (Hilari, Byng, Lamping, & Smith, 2003).

In what languages is the measure available?

English (van Straten et al., 1997)

Summary

What does the tool measure?	Health-related quality of life
What types of clients can the tool be used for?	The SA-SIP30 was developed for use in patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	The average scale completion time has not been reported, however, the SA-SIP30 is known to be a shorter scale than the original SIP, which takes 30 minutes on average to administer.
Versions	The SA-SIP30 was adapted from the original SIP-136
Other Languages	No translations of the SA-SIP30 have been conducted to date.
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: Out of two studies that examined the internal consistency internal of the SA-SIP30, both studies reported excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.. Test-retest: No studies have examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the SA-SIP30. Inter-rater: No studies have examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the SA-SIP30.
ValidityThe degree to which an assessment measures what it is supposed to measure.	Content: Items least relevant to patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. were eliminated. Items with a skewed response pattern or those relevant to < 10% of patients were dropped. Linear regression was used to assess the relevance of remaining items. Item selection for each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). was completed when items in the model explained 80% of the variance in score of the original total subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). . Least relevant subscales were excluded using a stepwise linear regression with forward inclusion. When adding another subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). to the model did increase the percentage of variance more than 1%, the process was stopped. Unreliable items were excluded, as long as at least 3 items remained in each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). . Construct: Convergent: Excellent correlations were found between the SA-SIP30 and the SIP-136 total score and subscales; the SIP-68 (shortened version of the SIP-136); and the global functional health score on the Rankin Scale. Adequate correlations with the disability score on the Barthel Index; total Rankin Scale; EuroQol; and the Frenchay ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. Index. Discriminant: Poor correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. between the SA-SIP30 and the Canadian Occupational Performance Measure. Known groups: The SA-SIP30 was able to distinguish clients with lacunar infarctions from those with cortical or subcortical lesions. One study reported that when using appropriate SA-SIP30 cut-off scores, the SA-SIP30 could classify patients as dependent in their activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function. of daily living; patients able to live independently; and patients having poor health-related quality of life.
Floor/Ceiling Effects	None.
Does the tool detect change in patients?	One study examined found that the SA-SIP30 had only a moderate ability to detect change in patients from 6 months to 1 year post-stroke.
Acceptability	The SA-SIP30 is shorter and simpler than the original SIP-136. The original SIP has been tested for use with proxy respondents, however the SA-SIP30 has not yet been tested for use by proxy respondent. The SA-SIP30 should not be administered to patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person's ability to communicate. It is most often the result of stroke or head injury. An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person's intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada), and should be used with caution in patients with a major physical disability or who have suffered a severe strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Feasibility	This shorter, simpler version of the SIP should represent less administrative burden and can be more easily included in both research and clinical setting. The scale is intended for self-administration or by interview. No special training is necessary. A user’s manual and trainer’s manual are available for the original SIP only. The SA-SIP30 is fairly simple to score and is based on weights that are applied to marked items, which are then summed for each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). and expressed as a % for each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society). ranging from 0 to 100%. Higher scores indicate less desirable health outcomes.
How to obtain the tool?	Click here to find a copy of the SA-SIP30. The SA-SIP30 can also be found in van Straten et al. (1997).

Psychometric Properties

Overview

To date, only a few studies have examined the psychometric properties of the Stroke-Adapted Sickness Impact Profile (SA-SIP30). For this reason, we have included for review all of the publications that we could identify on the scale. The SA-SIP30 was originally validated by its authors (van Straten et al., 1997; van Straten et al., 2000) and was later evaluated by van der Port et al. (2004).

Reliability

Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.:
van Straten et al. (1997) developed and examined the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the SA-SIP30 in 319 patients post-stroke. The total SA-SIP30 demonstrated excellent internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. (alpha = 0.85), as did the Psychosocial (alpha = 0.78) and Physical dimensions (alpha = 0.82). All subscales had adequate internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. with the exception of the Emotional Behavior (alpha = 0.57), and Ambulation (alpha = 0.54) subscales, which were poor. With the exception of the Communication subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
, the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SIP-136 was found to be slightly higher on all items than the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SA-SIP30.

van de Port, Ketelaar, Schepers, van den Bos, and Lindeman (2004) also examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the SA-SIP30 in 122 patients with stroke and found excellent reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
for the total score (alpha = 0.82), and moderate reliability for the Physical dimension (alpha = 0.76). However, unlike the results of van Straten et al. (1997), the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the Psychosocial dimension was found to be poor (alpha = 0.68).

Inter-rater:
Not reported.

Test-retest:
Not reported.

Validity

Criterion:

None.

Content:

van Straten et al. (1997) eliminated the least relevant items for patients with stroke from the SIP-136 . Items that had a skewed response pattern were dropped, as were items relevant to less than 10% of all patients. Linear regression was used to assess the relevance of the remaining items with a forward selection strategy, using the F statistic with p = 0.5 as the criteria level for selection. The item selection for each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
was completed when the items in the regression model explained 80% of the variance in score of the original total subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
. The least relevant subscales were excluded by applying a stepwise linear regression with forward inclusion to explain the variation of the original total SIP score with the shortened subscales. When adding another subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
to the model did not result in an increase in the percentage of variance more than 1%, the process was stopped. Finally, unreliable items were excluded, while ensuring that at least three items remained in each subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
.

Construct:

A principal component analysis supported two dimensions (Physical and Psychosocial), which is evidence that the original dimension structure of the SIP-136 was retained with the SA-SIP30 (van Straten et al., 1997). Twenty percent of the SA-SIP30-explained score variance could be attributed to the Physical dimension and 11% to the Psychosocial dimension (van Straten et al., 1997).

Convergent:
van Straten et al. (1997) examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the scale by comparing the scores of the SA-SIP30 with the scores on the 136-item version in 319 patients post-stroke. The SA-SIP30 total score explained 91% of the variance in SIP-136 scores. Furthermore, 87% of the original Physical dimension scores and 88% of the Psychosocial dimension scores could be explained by the SA-SIP30. For the different subscales, the percentages of explained variance ranged from 69% (Social Interaction) to 84% (Emotional Behavior). The Spearman rank correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient between the SA-SIP30 and the SIP-136 total scores was excellent (r = 0.96). SubscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
correlations were also excellent, ranging from r = 0.75 (Emotional Behavior) to r = 0.90 (Body Care and Movement).

Also in this study by van Straten et al., the SA-SIP30 was correlated with the Barthel Index and the Rankin Scale. As expected, SA-SIP30 correlated moderately with the disability score on the Barthel Index (r = 0.50) and had an excellent correlation with the global functional health score on the Rankin Scale (r = 0.68), further demonstrating the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the SA-SIP30.

van de Port, Ketelaar, Schepers, van den Bos, and Lindeman (2004) examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the SA-SIP30 in 122 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the SA-SIP30 and total SIP-68 (a shortened version of the SIP-136) scores was excellent (r = 0.98). Similar associations were reported for the Physical (r = 0.89) and Psychosocial (r = 0.84) dimension scores.

Cup et al. (2003) found that the SA-SIP30 correlated adequately with the Barthel Index (r = -0.517), the Rankin Scale (r = 0.468), the EuroQol (r = -0.483), and the Frenchay ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Index (r = -0.426). The correlations among the SA-SIP30 and the EuroQol, Barthel Index, and Frenchay ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
Index are negative because a high score on the SA-SIP30 indicates poor health outcomes, whereas a high score on these other scales indicates positive health outcomes. The results of this study demonstrate the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the SA-SIP30 with other frequently used standardized functional measures in strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

van Straten et al. (2000) conducted a linear regression analysis and found that common measures of physical disability were closely associated with SA-SIP30 scores. The Barthel Index accounted for 36% of the variance in total SA-SIP30 scores, the Rankin scale accounted for 53%, and the Euroqol index score accounted for 44%. The results of this study also confirm the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the SA-SIP30 with other frequently used standardized functional measures in strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Discriminant.
Cup et al. (2003) examined the discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess.
of the Canadian Occupational Performance Measure in 26 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. As predicted, the correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between the scores on the Canadian Occupational Performance Measure and the SA-SIP30 was poor (r = 0.102). This was to be expected because the Canadian Occupational Performance Measure was developed to examine issues specific to the individual, whereas the SA-SIP30 is focused on a societal perspective of independence.

Known groups:
van Straten et al. (1997) found that the SA-SIP30 was unable to distinguish between clients with supratentorial and infratentorial strokes, as has been possible with the SIP-136 (de Haan, Limburg, & van der Meulen, 1995). However, the SA-SIP30 was able to distinguish clients with lacunar infarctions from those with cortical or subcortical lesions. Further, clients with lacunar infarcts reported better functional health than those with cortical or subcortial lesions on the Psychosocial dimension of the scale, the total SA-SIP30 score, and on all subscales with the exception of Emotional Behavior and Mobility.

van Straten et al. (2000) identified the cut-off scores for poor health outcomes by examining the area under the ROC curves (AUC). When using a cut-off SA-SIP30 score > 28, the percentage of patients correctly classified as dependent in their activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living on the SA-SIP30 as assessed using the Barthel Index was adequate, 77% (AUC = 0.84). When using a cut-off SA-SIP30 score > 40 for the Physical dimension alone, the percentage of patients correctly classified as dependent in their activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living was excellent, 84% (AUC = 0.90). When using a cut-off SA-SIP30 score > 25, the percentage of patients correctly classified as unable to live independently by the SA-SIP30 as measured by the Rankin Scale was adequate for the total score was excellent, 80% (AUC = 0.90). When using a cut-off of > 36 for the Physical dimension alone, the percentage of patients correctly classified was excellent, 83% (AUC = 0.90). When using a cut-off of > 33, the percentage of patients correctly classified as having poor health-related quality of life as assessed by the EuroQol was adequate, 80% (AUC = 0.80) for the total score. When using a cut-off > 40 for the Physical dimension alone, the percentage of patients correctly classified was also adequate, 79% (AUC = 0.86).

Responsiveness

van de Port et al. (2004) found that the SA-SIP30 demonstrated moderate responsivenessThe ability of an instrument to detect clinically important change over time.
in a longitudinal study. Effect sizes from 6 months to 1 year post-stroke were 0.60 for the total SA-SIP30 scores, and 0.56 and 0.65 for the Physical and Psychosocial dimensions, respectively.

References

Benaim, C., Pelissier, J., Petiot, S., Bareil, M., Ferrat, E., Royer, E., Milhau, D., Herisson, C. (2003). A French questionnaire to assess quality of life of the aphasic patient: The SIP-65. [French]. Ann Readapt Med Phys, 46(1), 2-11.
Bergner, M., Bobbitt, R. A., Pollard, W. E., Martin, D. P., Gilson, B. S. (1976). The sickness impact profile: Validation of a health status measure. Med Care, 14(1), 57-67.
Bergner, M., Bobbit, R. A., Carter, W. B., Gilson, B. S. (1981). The Sickness Impact Profile: development and final revision of health status measure. Med Care, 19, 787-805.
Buck, D., Jacoby, A., Massey, A., Ford, G. (2000). Evaluation of measures used to assess quality of life after stroke. Stroke, 31, 2004-2010.
Coons, S. J., Rao, S., Keininger, D. L., Hays, R. D. (2000). A comparative review of generic quality-of-life instruments. Pharmacoeconomics, 17, 13-35.
Cup, E. H. C., Scholte op Reimer, W. J. M., Thijssen, M. C., E., van Kuyk-Minis, M. A. H. (2003). Reliability and validity of the Canadian Occupational Performance Measure in stroke patients. Clinical Rehabilitaton, 17(4), 402-409.
de Haan, R. J., Limburg, M., van der Meulen, J. H. P. (1995). Quality of life after stroke. Stroke, 26, 402-408.
Duncan, P. W., Lai, S. M., Tyler, D., Perera, S., Reker, D. M., Studenski, S. (2002). Evaluation of proxy responses to the Stroke Impact Scale. Stroke, 33, 2593-2599.
Finch, E., Brooks, D., Stratford, P. W., Mayo, N. E. (2002). Physical Rehabilitations Outcome Measures. A Guide to Enhanced Clinical Decision-Making (second ed.), Canadian Physiotherapy Association, Toronto.
Golomb, B. A., Vickrey, B. G., Hays, R. D. (2001). A review of health-related quality-of-life measures in stroke. Pharmacoeconomics, 19(2), 155-185.
Hilari, K., Byng, S., Lamping, D. L., Smith, S. C. (2003). Stroke and Aphasia Quality of Life Scale-39 (SAQOL-39): Evaluation of acceptability, reliability, and validity. Stroke, 34, 1944-1950.
Lurie, J. (2000). A review of generic health status measures in patients with low back pain. Spine, 25, 3125-3129.
McDowell, I., Newell, C. (1996). Measuring Health. A Guide to Rating Scales and Questionnaires (2nd ed.), New York: Oxford University Press.
Sneeuw, K. C. A., Aaronson, N. K., de Haan, R. J., Limburg, M. (1997). Assessing quality of life after stroke. The value and limitations of proxy ratings. Stroke, 28, 1541-1549.
van Straten, A., de Haan, R. J., Limburg, M., Schuling, J., Bossuyt, P. M., van den Bos, G. A. M. (1997). A Stroke-Adapted 30-Item Version of the Sickness Impact Profile to Assess Quality of Life (SA-SIP30). Stroke, 28, 2155-2161.
van Straten, A., de Haan, R. J., Limburg, M., van den Bos, G. A. M. (2000). Clinical Meaning of the Stroke-Adapted Sickness Impact Profile-30 and the Sickness Impact Profile-136. Stroke, 31, 2610-2615.
van de Port, I. G. L., Ketelaar, M., Schepers, V. P. M., van den Bos, G. A. M., Lindeman, E. (2004). Monitoring the functional health status of stroke patients: the value of the Stroke-Adapted Sickness Impact Profile-30. Disability and Rehabilitation, 26(11), 635-640.

See the measure

How to obtain a copy of the SA-SIP30?

The measure is provided in van Straten et al. (1997). Please click to view a copy of the SASIP-30.