Medical Outcomes Study Short Form 36 (SF-36)
Purpose
The Medical Outcomes Study 36-item Short-Form Health Survey is a widely used, generic, patient-report measure created to assess health-related quality of life (HRQOL) in the general population. It was developed as part of the Medical Outcomes Study (a two-year study of patients with chronic conditions) (Ware & Sherbourne, 1992). Today, the SF-36 is the most commonly used generic instrument for measuring quality of life (de Haan, 2002). The SF-36 can be used, but is not limited to, persons with stroke
In-Depth Review
Purpose of the measure
The Medical Outcomes Study 36-item Short-Form Health Survey is a widely used, generic, patient-report measure created to assess health-related quality of life (HRQOL) in the general population. It was developed as part of the Medical Outcomes Study (a two-year study of patients with chronic conditions) (Ware & Sherbourne, 1992). Today, the SF-36 is the most commonly used generic instrument for measuring quality of life (de Haan, 2002). The SF-36 can be used, but is not limited to, persons with stroke
Available versions
The SF-36 was published in 1992 by Ware and Sherbourne, and further developed and validated in 1993 and 1994 respectively (Ware & Sherbourne, 1992; McHorney, Ware & Raczek, 1993; McHorney, Ware, Lu & Sherbourne, 1994). In 1996, Version 2.0 of the SF-36 (SF-36v2) was introduced, to correct for deficiencies identified in the original version. Changes include a few wording alterations, for example, “downhearted and blue” in a question on mental health symptoms is now “downhearted and depressed”. SF-36v2 is now considered “the international version” of the SF-36 (Andresen & Meyers, 2000). The original SF-36 questions had variable numbers and formats for response categories, and these have been increased and/or standardized among scales and questions. Role Functioning items now have five levels of responses rather than two. This may increase the responsiveness
of the scales. Early reports of tests of this new version have been positive (Jenkinson, Stewart-Brown, Petersen & Paice, 1999). Versions 1.0 and 2.0 of the SF-36 are available with two recall periods: the standard 4-week recall, and the acute 1-week recall period.
Features of the measure
Items:
Items of the SF-36 are divided into eight different domains:
Physical component:
- Physical functioning (10 items)
- Role limitations due to physical problems (4 items)
- Bodily pain (2 items)
- General health perceptions (5 items)
Mental component
- Social functioning (2 items)
- General mental health (5 items)
- Role limitations due to emotional problems (3 items)
- Vitality (4 items)
Other
- Health transition (1 question): The respondent is asked to rate their current health status compared to their health status one year ago. This question remain separate from the 8 subscales and is not scored.
There are 11 questions in the SF-36, with 36 items in total. With the exception of the general change in health status questions, subjects are asked to respond with reference to the past 4 weeks. An acute version of the SF-36 refers to problems in the past week only (McDowell & Newell, 1996).
Scoring:
The SF-36 does not lend itself to the generation of an overall summary score. This is because information within the individual responses is lost in the total scale score (since the total score can be achieved in a variety of ways from individual item responses) (Dorman et al., 1999). The recommended scoring system for the SF-36 is a weighted Likert system for each item. Items within subscales are totaled to provide a summed score for each subscale
or dimension. Each of the 8 summed scores is linearly transformed onto a scale from 0 (negative health) to 100 (positive health) to provide a score for each subscale
. A physical component score (PCS) and mental component score (MCS) can be derived from the scale items. However, these summary scores should be interpreted with caution. Hobart et al. (2002) examined the use of this two-dimensional model and found that these two scales accounted for only 60% of the variance in SF-36 scores. This finding suggests that there is a significant loss of information when this two-dimensional model is used.
Subscales:
The SF-36 has 8 subscales
- Physical Functioning,
- Role Limitations due to Physical Problems,
- General Health Perceptions,
- Vitality,
- Social Functioning,
- Role Limitations due to Emotional Problems,
- General Mental Health,
- Health Transition.
Equipment:
Only the test and a pencil are required. Computer administered and telephone voice recognition interactive systems of administration of the SF-36 are currently being evaluated (SF-36 Health Survey Update: John E. Ware, Jr.).
Training:
No training is required for administration of the SF-36. The SF-36 is suitable for self-administration, computerized administration, or administration by a trained interviewer in person or by telephone, to persons age 14 and older (Ware & Sherbourne, 1992).
Time:
The SF-36 is considered simple to administer and takes an average of 10 minutes to complete (Andreson & Meyers, 2000). The SF-36 has been studied for use by a proxy, however, administration by proxy is not recommended for patients with stroke
The SF-36 can also be completed as a mail survey. As a self-completed, mailed questionnaire, it has been shown to have reasonably high response rates (83% – Brazier et al., 1992, O’Mahoney, Rodgers, Thomson, Dobson, & James, 1998; 75% – 83% Dorman et al., 1998; 85% – Dorman et al., 1999; 82% overall and 69% for those over age 85 – Walters et al., 2001). However, data is typically more complete when interviewer administration is used. However, low completion rates may not be limited to self-completion or postal administration. Andresen et al. (1999) administered the SF-36 to nursing home residents by face-to-face interview and reported that only 1 in 5 residents were able to complete it. It is possible that data completeness is indicative of respondent acceptance and understanding of the survey as relevant to them (O’Mahoney et al., 1998; Andresen et al., 1999). Hayes et al. (1995) identified that the most common items missing on the self-completed questionnaire referred to work or to vigorous activity. Older respondents recognized these questions as relevant to much younger people and not pertinent to their own situation. The authors suggested modifications to some of the questions, which may increase acceptability to older populations.
Alternative forms of the SF-36
SF-12 (Ware, Kosinski, & Keller, 1996)
The SF-12 was developed as an abbreviated version of the SF-36 for use in large surveys of general and specific populations as well as large longitudinal studies of health outcomes. It can be self-administered, or administered via interview, telephone, or computer. The SF-12 takes 5 minutes or less to complete (Nemeth, 2006). The SF-12v2 was later developed to correspond to the SF-36v2 and has demonstrated the same improvements as observed with the SF-36v2 (Ware, Kosinski, Turner-Bowker & Gandek, 2002). Versions 1.0 and 2.0 of the SF-12 are available with two recall periods: the standard 4-week recall, and the acute 1-week recall period.
SF-8 (QualityMetric, Incorporated)
The SF-8, a new generic eight-item assessment, generates a health profile consisting of eight scales and two summary measures describing HRQOL. The SF-8 uses one question to measure each of the eight SF-36 domains. The development, validation and norming of the new SF-8, including standard (4-week recall), acute (1-week recall), and 24-hour recall versions is documented in the SF-8 manual, “How to Score and Interpret Single-Item Health Status Measures: A Manual for Users of the SF-8 Health Survey” (Ware, Kosinski, Dewey & Gandek, 2001). The SF-8 Health Survey can be self-administered, computer-administered, or given by a trained interviewer in person or by telephone to persons aged 14 and older. It takes approximately 1-2 minutes to complete and it has been translated and validated for use in more than 30 countries (for a list of these countries, click on this list) (accessed July 12, 2006).
SF-6D (Brazier, Usherwood, Harper, & Thomas, 1998; Brazier, Roberts, & Deverill, 2002)
The SF-6D is a preference-based scoring system that uses six subscales from the SF-36, to allow for calculations of utilities from SF-36 and SF-36v2 responses. The eight dimensions from SF-36 were reduced to six by omitting General Health Perceptions and combining Role Limitations-Physical and Role Limitatons-Emotional. Good reliability
and validity
have been reported for the SF-6D (Petrou & Hockley, 2005; Brazier, Roberts, Tsuchiya & Busschbach, 2004).
For a fee, all versions of the SF Health Survey can be scored online via Quality Metric’s website (accessed July 12, 2006).
Client suitability
Can be used with:
- Individuals with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain..
The SF-36 is the most widely used measure to assess HRQOL in patients with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain., however, its suitability in this patient population has been contentious:
- Hobart, Williams, Moran, and Thompson (2002) reported that of their sample of 177 post-stroke patients, five of the eight SF-36 subscales were found to have limited validityThe degree to which an assessment measures what it is supposed to measure.
as outcome measures, and that the reporting of physical and mental summary scores were not supported. The authors questioned the use of the SF-36 in patients with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain.. - de Haan (2002) reported that when the results of the relatively small study of Hobart et al. (2002) were taken in conjunction with the findings of previous research, there was insufficient evidence to question the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of “true” variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to “noise” in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
and validityThe degree to which an assessment measures what it is supposed to measure.
of the SF-36 subscales in strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain..
Should not be used in:
- Patients who cannot understand written or spoken language. Make sure the patient is fluent in the language used in the survey.
- More severely affected strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. survivors who need a proxy to complete (Dorman et al., 1998). Instead, a stroke-specific quality of life measure such as the StrokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale, which has been evaluated successfully for use by proxy respondents, may be more a more appropriate measure to be administered by proxy. Another more reliable measure of health status for strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. patients by proxy is the Health Utilities Index (HUI) which has been reported to have moderate to high agreement in interrater reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of “true” variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to “noise” in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
between strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. patients and proxies (Mathias et al., 1997). - Patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person’s ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person’s intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada). For patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person’s ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person’s intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada), a stroke-specific quality of life measure developed specifically for patients with aphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person’s ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person’s intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada), such as the StrokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. and AphasiaAphasia is an acquired disorder caused by an injury to the brain and affects a person’s ability to communicate. It is most often the result of stroke or head injury.
An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person’s intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada) Quality Of Life Scale (SAQOL-39), should be used (Hilari, Byng, Lamping, & Smith, 2003). - The SF-36 should not be used to document individual patient change. Dorman, Slattery, Farrell, Dennis, and Sandercock (1998) found that although the SF-36 can function effectively as a discriminatory measure for assessing health-related quality-of-life outcomes in groups of patients after strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain., the SF-36 may not be adequate for serial assessments of individual patients, unless large differences over time are expected. Thus, the SF-36 should be used for large group comparisons only.
In what languages is the measure available?
The SF-36 is available in a number of languages. In 1991, the International Quality of Life Assessment launched a project aimed at translating, validating and norming the SF-36 health survey. The project, which is based at the Health Assessment Lab in Boston, has sponsored investigators from 14 countries: Australia, Belgium, Canada, Denmark, France, Germany, Italy, Japan, The Netherlands, Norway, Spain, Sweden, the United Kingdom (English version), and the United States (English and Spanish versions). In addition, the SF-36 has been translated for use in more than 40 other countries, including: Argentina, Armenia, Austria, Bangladesh, Brazil, Bulgaria, Cambodia, Chile, China, Colombia, Costa Rica, Croatia, Czech Republic, Finland, Greece, Guatemala, Honduras, Hong Kong, Hungary, Iceland, Israel, Korea, Latvia, Lithuania, Mexico, New Zealand, Peru, Poland, Portugal, Romania, Russia, Singapore, Slovak Republic, South Africa, Switzerland, Taiwan, Tanzania, Turkey, the United Kingdom (Welsh), the United States (Chinese, Japanese, Vietnamese), Uruguay, Venezuela, and Yugoslavia. There are more than 500 publications that use translations or English-language adaptations of the SF-36. For information about the availability of SF-36 translations, visit https://www.qualitymetric.com/health-surveys-old/the-sf-36v2-health-survey/.
Summary
What does the tool measure? | Health related quality of life |
What types of clients can the tool be used for? | The SF-36 is a generic measure that can be used, but is not limited to, persons with stroke |
Is this a screening or assessment tool? |
Assessment |
Time to administer | The SF-36 is considered simple to administer and takes an average of 10 minutes to complete. |
Versions | SF-12; SF-8, SF-6D |
Other Languages | The SF-36 is available in a number of languages. There are more than 500 publications that use translations or English-language adaptations of the SF-36. For information about the availability of SF-36 translations, visit www.sf-36.org |
Measurement Properties | |
Reliability |
Internal consistency Out of 10 studies examining the internal consistency Test-retest: Inter-rater: |
Validity |
Criterion: Predictive: Subscales of the SF-36 have been found to be predictive of death, hospitalizations, physician visits, and the burden of depression among depressed elderly persons. Construct: Known groups: |
Floor/Ceiling Effects | Of the 8 studies examined, 6 reported that the SF-36 had significant floor and ceiling effects, 1 reported significant ceiling effects only, and 1 reported significant floor effects only. |
Does the tool detect change in patients? |
Out of 3 studies examined, 1 reported that the SF-36 had a large ability to detect change, 1 reported moderate to large ability to detect change, (except for the Social Functioning and Mental Health dimensions which both had small effect sizes); 1 reported small (Role Limitations-Emotional, Mental component summary score) to large (Bodily Pain, Physical component summary score) ability to detect change. To our knowledge, no studies have examined the ability of the SF-36 to detect change in patients with stroke |
Acceptability | The SF-36 cannot be used with patients who cannot understand written or spoken language, severely affected patients who need a proxy to complete, or patients with aphasia An individual with aphasia may experience difficulty expressing themselves when speaking, difficulty understanding the speech of others, and difficulty reading and writing. Sadly, aphasia can mask a person’s intelligence and ability to communicate feelings, thoughts and emotions. (The Aphasia Institute, Canada). To our knowledge, no studies have examined the ability of the SF-36 to detect change in patients with stroke |
Feasibility | The SF-36 is simple to administer and requires no training or special equipment. It is suitable for self-administration, computerized administration, or administration by a trained interviewer in person or by telephone, to persons age 14 and older. |
How to obtain the tool? | All versions of the SF-36 can be viewed by visiting the website: www.qualitymetric.com |
Psychometric Properties
Overview
Extensive psychometric testing has been conducted on the SF-36. However, little research has been conducted specifically in a post-stroke population. For the purposes of this review, we conducted a literature search to identify all relevant publications on the psychometric properties of the SF-36. We then selected to review articles from high impact journals, and from a variety of authors. The creators of the SF-36 have performed many of the psychometric studies that exist on the survey, however, we preferentially reviewed studies carried out by other authors who were not implicated in the development of the SF-36.
Floor and Ceiling Effects
Lai, Perera, Duncan, and Bode (2003) administered the Stroke
had major floor effects (floor effects of 37% and 100% were observed for patients with a modified Rankin scale grade 4 or 5, respectively). Further, in contrast to the Stroke
had major ceiling effects (ceiling effects up to 60% for modified Rankin scale grade 0).
Anderson et al. (1996) examined the SF-36 in a cohort of 90 long-term (1-year) stroke survivors. The validity
of the SF-36 was assessed by comparing patients’ scores on the SF-36 with those obtained for the Barthel Index, the 28-item General Health Questionnaire, and the Adelaide Activities
Profile. Large ceiling effects were reported for the SF-36 Role Limitations-Physical (53%), Bodily Pain (43%), Social Functioning (67%) and Role Limitations-Emotional (72%) subscales. No floor effects exceeding 7% were reported for the SF-36, and scores for the SF-36 Physical Functioning subscale
were more uniformly distributed than Barthel Index scores suggesting the SF-36 has lower floor and ceiling effects than the Barthel Index.
Brazier et al. (1996) tested the psychometric properties of the SF-36 and the EuroQol on an elderly female population (n=380) aged 75 and older, and compared these scales to the Office of Population Census and Surveys Disability Survey. Patients were administered the scales at baseline and again six months later. Major floor effects (in excess of 25%) were reported for the Role Limitations-Physical and Role Limitations-Emotional subscales.
Hobart et al. (2002) examined SF-36 data from 177 people after stroke
O’Mahoney et al. (1998) examined the suitability of the SF-36 for assessing quality of life in older patients with stroke
Weinberger, Oddone, Samsa and Landsm (1996) administered the SF-36 three times over a 4-week period to 172 veterans receiving care in a General Medicine Clinic. Telephone, face-to-face, and self-administration modes of administering the SF-36 were compared. For face-to-face administration of the SF-36, notable floor effects were observed for the Role Limitations-Physical (43.8%) and Role Limitations-Emotional (30.3%) subscales. Notable ceiling effects were observed for the Social Functioning (31.5%), Role Limitations-Physical (14.6%), and Role Limitations-Emotional (47.2%) subscales. For telephone administration, significant floor effects were observed for the Role Limitations-Physical (53.2%) and Role Limitations-Emotional (34.0%) subscales. Significant ceiling effects were observed for the Role Limitations-Emotional (36.2%) subscale
only. Self-administration of the SF-36 resulted in significant floor effects for the Role Limitations-Physical (47.1%), and Role Limitations-Emotional (25.0%) subscales. Further, notable ceiling effects were observed for the Social Functioning (27.8%), Role Limitations-Physical (14.7%), and Role Limitations-Emotional (52.8%) subscales.
Walters, Munro and Brazier (2001) administered the SF-36 to a community-dwelling population over the age of 65. Substantial floor (30.9-61%) and ceiling effects across all age groupings (65-69, 70-74, 75-79, 80-84, and 85+) were observed for the Role Functioning-Physical (floor effects: 30.9%-60% and ceiling effects: 11.7%-38.6%) and Role Functioning-Emotional (floor effects: 25.6%-50.4% and ceiling effects: 32.2% – 53.2%) subscales. Substantial ceiling effects were also noted for the Social Functioning and Bodily Pain subscales (15%-46.7% and 14.1%-21.1%, respectively).
Andresen, Gwendell, Gravitt, Aydelotte, and Podgorski (1999) administered the SF-36 to 97 nursing home residents and reported substantial floor effects of 26.8% and 29.5% for the Physical Functioning and Role Limitations-Physical subscales, respectively. Substantial ceiling effects of 36.1%, 49.5% and 21.6% were reported for the Social Functioning, Role Limitations-Emotional, and Bodily Pain subscales, respectively.
Reliability
studies have demonstrated excellent internal consistency
(Ware, Snow, Kosinski & Gandek, 1993; Brazier et al., 1992; Lyons, Perry, & Littlepage, 1994; McHorney, Ware, Lu, & Sherbourne, 1994; Ruta, Garratt, Wardlaw, & Russell, 1994). Test-retest reliability
evaluations have also suggested that the SF-36 scores can generally be reproduced (Brazier et al. 1992; Beaton, Hogg-Johnson, & Bombardier, 1997).
Brazier et al. (1992) found considerable evidence for the reliability
of the SF-36. For the internal consistency
coefficients exceeded 0.75 for all dimensions of the scale with the exception of the Social Functioning subscale
(alpha = 0.73). To identify the test-retest reliability
, Brazier et al. (1992) calculated correlation
coefficients and found coefficients ranging from adequate (0.60 for Social Functioning) to excellent (0.81 for Physical Functioning).
Jenkinson, Coulter and Wright (1993) mailed the SF-36 in a large community sample to explore the questionnaire’s internal consistency
. Cronbach’s alpha on all subscales of the SF-36 were excellent, exceeding 0.80, with the exception being the Social Functioning subscale
, which was of adequate internal consistency
Jenkinson, Wright and Coulter (1994) mailed the SF-36 to 13,042 randomly selected subjects between the ages of 16-64 years. The internal consistency
, which was poor (exceeded 0.50). Due to the small number of items in this domain this result is considered acceptable.
Brazier et al. (1996) calculated the reliability
of the SF-36 in 380 women over the age of 75. Spearman’s rank correlation
coefficients between scores for those who said their health had not changed between initial assessment and first follow-up by perceived health change were calculated and coefficients ranged from poor (r = 0.28 for Social Functioning) to adequate (0.70 for Vitality) over a retest period of 6 months. These results suggest that the SF-36 has only adequate test-retest reliability
in the elderly. Brazier et al (1996) also examined the internal consistency
Andresen et al. (1999) administered the SF-36 to 97 nursing home residents and then re- administered the SF-36 after 1 week. Test-retest intraclass correlation
coefficients (ICC) ranged from adequate to excellent (from 0.55 to 0.82). Further, the ICCs for both the physical summary and mental summary scores were excellent (ICC = 0.82 and 0.79 respectively).
Essink-Bot, Krabbe, Bonsel, and Aaronson (1997) administered the SF-36, The Nottingham Health Profile, the COOP/WONCA charts (The Dartmouth Primary Care Cooperative Information Project/World Organization of National Colleges, Academies, and Academic Associations of General Practices/Family Physicians), and the EuroQol to migraine sufferers. The scales of the SF-36 yielded internal consistency
Walters, Munro and Brazier (2001) reported excellent internal consistency
(alpha = 0.79) when the survey was administered by mail to a sample of 9,897 subjects aged 65-104 years.
McHorney, Ware and Sherbourne (1994) evaluated data from 3,445 patients from the Medical Outcomes Study (MOS) and replicated data across 24 subgroups differing in socio-demographic characteristics, diagnosis, and disease severity. Across patient groups, all scales passed tests for item- internal consistency
coefficients ranged from a low of 0.65 to a high of 0.94 across scales (median = 0.85) and varied somewhat across patient subgroups.
Weinberger et al. (1996) tested whether the SF-36 is influenced by method of administration (face-to-face interview, self administration and telephone interview) in 172 veterans receiving care at a General Medical Clinic. All patients were asked to complete the SF-36 three times over a 4-week period. Cronbach’s alpha coefficients indicated that items in all eight SF-36 domains were highly internally consistent, regardless of the mode of administration, however they showed large variation over short intervals. Specifically, of 24 computed Cronbach’s alphas (i.e., eight scales times three modes of administration), only one was below 0.70 (Social Function via telephone administration), whereas 17 exceeded 0.80. Cronbach’s alphas did not differ significantly by method of administration. Test-retest correlations ranged from r = 0.55 (Physical Role Function by telephone administration) to r = 0.94 (Physical Function by self-administration).
Hagen, Bugge, and Alexander (2003) examined the reliability
of the SF-36 in patients in the early post-stroke period. The SF-36 was administered at 1, 3 and 6 months after stroke
Dorman et al. (1998) assessed the test-retest reliability
and the internal consistency
reported in stroke
Furthermore, test-retest reliability
was negatively affected by the use of proxy respondents in this study. While the use of a proxy may be the only means by which to include data from more severely affected stroke
Hobart, Williams, Moran and Thompson (2002) argue that the SF-36 has limited reliability
as the General Health Perceptions and Social Functioning scales generate low reliability
scores and have limited convergent and discriminant validity
. However, de Haan (2002) argues that Hobart et al.’s conclusions can be challenged. The reliability
of only one scale (General Health Perceptions) was marginally less (Cronbach’s alpha = 0.68) than the authors’ predefined criteria of alpha = 0.70. Although it is often recommended that coefficient values should be above 0.80, de Haan points out that coefficients above 0.70 are generally regarded as acceptable for scales when assessing outcome on a group level.
Anderson, Laubscheret and Burns (1996) administered the Australian version of the SF-36 to 90 individuals at one-year post-stroke. The authors concluded that the SF-36 has satisfactory internal consistency
Validity
Criterion:
Predictive:
McHorney (1996) examined data from the Medical Outcomes Study. The General Health Perceptions subscale
was found to be most predictive of death (death rate of patients in lowest quartile for SF-36 General Health scale was three times greater than for patients with SF-36 scores in the highest quartile, followed by scores in Physical Functioning). Baseline Physical Functioning, Role Limitations-physical, and Pain subscales were most predictive of hospitalizations. Moreover, Pain, General Health and Vitality subscales were most predictive of physician visits.
Beusterien, Steinwald, & Ware (1996) found that the SF-36 Mental Health subscale
and mental component summary measure were strongly associated with severity of depression
in cross-sectional analyses. These results suggest that the SF-36 is useful for estimating the burden of depression
among depressed elderly persons.
Rumsfeld et al. (1999) tested whether the physical and mental component summary scores from the preoperative SF-36 health status survey predicted mortality in 3,956 patients following coronary artery bypass graft surgery (CABG). The physical component summary of the preoperative SF-36 was found to be a statistically significant risk factor for 6-month mortality following CABG surgery. In multivariate analysis, a 10-point lower SF-36 physical component summary score had an odds ratio (OR) of 1.39 for predicting mortality. The SF-36 mental component summary score was not associated with 6-month mortality in multivariate analyses (OR = 1.09). Thus, preoperative patient self-report of the physical component of the SF-36 health status may be helpful for risk stratification and clinical decision making for patients undergoing CABG surgery.
Construct:
Walters et al. (2001) reported significant relationships in expected directions to support construct validity
among older adults. Scores in all scales were reported to decrease as age increased. Women reported worse health than men on all scales even after adjusting for age. Respondents who had recently visited their physician reported poorer health on all scales and people living alone had lower scores except on general health.
Ware, Kosinski, and Keller (1994) examined the construct validity
of the 8 subscales of the SF-36. Physical Functioning was shown to be the best all around measure of physical health (r = 0.85), and Mental Health was the most valid measure of mental health (r = 0.87). Interestingly, Mental Health was one of the poorest measures of the physical component (r = 0.17) and Physical Functioning was the poorest measure of the mental component (r = 0.12). The Vitality (r = 0.47 for physical health and r = 0.65 mental health component) and General Health (r = 0.69 for the physical health component and r = 0.37 for the mental health component) subscales had excellent or adequate validity
for both components.
Construct (in patients with stroke):
Wilkinson et al. (1997) interviewed 106 people less than 75 years old and their caregivers following a first-ever strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain.. Rank correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases – for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases – for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients of the Barthel Index with the SF-36 subscales in first-ever strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. patients ranged from poor (r = 0.22 for Role Limitation-Emotional subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
) to excellent (0.81 for Physical Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
).
Convergent/Discriminant:
Convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the SF-36 is generally strongly supported in comparison to similar domains of condition-specific measures (Fielder, Denholm, Lyons, & Fielder, 1996; Nortvedt, Riise, Myhr, & Nyland, 1999; The Counseling Versus Antidepressants in Primary Care Study Group, 1999; Benninger, Ahuja, Gardner, and Grywalski, 1998; Buchwald et al., 1996; Anderson, Laubscher, & Burns, 1996) and other generic HRQOL measures (Andresen et al., 1999; Andresen, Rothenberg, & Kaplan, 1998; Rothwell, McDowell, Wong, & Dorman, 1997). Discriminant validityMeasures that should not be related are not. Discriminant validity examines the extent to which a measure correlates with measures of attributes that are different from the attribute the measure is intended to assess.
is usually rated highly for the SF-36 (e.g. Andresen et al., 1999; The Canadian Burden of Illness Study Group, 1998; Buchwald, Pearlman, Umali, Schmaling, & Katon, 1996, Komaroff et al., 1996, O’Neill & Kelly, 1996) although some studies disagree (e.g. Colantonio, Dawson, McLellan, 1998; Lalonde, Clarke, Joseph, Mackenzie, & Grover, 1999; Myers & Wilks, 1999).
Andresen et al (1999) administered the SF-36, the Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely “pull themselves together” and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale and the Mini-Mental State Examination to 97 nursing home residents. ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of daily living and medication intake data were recorded. Convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
between the SF-36 Physical Health subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
and the ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living Index was adequate (r ranged from -0.37 to -0.43). These correlations are negative because a high score on the SF-36 indicates positive health status, whereas a high score on the ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living index indicates dependence. Physical health scores from the SF-36 correlated more strongly with Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely “pull themselves together” and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale scores than ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living Index scores (-0.63 vs. 0.01). However, the Role Limitations-Physical subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
correlated more strongly with Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely “pull themselves together” and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale scores than ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living scores. Social Functioning, Role Limitations-Emotional, Vitality and Mental Health subscales all correlated more strongly with Geriatric DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely “pull themselves together” and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Scale scores than ActivitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
of Daily Living scores.
Brazier et al. (1992) reported correlations of -0.41 (Social Functioning vs. social isolation) to -0.68 (Vitality vs. energy) between similar scales on the SF-36 and Nottingham Health Profile. Correlations between dimensions less clearly related ranged form -0.18 (Physical Functioning vs. emotional reaction) to -0.53 (Social Functioning vs. emotional reactions). These correlations are negative because a high score on the SF-36 indicates positive health status, whereas a high score on the Nottingham Health Profile indicates poorer perceived health status.
Dorman et al (1999) reported that the SF-36 Physical Functioning subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
correlated most closely with mobility, self-care and activitiesAs defined by the International Classification of Functioning, Disability and Health, activity is the performance of a task or action by an individual. Activity limitations are difficulties in performance of activities. These are also referred to as function.
domains of EuroQol (r = 0.57, 0.65 and 0.63, respectively) and less strongly with the EuroQol psychological domain (r = 0.34). SF-36 Bodily Pain subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
correlated with the EuroQol pain domain (r = 0.66) and adequately correlated with all EuroQol domains. Role Functioning-Emotional correlated most closely with the EuroQol psychological domain (r = 0.43), and correlated least with the EuroQol self care domain (r = 0.24). The SF-36 Mental Health subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
was not closely related to the psychological domain (r = 0.21) or to the physical EuroQol domains (r = 0.06 to 0.10). The SF-36 General Health subscaleMany measurement instruments are multidimensional and are designed to measure more than one construct or more than one domain of a single construct. In such instances subscales can be constructed in which the various items from a scale are grouped into subscales. Although a subscale could consist of a single item, in most cases subscales consist of multiple individual items that have been combined into a composite score (National Multiple Sclerosis Society).
correlated adequately with EuroQol overall HRQOL rating (r = 0.66).
Known Groups:
Patients diagnosed with ≥ 1 chronic physical problem had lower scores on all dimensions of the SF-36 except Mental Health, in comparison to healthy age-matched controls. The SF-36 scores were distributed as expected for sex, age, social class and use of health services (Brazier et al., 1992).
The SF-36 was found to discriminate between age groups (>75 years versus 75+) on Physical Functioning, Vitality and Change in Health subscales and between groups based on setting (general practice versus hospital outpatients) on the Physical Functioning and Role Functioning-Physical subscales (Hayes et al. 1995).
Essink-Bot et al. (1997) reported that the SF-36 was able to discriminate between migraine sufferers and controls on all subscales (ROC/AUC = 0.54 – 0.67) although this relationship was poor. The SF-36 was also able to discriminate between groups of migraine sufferers based on absence from work (0 vs. ≥ 0.5 days, ROC/AUC ranged from poor, 0.61 to adequate, 0.79).
Brazier et al. (1996) reported that SF-36 scores distinguished groups based on recent visits to their family doctor, hospital inpatient stays and longstanding illness.
Known Groups (in patients with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain.):
Anderson et al. (1996) administered the Australian version of the SF-36 to 90 stroke
was assessed by comparing patients’ scores on the SF-36 with those obtained for the Barthel Index, the 28-item General Health Questionnaire, and the Adelaide Activities
Profile, an instrument developed from the Frenchay Activities
Index. Construct validity
was demonstrated by significant differences across all eight SF-36 scales for patients with identified health problems. For patients dependent in activities
of daily living, the difference in mean scores was greatest for the physical functioning and general health scales, whereas for patients with emotional health problems, the strongest associations were with the Social Functioning, Role Limitations-Emotional, and Mental Health subscales.
Mayo et al. (2002) interviewed persons with first-ever stroke
Cross-diagnostic:
Dallmeijer et al. (2007) examined the unidimensionality and differential item functioning of the Physical Functioning subscale
of the SF-36 using Rasch analysis
in patients with stroke
, except one for the ALS group (bathing/dressing item), formed a unidimensional scale, supporting the use of a sum score as a measure of Physical Functioning within these diagnostic groups. The pooled analysis showed inadequate fit to the Rasch model for the ‘walking several hundred meters’ item of the other 9 items, 5 showed differential item functioning for stroke
Responsiveness
Harwood and Ebrahim (2000) examined the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also “Specificity.”
to change of the SF-36 in 81 patients before and after hip replacement. Eighty-nine percent of patients reported improvements three months after surgery. The largest changes were seen on the SF-36 Pain scale (large effect sizes of 1.2 at three months and 1.5 at 6-12 months), Physical Function (large effect sizes of 1.1 at 3 months and 1.3 at 6-12 months) and Role Limitation-Physical (large effect sizes of 0.8 at 3 months and 1.2 at 6-12 months) scales, suggesting that some of the SF-36 dimensions are very sensitive to change.
Brazier, Walters, Nicholl and Kohler (1996) tested the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also “Specificity.”
of the SF-36, EuroQol and the Office of Population Census and Surveys Disability Survey in an elderly female population. These measures were administered by interview in a hospital clinic at baseline. A random subsample of respondents was retested six months later. SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also “Specificity.”
of the instruments was quantified by estimating effect sizes for hypothesized changes in health status. There was some evidence of greater sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also “Specificity.”
to lower levels of morbidity in the SF-36. Hypothesizing a change from having a long standing illness to no long-standing illness was associated with moderate to large effect sizes across dimensions of the three instruments, except the Social Functioning (ES = 0.41) and Mental Health (ES = 0.31) dimensions of the SF-36 which both had small effect sizes. The effect sizes for differences in instrument scores between the age groups were small (in the range 0.00-0.50), with the highest for Physical Functioning. The SF-36 was rated as more sensitive to change than the EuroQol for older adult women.
In a study by Mossberg and McFarland (2001), 6 outpatient rehabilitation clinics incorporated the SF-36 into everyday practice. Ninety patients completed the SF-36 health status questionnaire before initiating treatment and again at discharge. Only nonsurgical patients without comorbidities were enrolled. Effect sizes for the SF-36 (admission to outpatient rehabilitation to discharge) ranged from small (0.48 for Role Limitations-Emotional) to large (1.38 for Bodily Pain). The physical component summary score effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the “effect size correlation”.
was large (ES = 0.80) and the mental component summary score effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the “effect size correlation”.
was small (ES = 0.45).
The SF-36 is increasingly being used in strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. studies (Anderson, Laubscher & Burns, 1996; Duncan et al. 1997) and in strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. clinical trials. However, the psychometric properties of the SF-36 soon after strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. are not well known, as most of the current data are from patients one year or more after the strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. (e.g. Anderson et al., 1996; Duncan et al., 1997). We did not identify any studies on the responsivenessThe ability of an instrument to detect clinically important change over time.
of the SF-36 in patients with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain..
Muller-Nordhorn et al. (2004) examined the responsivenessThe ability of an instrument to detect clinically important change over time.
to change of the SF-12 in patients with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. or transitory ischemic attack. Patients (n=558) were administered the SF-12 at baseline (referring to status prior to the event) and after 12 months. In patients with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain., standardized response means (SRMs) were small for the physical component summary scale of the SF-12 (SRM 0.49) and moderate for the mental component summary scale of the SF-12 (SRM 0.52). In patients with transitory ischemic attack, SRMs were below 0.2 for the physical component summary scale of the SF-12 and small for the mental component summary scale of the SF-12 (SRM 0.34). SRMs increased with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. severity as indicated by the National Institutes of Health StrokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. Scale score. Thus, the SF-12 summary scales show a small to moderate responsivenessThe ability of an instrument to detect clinically important change over time.
to change in patients after strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain.. ResponsivenessThe ability of an instrument to detect clinically important change over time.
to change was higher in patients with greater strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. severity.
The observation that patients with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. had scores similar to patients with transient ischemic attacks raises questions about the ability of the SF-36 to discriminate and to be responsive to clinical changes in patients with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. (Duncan et al., 1997). Currently, no evaluative stroke-specific HRQOL instrument is available, and it remains to be seen whether the generic HRQOL instruments such as the SF-36 are sufficiently responsive to be useful in clinical trials. More information regarding the responsivenessThe ability of an instrument to detect clinically important change over time.
of the SF-36 will be known when a number of ongoing current strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. trials are completed (Williams, 1998).
References
- Aaronson, N. K., Muller, M., Cohen, P. D. A., Essink-Bot, M. L., Fekkes, M., Sanderman, R., Sprangers, M. A., Velder, A., Verrips, E. (1998). Translation, validation and norming of the Dutch language version of the SF-36 health survey in community and chronic disease populations. J Clin Epidemiol, 51, 1055-1068
- Anderson, C., Laubscher, S., Burns, R. (1996). Validation of the Short Form 36 (SF-36) Health Survey Questionnaire among stroke patients. Stroke, 27(10), 1812-1816.
- Andresen, E. M., Meyers, A. R. (2000). Health-related quality of life outcomes measures. Arch Phys Med Rehabil, 81(12), S30-45.
- Andresen, E. M., Gwendell, W., Gravitt, G. W., Aydelotte, M. E., Podgorski, C. A. (1999). Limitations of the SF-36 in a sample of nursing home residents. Age and Ageing, 28, 562-566.
- Andresen, E. M., Fouts, B. S., Romeis, J. C., Brownson, C. A. (1999). Performance of health-related quality-of-life instruments in a spinal cord injured population. Arch Phys Med Rehabil, 80. 877-884.
- Andresen, E. M., Rothenberg, B. M., Kaplan, R. M. (1998). Performance of a self-administered mailed version of the Quality of Well-Being (QWB-SA) questionnaire among older adults. Med Care, 36, 1349-1360.
- Beaton, D. E., Hogg-Johnson, S., Bombardier, C. (1997). Evaluating changes in health status: Reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders. J Clin Epidemiol, 50(1), 79-93.
- Beaton, D. E., Hogg-Johnson, S., Bombardier, C. (1997). Evaluating changes in the health status: reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders. J Clin Epidemiol, 50, 79-93.
- Benninger, M. S., Ahuja, A. S., Gardner, G., Grywalski, C. (1998). Assessing outcomes for dysphonic patients. J Voice, 12, 540-550.
- Beusterien, K. M., Steinwald, B., Ware, J. E. (1996). Usefulness of the SF-36 Health Survey in measuring health outcomes in the depressed elderly. J Geriatr Psychiatry Neurol, 9(1), 13-21.
- Beck, A. T., Rial, W. Y., Rickets, K. (1974). Short form of Depression Inventory: Cross-validation. Psychological-Reports , 34(3), 1184-1186.
- Brazier, J., Roberts, J., Tsuchiya, A., Busschbach, J. (2004). A comparison of the EQ-5D and SF-6D across seven patient groups. Health Econ 13, 873-884.
- Brazier, J., Usherwood, T., Harper, R., Thomas, K. (1998). Deriving a preference-based single index from the UK SF-36 Health Survey. J Clin Epidemiol, 51, 1115-1128.
- Brazier, J.E., Walters, S.J., Nicholl, J.P. & Kohler, B. (1996). Using the SF-36 and EuroQol on an Elderly Population. Quality of Life Research, 5, 195-204.
- Brazier, J., Roberts, J., Deverill, M. (2002). The estimation of a preference-based measure of health from the SF-36. J Health Econ, 21, 271-292.
- Brazier, J. E., Harper, R., Jones, N. M. B. et al. (1992). Validating the SF-36 health survey questionnaire: new outcome measure for primary care. BMJ, 305, 160-164.
- Buchwald, D., Pearlman, T., Umali, J., Schmaling, K., Katon, W. (1996). Functional status in patients with chronic fatigue syndrome, other fatiguing illnesses, and healthy individuals. Am J Med, 101, 364-370.
- Ciconelli, R. M. (1997). Translation and validation to the Portuguese of the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36) [doctoral thesis]. Federal University of São Paulo, São Paulo, Brazil.
- Colantonio, A., Dawson, D. R., McLellan, B. A. (1998). Head injury in young adults: long-term outcome. Arch Phys Med Rehabil, 79, 550-558.
- Dallmeijer, A. J., de Groot, V., Roorda, L. D., Schepers, V. P. M., Lindeman, E., van den Berg, L. H., Beelen, A., Dekker, J. (2007). Cross-diagnostic validity of the SF-36 physical functioning scale in patients with stroke, multiple sclerosis and amyotrophic lateral sclerosis: A study using rasch analysis. J Rehabil Med, 9, 63 -169.
- de Haan, R. J. (2002). Measuring quality of life after stroke using the SF-36. Stroke, 33, 1176-1177.
- Dorman, P., Slattery, J., Farrell, B., Dennis, M., Sandercock, P. (1998). Qualitative comparison of the reliability of health status assessments with the EuroQol and SF-36 Questionnaires After Stroke. Stroke, 29, 63-68.
- Dorman, P. J., Dennis, M., Sandercock, P. (1999). How do scores on the EuroQol relate to scores on the SF-36 after stroke? Stroke, 30(10), 2146-2151.
- Duncan, P. W., Samsa, G. P., Weinberger, M., Goldstein, L. B., Bonito, A., Witter, D. M., Enarson, C., Matchar, D. (1997). Health status of individuals with mild stroke. Stroke, 28, 740-745.
- Essink-Bot, M. A., Krabbe, P. F., Bonsel, G. J., Aaronson, N. K. (1997). An empirical comparison of four generic health status measures: The Nottingham Health Profile, the Medical Outcomes Study 36-Item Short-Form Health Survey, the COOP/WONCA Charts, and the EuroQol Instrument. Med Care, 35(5), 522-537.
- Fielder, H., Denholm, S. W., Lyons, R. A., Fielder, C. P. (1996). Measurement of health status in patients with vertigo. Clin Otolaryngol, 21,124-126.
- Fukuhara, S., Ware, J. E., Kosinski, M., Wada, S., Gandek, B. (1998). Psychometric and Clinical Tests of Validity of the Japanese SF-36 Health Survey. J Clin Epidemiol, 1, 1045-1053.
- Hagen, S., Bugge, C., Alexander, H. (2003). Psychometric properties of the SF-36 in the early post-stroke phase. Journal of Advanced Nursing, 44(5), 461-468.
- Harwood, R. H., Ebrahim, S. (2000). A comparison of the responsiveness of the Nottingham extended activities of daily living scale, London handicap scale, and SF-36. Disability & Rehabilitation , 22(17), 786-793.
- Hayes, V., Morris, J., Wolfe, C., Morgan, M. (1995). The SF-36 Health Survey Questionnaire: Is it suitable for use with older adults? Age and Ageing, 24, 120-125.
- Hilari, K., Byng, S., Lamping, D. L., Smith, S. C. (2003). Stroke and Aphasia Quality of Life Scale-39 (SAQOL-39): Evaluation of acceptability, reliability, and validity. Stroke, 34, 1944-1950.
- Hobart, J. C., Williams, L. S., Moran, K., Thompson, A. J. (2002). Quality of life measurement after stroke: Uses and abuses of the SF-36. Stroke, 33, 1348-1356.
- Jenkinson, C., Coulter, A., Wright, L. (1993). Short form 36 (SF36) health survey questionnaire: Normative data for adults of working age. BMJ, 306(6890), 1437-1440.
- Jenkinson, C., Wright, L., Coulter, A. (1994). Criterion validity and reliability of the SF-36 in a population sample. Quality of Life Research, 3(1), 7-12.
- Jenkinson, C., Stewart-Brown, S., Petersen, S., Paice, C. (1999). Assessment of the SF-36 version 2 in the United Kingdom. J Epidemiol Community Health, 53(1), 46-50.
- Komaroff, A.L., Fagioli, L.R., Doolittle, T.H., Gandek, B., Gleit, M.A., Gueriero, R.T., et al. (1996). Health status in patients with chronic fatigue syndrome and in general population and disease comparison groups. Am J Med,101, 281-90.
- Lai, S-M., Perera, S., Duncan, P. W., Bode, R. (2003). Physical and social functioning after stroke: Comparison of the Stroke Impact Scale and Short Form-36. Stroke, 34, 488-493.
- Lalonde, L., Clarke, A. E., Joseph, L., Mackenzie, T., Grover, S. A. (1999). Comparing the psychometric properties of preference-based and nonpreference-based health-related quality of life in coronary heart disease. Qual Life Res, 8, 399-409.
- Lyons, R. A., Perry, H. M., Littlepage, B. N. C. (1994). Evidence for the validity of the Short-Form 36 Questionnaire (SF-36) in an elderly population. Age Aging, 23, 182-184.
- Mathias, S. D., Bates, M. M., Pasta, D. J., Cisternas, M. G., Feeny, D., Patrick, D. L. (1997). Use of the Health Utilities Index with stroke patients and their caregivers. Stroke, 28, 1888-1894.
- Mayo, N. E., Wood-Dauphinee, S., Cote, R., Durcan, L., Carlton, J. (2002). Activity, Participation, and Quality of Life 6 Months Poststroke. Arch Phys Med Rehabil, 83, 1035-1042.
- McDowell, I., Newell, C. (1996). Measuring Health. A Guide to Rating Scales and Questionnaires. 2nd ed. NewYork: Oxford University Press.
- McHorney, C. A. (1996). Measuring and monitoring general health status in elderly persons: Practical and methodological issues in using the SF-36 health survey. The Gerontologist, 36(5), 571-583.
- McHorney, C. A., Ware, J. E. Jr., Raczek, A. E. (1993). The MOS 36-Item Short-Form Health Survey (SF-36): II Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care, 31, 247-263.
- McHorney, C. A., Ware, J. E. Jr., Lu, J. F., Sherbourne, C. D. (1994). The MOS 36-item Short-Form Health Survey (SF-36): III Tests of data quality, scaling assumptions, and reliability across diverse patient groups. Med Care, 32, 40-66.
- Mossberg, K., McFarland, C. (2001). A patient-oriented health status measure in outpatient rehabilitation. Am J Phys Med Rehabil, 80(12), 896-902.
- Muller-Nordhorn, J., Nolte, C. H., Rossnagel, K., Jungehulsing, G. J., Reich, A., Roll, S., Villringer, A., Wllich, S. N. (2004). Responsiveness to change of the SF-12 in patients with cerebrovascular disease. Biometrical Journal, 46(S1), 50.
- Myers, C., Wilks, D. (1999). Comparison of Euroqol EQ-5D and SF-36 in patients with chronic fatigue syndrome. Qual Life Res, 8, 9-16.
- Nemeth, G. (2006). Health related quality of life outcome instruments. European Spine Journal, 15(1), S44-S51.
- Nortvedt, M. W., Riise, T., Myhr, K. M., Nyland, H. I. (1999). Quality of life in multiple sclerosis: measuring the disease effects more broadly. Neurology, 53, 1098-1103.
- O’Mahony, P. G., Rodgers, H., Thomson, R. G., Dobson, R., James, O. F. W. (1998). Is the SF-36 suitable for assessing health status of older stroke patients? Age and Ageing, 27, 19-22.
- O’Neill, P., Kelly, P. (1996). Postal questionnaire study of disability in the community associated with psoriasis. Br Med J, 313, 919-921.
- Petrou, S., Hockley, C. (2005). An investigation into the empirical validity of the EQ-5D and SF-6D based on hypothetical preferences in a general population. Health Econ, 14, 1169-1189.
- Ren, X. S., Amick, B., Zhou, L., et al. (1998). Translation and Psychometric Evaluation of a Chinese Version of the SF-36 Health Survey in the U.S. J Clin Epidemiol, 51(11), 1129.
- Rothwell, P. M., McDowell, Z., Wong, C. K., Dorman, P. J. (1997). Doctors and patients don’t agree: cross sectional study of patients’ and doctors’ perceptions and assessments of disability in multiple sclerosis. British Med J, 314, 1580-1583.
- Rumsfeld, J. S., MaWhinney, S., McCarthy, M., Shroyer, A. L., VillaNueva, C. B., O’Brien, M., Moritz, T. E., Henderson, W. G., Grover, F. L., Sethi, G. K., Hammermeister, K. E. (1999). Health-related quality of life as a predictor of mortality following coronary artery bypass graft surgery. Participants of the Department of Veterans Affairs Cooperative Study Group on Processes, Structures, and Outcomes of Care in Cardiac Surgery. JAMA, 14(281), 1298-1303.
- Ruta, D. A., Garratt, A. M., Wardlaw, D., Russell, I. T. (1994). Developing a valid and reliable measure of health outcome for patients with low back pain. Spine, 19, 1887-1896.
- Segal, M. E., Schall, R. R. (1994). Determining functional/health status and its relation to disability in stroke survivors. Stroke, 25, 2391-2397.
- The Canadian Burden of Illness Study Group. (1998). Burden of illness of multiple sclerosis: part II: quality of life. Can J Neurol Sci, 25, 31-38.
- The Counselling Versus Antidepressants in Primary Care Study Group. (1999). How disabling is depression? Evidence from a primary care sample. Br J Gen Pract, 49(439), 95-98.
- Walters, S. J., Munro, J. F., Brazier, J. E. (2001). Using the SF-36 with older adults: A cross-sectional community-based survey. Age and Ageing, 30, 337-343.
- Ware, J. E., Kosinski, M., Dewey, J. E., Gandek, B. (2001). How to Score and Interpret Single-Item Health Status Measures: A Manual for Users of the SF-8 Health Survey. Lincoln RI: QualityMetric Incorporated.
- Ware, J. E., Kosinski, M., Keller, S. D. (1994). SF-36 Physical and Mental Health Summary Scales: A User’s Manual. Boston, MA: The Health Institute.
- Ware, J. E. Jr., Sherbourne, C. D. (1992) The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care, 30, 473-483.
- Ware, J. Jr., Kosinski, M., Keller, S. D. (1996). A 12-item short-form health survey: Construction of scales and preliminary tests of reliability and validity. Med Care, 34(3), 220-233.
- Ware, J. E., Snow, K. K., Kosinski, M., Gandek, B. (1993). SF-36® Health Survey Manual and Interpretation Guide. Boston, MA: New England Medical Center, The Health Institute.
- Ware, J. E., Kosinski, M., Turner-Bowker, D. M., Gandek, B (2002) SF-12v2: How to score version 2 of the SF-12 Health Survey. Lincoln RI: QualityMetric Incorporated.
- Weinberger, M., Oddone, E. Z., Samsa, G. P., Landsman, P. B. (1996). Are health-related quality-of-life measures affected by the mode of administration? J Clin Epidemiol, 49(2), 135-140.
- Wilkinson, P. R., Wolfe, C. D., Warburton, F. G., Rudd, A. G., Howard, R. S., Ross-Russell, R. W., Beech, R. (1997). Longer term quality of life and outcome in stroke patients: Is the Barthel Index alone an adequate measure of outcome? Quality in Health Care, 6, 125-130.
- Williams, L. S. (1998). Health-Related Quality of Life Outcomes in Stroke. Neuroepidemiology , 17, 116-120.
See the measure
How to obtain the SF-36
Permission to use the SF-36 should be obtained from the Medical Outcomes Trust who oversees the standardized administration of the SF-36 and will provide updates on administration and scoring (McDowell & Newell 1996). Various computer applications are available to assist in scoring the SF-36 including free Excel templates that can be downloaded from the Internet.
All versions of the SF-36 can be viewed by visiting the website www.qualitymetric.com
Samples of the various versions of the SF-36 are also available on this website Please click here to see a copy of the SF-36