Medical Outcomes Study Short Form 36 (SF-36)

Evidence Reviewed as of before: 19-08-2008
Author(s)*: Lisa Zeltzer, MSc OT
Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc; Maxim Ben Yakov, BSc PT

Purpose

The Medical Outcomes Study 36-item Short-Form Health Survey is a widely used, generic, patient-report measure created to assess health-related quality of life (HRQOL) in the general population. It was developed as part of the Medical Outcomes Study (a two-year study of patients with chronic conditions) (Ware & Sherbourne, 1992). Today, the SF-36 is the most commonly used generic instrument for measuring quality of life (de Haan, 2002). The SF-36 can be used, but is not limited to, persons with stroke.

In-Depth Review

Purpose of the measure

The Medical Outcomes Study 36-item Short-Form Health Survey is a widely used, generic, patient-report measure created to assess health-related quality of life (HRQOL) in the general population. It was developed as part of the Medical Outcomes Study (a two-year study of patients with chronic conditions) (Ware & Sherbourne, 1992). Today, the SF-36 is the most commonly used generic instrument for measuring quality of life (de Haan, 2002). The SF-36 can be used, but is not limited to, persons with stroke.

Available versions

The SF-36 was published in 1992 by Ware and Sherbourne, and further developed and validated in 1993 and 1994 respectively (Ware & Sherbourne, 1992; McHorney, Ware & Raczek, 1993; McHorney, Ware, Lu & Sherbourne, 1994). In 1996, Version 2.0 of the SF-36 (SF-36v2) was introduced, to correct for deficiencies identified in the original version. Changes include a few wording alterations, for example, “downhearted and blue” in a question on mental health symptoms is now “downhearted and depressed”. SF-36v2 is now considered “the international version” of the SF-36 (Andresen & Meyers, 2000). The original SF-36 questions had variable numbers and formats for response categories, and these have been increased and/or standardized among scales and questions. Role Functioning items now have five levels of responses rather than two. This may increase the responsiveness of the scales. Early reports of tests of this new version have been positive (Jenkinson, Stewart-Brown, Petersen & Paice, 1999). Versions 1.0 and 2.0 of the SF-36 are available with two recall periods: the standard 4-week recall, and the acute 1-week recall period.

Features of the measure

Items:

Items of the SF-36 are divided into eight different domains:

Physical component:

  • Physical functioning (10 items)
  • Role limitations due to physical problems (4 items)
  • Bodily pain (2 items)
  • General health perceptions (5 items)

Mental component

  • Social functioning (2 items)
  • General mental health (5 items)
  • Role limitations due to emotional problems (3 items)
  • Vitality (4 items)

Other

  • Health transition (1 question): The respondent is asked to rate their current health status compared to their health status one year ago. This question remain separate from the 8 subscales and is not scored.

There are 11 questions in the SF-36, with 36 items in total. With the exception of the general change in health status questions, subjects are asked to respond with reference to the past 4 weeks. An acute version of the SF-36 refers to problems in the past week only (McDowell & Newell, 1996).

Scoring:

The SF-36 does not lend itself to the generation of an overall summary score. This is because information within the individual responses is lost in the total scale score (since the total score can be achieved in a variety of ways from individual item responses) (Dorman et al., 1999). The recommended scoring system for the SF-36 is a weighted Likert system for each item. Items within subscales are totaled to provide a summed score for each subscale or dimension. Each of the 8 summed scores is linearly transformed onto a scale from 0 (negative health) to 100 (positive health) to provide a score for each subscale. A physical component score (PCS) and mental component score (MCS) can be derived from the scale items. However, these summary scores should be interpreted with caution. Hobart et al. (2002) examined the use of this two-dimensional model and found that these two scales accounted for only 60% of the variance in SF-36 scores. This finding suggests that there is a significant loss of information when this two-dimensional model is used.

Subscales:

The SF-36 has 8 subscales

  • Physical Functioning,
  • Role Limitations due to Physical Problems,
  • General Health Perceptions,
  • Vitality,
  • Social Functioning,
  • Role Limitations due to Emotional Problems,
  • General Mental Health,
  • Health Transition.

Equipment:

Only the test and a pencil are required. Computer administered and telephone voice recognition interactive systems of administration of the SF-36 are currently being evaluated (SF-36 Health Survey Update: John E. Ware, Jr.).

Training:

No training is required for administration of the SF-36. The SF-36 is suitable for self-administration, computerized administration, or administration by a trained interviewer in person or by telephone, to persons age 14 and older (Ware & Sherbourne, 1992).

Time:

The SF-36 is considered simple to administer and takes an average of 10 minutes to complete (Andreson & Meyers, 2000). The SF-36 has been studied for use by a proxy, however, administration by proxy is not recommended for patients with stroke, as agreement has been found to be poor in this patient population (Segal & Schall, 1994; Dorman, Slattery, Farrell, & Dennis, 1998). Instead, a stroke-specific quality of life measure such as the Stroke Impact Scale, which has been evaluated successfully for use by proxy respondents, may be more a more appropriate measure to be administered by proxy. Another reliable measure of health status for stroke patients by proxy is the Health Utilities Index (HUI) which has been reported to have adequate to excellent agreement in between patients with stroke and their proxies (Mathias, Bates, Pasta, Cisternas, Feeny & Patrick, 1997).

The SF-36 can also be completed as a mail survey. As a self-completed, mailed questionnaire, it has been shown to have reasonably high response rates (83% – Brazier et al., 1992, O’Mahoney, Rodgers, Thomson, Dobson, & James, 1998; 75% – 83% Dorman et al., 1998; 85% – Dorman et al., 1999; 82% overall and 69% for those over age 85 – Walters et al., 2001). However, data is typically more complete when interviewer administration is used. However, low completion rates may not be limited to self-completion or postal administration. Andresen et al. (1999) administered the SF-36 to nursing home residents by face-to-face interview and reported that only 1 in 5 residents were able to complete it. It is possible that data completeness is indicative of respondent acceptance and understanding of the survey as relevant to them (O’Mahoney et al., 1998; Andresen et al., 1999). Hayes et al. (1995) identified that the most common items missing on the self-completed questionnaire referred to work or to vigorous activity. Older respondents recognized these questions as relevant to much younger people and not pertinent to their own situation. The authors suggested modifications to some of the questions, which may increase acceptability to older populations.

Alternative forms of the SF-36

SF-12 (Ware, Kosinski, & Keller, 1996)

The SF-12 was developed as an abbreviated version of the SF-36 for use in large surveys of general and specific populations as well as large longitudinal studies of health outcomes. It can be self-administered, or administered via interview, telephone, or computer. The SF-12 takes 5 minutes or less to complete (Nemeth, 2006). The SF-12v2 was later developed to correspond to the SF-36v2 and has demonstrated the same improvements as observed with the SF-36v2 (Ware, Kosinski, Turner-Bowker & Gandek, 2002). Versions 1.0 and 2.0 of the SF-12 are available with two recall periods: the standard 4-week recall, and the acute 1-week recall period.

SF-8 (QualityMetric, Incorporated)

The SF-8, a new generic eight-item assessment, generates a health profile consisting of eight scales and two summary measures describing HRQOL. The SF-8 uses one question to measure each of the eight SF-36 domains. The development, validation and norming of the new SF-8, including standard (4-week recall), acute (1-week recall), and 24-hour recall versions is documented in the SF-8 manual, “How to Score and Interpret Single-Item Health Status Measures: A Manual for Users of the SF-8 Health Survey” (Ware, Kosinski, Dewey & Gandek, 2001). The SF-8 Health Survey can be self-administered, computer-administered, or given by a trained interviewer in person or by telephone to persons aged 14 and older. It takes approximately 1-2 minutes to complete and it has been translated and validated for use in more than 30 countries (for a list of these countries, click on this list) (accessed July 12, 2006).

SF-6D (Brazier, Usherwood, Harper, & Thomas, 1998; Brazier, Roberts, & Deverill, 2002)

The SF-6D is a preference-based scoring system that uses six subscales from the SF-36, to allow for calculations of utilities from SF-36 and SF-36v2 responses. The eight dimensions from SF-36 were reduced to six by omitting General Health Perceptions and combining Role Limitations-Physical and Role Limitatons-Emotional. Good reliability and validity have been reported for the SF-6D (Petrou & Hockley, 2005; Brazier, Roberts, Tsuchiya & Busschbach, 2004).

For a fee, all versions of the SF Health Survey can be scored online via Quality Metric’s website (accessed July 12, 2006).

Client suitability

Can be used with:

  • Individuals with stroke.

The SF-36 is the most widely used measure to assess HRQOL in patients with stroke, however, its suitability in this patient population has been contentious:

  • Hobart, Williams, Moran, and Thompson (2002) reported that of their sample of 177 post-stroke patients, five of the eight SF-36 subscales were found to have limited validity as outcome measures, and that the reporting of physical and mental summary scores were not supported. The authors questioned the use of the SF-36 in patients with stroke.
  • de Haan (2002) reported that when the results of the relatively small study of Hobart et al. (2002) were taken in conjunction with the findings of previous research, there was insufficient evidence to question the reliability and validity of the SF-36 subscales in stroke.

Should not be used in:

  • Patients who cannot understand written or spoken language. Make sure the patient is fluent in the language used in the survey.
  • More severely affected stroke survivors who need a proxy to complete (Dorman et al., 1998). Instead, a stroke-specific quality of life measure such as the Stroke Impact Scale, which has been evaluated successfully for use by proxy respondents, may be more a more appropriate measure to be administered by proxy. Another more reliable measure of health status for stroke patients by proxy is the Health Utilities Index (HUI) which has been reported to have moderate to high agreement in interrater reliability between stroke patients and proxies (Mathias et al., 1997).
  • Patients with aphasia. For patients with aphasia, a stroke-specific quality of life measure developed specifically for patients with aphasia, such as the Stroke and Aphasia Quality Of Life Scale (SAQOL-39), should be used (Hilari, Byng, Lamping, & Smith, 2003).
  • The SF-36 should not be used to document individual patient change. Dorman, Slattery, Farrell, Dennis, and Sandercock (1998) found that although the SF-36 can function effectively as a discriminatory measure for assessing health-related quality-of-life outcomes in groups of patients after stroke, the SF-36 may not be adequate for serial assessments of individual patients, unless large differences over time are expected. Thus, the SF-36 should be used for large group comparisons only.

In what languages is the measure available?

The SF-36 is available in a number of languages. In 1991, the International Quality of Life Assessment launched a project aimed at translating, validating and norming the SF-36 health survey. The project, which is based at the Health Assessment Lab in Boston, has sponsored investigators from 14 countries: Australia, Belgium, Canada, Denmark, France, Germany, Italy, Japan, The Netherlands, Norway, Spain, Sweden, the United Kingdom (English version), and the United States (English and Spanish versions). In addition, the SF-36 has been translated for use in more than 40 other countries, including: Argentina, Armenia, Austria, Bangladesh, Brazil, Bulgaria, Cambodia, Chile, China, Colombia, Costa Rica, Croatia, Czech Republic, Finland, Greece, Guatemala, Honduras, Hong Kong, Hungary, Iceland, Israel, Korea, Latvia, Lithuania, Mexico, New Zealand, Peru, Poland, Portugal, Romania, Russia, Singapore, Slovak Republic, South Africa, Switzerland, Taiwan, Tanzania, Turkey, the United Kingdom (Welsh), the United States (Chinese, Japanese, Vietnamese), Uruguay, Venezuela, and Yugoslavia. There are more than 500 publications that use translations or English-language adaptations of the SF-36. For information about the availability of SF-36 translations, visit https://www.qualitymetric.com/health-surveys-old/the-sf-36v2-health-survey/.

Summary

What does the tool measure? Health related quality of life
What types of clients can the tool be used for? The SF-36 is a generic measure that can be used, but is not limited to, persons with stroke.
Is this a screening or assessment tool? Assessment
Time to administer The SF-36 is considered simple to administer and takes an average of 10 minutes to complete.
Versions SF-12; SF-8, SF-6D
Other Languages The SF-36 is available in a number of languages. There are more than 500 publications that use translations or English-language adaptations of the SF-36. For information about the availability of SF-36 translations, visit www.sf-36.org
Measurement Properties
Reliability Internal consistency:
Out of 10 studies examining the internal consistency of the SF-36, five reported excellent internal consistency (except for the subscales of Social Functioning in three studies and General Health in one study, which were considered adequate). Two studies reported adequate to excellent internal consistency. Three studies reported poor to excellent internal consistency.

Test-retest:
Out of the five studies examining test-retest reliability of the SF-36, three reported adequate to excellent test-retest reliability. One study reported adequate test-retest reliability. One reported poor to excellent test-retest reliability.

Inter-rater:
No studies have examined the inter-rater reliability of the SF-36.

Validity Criterion:
Predictive:
Subscales of the SF-36 have been found to be predictive of death, hospitalizations, physician visits, and the burden of depression among depressed elderly persons.

Construct:
Convergent:
Adequate correlations between the SF-36 Physical Health subscale and the Activities of Daily Living Index; the SF-36 Social Functioning subscale and social isolation on the Nottingham Health Profile; the General Health subscale and the EuroQol overall HRQOL rating; the SF-36 Bodily Pain subscale and all EuroQol domains; and the Role Functioning-Emotional subscale with the EuroQol psychological domain. Excellent correlation between the Physical Health scores from the SF-36 and the Geriatric Depression Scale; the Vitality subscale on the SF-36 and energy subscale on the Nottingham Health Profile; and the Bodily Pain subscale on the SF-36 with the EuroQol pain domain.

Known groups:
SF-36 scores discriminated between patients diagnosed with one or more chronic physical problems and healthy age-matched controls; individuals older than 75 and younger than 75; groups based on setting (general practice versus hospital outpatients); migraine sufferers and controls; groups based on recent visits to their family doctor, hospital inpatient stays and longstanding illness; patients with stroke and their age and gender matched controls.

Floor/Ceiling Effects Of the 8 studies examined, 6 reported that the SF-36 had significant floor and ceiling effects, 1 reported significant ceiling effects only, and 1 reported significant floor effects only.
Does the tool detect change in patients?

Out of 3 studies examined, 1 reported that the SF-36 had a large ability to detect change, 1 reported moderate to large ability to detect change, (except for the Social Functioning and Mental Health dimensions which both had small effect sizes); 1 reported small (Role Limitations-Emotional, Mental component summary score) to large (Bodily Pain, Physical component summary score) ability to detect change. To our knowledge, no studies have examined the ability of the SF-36 to detect change in patients with stroke.

Acceptability The SF-36 cannot be used with patients who cannot understand written or spoken language, severely affected patients who need a proxy to complete, or patients with aphasia. To our knowledge, no studies have examined the ability of the SF-36 to detect change in patients with stroke.
Feasibility The SF-36 is simple to administer and requires no training or special equipment. It is suitable for self-administration, computerized administration, or administration by a trained interviewer in person or by telephone, to persons age 14 and older.
How to obtain the tool? All versions of the SF-36 can be viewed by visiting the website: www.qualitymetric.com

Psychometric Properties

Overview

Extensive psychometric testing has been conducted on the SF-36. However, little research has been conducted specifically in a post-stroke population. For the purposes of this review, we conducted a literature search to identify all relevant publications on the psychometric properties of the SF-36. We then selected to review articles from high impact journals, and from a variety of authors. The creators of the SF-36 have performed many of the psychometric studies that exist on the survey, however, we preferentially reviewed studies carried out by other authors who were not implicated in the development of the SF-36.

Floor and Ceiling Effects

Lai, Perera, Duncan, and Bode (2003) administered the Stroke Impact Scale and the SF-36 to 278 stroke subjects approximately 90 days after stroke. In comparison to the Stroke Impact Scale-16 (characterizes physical functioning), the SF-36 Physical Functioning subscale had major floor effects (floor effects of 37% and 100% were observed for patients with a modified Rankin scale grade 4 or 5, respectively). Further, in contrast to the Stroke Impact Scale-Participation (characterizes social functioning), the SF-36 Social Functioning subscale had major ceiling effects (ceiling effects up to 60% for modified Rankin scale grade 0).

Anderson et al. (1996) examined the SF-36 in a cohort of 90 long-term (1-year) stroke survivors. The validity of the SF-36 was assessed by comparing patients’ scores on the SF-36 with those obtained for the Barthel Index, the 28-item General Health Questionnaire, and the Adelaide Activities Profile. Large ceiling effects were reported for the SF-36 Role Limitations-Physical (53%), Bodily Pain (43%), Social Functioning (67%) and Role Limitations-Emotional (72%) subscales. No floor effects exceeding 7% were reported for the SF-36, and scores for the SF-36 Physical Functioning subscale were more uniformly distributed than Barthel Index scores suggesting the SF-36 has lower floor and ceiling effects than the Barthel Index.

Brazier et al. (1996) tested the psychometric properties of the SF-36 and the EuroQol on an elderly female population (n=380) aged 75 and older, and compared these scales to the Office of Population Census and Surveys Disability Survey. Patients were administered the scales at baseline and again six months later. Major floor effects (in excess of 25%) were reported for the Role Limitations-Physical and Role Limitations-Emotional subscales.

Hobart et al. (2002) examined SF-36 data from 177 people after stroke. Notable floor effects were observed for the Role Limitations-Physical (59.1%), Role Limitations-Emotional (63.1%), Social Functioning (29.9%), and Bodily Pain (25.6%) subscales. Notable ceiling effects were also observed for the Role Limitations-Emotional (63.1%), Social Functioning (29.9%) and Bodily Pain (25.6%) subscales.

O’Mahoney et al. (1998) examined the suitability of the SF-36 for assessing quality of life in older patients with stroke. Floor effects were high for the Role Limitations-Physical (54%) and Role Limitations-Emotional (35%) subscales and for the Social Functioning (17%) and Physical Functioning (18%) subscales. Ceiling effects were also substantial for the Role Limitations-Physical (16%), Role Limitations-Emotional (51%), Social Functioning (18%) and Bodily Pain (25%) subscales.

Weinberger, Oddone, Samsa and Landsm (1996) administered the SF-36 three times over a 4-week period to 172 veterans receiving care in a General Medicine Clinic. Telephone, face-to-face, and self-administration modes of administering the SF-36 were compared. For face-to-face administration of the SF-36, notable floor effects were observed for the Role Limitations-Physical (43.8%) and Role Limitations-Emotional (30.3%) subscales. Notable ceiling effects were observed for the Social Functioning (31.5%), Role Limitations-Physical (14.6%), and Role Limitations-Emotional (47.2%) subscales. For telephone administration, significant floor effects were observed for the Role Limitations-Physical (53.2%) and Role Limitations-Emotional (34.0%) subscales. Significant ceiling effects were observed for the Role Limitations-Emotional (36.2%) subscale only. Self-administration of the SF-36 resulted in significant floor effects for the Role Limitations-Physical (47.1%), and Role Limitations-Emotional (25.0%) subscales. Further, notable ceiling effects were observed for the Social Functioning (27.8%), Role Limitations-Physical (14.7%), and Role Limitations-Emotional (52.8%) subscales.

Walters, Munro and Brazier (2001) administered the SF-36 to a community-dwelling population over the age of 65. Substantial floor (30.9-61%) and ceiling effects across all age groupings (65-69, 70-74, 75-79, 80-84, and 85+) were observed for the Role Functioning-Physical (floor effects: 30.9%-60% and ceiling effects: 11.7%-38.6%) and Role Functioning-Emotional (floor effects: 25.6%-50.4% and ceiling effects: 32.2% – 53.2%) subscales. Substantial ceiling effects were also noted for the Social Functioning and Bodily Pain subscales (15%-46.7% and 14.1%-21.1%, respectively).

Andresen, Gwendell, Gravitt, Aydelotte, and Podgorski (1999) administered the SF-36 to 97 nursing home residents and reported substantial floor effects of 26.8% and 29.5% for the Physical Functioning and Role Limitations-Physical subscales, respectively. Substantial ceiling effects of 36.1%, 49.5% and 21.6% were reported for the Social Functioning, Role Limitations-Emotional, and Bodily Pain subscales, respectively.

Reliability studies have demonstrated excellent internal consistency, with Cronbach’s alpha generally exceeding 0.80 for all scales except Social Functioning. Social Functioning may sometimes be lower due to the fact that there are fewer items (only 2 items) in the subscale (Ware, Snow, Kosinski & Gandek, 1993; Brazier et al., 1992; Lyons, Perry, & Littlepage, 1994; McHorney, Ware, Lu, & Sherbourne, 1994; Ruta, Garratt, Wardlaw, & Russell, 1994). Test-retest reliability evaluations have also suggested that the SF-36 scores can generally be reproduced (Brazier et al. 1992; Beaton, Hogg-Johnson, & Bombardier, 1997).

Brazier et al. (1992) found considerable evidence for the reliability of the SF-36. For the internal consistency of the SF-36, Cronbach’s alpha was found to be excellent, exceeding 0.85, and reliability coefficients exceeded 0.75 for all dimensions of the scale with the exception of the Social Functioning subscale (alpha = 0.73). To identify the test-retest reliability, Brazier et al. (1992) calculated correlation coefficients and found coefficients ranging from adequate (0.60 for Social Functioning) to excellent (0.81 for Physical Functioning).

Jenkinson, Coulter and Wright (1993) mailed the SF-36 in a large community sample to explore the questionnaire’s internal consistency and validity. Cronbach’s alpha on all subscales of the SF-36 were excellent, exceeding 0.80, with the exception being the Social Functioning subscale, which was of adequate internal consistency (alpha = 0.76). In the case of the Social Functioning dimension, the results were considered acceptable due to the small number of items (2 items using a 5-point scale).

Jenkinson, Wright and Coulter (1994) mailed the SF-36 to 13,042 randomly selected subjects between the ages of 16-64 years. The internal consistency of the SF-36 was found range from adequate to excellent (alpha ranged from 0.76 for Social Functioning to 0.90 for Physical Functioning). The internal consistency was then calculated by breaking the data down into five subgroups of overall self-rated general health (poor, fair, good, very good, excellent). All alpha values were adequate, exceeding 0.70, except for the Social Functioning subscale, which was poor (exceeded 0.50). Due to the small number of items in this domain this result is considered acceptable.

Brazier et al. (1996) calculated the reliability of the SF-36 in 380 women over the age of 75. Spearman’s rank correlation coefficients between scores for those who said their health had not changed between initial assessment and first follow-up by perceived health change were calculated and coefficients ranged from poor (r = 0.28 for Social Functioning) to adequate (0.70 for Vitality) over a retest period of 6 months. These results suggest that the SF-36 has only adequate test-retest reliability in the elderly. Brazier et al (1996) also examined the internal consistency of the SF-36 and reported excellent internal consistency (alpha ≥ 0.80) for all subscales but poor internal consistency for the subscales Social Functioning (0.56) and General Health (0.66).

Andresen et al. (1999) administered the SF-36 to 97 nursing home residents and then re- administered the SF-36 after 1 week. Test-retest intraclass correlation coefficients (ICC) ranged from adequate to excellent (from 0.55 to 0.82). Further, the ICCs for both the physical summary and mental summary scores were excellent (ICC = 0.82 and 0.79 respectively).

Essink-Bot, Krabbe, Bonsel, and Aaronson (1997) administered the SF-36, The Nottingham Health Profile, the COOP/WONCA charts (The Dartmouth Primary Care Cooperative Information Project/World Organization of National Colleges, Academies, and Academic Associations of General Practices/Family Physicians), and the EuroQol to migraine sufferers. The scales of the SF-36 yielded internal consistency estimates ranging from adequate (alpha = 0.76 for General Health) to excellent (0.91 for Physical Functioning). The mean alpha coefficient was considered excellent (alpha = 0.84). The internal consistency of the SF-36 subscales exceeded that of the Nottingham Health Profile scales.

Walters, Munro and Brazier (2001) reported excellent internal consistency (Cronbach’s alpha ≥ 0.80) for all subscales of the SF-36 except for the Social Functioning subscale (alpha = 0.79) when the survey was administered by mail to a sample of 9,897 subjects aged 65-104 years.

McHorney, Ware and Sherbourne (1994) evaluated data from 3,445 patients from the Medical Outcomes Study (MOS) and replicated data across 24 subgroups differing in socio-demographic characteristics, diagnosis, and disease severity. Across patient groups, all scales passed tests for item- internal consistency (97% passed). Reliability coefficients ranged from a low of 0.65 to a high of 0.94 across scales (median = 0.85) and varied somewhat across patient subgroups.

Weinberger et al. (1996) tested whether the SF-36 is influenced by method of administration (face-to-face interview, self administration and telephone interview) in 172 veterans receiving care at a General Medical Clinic. All patients were asked to complete the SF-36 three times over a 4-week period. Cronbach’s alpha coefficients indicated that items in all eight SF-36 domains were highly internally consistent, regardless of the mode of administration, however they showed large variation over short intervals. Specifically, of 24 computed Cronbach’s alphas (i.e., eight scales times three modes of administration), only one was below 0.70 (Social Function via telephone administration), whereas 17 exceeded 0.80. Cronbach’s alphas did not differ significantly by method of administration. Test-retest correlations ranged from r = 0.55 (Physical Role Function by telephone administration) to r = 0.94 (Physical Function by self-administration).

Hagen, Bugge, and Alexander (2003) examined the reliability of the SF-36 in patients in the early post-stroke period. The SF-36 was administered at 1, 3 and 6 months after stroke onset. The internal consistency of the eight subscales at all three time-points was good except for 1-month Vitality (alpha = 0.68) and 3-month General Health (alpha = 0.67), which were considered poor.
Dorman et al. (1998) assessed the test-retest reliability and the internal consistency of the SF-36 in 2,253 patients with stroke. ICC’s ranged from poor (0.28 for Mental Health) to excellent (0.80 for Social Functioning). Internal consistency of the SF-36 was excellent (ranging from 0.81 for Social Functioning to 0.96 for Emotional Role Functioning). Dorman et al. concluded that although the SF-36 can function effectively as a discriminatory measure for assessing health-related quality-of-life outcomes in groups of patients after stroke, the level of test re-test reliability reported in stroke populations indicates that the SF-36 may not be adequate for serial assessments of individual patients, unless large differences over time are expected. Thus, the SF-36 should be used for large group comparisons only.

Furthermore, test-retest reliability was negatively affected by the use of proxy respondents in this study. While the use of a proxy may be the only means by which to include data from more severely affected stroke survivors, the subjective nature of the SF-36 may make proxy use difficult or even inadvisable.

Hobart, Williams, Moran and Thompson (2002) argue that the SF-36 has limited reliability as the General Health Perceptions and Social Functioning scales generate low reliability scores and have limited convergent and discriminant validity. However, de Haan (2002) argues that Hobart et al.’s conclusions can be challenged. The reliability of only one scale (General Health Perceptions) was marginally less (Cronbach’s alpha = 0.68) than the authors’ predefined criteria of alpha = 0.70. Although it is often recommended that coefficient values should be above 0.80, de Haan points out that coefficients above 0.70 are generally regarded as acceptable for scales when assessing outcome on a group level.

Anderson, Laubscheret and Burns (1996) administered the Australian version of the SF-36 to 90 individuals at one-year post-stroke. The authors concluded that the SF-36 has satisfactory internal consistency, however alphas ranged from 0.60 for the Vitality scale (indicating poor internal consistency) to 0.90 for Physical Functioning, Bodily Pain and Role Limitations-Emotional (excellent internal consistency). The Cronbach’s alphas of four subscales of the SF-36 fell below 0.80 (General Health, Vitality, Social Functioning and Mental Health).

Validity

Criterion:

Predictive:
McHorney (1996) examined data from the Medical Outcomes Study. The General Health Perceptions subscale was found to be most predictive of death (death rate of patients in lowest quartile for SF-36 General Health scale was three times greater than for patients with SF-36 scores in the highest quartile, followed by scores in Physical Functioning). Baseline Physical Functioning, Role Limitations-physical, and Pain subscales were most predictive of hospitalizations. Moreover, Pain, General Health and Vitality subscales were most predictive of physician visits.
Beusterien, Steinwald, & Ware (1996) found that the SF-36 Mental Health subscale and mental component summary measure were strongly associated with severity of depression in cross-sectional analyses. These results suggest that the SF-36 is useful for estimating the burden of depression among depressed elderly persons.

Rumsfeld et al. (1999) tested whether the physical and mental component summary scores from the preoperative SF-36 health status survey predicted mortality in 3,956 patients following coronary artery bypass graft surgery (CABG). The physical component summary of the preoperative SF-36 was found to be a statistically significant risk factor for 6-month mortality following CABG surgery. In multivariate analysis, a 10-point lower SF-36 physical component summary score had an odds ratio (OR) of 1.39 for predicting mortality. The SF-36 mental component summary score was not associated with 6-month mortality in multivariate analyses (OR = 1.09). Thus, preoperative patient self-report of the physical component of the SF-36 health status may be helpful for risk stratification and clinical decision making for patients undergoing CABG surgery.

Construct:

Walters et al. (2001) reported significant relationships in expected directions to support construct validity among older adults. Scores in all scales were reported to decrease as age increased. Women reported worse health than men on all scales even after adjusting for age. Respondents who had recently visited their physician reported poorer health on all scales and people living alone had lower scores except on general health.

Ware, Kosinski, and Keller (1994) examined the construct validity of the 8 subscales of the SF-36. Physical Functioning was shown to be the best all around measure of physical health (r = 0.85), and Mental Health was the most valid measure of mental health (r = 0.87). Interestingly, Mental Health was one of the poorest measures of the physical component (r = 0.17) and Physical Functioning was the poorest measure of the mental component (r = 0.12). The Vitality (r = 0.47 for physical health and r = 0.65 mental health component) and General Health (r = 0.69 for the physical health component and r = 0.37 for the mental health component) subscales had excellent or adequate validity for both components.

Construct (in patients with stroke):

Wilkinson et al. (1997) interviewed 106 people less than 75 years old and their caregivers following a first-ever stroke. Rank correlation coefficients of the Barthel Index with the SF-36 subscales in first-ever stroke patients ranged from poor (r = 0.22 for Role Limitation-Emotional subscale) to excellent (0.81 for Physical Functioning subscale).

Convergent/Discriminant:
Convergent validity of the SF-36 is generally strongly supported in comparison to similar domains of condition-specific measures (Fielder, Denholm, Lyons, & Fielder, 1996; Nortvedt, Riise, Myhr, & Nyland, 1999; The Counseling Versus Antidepressants in Primary Care Study Group, 1999; Benninger, Ahuja, Gardner, and Grywalski, 1998; Buchwald et al., 1996; Anderson, Laubscher, & Burns, 1996) and other generic HRQOL measures (Andresen et al., 1999; Andresen, Rothenberg, & Kaplan, 1998; Rothwell, McDowell, Wong, & Dorman, 1997). Discriminant validity is usually rated highly for the SF-36 (e.g. Andresen et al., 1999; The Canadian Burden of Illness Study Group, 1998; Buchwald, Pearlman, Umali, Schmaling, & Katon, 1996, Komaroff et al., 1996, O’Neill & Kelly, 1996) although some studies disagree (e.g. Colantonio, Dawson, McLellan, 1998; Lalonde, Clarke, Joseph, Mackenzie, & Grover, 1999; Myers & Wilks, 1999).

Andresen et al (1999) administered the SF-36, the Geriatric Depression Scale and the Mini-Mental State Examination to 97 nursing home residents. Activities of daily living and medication intake data were recorded. Convergent validity between the SF-36 Physical Health subscale and the Activities of Daily Living Index was adequate (r ranged from -0.37 to -0.43). These correlations are negative because a high score on the SF-36 indicates positive health status, whereas a high score on the Activities of Daily Living index indicates dependence. Physical health scores from the SF-36 correlated more strongly with Geriatric Depression Scale scores than Activities of Daily Living Index scores (-0.63 vs. 0.01). However, the Role Limitations-Physical subscale correlated more strongly with Geriatric Depression Scale scores than Activities of Daily Living scores. Social Functioning, Role Limitations-Emotional, Vitality and Mental Health subscales all correlated more strongly with Geriatric Depression Scale scores than Activities of Daily Living scores.

Brazier et al. (1992) reported correlations of -0.41 (Social Functioning vs. social isolation) to -0.68 (Vitality vs. energy) between similar scales on the SF-36 and Nottingham Health Profile. Correlations between dimensions less clearly related ranged form -0.18 (Physical Functioning vs. emotional reaction) to -0.53 (Social Functioning vs. emotional reactions). These correlations are negative because a high score on the SF-36 indicates positive health status, whereas a high score on the Nottingham Health Profile indicates poorer perceived health status.

Dorman et al (1999) reported that the SF-36 Physical Functioning subscale correlated most closely with mobility, self-care and activities domains of EuroQol (r = 0.57, 0.65 and 0.63, respectively) and less strongly with the EuroQol psychological domain (r = 0.34). SF-36 Bodily Pain subscale correlated with the EuroQol pain domain (r = 0.66) and adequately correlated with all EuroQol domains. Role Functioning-Emotional correlated most closely with the EuroQol psychological domain (r = 0.43), and correlated least with the EuroQol self care domain (r = 0.24). The SF-36 Mental Health subscale was not closely related to the psychological domain (r = 0.21) or to the physical EuroQol domains (r = 0.06 to 0.10). The SF-36 General Health subscale correlated adequately with EuroQol overall HRQOL rating (r = 0.66).

Known Groups:
Patients diagnosed with ≥ 1 chronic physical problem had lower scores on all dimensions of the SF-36 except Mental Health, in comparison to healthy age-matched controls. The SF-36 scores were distributed as expected for sex, age, social class and use of health services (Brazier et al., 1992).

The SF-36 was found to discriminate between age groups (>75 years versus 75+) on Physical Functioning, Vitality and Change in Health subscales and between groups based on setting (general practice versus hospital outpatients) on the Physical Functioning and Role Functioning-Physical subscales (Hayes et al. 1995).

Essink-Bot et al. (1997) reported that the SF-36 was able to discriminate between migraine sufferers and controls on all subscales (ROC/AUC = 0.54 – 0.67) although this relationship was poor. The SF-36 was also able to discriminate between groups of migraine sufferers based on absence from work (0 vs. ≥ 0.5 days, ROC/AUC ranged from poor, 0.61 to adequate, 0.79).

Brazier et al. (1996) reported that SF-36 scores distinguished groups based on recent visits to their family doctor, hospital inpatient stays and longstanding illness.

Known Groups (in patients with stroke):
Anderson et al. (1996) administered the Australian version of the SF-36 to 90 stroke survivors (1-year post-stroke). Validity was assessed by comparing patients’ scores on the SF-36 with those obtained for the Barthel Index, the 28-item General Health Questionnaire, and the Adelaide Activities Profile, an instrument developed from the Frenchay Activities Index. Construct validity was demonstrated by significant differences across all eight SF-36 scales for patients with identified health problems. For patients dependent in activities of daily living, the difference in mean scores was greatest for the physical functioning and general health scales, whereas for patients with emotional health problems, the strongest associations were with the Social Functioning, Role Limitations-Emotional, and Mental Health subscales.

Mayo et al. (2002) interviewed persons with first-ever stroke and a population-based sample of community-dwelling individuals without stroke by telephone at 6-month intervals for 2 years of follow-up. SF-36 scores successfully discriminated those with stroke from their age and gender-matched controls.

Cross-diagnostic:

Dallmeijer et al. (2007) examined the unidimensionality and differential item functioning of the Physical Functioning subscale of the SF-36 using Rasch analysis in patients with stroke, multiple sclerosis, and amyotrophic lateral sclerosis (ALS). All items of the Physical Functioning subscale, except one for the ALS group (bathing/dressing item), formed a unidimensional scale, supporting the use of a sum score as a measure of Physical Functioning within these diagnostic groups. The pooled analysis showed inadequate fit to the Rasch model for the ‘walking several hundred meters’ item of the other 9 items, 5 showed differential item functioning for stroke vs. multiple sclerosis and ALS, while no differential item functioning was found between multiple sclerosis and ALS. Thus, when comparing the data of patients with stroke, with that of patients with multiple sclerosis and/or patients with ALS, adjustments are necessary for differential item functioning.

Responsiveness

Harwood and Ebrahim (2000) examined the sensitivity to change of the SF-36 in 81 patients before and after hip replacement. Eighty-nine percent of patients reported improvements three months after surgery. The largest changes were seen on the SF-36 Pain scale (large effect sizes of 1.2 at three months and 1.5 at 6-12 months), Physical Function (large effect sizes of 1.1 at 3 months and 1.3 at 6-12 months) and Role Limitation-Physical (large effect sizes of 0.8 at 3 months and 1.2 at 6-12 months) scales, suggesting that some of the SF-36 dimensions are very sensitive to change.

Brazier, Walters, Nicholl and Kohler (1996) tested the sensitivity of the SF-36, EuroQol and the Office of Population Census and Surveys Disability Survey in an elderly female population. These measures were administered by interview in a hospital clinic at baseline. A random subsample of respondents was retested six months later. Sensitivity of the instruments was quantified by estimating effect sizes for hypothesized changes in health status. There was some evidence of greater sensitivity to lower levels of morbidity in the SF-36. Hypothesizing a change from having a long standing illness to no long-standing illness was associated with moderate to large effect sizes across dimensions of the three instruments, except the Social Functioning (ES = 0.41) and Mental Health (ES = 0.31) dimensions of the SF-36 which both had small effect sizes. The effect sizes for differences in instrument scores between the age groups were small (in the range 0.00-0.50), with the highest for Physical Functioning. The SF-36 was rated as more sensitive to change than the EuroQol for older adult women.

In a study by Mossberg and McFarland (2001), 6 outpatient rehabilitation clinics incorporated the SF-36 into everyday practice. Ninety patients completed the SF-36 health status questionnaire before initiating treatment and again at discharge. Only nonsurgical patients without comorbidities were enrolled. Effect sizes for the SF-36 (admission to outpatient rehabilitation to discharge) ranged from small (0.48 for Role Limitations-Emotional) to large (1.38 for Bodily Pain). The physical component summary score effect size was large (ES = 0.80) and the mental component summary score effect size was small (ES = 0.45).

The SF-36 is increasingly being used in stroke studies (Anderson, Laubscher & Burns, 1996; Duncan et al. 1997) and in stroke clinical trials. However, the psychometric properties of the SF-36 soon after stroke are not well known, as most of the current data are from patients one year or more after the stroke (e.g. Anderson et al., 1996; Duncan et al., 1997). We did not identify any studies on the responsiveness of the SF-36 in patients with stroke.

Muller-Nordhorn et al. (2004) examined the responsiveness to change of the SF-12 in patients with stroke or transitory ischemic attack. Patients (n=558) were administered the SF-12 at baseline (referring to status prior to the event) and after 12 months. In patients with stroke, standardized response means (SRMs) were small for the physical component summary scale of the SF-12 (SRM 0.49) and moderate for the mental component summary scale of the SF-12 (SRM 0.52). In patients with transitory ischemic attack, SRMs were below 0.2 for the physical component summary scale of the SF-12 and small for the mental component summary scale of the SF-12 (SRM 0.34). SRMs increased with stroke severity as indicated by the National Institutes of Health Stroke Scale score. Thus, the SF-12 summary scales show a small to moderate responsiveness to change in patients after stroke. Responsiveness to change was higher in patients with greater stroke severity.

The observation that patients with stroke had scores similar to patients with transient ischemic attacks raises questions about the ability of the SF-36 to discriminate and to be responsive to clinical changes in patients with stroke (Duncan et al., 1997). Currently, no evaluative stroke-specific HRQOL instrument is available, and it remains to be seen whether the generic HRQOL instruments such as the SF-36 are sufficiently responsive to be useful in clinical trials. More information regarding the responsiveness of the SF-36 will be known when a number of ongoing current stroke trials are completed (Williams, 1998).

References

  • Aaronson, N. K., Muller, M., Cohen, P. D. A., Essink-Bot, M. L., Fekkes, M., Sanderman, R., Sprangers, M. A., Velder, A., Verrips, E. (1998). Translation, validation and norming of the Dutch language version of the SF-36 health survey in community and chronic disease populations. J Clin Epidemiol, 51, 1055-1068
  • Anderson, C., Laubscher, S., Burns, R. (1996). Validation of the Short Form 36 (SF-36) Health Survey Questionnaire among stroke patients. Stroke, 27(10), 1812-1816.
  • Andresen, E. M., Meyers, A. R. (2000). Health-related quality of life outcomes measures. Arch Phys Med Rehabil, 81(12), S30-45.
  • Andresen, E. M., Gwendell, W., Gravitt, G. W., Aydelotte, M. E., Podgorski, C. A. (1999). Limitations of the SF-36 in a sample of nursing home residents. Age and Ageing, 28, 562-566.
  • Andresen, E. M., Fouts, B. S., Romeis, J. C., Brownson, C. A. (1999). Performance of health-related quality-of-life instruments in a spinal cord injured population. Arch Phys Med Rehabil, 80. 877-884.
  • Andresen, E. M., Rothenberg, B. M., Kaplan, R. M. (1998). Performance of a self-administered mailed version of the Quality of Well-Being (QWB-SA) questionnaire among older adults. Med Care, 36, 1349-1360.
  • Beaton, D. E., Hogg-Johnson, S., Bombardier, C. (1997). Evaluating changes in health status: Reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders. J Clin Epidemiol, 50(1), 79-93.
  • Beaton, D. E., Hogg-Johnson, S., Bombardier, C. (1997). Evaluating changes in the health status: reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders. J Clin Epidemiol, 50, 79-93.
  • Benninger, M. S., Ahuja, A. S., Gardner, G., Grywalski, C. (1998). Assessing outcomes for dysphonic patients. J Voice, 12, 540-550.
  • Beusterien, K. M., Steinwald, B., Ware, J. E. (1996). Usefulness of the SF-36 Health Survey in measuring health outcomes in the depressed elderly. J Geriatr Psychiatry Neurol, 9(1), 13-21.
  • Beck, A. T., Rial, W. Y., Rickets, K. (1974). Short form of Depression Inventory: Cross-validation. Psychological-Reports , 34(3), 1184-1186.
  • Brazier, J., Roberts, J., Tsuchiya, A., Busschbach, J. (2004). A comparison of the EQ-5D and SF-6D across seven patient groups. Health Econ 13, 873-884.
  • Brazier, J., Usherwood, T., Harper, R., Thomas, K. (1998). Deriving a preference-based single index from the UK SF-36 Health Survey. J Clin Epidemiol, 51, 1115-1128.
  • Brazier, J.E., Walters, S.J., Nicholl, J.P. & Kohler, B. (1996). Using the SF-36 and EuroQol on an Elderly Population. Quality of Life Research, 5, 195-204.
  • Brazier, J., Roberts, J., Deverill, M. (2002). The estimation of a preference-based measure of health from the SF-36. J Health Econ, 21, 271-292.
  • Brazier, J. E., Harper, R., Jones, N. M. B. et al. (1992). Validating the SF-36 health survey questionnaire: new outcome measure for primary care. BMJ, 305, 160-164.
  • Buchwald, D., Pearlman, T., Umali, J., Schmaling, K., Katon, W. (1996). Functional status in patients with chronic fatigue syndrome, other fatiguing illnesses, and healthy individuals. Am J Med, 101, 364-370.
  • Ciconelli, R. M. (1997). Translation and validation to the Portuguese of the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36) [doctoral thesis]. Federal University of São Paulo, São Paulo, Brazil.
  • Colantonio, A., Dawson, D. R., McLellan, B. A. (1998). Head injury in young adults: long-term outcome. Arch Phys Med Rehabil, 79, 550-558.
  • Dallmeijer, A. J., de Groot, V., Roorda, L. D., Schepers, V. P. M., Lindeman, E., van den Berg, L. H., Beelen, A., Dekker, J. (2007). Cross-diagnostic validity of the SF-36 physical functioning scale in patients with stroke, multiple sclerosis and amyotrophic lateral sclerosis: A study using rasch analysis. J Rehabil Med, 9, 63 -169.
  • de Haan, R. J. (2002). Measuring quality of life after stroke using the SF-36. Stroke, 33, 1176-1177.
  • Dorman, P., Slattery, J., Farrell, B., Dennis, M., Sandercock, P. (1998). Qualitative comparison of the reliability of health status assessments with the EuroQol and SF-36 Questionnaires After Stroke. Stroke, 29, 63-68.
  • Dorman, P. J., Dennis, M., Sandercock, P. (1999). How do scores on the EuroQol relate to scores on the SF-36 after stroke? Stroke, 30(10), 2146-2151.
  • Duncan, P. W., Samsa, G. P., Weinberger, M., Goldstein, L. B., Bonito, A., Witter, D. M., Enarson, C., Matchar, D. (1997). Health status of individuals with mild stroke. Stroke, 28, 740-745.
  • Essink-Bot, M. A., Krabbe, P. F., Bonsel, G. J., Aaronson, N. K. (1997). An empirical comparison of four generic health status measures: The Nottingham Health Profile, the Medical Outcomes Study 36-Item Short-Form Health Survey, the COOP/WONCA Charts, and the EuroQol Instrument. Med Care, 35(5), 522-537.
  • Fielder, H., Denholm, S. W., Lyons, R. A., Fielder, C. P. (1996). Measurement of health status in patients with vertigo. Clin Otolaryngol, 21,124-126.
  • Fukuhara, S., Ware, J. E., Kosinski, M., Wada, S., Gandek, B. (1998). Psychometric and Clinical Tests of Validity of the Japanese SF-36 Health Survey. J Clin Epidemiol, 1, 1045-1053.
  • Hagen, S., Bugge, C., Alexander, H. (2003). Psychometric properties of the SF-36 in the early post-stroke phase. Journal of Advanced Nursing, 44(5), 461-468.
  • Harwood, R. H., Ebrahim, S. (2000). A comparison of the responsiveness of the Nottingham extended activities of daily living scale, London handicap scale, and SF-36. Disability & Rehabilitation , 22(17), 786-793.
  • Hayes, V., Morris, J., Wolfe, C., Morgan, M. (1995). The SF-36 Health Survey Questionnaire: Is it suitable for use with older adults? Age and Ageing, 24, 120-125.
  • Hilari, K., Byng, S., Lamping, D. L., Smith, S. C. (2003). Stroke and Aphasia Quality of Life Scale-39 (SAQOL-39): Evaluation of acceptability, reliability, and validity. Stroke, 34, 1944-1950.
  • Hobart, J. C., Williams, L. S., Moran, K., Thompson, A. J. (2002). Quality of life measurement after stroke: Uses and abuses of the SF-36. Stroke, 33, 1348-1356.
  • Jenkinson, C., Coulter, A., Wright, L. (1993). Short form 36 (SF36) health survey questionnaire: Normative data for adults of working age. BMJ, 306(6890), 1437-1440.
  • Jenkinson, C., Wright, L., Coulter, A. (1994). Criterion validity and reliability of the SF-36 in a population sample. Quality of Life Research, 3(1), 7-12.
  • Jenkinson, C., Stewart-Brown, S., Petersen, S., Paice, C. (1999). Assessment of the SF-36 version 2 in the United Kingdom. J Epidemiol Community Health, 53(1), 46-50.
  • Komaroff, A.L., Fagioli, L.R., Doolittle, T.H., Gandek, B., Gleit, M.A., Gueriero, R.T., et al. (1996). Health status in patients with chronic fatigue syndrome and in general population and disease comparison groups. Am J Med,101, 281-90.
  • Lai, S-M., Perera, S., Duncan, P. W., Bode, R. (2003). Physical and social functioning after stroke: Comparison of the Stroke Impact Scale and Short Form-36. Stroke, 34, 488-493.
  • Lalonde, L., Clarke, A. E., Joseph, L., Mackenzie, T., Grover, S. A. (1999). Comparing the psychometric properties of preference-based and nonpreference-based health-related quality of life in coronary heart disease. Qual Life Res, 8, 399-409.
  • Lyons, R. A., Perry, H. M., Littlepage, B. N. C. (1994). Evidence for the validity of the Short-Form 36 Questionnaire (SF-36) in an elderly population. Age Aging, 23, 182-184.
  • Mathias, S. D., Bates, M. M., Pasta, D. J., Cisternas, M. G., Feeny, D., Patrick, D. L. (1997). Use of the Health Utilities Index with stroke patients and their caregivers. Stroke, 28, 1888-1894.
  • Mayo, N. E., Wood-Dauphinee, S., Cote, R., Durcan, L., Carlton, J. (2002). Activity, Participation, and Quality of Life 6 Months Poststroke. Arch Phys Med Rehabil, 83, 1035-1042.
  • McDowell, I., Newell, C. (1996). Measuring Health. A Guide to Rating Scales and Questionnaires. 2nd ed. NewYork: Oxford University Press.
  • McHorney, C. A. (1996). Measuring and monitoring general health status in elderly persons: Practical and methodological issues in using the SF-36 health survey. The Gerontologist, 36(5), 571-583.
  • McHorney, C. A., Ware, J. E. Jr., Raczek, A. E. (1993). The MOS 36-Item Short-Form Health Survey (SF-36): II Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care, 31, 247-263.
  • McHorney, C. A., Ware, J. E. Jr., Lu, J. F., Sherbourne, C. D. (1994). The MOS 36-item Short-Form Health Survey (SF-36): III Tests of data quality, scaling assumptions, and reliability across diverse patient groups. Med Care, 32, 40-66.
  • Mossberg, K., McFarland, C. (2001). A patient-oriented health status measure in outpatient rehabilitation. Am J Phys Med Rehabil, 80(12), 896-902.
  • Muller-Nordhorn, J., Nolte, C. H., Rossnagel, K., Jungehulsing, G. J., Reich, A., Roll, S., Villringer, A., Wllich, S. N. (2004). Responsiveness to change of the SF-12 in patients with cerebrovascular disease. Biometrical Journal, 46(S1), 50.
  • Myers, C., Wilks, D. (1999). Comparison of Euroqol EQ-5D and SF-36 in patients with chronic fatigue syndrome. Qual Life Res, 8, 9-16.
  • Nemeth, G. (2006). Health related quality of life outcome instruments. European Spine Journal, 15(1), S44-S51.
  • Nortvedt, M. W., Riise, T., Myhr, K. M., Nyland, H. I. (1999). Quality of life in multiple sclerosis: measuring the disease effects more broadly. Neurology, 53, 1098-1103.
  • O’Mahony, P. G., Rodgers, H., Thomson, R. G., Dobson, R., James, O. F. W. (1998). Is the SF-36 suitable for assessing health status of older stroke patients? Age and Ageing, 27, 19-22.
  • O’Neill, P., Kelly, P. (1996). Postal questionnaire study of disability in the community associated with psoriasis. Br Med J, 313, 919-921.
  • Petrou, S., Hockley, C. (2005). An investigation into the empirical validity of the EQ-5D and SF-6D based on hypothetical preferences in a general population. Health Econ, 14, 1169-1189.
  • Ren, X. S., Amick, B., Zhou, L., et al. (1998). Translation and Psychometric Evaluation of a Chinese Version of the SF-36 Health Survey in the U.S. J Clin Epidemiol, 51(11), 1129.
  • Rothwell, P. M., McDowell, Z., Wong, C. K., Dorman, P. J. (1997). Doctors and patients don’t agree: cross sectional study of patients’ and doctors’ perceptions and assessments of disability in multiple sclerosis. British Med J, 314, 1580-1583.
  • Rumsfeld, J. S., MaWhinney, S., McCarthy, M., Shroyer, A. L., VillaNueva, C. B., O’Brien, M., Moritz, T. E., Henderson, W. G., Grover, F. L., Sethi, G. K., Hammermeister, K. E. (1999). Health-related quality of life as a predictor of mortality following coronary artery bypass graft surgery. Participants of the Department of Veterans Affairs Cooperative Study Group on Processes, Structures, and Outcomes of Care in Cardiac Surgery. JAMA, 14(281), 1298-1303.
  • Ruta, D. A., Garratt, A. M., Wardlaw, D., Russell, I. T. (1994). Developing a valid and reliable measure of health outcome for patients with low back pain. Spine, 19, 1887-1896.
  • Segal, M. E., Schall, R. R. (1994). Determining functional/health status and its relation to disability in stroke survivors. Stroke, 25, 2391-2397.
  • The Canadian Burden of Illness Study Group. (1998). Burden of illness of multiple sclerosis: part II: quality of life. Can J Neurol Sci, 25, 31-38.
  • The Counselling Versus Antidepressants in Primary Care Study Group. (1999). How disabling is depression? Evidence from a primary care sample. Br J Gen Pract, 49(439), 95-98.
  • Walters, S. J., Munro, J. F., Brazier, J. E. (2001). Using the SF-36 with older adults: A cross-sectional community-based survey. Age and Ageing, 30, 337-343.
  • Ware, J. E., Kosinski, M., Dewey, J. E., Gandek, B. (2001). How to Score and Interpret Single-Item Health Status Measures: A Manual for Users of the SF-8 Health Survey. Lincoln RI: QualityMetric Incorporated.
  • Ware, J. E., Kosinski, M., Keller, S. D. (1994). SF-36 Physical and Mental Health Summary Scales: A User’s Manual. Boston, MA: The Health Institute.
  • Ware, J. E. Jr., Sherbourne, C. D. (1992) The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care, 30, 473-483.
  • Ware, J. Jr., Kosinski, M., Keller, S. D. (1996). A 12-item short-form health survey: Construction of scales and preliminary tests of reliability and validity. Med Care, 34(3), 220-233.
  • Ware, J. E., Snow, K. K., Kosinski, M., Gandek, B. (1993). SF-36® Health Survey Manual and Interpretation Guide. Boston, MA: New England Medical Center, The Health Institute.
  • Ware, J. E., Kosinski, M., Turner-Bowker, D. M., Gandek, B (2002) SF-12v2: How to score version 2 of the SF-12 Health Survey. Lincoln RI: QualityMetric Incorporated.
  • Weinberger, M., Oddone, E. Z., Samsa, G. P., Landsman, P. B. (1996). Are health-related quality-of-life measures affected by the mode of administration? J Clin Epidemiol, 49(2), 135-140.
  • Wilkinson, P. R., Wolfe, C. D., Warburton, F. G., Rudd, A. G., Howard, R. S., Ross-Russell, R. W., Beech, R. (1997). Longer term quality of life and outcome in stroke patients: Is the Barthel Index alone an adequate measure of outcome? Quality in Health Care, 6, 125-130.
  • Williams, L. S. (1998). Health-Related Quality of Life Outcomes in Stroke. Neuroepidemiology , 17, 116-120.

See the measure

How to obtain the SF-36

Permission to use the SF-36 should be obtained from the Medical Outcomes Trust who oversees the standardized administration of the SF-36 and will provide updates on administration and scoring (McDowell & Newell 1996). Various computer applications are available to assist in scoring the SF-36 including free Excel templates that can be downloaded from the Internet.

All versions of the SF-36 can be viewed by visiting the website www.qualitymetric.com

Samples of the various versions of the SF-36 are also available on this website Please click here to see a copy of the SF-36

Table of contents
Survey