Medical Outcomes Study Short Form 36 (SF-36)

Evidence Reviewed as of before: 19-08-2008
Author(s)*: Lisa Zeltzer, MSc OT
Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc; Maxim Ben Yakov, BSc PT

Purpose

The Medical Outcomes Study 36-item Short-Form Health Survey is a widely used, generic, patient-report measure created to assess health-related quality of life (HRQOL) in the general population. It was developed as part of the Medical Outcomes Study (a two-year study of patients with chronic conditions) (Ware & Sherbourne, 1992). Today, the SF-36 is the most commonly used generic instrument for measuring quality of life (de Haan, 2002). The SF-36 can be used, but is not limited to, persons with stroke.

In-Depth Review

Purpose of the measure

The Medical Outcomes Study 36-item Short-Form Health Survey is a widely used, generic, patient-report measure created to assess health-related quality of life (HRQOL) in the general population. It was developed as part of the Medical Outcomes Study (a two-year study of patients with chronic conditions) (Ware & Sherbourne, 1992). Today, the SF-36 is the most commonly used generic instrument for measuring quality of life (de Haan, 2002). The SF-36 can be used, but is not limited to, persons with stroke.

Available versions

The SF-36 was published in 1992 by Ware and Sherbourne, and further developed and validated in 1993 and 1994 respectively (Ware & Sherbourne, 1992; McHorney, Ware & Raczek, 1993; McHorney, Ware, Lu & Sherbourne, 1994). In 1996, Version 2.0 of the SF-36 (SF-36v2) was introduced, to correct for deficiencies identified in the original version. Changes include a few wording alterations, for example, “downhearted and blue” in a question on mental health symptoms is now “downhearted and depressed”. SF-36v2 is now considered “the international version” of the SF-36 (Andresen & Meyers, 2000). The original SF-36 questions had variable numbers and formats for response categories, and these have been increased and/or standardized among scales and questions. Role Functioning items now have five levels of responses rather than two. This may increase the responsiveness of the scales. Early reports of tests of this new version have been positive (Jenkinson, Stewart-Brown, Petersen & Paice, 1999). The virtual elimination of the www.sf-36.org, accessed July 12, 2006). Versions 1.0 and 2.0 of the SF-36 are available with two recall periods: the standard 4-week recall, and the acute 1-week recall period.

Features of the measure

Items:

Items of the SF-36 are divided into eight different domains:

Physical component:

  • Physical functioning (10 items)
  • Role limitations due to physical problems (4 items)
  • Bodily pain (2 items)
  • General health perceptions (5 items)

Mental component

  • Social functioning (2 items)
  • General mental health (5 items)
  • Role limitations due to emotional problems (3 items)
  • Vitality (4 items)

Other

  • Health transition (2 questions): The respondent is asked to rate their current health status compared to their health status one year ago. These two questions remain separate from the 8 subscales and are not scored.

There are 11 questions in the SF-36, with 36 items in total. With the exception of the general change in health status questions, subjects are asked to respond with reference to the past 4 weeks. An acute version of the SF-36 refers to problems in the past week only (McDowell & Newell, 1996).

Scoring:

The SF-36 does not lend itself to the generation of an overall summary score. This is because information within the individual responses is lost in the total scale score (since the total score can be achieved in a variety of ways from individual item responses) (Dorman et al., 1999). The recommended scoring system for the SF-36 is a weighted Likert system for each item. Items within subscales are totaled to provide a summed score for each subscale or dimension. Each of the 8 summed scores is linearly transformed onto a scale from 0 (negative health) to 100 (positive health) to provide a score for each subscale. A physical component score (PCS) and mental component score (MCS) can be derived from the scale items. However, these summary scores should be interpreted with caution. Hobart et al. (2002) examined the use of this two-dimensional model and found that these two scales accounted for only 60% of the variance in SF-36 scores. This finding suggests that there is a significant loss of information when this two-dimensional model is used.

Subscales:

The SF-36 has 8 subscales

  • Physical Functioning,
  • Role Limitations due to Physical Problems,
  • General Health Perceptions,
  • Vitality,
  • Social Functioning,
  • Role Limitations due to Emotional Problems,
  • General Mental Health,
  • Health Transition.

Equipment:

Only the test and a pencil are required. Computer administered and telephone voice recognition interactive systems of administration of the SF-36 are currently being evaluated (SF-36 Health Survey Update: John E. Ware, Jr.).

Training:

No training is required for administration of the SF-36. The SF-36 is suitable for self-administration, computerized administration, or administration by a trained interviewer in person or by telephone, to persons age 14 and older (Ware & Sherbourne, 1992).

Time:

The SF-36 is considered simple to administer and takes an average of 10 minutes to complete (Andreson & Meyers, 2000). The SF-36 has been studied for use by a proxy, however, administration by proxy is not recommended for patients with stroke, as agreement has been found to be poor in this patient population (Segal & Schall, 1994; Dorman, Slattery, Farrell, & Dennis, 1998). Instead, a stroke-specific quality of life measure such as the Stroke Impact Scale, which has been evaluated successfully for use by proxy respondents, may be more a more appropriate measure to be administered by proxy. Another reliable measure of health status for stroke patients by proxy is the Health Utilities Index (HUI) which has been reported to have adequate to excellent agreement in between patients with stroke and their proxies (Mathias, Bates, Pasta, Cisternas, Feeny & Patrick, 1997).

The SF-36 can also be completed as a mail survey. As a self-completed, mailed questionnaire, it has been shown to have reasonably high response rates (83% – Brazier et al., 1992, O’Mahoney, Rodgers, Thomson, Dobson, & James, 1998; 75% – 83% Dorman et al., 1998; 85% – Dorman et al., 1999; 82% overall and 69% for those over age 85 – Walters et al., 2001). However, data is typically more complete when interviewer administration is used. However, low completion rates may not be limited to self-completion or postal administration. Andresen et al. (1999) administered the SF-36 to nursing home residents by face-to-face interview and reported that only 1 in 5 residents were able to complete it. It is possible that data completeness is indicative of respondent acceptance and understanding of the survey as relevant to them (O’Mahoney et al., 1998; Andresen et al., 1999). Hayes et al. (1995) identified that the most common items missing on the self-completed questionnaire referred to work or to vigorous activity. Older respondents recognized these questions as relevant to much younger people and not pertinent to their own situation. The authors suggested modifications to some of the questions, which may increase acceptability to older populations.

Alternative forms of the SF-36

SF-12 (Ware, Kosinski, & Keller, 1996)

The SF-12 was developed as an abbreviated version of the SF-36 for use in large surveys of general and specific populations as well as large longitudinal studies of health outcomes. It can be self-administered, or administered via interview, telephone, or computer. The SF-12 takes 5 minutes or less to complete (Nemeth, 2006). The SF-12v2 was later developed to correspond to the SF-36v2 and has demonstrated the same improvements as observed with the SF-36v2 (Ware, Kosinski, Turner-Bowker & Gandek, 2002). Versions 1.0 and 2.0 of the SF-12 are available with two recall periods: the standard 4-week recall, and the acute 1-week recall period.

SF-8 (QualityMetric, Incorporated)

The SF-8, a new generic eight-item assessment, generates a health profile consisting of eight scales and two summary measures describing HRQOL. The SF-8 uses one question to measure each of the eight SF-36 domains. The development, validation and norming of the new SF-8, including standard (4-week recall), acute (1-week recall), and 24-hour recall versions is documented in the SF-8 manual, “How to Score and Interpret Single-Item Health Status Measures: A Manual for Users of the SF-8 Health Survey” (Ware, Kosinski, Dewey & Gandek, 2001). The SF-8 Health Survey can be self-administered, computer-administered, or given by a trained interviewer in person or by telephone to persons aged 14 and older. It takes approximately 1-2 minutes to complete and it has been translated and validated for use in more than 30 countries (for a list of these countries, click on this list) (accessed July 12, 2006).

SF-6D (Brazier, Usherwood, Harper, & Thomas, 1998; Brazier, Roberts, & Deverill, 2002)

The SF-6D is a preference-based scoring system that uses six subscales from the SF-36, to allow for calculations of utilities from SF-36 and SF-36v2 responses. The eight dimensions from SF-36 were reduced to six by omitting General Health Perceptions and combining Role Limitations-Physical and Role Limitatons-Emotional. Good reliability and validity have been reported for the SF-6D (Petrou & Hockley, 2005; Brazier, Roberts, Tsuchiya & Busschbach, 2004).

For a fee, all versions of the SF Health Survey can be scored online via Quality Metric’s website (accessed July 12, 2006).

Client suitability

Can be used with:

  • Individuals with stroke.

The SF-36 is the most widely used measure to assess HRQOL in patients with stroke, however, its suitability in this patient population has been contentious:

  • Hobart, Williams, Moran, and Thompson (2002) reported that of their sample of 177 post-stroke patients, five of the eight SF-36 subscales were found to have limited validity as outcome measures, and that the reporting of physical and mental summary scores were not supported. The authors questioned the use of the SF-36 in patients with stroke.
  • de Haan (2002) reported that when the results of the relatively small study of Hobart et al. (2002) were taken in conjunction with the findings of previous research, there was insufficient evidence to question the reliability and validity of the SF-36 subscales in stroke.

Should not be used in:

  • Patients who cannot understand written or spoken language. Make sure the patient is fluent in the language used in the survey.
  • More severely affected stroke survivors who need a proxy to complete (Dorman et al., 1998). Instead, a stroke-specific quality of life measure such as the Stroke Impact Scale, which has been evaluated successfully for use by proxy respondents, may be more a more appropriate measure to be administered by proxy. Another more reliable measure of health status for stroke patients by proxy is the Health Utilities Index (HUI) which has been reported to have moderate to high agreement in interrater reliability between stroke patients and proxies (Mathias et al., 1997).
  • Patients with aphasia. For patients with aphasia, a stroke-specific quality of life measure developed specifically for patients with aphasia, such as the Stroke and Aphasia Quality Of Life Scale (SAQOL-39), should be used (Hilari, Byng, Lamping, & Smith, 2003).
  • The SF-36 should not be used to document individual patient change. Dorman, Slattery, Farrell, Dennis, and Sandercock (1998) found that although the SF-36 can function effectively as a discriminatory measure for assessing health-related quality-of-life outcomes in groups of patients after stroke, the SF-36 may not be adequate for serial assessments of individual patients, unless large differences over time are expected. Thus, the SF-36 should be used for large group comparisons only.

In what languages is the measure available?

The SF-36 is available in a number of languages. In 1991, the International Quality of Life Assessment launched a project aimed at translating, validating and norming the SF-36 health survey. The project, which is based at the Health Assessment Lab in Boston, has sponsored investigators from 14 countries: Australia, Belgium, Canada, Denmark, France, Germany, Italy, Japan, The Netherlands, Norway, Spain, Sweden, the United Kingdom (English version), and the United States (English and Spanish versions). In addition, the SF-36 has been translated for use in more than 40 other countries, including: Argentina, Armenia, Austria, Bangladesh, Brazil, Bulgaria, Cambodia, Chile, China, Colombia, Costa Rica, Croatia, Czech Republic, Finland, Greece, Guatemala, Honduras, Hong Kong, Hungary, Iceland, Israel, Korea, Latvia, Lithuania, Mexico, New Zealand, Peru, Poland, Portugal, Romania, Russia, Singapore, Slovak Republic, South Africa, Switzerland, Taiwan, Tanzania, Turkey, the United Kingdom (Welsh), the United States (Chinese, Japanese, Vietnamese), Uruguay, Venezuela, and Yugoslavia. There are more than 500 publications that use translations or English-language adaptations of the SF-36. For information about the availability of SF-36 translations, visit www.SF-36.org.

Summary

What does the tool measure? Health related quality of life
What types of clients can the tool be used for? The SF-36 is a generic measure that can be used, but is not limited to, persons with stroke.
Is this a screening or assessment tool? Assessment
Time to administer The SF-36 is considered simple to administer and takes an average of 10 minutes to complete.
Versions SF-12; SF-8, SF-6D
Other Languages The SF-36 is available in a number of languages. There are more than 500 publications that use translations or English-language adaptations of the SF-36. For information about the availability of SF-36 translations, visit www.sf-36.org
Measurement Properties
Reliability Internal consistency:
Out of 10 studies examining the internal consistency of the SF-36, five reported excellent internal consistency (except for the subscales of Social Functioning in three studies and General Health in one study, which were considered adequate). Two studies reported adequate to excellent internal consistency. Three studies reported poor to excellent internal consistency.

Test-retest:
Out of the five studies examining test-retest reliability of the SF-36, three reported adequate to excellent test-retest reliability. One study reported adequate test-retest reliability. One reported poor to excellent test-retest reliability.

Inter-rater:
No studies have examined the inter-rater reliability of the SF-36.

Validity Criterion:
Predictive:
Subscales of the SF-36 have been found to be predictive of death, hospitalizations, physician visits, and the burden of depression among depressed elderly persons.

Construct:
Convergent:
Adequate correlations between the SF-36 Physical Health subscale and the Activities of Daily Living Index; the SF-36 Social Functioning subscale and social isolation on the Nottingham Health Profile; the General Health subscale and the EuroQol overall HRQOL rating; the SF-36 Bodily Pain subscale and all EuroQol domains; and the Role Functioning-Emotional subscale with the EuroQol psychological domain. Excellent correlation between the Physical Health scores from the SF-36 and the Geriatric Depression Scale; the Vitality subscale on the SF-36 and energy subscale on the Nottingham Health Profile; and the Bodily Pain subscale on the SF-36 with the EuroQol pain domain.

Known groups:
SF-36 scores discriminated between patients diagnosed with one or more chronic physical problems and healthy age-matched controls; individuals older than 75 and younger than 75; groups based on setting (general practice versus hospital outpatients); migraine sufferers and controls; groups based on recent visits to their family doctor, hospital inpatient stays and longstanding illness; patients with stroke and their age and gender matched controls.

Floor/Ceiling Effects Of the 8 studies examined, 6 reported that the SF-36 had significant floor and ceiling effects, 1 reported significant ceiling effects only, and 1 reported significant floor effects only.
Does the tool detect change in patients?

Out of 3 studies examined, 1 reported that the SF-36 had a large ability to detect change, 1 reported moderate to large ability to detect change, (except for the Social Functioning and Mental Health dimensions which both had small effect sizes); 1 reported small (Role Limitations-Emotional, Mental component summary score) to large (Bodily Pain, Physical component summary score) ability to detect change. To our knowledge, no studies have examined the ability of the SF-36 to detect change in patients with stroke.

Acceptability The SF-36 cannot be used with patients who cannot understand written or spoken language, severely affected patients who need a proxy to complete, or patients with aphasia. To our knowledge, no studies have examined the ability of the SF-36 to detect change in patients with stroke.
Feasibility The SF-36 is simple to administer and requires no training or special equipment. It is suitable for self-administration, computerized administration, or administration by a trained interviewer in person or by telephone, to persons age 14 and older.
How to obtain the tool? All versions of the SF-36 can be viewed by visiting the website: www.qualitymetric.com

Psychometric Properties

Overview

Extensive psychometric testing has been conducted on the SF-36. However, little research has been conducted specifically in a post-stroke population. For the purposes of this review, we conducted a literature search to identify all relevant publications on the psychometric properties of the SF-36. We then selected to review articles from high impact journals, and from a variety of authors. The creators of the SF-36 have performed many of the psychometric studies that exist on the survey, however, we preferentially reviewed studies carried out by other authors who were not implicated in the development of the SF-36.

Floor and Ceiling Effects

Lai, Perera, Duncan, and Bode (2003) administered the Stroke Impact Scale and the SF-36 to 278 stroke subjects approximately 90 days after stroke. In comparison to the Stroke Impact Scale-16 (characterizes physical functioning), the SF-36 Physical Functioning subscale had major floor effects (floor effects of 37% and 100% were observed for patients with a modified Rankin scale grade 4 or 5, respectively). Further, in contrast to the Stroke Impact Scale-Participation (characterizes social functioning), the SF-36 Social Functioning subscale had major ceiling effects (ceiling effects up to 60% for modified Rankin scale grade 0).

Anderson et al. (1996) examined the SF-36 in a cohort of 90 long-term (1-year) stroke survivors. The validity of the SF-36 was assessed by comparing patients’ scores on the SF-36 with those obtained for the Barthel Index, the 28-item General Health Questionnaire, and the Adelaide Activities Profile. Large ceiling effects were reported for the SF-36 Role Limitations-Physical (53%), Bodily Pain (43%), Social Functioning (67%) and Role Limitations-Emotional (72%) subscales. No floor effects exceeding 7% were reported for the SF-36, and scores for the SF-36 Physical Functioning subscale were more uniformly distributed than Barthel Index scores suggesting the SF-36 has lower floor and ceiling effects than the Barthel Index.

Brazier et al. (1996) tested the psychometric properties of the SF-36 and the EuroQol on an elderly female population (n=380) aged 75 and older, and compared these scales to the Office of Population Census and Surveys Disability Survey. Patients were administered the scales at baseline and again six months later. Major floor effects (in excess of 25%) were reported for the Role Limitations-Physical and Role Limitations-Emotional subscales.

Hobart et al. (2002) examined SF-36 data from 177 people after stroke. Notable floor effects were observed for the Role Limitations-Physical (59.1%), Role Limitations-Emotional (63.1%), Social Functioning (29.9%), and Bodily Pain (25.6%) subscales. Notable ceiling effects were also observed for the Role Limitations-Emotional (63.1%), Social Functioning (29.9%) and Bodily Pain (25.6%) subscales.

O’Mahoney et al. (1998) examined the suitability of the SF-36 for assessing quality of life in older patients with stroke. Floor effects were high for the Role Limitations-Physical (54%) and Role Limitations-Emotional (35%) subscales and for the Social Functioning (17%) and Physical Functioning (18%) subscales. Ceiling effects were also substantial for the Role Limitations-Physical (16%), Role Limitations-Emotional (51%), Social Functioning (18%) and Bodily Pain (25%) subscales.

Weinberger, Oddone, Samsa and Landsm (1996) administered the SF-36 three times over a 4-week period to 172 veterans receiving care in a General Medicine Clinic. Telephone, face-to-face, and self-administration modes of administering the SF-36 were compared. For face-to-face administration of the SF-36, notable floor effects were observed for the Role Limitations-Physical (43.8%) and Role Limitations-Emotional (30.3%) subscales. Notable ceiling effects were observed for the Social Functioning (31.5%), Role Limitations-Physical (14.6%), and Role Limitations-Emotional (47.2%) subscales. For telephone administration, significant floor effects were observed for the Role Limitations-Physical (53.2%) and Role Limitations-Emotional (34.0%) subscales. Significant ceiling effects were observed for the Role Limitations-Emotional (36.2%) subscale only. Self-administration of the SF-36 resulted in significant floor effects for the Role Limitations-Physical (47.1%), and Role Limitations-Emotional (25.0%) subscales. Further, notable ceiling effects were observed for the Social Functioning (27.8%), Role Limitations-Physical (14.7%), and Role Limitations-Emotional (52.8%) subscales.

Walters, Munro and Brazier (2001) administered the SF-36 to a community-dwelling population over the age of 65. Substantial floor (30.9-61%) and ceiling effects across all age groupings (65-69, 70-74, 75-79, 80-84, and 85+) were observed for the Role Functioning-Physical (floor effects: 30.9%-60% and ceiling effects: 11.7%-38.6%) and Role Functioning-Emotional (floor effects: 25.6%-50.4% and ceiling effects: 32.2% – 53.2%) subscales. Substantial ceiling effects were also noted for the Social Functioning and Bodily Pain subscales (15%-46.7% and 14.1%-21.1%, respectively).

Andresen, Gwendell, Gravitt, Aydelotte, and Podgorski (1999) administered the SF-36 to 97 nursing home residents and reported substantial floor effects of 26.8% and 29.5% for the Physical Functioning and Role Limitations-Physical subscales, respectively. Substantial ceiling effects of 36.1%, 49.5% and 21.6% were reported for the Social Functioning, Role Limitations-Emotional, and Bodily Pain subscales, respectively.

Reliability studies have demonstrated excellent internal consistency, with Cronbach’s alpha generally exceeding 0.80 for all scales except Social Functioning. Social Functioning may sometimes be lower due to the fact that there are fewer items (only 2 items) in the subscale (Ware, Snow, Kosinski & Gandek, 1993; Brazier et al., 1992; Lyons, Perry, & Littlepage, 1994; McHorney, Ware, Lu, & Sherbourne, 1994; Ruta, Garratt, Wardlaw, & Russell, 1994). Test-retest reliability evaluations have also suggested that the SF-36 scores can generally be reproduced (Brazier et al. 1992; Beaton, Hogg-Johnson, & Bombardier, 1997).

Brazier et al. (1992) found considerable evidence for the reliability of the SF-36. For the internal consistency of the SF-36, Cronbach’s alpha was found to be excellent, exceeding 0.85, and reliability coefficients exceeded 0.75 for all dimensions of the scale with the exception of the Social Functioning subscale (alpha = 0.73). To identify the test-retest reliability, Brazier et al. (1992) calculated correlation coefficients and found coefficients ranging from adequate (0.60 for Social Functioning) to excellent (0.81 for Physical Functioning).

Jenkinson, Coulter and Wright (1993) mailed the SF-36 in a large community sample to explore the questionnaire’s internal consistency and validity. Cronbach’s alpha on all subscales of the SF-36 were excellent, exceeding 0.80, with the exception being the Social Functioning subscale, which was of adequate internal consistency (alpha = 0.76). In the case of the Social Functioning dimension, the results were considered acceptable due to the small number of items (2 items using a 5-point scale).

Jenkinson, Wright and Coulter (1994) mailed the SF-36 to 13,042 randomly selected subjects between the ages of 16-64 years. The internal consistency of the SF-36 was found range from adequate to excellent (alpha ranged from 0.76 for Social Functioning to 0.90 for Physical Functioning). The internal consistency was then calculated by breaking the data down into five subgroups of overall self-rated general health (poor, fair, good, very good, excellent). All alpha values were adequate, exceeding 0.70, except for the Social Functioning subscale, which was poor (exceeded 0.50). Due to the small number of items in this domain this result is considered acceptable.

Brazier et al. (1996) calculated the reliability of the SF-36 in 380 women over the age of 75. Spearman’s rank correlation coefficients between scores for those who said their health had not changed between initial assessment and first follow-up by perceived health change were calculated and coefficients ranged from poor (r = 0.28 for Social Functioning) to adequate (0.70 for Vitality) over a retest period of 6 months. These results suggest that the SF-36 has only adequate test-retest reliability in the elderly. Brazier et al (1996) also examined the internal consistency of the SF-36 and reported excellent internal consistency (alpha ≥ 0.80) for all subscales but poor internal consistency for the subscales Social Functioning (0.56) and General Health (0.66).

Andresen et al. (1999) administered the SF-36 to 97 nursing home residents and then re- administered the SF-36 after 1 week. Test-retest intraclass correlation coefficients (ICC) ranged from adequate to excellent (from 0.55 to 0.82). Further, the ICCs for both the physical summary and mental summary scores were excellent (ICC = 0.82 and 0.79 respectively).

Essink-Bot, Krabbe, Bonsel, and Aaronson (1997) administered the SF-36, The Nottingham Health Profile, the COOP/WONCA charts (The Dartmouth Primary Care Cooperative Information Project/World Organization of National Colleges, Academies, and Academic Associations of General Practices/Family Physicians), and the EuroQol to migraine sufferers. The scales of the SF-36 yielded internal consistency estimates ranging from adequate (alpha = 0.76 for General Health) to excellent (0.91 for Physical Functioning). The mean alpha coefficient was considered excellent (alpha = 0.84). The internal consistency of the SF-36 subscales exceeded that of the Nottingham Health Profile scales.

Walters, Munro and Brazier (2001) reported excellent internal consistency (Cronbach’s alpha ≥ 0.80) for all subscales of the SF-36 except for the Social Functioning subscale (alpha = 0.79) when the survey was administered by mail to a sample of 9,897 subjects aged 65-104 years.

McHorney, Ware and Sherbourne (1994) evaluated data from 3,445 patients from the Medical Outcomes Study (MOS) and replicated data across 24 subgroups differing in socio-demographic characteristics, diagnosis, and disease severity. Across patient groups, all scales passed tests for item- internal consistency (97% passed). Reliability coefficients ranged from a low of 0.65 to a high of 0.94 across scales (median = 0.85) and varied somewhat across patient subgroups.

Weinberger et al. (1996) tested whether the SF-36 is influenced by method of administration (face-to-face interview, self administration and telephone interview) in 172 veterans receiving care at a General Medical Clinic. All patients were asked to complete the SF-36 three times over a 4-week period. Cronbach’s alpha coefficients indicated that items in all eight SF-36 domains were highly internally consistent, regardless of the mode of administration, however they showed large variation over short intervals. Specifically, of 24 computed Cronbach’s alphas (i.e., eight scales times three modes of administration), only one was below 0.70 (Social Function via telephone administration), whereas 17 exceeded 0.80. Cronbach’s alphas did not differ significantly by method of administration. Test-retest correlations ranged from r = 0.55 (Physical Role Function by telephone administration) to r = 0.94 (Physical Function by self-administration).

Hagen, Bugge, and Alexander (2003) examined the reliability of the SF-36 in patients in the early post-stroke period. The SF-36 was administered at 1, 3 and 6 months after stroke onset. The internal consistency of the eight subscales at all three time-points was good except for 1-month Vitality (alpha = 0.68) and 3-month General Health (alpha = 0.67), which were considered poor.
Dorman et al. (1998) assessed the test-retest reliability and the internal consistency of the SF-36 in 2,253 patients with stroke. ICC’s ranged from poor (0.28 for Mental Health) to excellent (0.80 for Social Functioning). Internal consistency of the SF-36 was excellent (ranging from 0.81 for Social Functioning to 0.96 for Emotional Role Functioning). Dorman et al. concluded that although the SF-36 can function effectively as a discriminatory measure for assessing health-related quality-of-life outcomes in groups of patients after stroke, the level of test re-test reliability reported in stroke populations indicates that the SF-36 may not be adequate for serial assessments of individual patients, unless large differences over time are expected. Thus, the SF-36 should be used for large group comparisons only.

Furthermore, test-retest reliability was negatively affected by the use of proxy respondents in this study. While the use of a proxy may be the only means by which to include data from more severely affected stroke survivors, the subjective nature of the SF-36 may make proxy use difficult or even inadvisable.

Hobart, Williams, Moran and Thompson (2002) argue that the SF-36 has limited reliability as the General Health Perceptions and Social Functioning scales generate low reliability scores and have limited convergent and discriminant validity. However, de Haan (2002) argues that Hobart et al.’s conclusions can be challenged. The reliability of only one scale (General Health Perceptions) was marginally less (Cronbach’s alpha = 0.68) than the authors’ predefined criteria of alpha = 0.70. Although it is often recommended that coefficient values should be above 0.80, de Haan points out that coefficients above 0.70 are generally regarded as acceptable for scales when assessing outcome on a group level.

Anderson, Laubscheret and Burns (1996) administered the Australian version of the SF-36 to 90 individuals at one-year post-stroke. The authors concluded that the SF-36 has satisfactory internal consistency, however alphas ranged from 0.60 for the Vitality scale (indicating poor internal consistency) to 0.90 for Physical Functioning, Bodily Pain and Role Limitations-Emotional (excellent internal consistency). The Cronbach’s alphas of four subscales of the SF-36 fell below 0.80 (General Health, Vitality, Social Functioning and Mental Health).

Validity

Criterion:

Predictive:
McHorney (1996) examined data from the Medical Outcomes Study. The General Health Perceptions subscale was found to be most predictive of death (death rate of patients in lowest quartile for SF-36 General Health scale was three times greater than for patients with SF-36 scores in the highest quartile, followed by scores in Physical Functioning). Baseline Physical Functioning, Role Limitations-physical, and Pain subscales were most predictive of hospitalizations. Moreover, Pain, General Health and Vitality subscales were most predictive of physician visits.
Beusterien, Steinwald, & Ware (1996) found that the SF-36 Mental Health subscale and mental component summary measure were strongly associated with severity of depression in cross-sectional analyses. These results suggest that the SF-36 is useful for estimating the burden of depression among depressed elderly persons.

Rumsfeld et al. (1999) tested whether the physical and mental component summary scores from the preoperative SF-36 health status survey predicted mortality in 3,956 patients following coronary artery bypass graft surgery (CABG). The physical component summary of the preoperative SF-36 was found to be a statistically significant risk factor for 6-month mortality following CABG surgery. In multivariate analysis, a 10-point lower SF-36 physical component summary score had an odds ratio (OR) of 1.39 for predicting mortality. The SF-36 mental component summary score was not associated with 6-month mortality in multivariate analyses (OR = 1.09). Thus, preoperative patient self-report of the physical component of the SF-36 health status may be helpful for risk stratification and clinical decision making for patients undergoing CABG surgery.

Construct:

Walters et al. (2001) reported significant relationships in expected directions to support construct validity among older adults. Scores in all scales were reported to decrease as age increased. Women reported worse health than men on all scales even after adjusting for age. Respondents who had recently visited their physician reported poorer health on all scales and people living alone had lower scores except on general health.

Ware, Kosinski, and Keller (1994) examined the construct validity of the 8 subscales of the SF-36. Physical Functioning was shown to be the best all around measure of physical health (r = 0.85), and Mental Health was the most valid measure of mental health (r = 0.87). Interestingly, Mental Health was one of the poorest measures of the physical component (r = 0.17) and Physical Functioning was the poorest measure of the mental component (r = 0.12). The Vitality (r = 0.47 for physical health and r = 0.65 mental health component) and General Health (r = 0.69 for the physical health component and r = 0.37 for the mental health component) subscales had excellent or adequate validity for both components.

Construct (in patients with stroke):

Wilkinson et al. (1997) interviewed 106 people less than 75 years old and their caregivers following a first-ever stroke. Rank correlation coefficients of the Barthel Index with the SF-36 subscales in first-ever stroke patients ranged from poor (r = 0.22 for Role Limitation-Emotional subscale) to excellent (0.81 for Physical Functioning subscale).

Convergent/Discriminant:
Convergent validity of the SF-36 is generally strongly supported in comparison to similar domains of condition-specific measures (Fielder, Denholm, Lyons, & Fielder, 1996; Nortvedt, Riise, Myhr, & Nyland, 1999; The Counseling Versus Antidepressants in Primary Care Study Group, 1999; Benninger, Ahuja, Gardner, and Grywalski, 1998; Buchwald et al., 1996; Anderson, Laubscher, & Burns, 1996) and other generic HRQOL measures (Andresen et al., 1999; Andresen, Rothenberg, & Kaplan, 1998; Rothwell, McDowell, Wong, & Dorman, 1997). Discriminant validity is usually rated highly for the SF-36 (e.g. Andresen et al., 1999; The Canadian Burden of Illness Study Group, 1998; Buchwald, Pearlman, Umali, Schmaling, & Katon, 1996, Komaroff et al., 1996, O’Neill & Kelly, 1996) although some studies disagree (e.g. Colantonio, Dawson, McLellan, 1998; Lalonde, Clarke, Joseph, Mackenzie, & Grover, 1999; Myers & Wilks, 1999).

Andresen et al (1999) administered the SF-36, the Geriatric Depression Scale and the Mini-Mental State Examination to 97 nursing home residents. Activities of daily living and medication intake data were recorded. Convergent validity between the SF-36 Physical Health subscale and the Activities of Daily Living Index was adequate (r ranged from -0.37 to -0.43). These correlations are negative because a high score on the SF-36 indicates positive health status, whereas a high score on the Activities of Daily Living index indicates dependence. Physical health scores from the SF-36 correlated more strongly with Geriatric Depression Scale scores than Activities of Daily Living Index scores (-0.63 vs. 0.01). However, the Role Limitations-Physical subscale correlated more strongly with Geriatric Depression Scale scores than Activities of Daily Living scores. Social Functioning, Role Limitations-Emotional, Vitality and Mental Health subscales all correlated more strongly with Geriatric Depression Scale scores than Activities of Daily Living scores.

Brazier et al. (1992) reported correlations of -0.41 (Social Functioning vs. social isolation) to -0.68 (Vitality vs. energy) between similar scales on the SF-36 and Nottingham Health Profile. Correlations between dimensions less clearly related ranged form -0.18 (Physical Functioning vs. emotional reaction) to -0.53 (Social Functioning vs. emotional reactions). These correlations are negative because a high score on the SF-36 indicates positive health status, whereas a high score on the Nottingham Health Profile indicates poorer perceived health status.

Dorman et al (1999) reported that the SF-36 Physical Functioning subscale correlated most closely with mobility, self-care and activities domains of EuroQol (r = 0.57, 0.65 and 0.63, respectively) and less strongly with the EuroQol psychological domain (r = 0.34). SF-36 Bodily Pain subscale correlated with the EuroQol pain domain (r = 0.66) and adequately correlated with all EuroQol domains. Role Functioning-Emotional correlated most closely with the EuroQol psychological domain (r = 0.43), and correlated least with the EuroQol self care domain (r = 0.24). The SF-36 Mental Health subscale was not closely related to the psychological domain (r = 0.21) or to the physical EuroQol domains (r = 0.06 to 0.10). The SF-36 General Health subscale correlated adequately with EuroQol overall HRQOL rating (r = 0.66).

Known Groups:
Patients diagnosed with ≥ 1 chronic physical problem had lower scores on all dimensions of the SF-36 except Mental Health, in comparison to healthy age-matched controls. The SF-36 scores were distributed as expected for sex, age, social class and use of health services (Brazier et al., 1992).

The SF-36 was found to discriminate between age groups (>75 years versus 75+) on Physical Functioning, Vitality and Change in Health subscales and between groups based on setting (general practice versus hospital outpatients) on the Physical Functioning and Role Functioning-Physical subscales (Hayes et al. 1995).

Essink-Bot et al. (1997) reported that the SF-36 was able to discriminate between migraine sufferers and controls on all subscales (ROC/AUC = 0.54 – 0.67) although this relationship was poor. The SF-36 was also able to discriminate between groups of migraine sufferers based on absence from work (0 vs. ≥ 0.5 days, ROC/AUC ranged from poor, 0.61 to adequate, 0.79).

Brazier et al. (1996) reported that SF-36 scores distinguished groups based on recent visits to their family doctor, hospital inpatient stays and longstanding illness.

Known Groups (in patients with stroke):
Anderson et al. (1996) administered the Australian version of the SF-36 to 90 stroke survivors (1-year post-stroke). Validity was assessed by comparing patients’ scores on the SF-36 with those obtained for the Barthel Index, the 28-item General Health Questionnaire, and the Adelaide Activities Profile, an instrument developed from the Frenchay Activities Index. Construct validity was demonstrated by significant differences across all eight SF-36 scales for patients with identified health problems. For patients dependent in activities of daily living, the difference in mean scores was greatest for the physical functioning and general health scales, whereas for patients with emotional health problems, the strongest associations were with the Social Functioning, Role Limitations-Emotional, and Mental Health subscales.

Mayo et al. (2002) interviewed persons with first-ever stroke and a population-based sample of community-dwelling individuals without stroke by telephone at 6-month intervals for 2 years of follow-up. SF-36 scores successfully discriminated those with stroke from their age and gender-matched controls.

Cross-diagnostic:

Dallmeijer et al. (2007) examined the unidimensionality and differential item functioning of the Physical Functioning subscale of the SF-36 using Rasch analysis in patients with stroke, multiple sclerosis, and amyotrophic lateral sclerosis (ALS). All items of the Physical Functioning subscale, except one for the ALS group (bathing/dressing item), formed a unidimensional scale, supporting the use of a sum score as a measure of Physical Functioning within these diagnostic groups. The pooled analysis showed inadequate fit to the Rasch model for the ‘walking several hundred meters’ item of the other 9 items, 5 showed differential item functioning for stroke vs. multiple sclerosis and ALS, while no differential item functioning was found between multiple sclerosis and ALS. Thus, when comparing the data of patients with stroke, with that of patients with multiple sclerosis and/or patients with ALS, adjustments are necessary for differential item functioning.

Responsiveness

Harwood and Ebrahim (2000) examined the sensitivity to change of the SF-36 in 81 patients before and after hip replacement. Eighty-nine percent of patients reported improvements three months after surgery. The largest changes were seen on the SF-36 Pain scale (large effect sizes of 1.2 at three months and 1.5 at 6-12 months), Physical Function (large effect sizes of 1.1 at 3 months and 1.3 at 6-12 months) and Role Limitation-Physical (large effect sizes of 0.8 at 3 months and 1.2 at 6-12 months) scales, suggesting that some of the SF-36 dimensions are very sensitive to change.

Brazier, Walters, Nicholl and Kohler (1996) tested the sensitivity of the SF-36, EuroQol and the Office of Population Census and Surveys Disability Survey in an elderly female population. These measures were administered by interview in a hospital clinic at baseline. A random subsample of respondents was retested six months later. Sensitivity of the instruments was quantified by estimating effect sizes for hypothesized changes in health status. There was some evidence of greater sensitivity to lower levels of morbidity in the SF-36. Hypothesizing a change from having a long standing illness to no long-standing illness was associated with moderate to large effect sizes across dimensions of the three instruments, except the Social Functioning (ES = 0.41) and Mental Health (ES = 0.31) dimensions of the SF-36 which both had small effect sizes. The effect sizes for differences in instrument scores between the age groups were small (in the range 0.00-0.50), with the highest for Physical Functioning. The SF-36 was rated as more sensitive to change than the EuroQol for older adult women.

In a study by Mossberg and McFarland (2001), 6 outpatient rehabilitation clinics incorporated the SF-36 into everyday practice. Ninety patients completed the SF-36 health status questionnaire before initiating treatment and again at discharge. Only nonsurgical patients without comorbidities were enrolled. Effect sizes for the SF-36 (admission to outpatient rehabilitation to discharge) ranged from small (0.48 for Role Limitations-Emotional) to large (1.38 for Bodily Pain). The physical component summary score effect size was large (ES = 0.80) and the mental component summary score effect size was small (ES = 0.45).

The SF-36 is increasingly being used in stroke studies (Anderson, Laubscher & Burns, 1996; Duncan et al. 1997) and in stroke clinical trials. However, the psychometric properties of the SF-36 soon after stroke are not well known, as most of the current data are from patients one year or more after the stroke (e.g. Anderson et al., 1996; Duncan et al., 1997). We did not identify any studies on the responsiveness of the SF-36 in patients with stroke.

Muller-Nordhorn et al. (2004) examined the responsiveness to change of the SF-12 in patients with stroke or transitory ischemic attack. Patients (n=558) were administered the SF-12 at baseline (referring to status prior to the event) and after 12 months. In patients with stroke, standardized response means (SRMs) were small for the physical component summary scale of the SF-12 (SRM 0.49) and moderate for the mental component summary scale of the SF-12 (SRM 0.52). In patients with transitory ischemic attack, SRMs were below 0.2 for the physical component summary scale of the SF-12 and small for the mental component summary scale of the SF-12 (SRM 0.34). SRMs increased with stroke severity as indicated by the National Institutes of Health Stroke Scale score. Thus, the SF-12 summary scales show a small to moderate responsiveness to change in patients after stroke. Responsiveness to change was higher in patients with greater stroke severity.

The observation that patients with stroke had scores similar to patients with transient ischemic attacks raises questions about the ability of the SF-36 to discriminate and to be responsive to clinical changes in patients with stroke (Duncan et al., 1997). Currently, no evaluative stroke-specific HRQOL instrument is available, and it remains to be seen whether the generic HRQOL instruments such as the SF-36 are sufficiently responsive to be useful in clinical trials. More information regarding the responsiveness of the SF-36 will be known when a number of ongoing current stroke trials are completed (Williams, 1998).

References

  • Aaronson, N. K., Muller, M., Cohen, P. D. A., Essink-Bot, M. L., Fekkes, M., Sanderman, R., Sprangers, M. A., Velder, A., Verrips, E. (1998). Translation, validation and norming of the Dutch language version of the SF-36 health survey in community and chronic disease populations. J Clin Epidemiol, 51, 1055-1068
  • Anderson, C., Laubscher, S., Burns, R. (1996). Validation of the Short Form 36 (SF-36) Health Survey Questionnaire among stroke patients. Stroke, 27(10), 1812-1816.
  • Andresen, E. M., Meyers, A. R. (2000). Health-related quality of life outcomes measures. Arch Phys Med Rehabil, 81(12), S30-45.
  • Andresen, E. M., Gwendell, W., Gravitt, G. W., Aydelotte, M. E., Podgorski, C. A. (1999). Limitations of the SF-36 in a sample of nursing home residents. Age and Ageing, 28, 562-566.
  • Andresen, E. M., Fouts, B. S., Romeis, J. C., Brownson, C. A. (1999). Performance of health-related quality-of-life instruments in a spinal cord injured population. Arch Phys Med Rehabil, 80. 877-884.
  • Andresen, E. M., Rothenberg, B. M., Kaplan, R. M. (1998). Performance of a self-administered mailed version of the Quality of Well-Being (QWB-SA) questionnaire among older adults. Med Care, 36, 1349-1360.
  • Beaton, D. E., Hogg-Johnson, S., Bombardier, C. (1997). Evaluating changes in health status: Reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders. J Clin Epidemiol, 50(1), 79-93.
  • Beaton, D. E., Hogg-Johnson, S., Bombardier, C. (1997). Evaluating changes in the health status: reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders. J Clin Epidemiol, 50, 79-93.
  • Benninger, M. S., Ahuja, A. S., Gardner, G., Grywalski, C. (1998). Assessing outcomes for dysphonic patients. J Voice, 12, 540-550.
  • Beusterien, K. M., Steinwald, B., Ware, J. E. (1996). Usefulness of the SF-36 Health Survey in measuring health outcomes in the depressed elderly. J Geriatr Psychiatry Neurol, 9(1), 13-21.
  • Beck, A. T., Rial, W. Y., Rickets, K. (1974). Short form of Depression Inventory: Cross-validation. Psychological-Reports , 34(3), 1184-1186.
  • Brazier, J., Roberts, J., Tsuchiya, A., Busschbach, J. (2004). A comparison of the EQ-5D and SF-6D across seven patient groups. Health Econ 13, 873-884.
  • Brazier, J., Usherwood, T., Harper, R., Thomas, K. (1998). Deriving a preference-based single index from the UK SF-36 Health Survey. J Clin Epidemiol, 51, 1115-1128.
  • Brazier, J.E., Walters, S.J., Nicholl, J.P. & Kohler, B. (1996). Using the SF-36 and EuroQol on an Elderly Population. Quality of Life Research, 5, 195-204.
  • Brazier, J., Roberts, J., Deverill, M. (2002). The estimation of a preference-based measure of health from the SF-36. J Health Econ, 21, 271-292.
  • Brazier, J. E., Harper, R., Jones, N. M. B. et al. (1992). Validating the SF-36 health survey questionnaire: new outcome measure for primary care. BMJ, 305, 160-164.
  • Buchwald, D., Pearlman, T., Umali, J., Schmaling, K., Katon, W. (1996). Functional status in patients with chronic fatigue syndrome, other fatiguing illnesses, and healthy individuals. Am J Med, 101, 364-370.
  • Ciconelli, R. M. (1997). Translation and validation to the Portuguese of the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36) [doctoral thesis]. Federal University of São Paulo, São Paulo, Brazil.
  • Colantonio, A., Dawson, D. R., McLellan, B. A. (1998). Head injury in young adults: long-term outcome. Arch Phys Med Rehabil, 79, 550-558.
  • Dallmeijer, A. J., de Groot, V., Roorda, L. D., Schepers, V. P. M., Lindeman, E., van den Berg, L. H., Beelen, A., Dekker, J. (2007). Cross-diagnostic validity of the SF-36 physical functioning scale in patients with stroke, multiple sclerosis and amyotrophic lateral sclerosis: A study using rasch analysis. J Rehabil Med, 9, 63 -169.
  • de Haan, R. J. (2002). Measuring quality of life after stroke using the SF-36. Stroke, 33, 1176-1177.
  • Dorman, P., Slattery, J., Farrell, B., Dennis, M., Sandercock, P. (1998). Qualitative comparison of the reliability of health status assessments with the EuroQol and SF-36 Questionnaires After Stroke. Stroke, 29, 63-68.
  • Dorman, P. J., Dennis, M., Sandercock, P. (1999). How do scores on the EuroQol relate to scores on the SF-36 after stroke? Stroke, 30(10), 2146-2151.
  • Duncan, P. W., Samsa, G. P., Weinberger, M., Goldstein, L. B., Bonito, A., Witter, D. M., Enarson, C., Matchar, D. (1997). Health status of individuals with mild stroke. Stroke, 28, 740-745.
  • Essink-Bot, M. A., Krabbe, P. F., Bonsel, G. J., Aaronson, N. K. (1997). An empirical comparison of four generic health status measures: The Nottingham Health Profile, the Medical Outcomes Study 36-Item Short-Form Health Survey, the COOP/WONCA Charts, and the EuroQol Instrument. Med Care, 35(5), 522-537.
  • Fielder, H., Denholm, S. W., Lyons, R. A., Fielder, C. P. (1996). Measurement of health status in patients with vertigo. Clin Otolaryngol, 21,124-126.
  • Fukuhara, S., Ware, J. E., Kosinski, M., Wada, S., Gandek, B. (1998). Psychometric and Clinical Tests of Validity of the Japanese SF-36 Health Survey. J Clin Epidemiol, 1, 1045-1053.
  • Hagen, S., Bugge, C., Alexander, H. (2003). Psychometric properties of the SF-36 in the early post-stroke phase. Journal of Advanced Nursing, 44(5), 461-468.
  • Harwood, R. H., Ebrahim, S. (2000). A comparison of the responsiveness of the Nottingham extended activities of daily living scale, London handicap scale, and SF-36. Disability & Rehabilitation , 22(17), 786-793.
  • Hayes, V., Morris, J., Wolfe, C., Morgan, M. (1995). The SF-36 Health Survey Questionnaire: Is it suitable for use with older adults? Age and Ageing, 24, 120-125.
  • Hilari, K., Byng, S., Lamping, D. L., Smith, S. C. (2003). Stroke and Aphasia Quality of Life Scale-39 (SAQOL-39): Evaluation of acceptability, reliability, and validity. Stroke, 34, 1944-1950.
  • Hobart, J. C., Williams, L. S., Moran, K., Thompson, A. J. (2002). Quality of life measurement after stroke: Uses and abuses of the SF-36. Stroke, 33, 1348-1356.
  • Jenkinson, C., Coulter, A., Wright, L. (1993). Short form 36 (SF36) health survey questionnaire: Normative data for adults of working age. BMJ, 306(6890), 1437-1440.
  • Jenkinson, C., Wright, L., Coulter, A. (1994). Criterion validity and reliability of the SF-36 in a population sample. Quality of Life Research, 3(1), 7-12.
  • Jenkinson, C., Stewart-Brown, S., Petersen, S., Paice, C. (1999). Assessment of the SF-36 version 2 in the United Kingdom. J Epidemiol Community Health, 53(1), 46-50.
  • Komaroff, A.L., Fagioli, L.R., Doolittle, T.H., Gandek, B., Gleit, M.A., Gueriero, R.T., et al. (1996). Health status in patients with chronic fatigue syndrome and in general population and disease comparison groups. Am J Med,101, 281-90.
  • Lai, S-M., Perera, S., Duncan, P. W., Bode, R. (2003). Physical and social functioning after stroke: Comparison of the Stroke Impact Scale and Short Form-36. Stroke, 34, 488-493.
  • Lalonde, L., Clarke, A. E., Joseph, L., Mackenzie, T., Grover, S. A. (1999). Comparing the psychometric properties of preference-based and nonpreference-based health-related quality of life in coronary heart disease. Qual Life Res, 8, 399-409.
  • Lyons, R. A., Perry, H. M., Littlepage, B. N. C. (1994). Evidence for the validity of the Short-Form 36 Questionnaire (SF-36) in an elderly population. Age Aging, 23, 182-184.
  • Mathias, S. D., Bates, M. M., Pasta, D. J., Cisternas, M. G., Feeny, D., Patrick, D. L. (1997). Use of the Health Utilities Index with stroke patients and their caregivers. Stroke, 28, 1888-1894.
  • Mayo, N. E., Wood-Dauphinee, S., Cote, R., Durcan, L., Carlton, J. (2002). Activity, Participation, and Quality of Life 6 Months Poststroke. Arch Phys Med Rehabil, 83, 1035-1042.
  • McDowell, I., Newell, C. (1996). Measuring Health. A Guide to Rating Scales and Questionnaires. 2nd ed. NewYork: Oxford University Press.
  • McHorney, C. A. (1996). Measuring and monitoring general health status in elderly persons: Practical and methodological issues in using the SF-36 health survey. The Gerontologist, 36(5), 571-583.
  • McHorney, C. A., Ware, J. E. Jr., Raczek, A. E. (1993). The MOS 36-Item Short-Form Health Survey (SF-36): II Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care, 31, 247-263.
  • McHorney, C. A., Ware, J. E. Jr., Lu, J. F., Sherbourne, C. D. (1994). The MOS 36-item Short-Form Health Survey (SF-36): III Tests of data quality, scaling assumptions, and reliability across diverse patient groups. Med Care, 32, 40-66.
  • Mossberg, K., McFarland, C. (2001). A patient-oriented health status measure in outpatient rehabilitation. Am J Phys Med Rehabil, 80(12), 896-902.
  • Muller-Nordhorn, J., Nolte, C. H., Rossnagel, K., Jungehulsing, G. J., Reich, A., Roll, S., Villringer, A., Wllich, S. N. (2004). Responsiveness to change of the SF-12 in patients with cerebrovascular disease. Biometrical Journal, 46(S1), 50.
  • Myers, C., Wilks, D. (1999). Comparison of Euroqol EQ-5D and SF-36 in patients with chronic fatigue syndrome. Qual Life Res, 8, 9-16.
  • Nemeth, G. (2006). Health related quality of life outcome instruments. European Spine Journal, 15(1), S44-S51.
  • Nortvedt, M. W., Riise, T., Myhr, K. M., Nyland, H. I. (1999). Quality of life in multiple sclerosis: measuring the disease effects more broadly. Neurology, 53, 1098-1103.
  • O’Mahony, P. G., Rodgers, H., Thomson, R. G., Dobson, R., James, O. F. W. (1998). Is the SF-36 suitable for assessing health status of older stroke patients? Age and Ageing, 27, 19-22.
  • O’Neill, P., Kelly, P. (1996). Postal questionnaire study of disability in the community associated with psoriasis. Br Med J, 313, 919-921.
  • Petrou, S., Hockley, C. (2005). An investigation into the empirical validity of the EQ-5D and SF-6D based on hypothetical preferences in a general population. Health Econ, 14, 1169-1189.
  • Ren, X. S., Amick, B., Zhou, L., et al. (1998). Translation and Psychometric Evaluation of a Chinese Version of the SF-36 Health Survey in the U.S. J Clin Epidemiol, 51(11), 1129.
  • Rothwell, P. M., McDowell, Z., Wong, C. K., Dorman, P. J. (1997). Doctors and patients don’t agree: cross sectional study of patients’ and doctors’ perceptions and assessments of disability in multiple sclerosis. British Med J, 314, 1580-1583.
  • Rumsfeld, J. S., MaWhinney, S., McCarthy, M., Shroyer, A. L., VillaNueva, C. B., O’Brien, M., Moritz, T. E., Henderson, W. G., Grover, F. L., Sethi, G. K., Hammermeister, K. E. (1999). Health-related quality of life as a predictor of mortality following coronary artery bypass graft surgery. Participants of the Department of Veterans Affairs Cooperative Study Group on Processes, Structures, and Outcomes of Care in Cardiac Surgery. JAMA, 14(281), 1298-1303.
  • Ruta, D. A., Garratt, A. M., Wardlaw, D., Russell, I. T. (1994). Developing a valid and reliable measure of health outcome for patients with low back pain. Spine, 19, 1887-1896.
  • Segal, M. E., Schall, R. R. (1994). Determining functional/health status and its relation to disability in stroke survivors. Stroke, 25, 2391-2397.
  • The Canadian Burden of Illness Study Group. (1998). Burden of illness of multiple sclerosis: part II: quality of life. Can J Neurol Sci, 25, 31-38.
  • The Counselling Versus Antidepressants in Primary Care Study Group. (1999). How disabling is depression? Evidence from a primary care sample. Br J Gen Pract, 49(439), 95-98.
  • Walters, S. J., Munro, J. F., Brazier, J. E. (2001). Using the SF-36 with older adults: A cross-sectional community-based survey. Age and Ageing, 30, 337-343.
  • Ware, J. E., Kosinski, M., Dewey, J. E., Gandek, B. (2001). How to Score and Interpret Single-Item Health Status Measures: A Manual for Users of the SF-8 Health Survey. Lincoln RI: QualityMetric Incorporated.
  • Ware, J. E., Kosinski, M., Keller, S. D. (1994). SF-36 Physical and Mental Health Summary Scales: A User’s Manual. Boston, MA: The Health Institute.
  • Ware, J. E. Jr., Sherbourne, C. D. (1992) The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care, 30, 473-483.
  • Ware, J. Jr., Kosinski, M., Keller, S. D. (1996). A 12-item short-form health survey: Construction of scales and preliminary tests of reliability and validity. Med Care, 34(3), 220-233.
  • Ware, J. E., Snow, K. K., Kosinski, M., Gandek, B. (1993). SF-36® Health Survey Manual and Interpretation Guide. Boston, MA: New England Medical Center, The Health Institute.
  • Ware, J. E., Kosinski, M., Turner-Bowker, D. M., Gandek, B (2002) SF-12v2: How to score version 2 of the SF-12 Health Survey. Lincoln RI: QualityMetric Incorporated.
  • Weinberger, M., Oddone, E. Z., Samsa, G. P., Landsman, P. B. (1996). Are health-related quality-of-life measures affected by the mode of administration? J Clin Epidemiol, 49(2), 135-140.
  • Wilkinson, P. R., Wolfe, C. D., Warburton, F. G., Rudd, A. G., Howard, R. S., Ross-Russell, R. W., Beech, R. (1997). Longer term quality of life and outcome in stroke patients: Is the Barthel Index alone an adequate measure of outcome? Quality in Health Care, 6, 125-130.
  • Williams, L. S. (1998). Health-Related Quality of Life Outcomes in Stroke. Neuroepidemiology , 17, 116-120.

See the measure

How to obtain the SF-36

Permission to use the SF-36 should be obtained from the Medical Outcomes Trust who oversees the standardized administration of the SF-36 and will provide updates on administration and scoring (McDowell & Newell 1996). Various computer applications are available to assist in scoring the SF-36 including free Excel templates that can be downloaded from the Internet.

All versions of the SF-36 can be viewed by visiting the website www.qualitymetric.com

Samples of the various versions of the SF-36 are also available on this website Please click here to see a copy of the SF-36

Table of contents

Stroke Specific Quality of Life Scale (SS-QOL)

Evidence Reviewed as of before: 19-08-2008
Author(s)*: Lisa Zeltzer, MSc OT
Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Purpose

The Stroke Specific Quality Of Life scale (SS-QOL) is a patient-centered outcome measure intended to provide an assessment of health-related quality of life (HRQOL) specific to patients with stroke.

In-Depth Review

Purpose of the measure

The Stroke Specific Quality Of Life scale (SS-QOL) is a patient-centered outcome measure intended to provide an assessment of health-related quality of life specific to patients with stroke.

Available versions

The SS-QOL was published and validated in 1999 by Williams, Weinberger, Harris, and Clark.

Features of the measure

Items:
Scale domains and items were derived from a series of interviews with post-stroke patients (Williams et al. 1999a).

Patients must respond to each question of the SS-QOL with reference to the past week. It is a self-report scale containing 49 items in 12 domains:

  • Mobility (6 items)
  • Energy (3 items)
  • Upper extremity function (5 items)
  • Work/productivity (3 items)
  • Mood (5 items)
  • Self-care (5 items)
  • Social roles (5 items)
  • Family roles (3 items)
  • Vision (3 items)
  • Language (5 items)
  • Thinking (3 items)
  • Personality (3 items)

Subscales:
Energy, Upper extremity function, Work/productivity, Mood, Self-care, Social roles, Family roles, Vision, Language, Thinking, and Personality.

Equipment:
Only a pencil and the test are needed.

Training:
No training is required, as the SS-QOL is intended to be self-administered. One study suggests that the scale can be administered to patients with stroke reliably over the telephone (Williams, Redmon, Saul & Weinberger, 2000).

Time:
It takes approximately 10-15 minutes to complete the SS-QOL scale.

Scoring:
Items are rated on a 5-point Likert scale. There are 3 different response sets (see table below). Patients must respond to each item using the corresponding response set as indicated on the scale (Williams et al. 1999a). For example, the item “did you have any trouble doing daily work around the house?” requires response set 2, which ranges from “couldn’t do it at all” to “no trouble at all”.

Response Sets:

1. Total help 2. A lot of help 3. Some help 4. A little help 5. No help needed
1. Couldn’t do it at all 2. A lot of trouble 3. Some trouble 4. A little trouble 5. No trouble at all
1. Strongly agree 2. Moderately agree 3. Neither agree nor disagree 4. Moderately disagree 5. Strongly disagree

Higher scores indicate better functioning. The SS-QOL yields both domain scores and an overall SS-QOL summary score. The domain scores are unweighted averages of the associated items while the summary score is an unweighted average of all twelve domain scores (Williams et al. 1999b).

Alternative forms of SS-QOL

  • The Stroke and Aphasia Quality Of Life Scale (SAQOL-39 – Hilari, Byng, Lamping, & Smith, 2003). Developed from the SS-QOL for use in patients with long-term aphasia, the SAQOL-39 has four subdomains (Physical, Psychosocial, Communication, and Energy). It is an interview-administered self-report scale. It is comprised of items from the SS-QOL that have been modified to ensure they are appropriate for use in individuals with aphasia. The SAQOL-39 has four additional items that were added to increase the content validity of the scale with this population. These four items focus on the difficulties with understanding speech, issues with decision-making, and the impact of language difficulties on family and social life.

Hilari et al. (2003) reported that the SAQOL-39 has good acceptability, adequate to excellent internal consistency (Cronbach’s alphas ranging from 0.74 to 0.94), excellent test-retest reliability (intraclass correlation coefficient = 0.89 to 0.98), and poor to excellent construct validity (corrected domain-total correlations, r = 0.38 to 0.58; convergent, r = 0.55 to 0.67; discriminant, r = 0.02 to 0.27 validity). Further research is needed to confirm its psychometric properties and to determine its appropriateness as a clinical outcome measure.

Client suitability

Can be used with:

  • Individuals with mild or moderate stroke.

Should not be used in:

  • Patients without stroke. The SS-QOL was developed and validated specifically for individuals with stroke and has been examined for use in this population only.
  • Severe stroke populations. The SS-QOL has not yet been tested among patients with severe stroke.
  • Should be used with caution in patients with aphasia. Although the modified version of the scale, the SAQOL-39, has been validated for use in patients with long-term aphasia, it is a relatively new measure that requires further psychometric testing.
  • Patients who require a proxy to complete. A study by Williams et al. (2006) compared proxy ratings of the SS-QOL to patient self administration in 225 patient-proxy pairs. Proxies rated all domains of SS-QOL lower than the patients. The intraclass correlation coefficient (ICC) for each domain ranged from poor (r = 30 for role function) to adequate (r = 0.59 for physical function). Proxy overall SS-QOL score was also rated lower than the patient score (3.7 versus 3.4) with an ICC of r = 0.41. It is recommended that information obtained from proxy respondents be treated as supplementary rather than substantive and that use of proxy be restricted to individuals either living with or in daily contact with the patient (Snow, Cook, Lin, Morgan & Magaziner, 2005; Muus, Petzold & Ringsberg, 2009).
  • For patients who require a proxy, the Stroke Impact Scale is a more reliable and valid measure of HRQOL (Duncan, Lai, Tyler, Perera, Reker, & Studenski, 2002).

In what languages is the measure available?

  • Danish (SS-QOL-DK): translated Muus & Ringsberg, 2005 and validated Muus, Williams & Ringesberg, 2007.
  • German: translated Ewart & Stucki, 2007 and initial validation study completed Ewart & Stucki, 2007. The initial validation study revealed validity of the total SS-QOL German score, however, some subscales (Energy, Mood and Thinking) were not validated. Further research is required.

Summary

What does the tool measure? Health related quality of life
What types of clients can the tool be used for? The SS-QOL was developed for use in patients with stroke.
Is this a screening or assessment tool? Assessment.
Time to administer Approximately 10-15 minutes to complete.
Versions The Stroke and Aphasia Quality Of Life Scale (SAQOL-39)
Other Languages Translated and validated in Danish. Translated in German.
Measurement Properties
Reliability Internal consistency:
One study examined the internal consistency of the SS-QOL and found that the internal consistency ranged from adequate (for work/productivity subscale) to excellent (for self-care).

Test-retest:
One study examined the test-retest reliability of the SS-QOL and found excellent test-retest.

Inter-rater:
One study examined the inter-rater reliability of the SS-QOL and found excellent inter-rater.

Validity Criterion:
Predictive:
The SS-QOL summary score significantly predicted overall post-stroke health-related quality of life.

Construct:
Convergent:
Most domains of the SS-QOL correlate with the Barthel Index, the Beck Depression Inventory, and subscales of the SF-36.

Floor/Ceiling Effects One study reported ceiling effects exceeding 20% in 10 out of 12 domains of the SS-QOL, and a floor effect of 24% in the Energy domain. Floor or ceiling effect exceeding 20% are typically considered poor.
Does the tool detect change in patients? One study found that the SS-QOL had only a moderate ability to detect change in patients between 1 and 3 months post-stroke. A subsequent study involving an alternative language version of the SS-QOL, found a small to moderate ability to detect change in patients between 3 and 12 month post-stroke. In a later study, the minimal clinically detectable difference for the mobility, self-care and upper extremity function subscales was defined as a mean change in score of at least 1.5, 1.2 and 1.2 respectively.
Acceptability Further investigation on the reliability, validity, and sensitivity of the SS-QOL is required with larger numbers of subjects. This measure has not been tested in severely affected patients with stroke. For patients with aphasia, the SAQOL-39 is a more suitable version of the measure, however, it is a relatively new measure, which requires further psychometric testing. The scale is not suitable for use by proxy.
Feasibility No training is required for the SS-QOL as the measure is intended to be completed by self-report. The measure is simple to score and is based on a 5-point Likert scale.
How to obtain the tool?

Click here to find a copy of the SS-QOL.

Psychometric Properties

Overview

The Stroke Specific Quality of Life Scale (SS-QOL) is a new scale and has not been well studied. It has not been tested among severe stroke populations. To our knowledge, the creators of the SS-QOL have personally gathered the majority of psychometric data that are currently published on the scale. Further investigation on the reliability, validity, and sensitivity of the SS-QOL is required with larger numbers of subjects.

Floor and Ceiling Effects

Czechowsky and Hill (2002) examined the SS-QOL and reported ceiling effects exceeding 20% in 10 out of 12 domains of the SS-QOL, and a ceiling effects exceeding 20% are typically considered poor.

Reliability

Internal consistency:
Williams et al. (1999a) examined the internal consistency of the SS-QOL in 34 individuals with stroke and found that Cronbach’s alpha ranged from adequate (alpha = 0.75 for work/productivity subscale) to excellent (alpha = 0.89 for self-care), suggesting that the SS-QOL has a strong internal consistency.

Test-retest:
In a study by Williams et al. (2000), the SS-QOL was administered by a trained interviewer to 47 stroke survivors at baseline and again within 2 hours of the initial interview. SS-QOL scores were highly correlated (r = 0.92), showing excellent test-retest reliability.

Inter-rater:
The SS-QOL was also administered by a trained interviewer to 24 stroke survivors and then a second trained interviewer re-administered the SS-QOL within 2 hours of the first interview. SS-QOL scores were highly correlated (r = 0.92), demonstrating excellent inter-rater reliability of the SS-QOL.

Validity

Criterion:
Predictive:
Williams et al. (1999b) administered the SS-QOL to a total of 71 patients 1-month post-ischemic stroke. Multivariate analysis showed that the SS-QOL summary score significantly predicted overall post-stroke health-related quality of life (HRQOL) (OR = 2.97). When scores were examined on the domain level, however, only one domain, Family Roles, was significantly different between groups, with higher scores in those patients with better overall HRQOL.

Construct:
Convergent:
Williams et al. (1999a) examined the validity of the SS-QOL in 34 survivors of stroke and reported that most domains of the SS-QOL correlated with the Barthel Index, Beck Depression Inventory, and subscales of the SF-36. The Energy, Family Roles, Mobility and Work/Productivity domains were significantly associated with corresponding subscales on the SF-36. Total SS-QOL score correlated excellently with the overall SF-36 health status rating (r = 0.65). The self-care domain was adequately correlated with the Barthel Index (r = 0.45). Upper Extremity Function showed a positive but poor relationship with the Barthel Index and the National Institutes of Health Stroke Scale Upper Extremity score (r = 0.18).

However, in this study, a few domains did not show a significant relationship with their corresponding measures. Scores in the Language and Thinking domains were not associated with selected items from the National Institutes of Health Stroke Scale (r = 0.00 and r = 0.10 respectively). This most likely occurred because patients with language and cognitive deficits were excluded, i.e., there were no patients with a score > 1 on these items. Furthermore, the SS-QOL Social Roles domain was not associated with the SF-36 Social Functioning subscale score (r = 0.01). Finally, the Vision domain of the SS-QOL did not correlate with the National Institutes of Health Stroke Scale Visual Field and Ocular Movement scores (r = 0.11).

Responsiveness

Williams et al. (1999a) examined the standardized effect size scores for the interval between 1 and 3 months post-stroke in 34 individuals with stroke. Effect sizes ranged from small (ES = 0.20 for the personality domain) to large (ES = 0.83 for the social roles domain). One half of the SS-QOL domains demonstrated less than moderate effect sizes. The ‘amount of help’ response set appeared to lack responsiveness. The results of this study demonstrate that the SS-QOL has only adequate responsiveness.

Muus et al. (2011) investigated the responsiveness of the Danish language version of the SS-QOL (SSQOL-DK). Patients were assessed at 3 and 12 months following stroke. Small standardized effect sizes were found for all domains (-0.03-0.40), except the social roles domain which demonstrated moderate standardized effect size (-0.53).

Lin, Fu, Wu & Hsieh (2011) examined the minimal clinically important difference (CID), of the mobility, self-care and upper extremity function subscales of the SS-QOL. The study included 74 patients with stroke receiving rehabilitation and the SS-QOL was administered at baseline and at 3 weeks. The MCID ranges for the mobility, self-care and upper extremity function subscales were 1.5 – 2.4, 1.2 – 1.9, and 1.2 – 1.8 respectively. The results of the study indicate that mean change of score on the mobility, self-care and upper extremity function subscale should reach 1.5, 1.2 and 1.2, respectively, in order for change to be interpreted as clinically meaningful.

References

  • Czechowsky, D., Hill, M. D. (2002). Neurological Outcome and Quality of Life after Stroke due to Vertebral Artery Dissection. Cerebrovascular Diseases, 13, 192-197.
  • Duncan, P. W., Lai, S. M., Tyler, D., Perera, S., Reker, D. M., Studenski, S. (2002). Evaluation of proxy responses to the Stroke Impact Scale. Stroke, 33, 2593-2599.
  • Ewart, T. & Stucki, G, (2007). Validity of the SS-QOL in Germany and in survivors of hemorrhagic or ischemic stroke. Neurorehabilitation and Neuro Repair, 21, 161-168.
  • Hilari, K., Byng, S., Lamping, D. L., Smith, S. C. (2003). Stroke and Aphasia Quality of Life Scale-39 (SAQOL-39): Evaluation of acceptability, reliability, and validity. Stroke, 34, 1944-1950.
  • Lin, K-C., Fu, T., Wu, C-Y. & Hsieh, C-J. (2011). Assessing the stroke-specific quality of life for outcomes measurement in stroke rehabilitation: Minimal detectable change and clinically important difference. Health and Quality of Life Outcomes, 9, 5. Retrieved April 25, 2012 from Sage Journals database.
  • Muus, I., Christensen, D., Petzold, M., Harder, I., Johnsen, S.P., Kirkevold, M., Ringsberg, K.C. (2011). Responsiveness and sensitivity of the Stroke Specific Quality of Life Danish version. Disability and Rehabilitation, 33(25-26), 2425-2433.
  • Muus, I., Petzold, M. & Ringsberg, K.C. (2009). Health-related quality of life after stroke: Reliability of proxy responses. Clinical Nursing Research, 18(2), 103-118.
  • Muus, I., Ringsberg, K. C. (2005). Stroke Specific Quality of Life Scale: Danish adaptation and a pilot study for testing psychometric properties. Scand J Caring Sci, 19, 140-147.
  • Muus, I., Williams, L.S. & Ringsberg, K.C. (2007). Validation of the Stroke Specific Quality of Life Scale (SS-QOL): Test of reliability and validity of the Danish version (SS-QOL-DK). Clinical Rehabilitation, 21, 620-627.
  • Snow, A.L., Cook, K.F., Lin, P.S., Morgan, R.O. & Magaziner, J. (2005). Proxies and other external raters: Methodological considerations. Health Services Research, 40(5), 1976-1693.
  • Williams, L. S., Weinberger, M., Harris, L. E., Clark, D. O., Biller, J. (1999a). Development of a stroke-specific quality of life scale. Stroke, 30(7), 1362-1369.
  • Williams, L. S., Weinberger, M., Harris, L. E., Biller, J. (1999b). Measuring quality of life in a way that is meaningful to stroke patients. Neurology, 53, 1839-1843.
  • Williams, L. S., Redmon, G., Saul, D. C., Weinberger, M. (2000). Reliability and telephone validity of the Stroke-specific Quality of Life (SS-QOL) scale. Stroke, 32, 339-b.
  • Williams, L. S., Bakas, T., Brizendine, E., Plue, L., Tu, W., Hendrie, H., Kroenke, K. (2006). How valid are family proxy assessments of stroke patients’ health-related quality of life? Stroke, 37, 2081-2085.

See the measure

Please click here for a copy of the Stroke-Specific-Quality-of-Life-Scale (SS-QOL).

Table of contents

Stroke-Adapted Sickness Impact Profile (SA-SIP30)

Evidence Reviewed as of before: 19-08-2008
Author(s)*: Lisa Zeltzer, MSc OT
Editor(s): Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Purpose

The Stroke-Adapted Sickness Impact Profile (SA-SIP30 – van Straten, de Haan, Limburg, Schuling, Bossuyt, & van den Bos, 1997) was developed from the original 136-item Sickness Impact Profile (SIP-136), and assesses quality of life in patients who have sustained a stroke. The scale was developed specifically for use in stroke outcome research in order to overcome the major problem observed with the SIP-136, its length (Finch, Brooks, Stratford, & Mayo, 2002).

In-Depth Review

Purpose of the measure

The Stroke-Adapted Sickness Impact Profile (SA-SIP30 – van Straten, de Haan, Limburg, Schuling, Bossuyt, & van den Bos, 1997) was developed from the original 136-item Sickness Impact Profile (SIP-136), and assesses quality of life in patients who have sustained a stroke. The scale was developed specifically for use in stroke outcome research in order to overcome the major problem observed with the SIP-136, its length (Finch, Brooks, Stratford, & Mayo, 2002).

Available versions

The SA-SIP30 was adapted from the original SIP-136 first published in 1976 by Bergner, Bobbitt, Pollard, Martin, and Gilson and later revised in 1981 by Bergner, Bobbit, Carter and Gilson.

Features of the measure

Items:

van Straten et al. (1997) followed a three-stage process to eliminate items and subscales that were least relevant to stroke survivors (i.e. those applying to fewer than 10% of patients) as well as those with the lowest levels of reliability from the original SIP (van Straten et al. 1997; Golomb, Vickrey, & Hays, 2001).

A criticism of the SA-SIP30 is that no attempt has been made to enhance the scale with items or domains of potential importance to stroke. Thus, the SA-SIP30 does not assess pain, recreation, energy, general health perceptions, overall quality of life or stroke symptoms (Golomb, Vickrey, & Hays, 2001).

The SA-SIP30 contains 30 items. Each item takes the form of a statement describing changes in behavior that reflect the impact of illness on some aspect of daily life. Patients are asked to mark items most descriptive of themselves on a given day. All responses are “yes” or “no”. Scale items are weighted to reflect the relative importance of the item to health status and are the same as the weights used in the SIP-136. In addition to maintaining much of the original subscale structure of the SIP-136, these weights help facilitate comparisons with studies using the original SIP-136.

Scoring:

The scoring of items, subscales, dimensions and total score is the same as for the original SIP. To score the scale, weights are applied to marked items, summed for each subscale and expressed as a percentage for each subscale ranging from 0 to 100%. Higher scores indicate less desirable health outcomes (van Straten et al., 1997; van Straten, de Haan, Limburg, & van den Bos, 2000; Finch et al., 2002; Cup, Scholte op Reimer, Thijssen, & van Kuyk-Minis, 2003). Regression weights have also been provided to allow for a calculation of estimated SIP-136 scores from SA-SIP30 scores.

Cut-off scores representative of poor health have been defined as the following: patients with scores > 33 are known to be impaired in activities of daily living, unable to live independently, experience difficulties in self care, mobility and in performing their main activity. Similar profiles have been observed for Physical dimension scores > 40, but no cut-off values could be defined using the Psychosocial dimension (van Straten et al., 2000).

Subscales:

There are 8 subscales:

  • Body Care and Movement (5 items)
  • Social Interaction (5 items)
  • Mobility (3 items)
  • Communication (3 items)
  • Emotional Behavior (4 items)
  • Household Management (4 items)
  • Alertness Behavior (3 items)
  • Ambulation (3 items)

Subscales can be combined to form 2 dimensions:

  • Physical: includes the subscales Body care and movement, Ambulation, Household management and Mobility (15 items)
  • Psychosocial: includes the subscales Alertness behavior, Communication, Social interaction and Emotional behavior (15 items)

Equipment:

No special equipment is required to administer the SA-SIP30.

Training:

The scale is intended for self-administration or by interview (Buck, Jacoby, Massey, & Ford, 2000). No special training is necessary, however a user’s manual and trainer’s manual are available for the original SIP (McDowell & Newell, 1996). There is not yet any evidence that the SA-SIP30 can be administered by proxy, however, the original SIP-136 can be used in this fashion (Sneeuw, Aaronson, de Haan, & Limburg, 1997).

Time:

The average scale completion time has not been reported, however, the SA-SIP30 is known to be a shorter scale than the original SIP, which takes 30 minutes on average to administer.

Alternative forms of the SA-SIP30

None.

Client suitability

Can be used with:

  • Patients with stroke.

Should not be used in:

  • The SA-SIP30 should be administered with caution to patients who have experienced a severe stroke. van Straten et al. (1997) noted that the SA-SIP30 might be less effective for patients with severe stroke because in developing the SA-SIP30, higher item weights were mostly associated with items that were removed, and these had been descriptive of more severe health status. Evidence of this came from the observation that agreement between scores obtained with the original SIP-136 and the SA-SIP30 were lower among more severely ill patients with stroke than among healthier patients (van Straten et al., 1997). However, it is important to note that in a subsequent study by van de Port et al. (2004), this trend was only observed on the Physical dimension of the SA-SIP30 and even then, the trend was less notable than on the SIP-68 (a short version of the original SIP-136).
  • The SA-SIP30 should be administered with caution to patients who have a major physical disability. van Straten et al. (2000) found that the total scores of the SA-SIP30 were largely explained by the Physical dimension of the scale (66% for the subscales of the Physical dimension versus 25% for the subscales of the Psychosocial dimension). This might result in any patient with a serious physical disability being automatically detected by the scale as having poor health-related quality of life.
  • Patients who require a proxy to complete. Although the original SIP has been validated for proxy use, proxy use has not been examined using the SA-SIP30. For patients who have had a stroke and who require a proxy, the Stroke Impact Scale is known to be a reliable and valid measure of quality of life (Duncan, Lai, Tyler, Perera, Reker, & Studenski, 2002).
  • Patients with aphasia. The SA-SIP30 has not been validated for use in patients with aphasia. A French questionnaire, the SIP-65, has been validated to assess quality of life in patients with aphasia, however this scale is not available in English (Benaim et al., 2003). The Stroke and Aphasia Quality of Life Scale-39 (SAQOL-39) is another measure that assesses quality of life and was developed specifically for use in patients with aphasia. This scale has been found to be an acceptable, reliable, and valid measure in patients with long-term aphasia (Hilari, Byng, Lamping, & Smith, 2003).

In what languages is the measure available?

English (van Straten et al., 1997)

Summary

What does the tool measure? Health-related quality of life
What types of clients can the tool be used for? The SA-SIP30 was developed for use in patients with stroke.
Is this a screening or assessment tool? Assessment
Time to administer The average scale completion time has not been reported, however, the SA-SIP30 is known to be a shorter scale than the original SIP, which takes 30 minutes on average to administer.
Versions The SA-SIP30 was adapted from the original SIP-136
Other Languages No translations of the SA-SIP30 have been conducted to date.
Measurement Properties
Reliability Internal consistency:
Out of two studies that examined the internal consistency internal of the SA-SIP30, both studies reported excellent internal consistency.

Test-retest:
No studies have examined the test-retest reliability of the SA-SIP30.

Inter-rater:
No studies have examined the inter-rater reliability of the SA-SIP30.

Validity Content:
Items least relevant to patients with stroke were eliminated. Items with a skewed response pattern or those relevant to < 10% of patients were dropped. Linear regression was used to assess the relevance of remaining items. Item selection for each subscale was completed when items in the model explained 80% of the variance in score of the original total subscale. Least relevant subscales were excluded using a stepwise linear regression with forward inclusion. When adding another subscale to the model did increase the percentage of variance more than 1%, the process was stopped. Unreliable items were excluded, as long as at least 3 items remained in each subscale.

Construct:
Convergent:
Excellent correlations were found between the SA-SIP30 and the SIP-136 total score and subscales; the SIP-68 (shortened version of the SIP-136); and the global functional health score on the Rankin Scale. Adequate correlations with the disability score on the Barthel Index; total Rankin Scale; EuroQol; and the Frenchay Activities Index.

Discriminant:
Poor correlation between the SA-SIP30 and the Canadian Occupational Performance Measure.

Known groups:
The SA-SIP30 was able to distinguish clients with lacunar infarctions from those with cortical or subcortical lesions. One study reported that when using appropriate SA-SIP30 cut-off scores, the SA-SIP30 could classify patients as dependent in their activities of daily living; patients able to live independently; and patients having poor health-related quality of life.

Floor/Ceiling Effects None.
Does the tool detect change in patients?

One study examined found that the SA-SIP30 had only a moderate ability to detect change in patients from 6 months to 1 year post-stroke.

Acceptability The SA-SIP30 is shorter and simpler than the original SIP-136. The original SIP has been tested for use with proxy respondents, however the SA-SIP30 has not yet been tested for use by proxy respondent. The SA-SIP30 should not be administered to patients with aphasia, and should be used with caution in patients with a major physical disability or who have suffered a severe stroke.
Feasibility This shorter, simpler version of the SIP should represent less administrative burden and can be more easily included in both research and clinical setting. The scale is intended for self-administration or by interview. No special training is necessary. A user’s manual and trainer’s manual are available for the original SIP only. The SA-SIP30 is fairly simple to score and is based on weights that are applied to marked items, which are then summed for each subscale and expressed as a % for each subscale ranging from 0 to 100%. Higher scores indicate less desirable health outcomes.
How to obtain the tool? Click here to find a copy of the SA-SIP30. The SA-SIP30 can also be found in van Straten et al. (1997).

Psychometric Properties

Overview

To date, only a few studies have examined the psychometric properties of the Stroke-Adapted Sickness Impact Profile (SA-SIP30). For this reason, we have included for review all of the publications that we could identify on the scale. The SA-SIP30 was originally validated by its authors (van Straten et al., 1997; van Straten et al., 2000) and was later evaluated by van der Port et al. (2004).

Reliability

Internal consistency:
van Straten et al. (1997) developed and examined the reliability of the SA-SIP30 in 319 patients post-stroke. The total SA-SIP30 demonstrated excellent internal consistency (alpha = 0.85), as did the Psychosocial (alpha = 0.78) and Physical dimensions (alpha = 0.82). All subscales had adequate internal consistency with the exception of the Emotional Behavior (alpha = 0.57), and Ambulation (alpha = 0.54) subscales, which were poor. With the exception of the Communication subscale, the internal consistency of the SIP-136 was found to be slightly higher on all items than the internal consistency of the SA-SIP30.

van de Port, Ketelaar, Schepers, van den Bos, and Lindeman (2004) also examined the internal consistency of the SA-SIP30 in 122 patients with stroke and found excellent reliability for the total score (alpha = 0.82), and moderate reliability for the Physical dimension (alpha = 0.76). However, unlike the results of van Straten et al. (1997), the internal consistency of the Psychosocial dimension was found to be poor (alpha = 0.68).

Inter-rater:
Not reported.

Test-retest:
Not reported.

Validity

Criterion:

None.

Content:

van Straten et al. (1997) eliminated the least relevant items for patients with stroke from the SIP-136 . Items that had a skewed response pattern were dropped, as were items relevant to less than 10% of all patients. Linear regression was used to assess the relevance of the remaining items with a forward selection strategy, using the F statistic with p = 0.5 as the criteria level for selection. The item selection for each subscale was completed when the items in the regression model explained 80% of the variance in score of the original total subscale. The least relevant subscales were excluded by applying a stepwise linear regression with forward inclusion to explain the variation of the original total SIP score with the shortened subscales. When adding another subscale to the model did not result in an increase in the percentage of variance more than 1%, the process was stopped. Finally, unreliable items were excluded, while ensuring that at least three items remained in each subscale.

Construct:

A principal component analysis supported two dimensions (Physical and Psychosocial), which is evidence that the original dimension structure of the SIP-136 was retained with the SA-SIP30 (van Straten et al., 1997). Twenty percent of the SA-SIP30-explained score variance could be attributed to the Physical dimension and 11% to the Psychosocial dimension (van Straten et al., 1997).

Convergent:
van Straten et al. (1997) examined the convergent validity of the scale by comparing the scores of the SA-SIP30 with the scores on the 136-item version in 319 patients post-stroke. The SA-SIP30 total score explained 91% of the variance in SIP-136 scores. Furthermore, 87% of the original Physical dimension scores and 88% of the Psychosocial dimension scores could be explained by the SA-SIP30. For the different subscales, the percentages of explained variance ranged from 69% (Social Interaction) to 84% (Emotional Behavior). The Spearman rank correlation coefficient between the SA-SIP30 and the SIP-136 total scores was excellent (r = 0.96). Subscale correlations were also excellent, ranging from r = 0.75 (Emotional Behavior) to r = 0.90 (Body Care and Movement).

Also in this study by van Straten et al., the SA-SIP30 was correlated with the Barthel Index and the Rankin Scale. As expected, SA-SIP30 correlated moderately with the disability score on the Barthel Index (r = 0.50) and had an excellent correlation with the global functional health score on the Rankin Scale (r = 0.68), further demonstrating the convergent validity of the SA-SIP30.

van de Port, Ketelaar, Schepers, van den Bos, and Lindeman (2004) examined the convergent validity of the SA-SIP30 in 122 patients with stroke. The correlation between the SA-SIP30 and total SIP-68 (a shortened version of the SIP-136) scores was excellent (r = 0.98). Similar associations were reported for the Physical (r = 0.89) and Psychosocial (r = 0.84) dimension scores.

Cup et al. (2003) found that the SA-SIP30 correlated adequately with the Barthel Index (r = -0.517), the Rankin Scale (r = 0.468), the EuroQol (r = -0.483), and the Frenchay Activities Index (r = -0.426). The correlations among the SA-SIP30 and the EuroQol, Barthel Index, and Frenchay Activities Index are negative because a high score on the SA-SIP30 indicates poor health outcomes, whereas a high score on these other scales indicates positive health outcomes. The results of this study demonstrate the convergent validity of the SA-SIP30 with other frequently used standardized functional measures in stroke.

van Straten et al. (2000) conducted a linear regression analysis and found that common measures of physical disability were closely associated with SA-SIP30 scores. The Barthel Index accounted for 36% of the variance in total SA-SIP30 scores, the Rankin scale accounted for 53%, and the Euroqol index score accounted for 44%. The results of this study also confirm the convergent validity of the SA-SIP30 with other frequently used standardized functional measures in stroke.

Discriminant.
Cup et al. (2003) examined the discriminant validity of the Canadian Occupational Performance Measure in 26 patients with stroke. As predicted, the correlation between the scores on the Canadian Occupational Performance Measure and the SA-SIP30 was poor (r = 0.102). This was to be expected because the Canadian Occupational Performance Measure was developed to examine issues specific to the individual, whereas the SA-SIP30 is focused on a societal perspective of independence.

Known groups:
van Straten et al. (1997) found that the SA-SIP30 was unable to distinguish between clients with supratentorial and infratentorial strokes, as has been possible with the SIP-136 (de Haan, Limburg, & van der Meulen, 1995). However, the SA-SIP30 was able to distinguish clients with lacunar infarctions from those with cortical or subcortical lesions. Further, clients with lacunar infarcts reported better functional health than those with cortical or subcortial lesions on the Psychosocial dimension of the scale, the total SA-SIP30 score, and on all subscales with the exception of Emotional Behavior and Mobility.

van Straten et al. (2000) identified the cut-off scores for poor health outcomes by examining the area under the ROC curves (AUC). When using a cut-off SA-SIP30 score > 28, the percentage of patients correctly classified as dependent in their activities of daily living on the SA-SIP30 as assessed using the Barthel Index was adequate, 77% (AUC = 0.84). When using a cut-off SA-SIP30 score > 40 for the Physical dimension alone, the percentage of patients correctly classified as dependent in their activities of daily living was excellent, 84% (AUC = 0.90). When using a cut-off SA-SIP30 score > 25, the percentage of patients correctly classified as unable to live independently by the SA-SIP30 as measured by the Rankin Scale was adequate for the total score was excellent, 80% (AUC = 0.90). When using a cut-off of > 36 for the Physical dimension alone, the percentage of patients correctly classified was excellent, 83% (AUC = 0.90). When using a cut-off of > 33, the percentage of patients correctly classified as having poor health-related quality of life as assessed by the EuroQol was adequate, 80% (AUC = 0.80) for the total score. When using a cut-off > 40 for the Physical dimension alone, the percentage of patients correctly classified was also adequate, 79% (AUC = 0.86).

Responsiveness

van de Port et al. (2004) found that the SA-SIP30 demonstrated moderate responsiveness in a longitudinal study. Effect sizes from 6 months to 1 year post-stroke were 0.60 for the total SA-SIP30 scores, and 0.56 and 0.65 for the Physical and Psychosocial dimensions, respectively.

References

  • Benaim, C., Pelissier, J., Petiot, S., Bareil, M., Ferrat, E., Royer, E., Milhau, D., Herisson, C. (2003). A French questionnaire to assess quality of life of the aphasic patient: The SIP-65. [French]. Ann Readapt Med Phys, 46(1), 2-11.

  • Bergner, M., Bobbitt, R. A., Pollard, W. E., Martin, D. P., Gilson, B. S. (1976). The sickness impact profile: Validation of a health status measure. Med Care, 14(1), 57-67. 

  • Bergner, M., Bobbit, R. A., Carter, W. B., Gilson, B. S. (1981). The Sickness Impact Profile: development and final revision of health status measure. Med Care, 19, 787-805.

  • Buck, D., Jacoby, A., Massey, A., Ford, G. (2000). Evaluation of measures used to assess quality of life after stroke. Stroke, 31, 2004-2010.

  • Coons, S. J., Rao, S., Keininger, D. L., Hays, R. D. (2000). A comparative review of generic quality-of-life instruments. Pharmacoeconomics, 17, 13-35.

  • Cup, E. H. C., Scholte op Reimer, W. J. M., Thijssen, M. C., E., van Kuyk-Minis, M. A. H. (2003). Reliability and validity of the Canadian Occupational Performance Measure in stroke patients. Clinical Rehabilitaton, 17(4), 402-409.

  • de Haan, R. J., Limburg, M., van der Meulen, J. H. P. (1995). Quality of life after stroke. Stroke, 26, 402-408.

  • Duncan, P. W., Lai, S. M., Tyler, D., Perera, S., Reker, D. M., Studenski, S. (2002). Evaluation of proxy responses to the Stroke Impact Scale. Stroke, 33, 2593-2599.

  • Finch, E., Brooks, D., Stratford, P. W., Mayo, N. E. (2002). Physical Rehabilitations Outcome Measures. A Guide to Enhanced Clinical Decision-Making (second ed.), Canadian Physiotherapy Association, Toronto.

  • Golomb, B. A., Vickrey, B. G., Hays, R. D. (2001). A review of health-related quality-of-life measures in stroke. Pharmacoeconomics, 19(2), 155-185.

  • Hilari, K., Byng, S., Lamping, D. L., Smith, S. C. (2003). Stroke and Aphasia Quality of Life Scale-39 (SAQOL-39): Evaluation of acceptability, reliability, and validity. Stroke, 34, 1944-1950.

  • Lurie, J. (2000). A review of generic health status measures in patients with low back pain. Spine, 25, 3125-3129.

  • McDowell, I., Newell, C. (1996). Measuring Health. A Guide to Rating Scales and Questionnaires (2nd ed.), New York: Oxford University Press.

  • Sneeuw, K. C. A., Aaronson, N. K., de Haan, R. J., Limburg, M. (1997). Assessing quality of life after stroke. The value and limitations of proxy ratings. Stroke, 28, 1541-1549.

  • van Straten, A., de Haan, R. J., Limburg, M., Schuling, J., Bossuyt, P. M., van den Bos, G. A. M. (1997). A Stroke-Adapted 30-Item Version of the Sickness Impact Profile to Assess Quality of Life (SA-SIP30). Stroke, 28, 2155-2161.

  • van Straten, A., de Haan, R. J., Limburg, M., van den Bos, G. A. M. (2000). Clinical Meaning of the Stroke-Adapted Sickness Impact Profile-30 and the Sickness Impact Profile-136. Stroke, 31, 2610-2615.

  • van de Port, I. G. L., Ketelaar, M., Schepers, V. P. M., van den Bos, G. A. M., Lindeman, E. (2004). Monitoring the functional health status of stroke patients: the value of the Stroke-Adapted Sickness Impact Profile-30. Disability and Rehabilitation, 26(11), 635-640.

See the measure

How to obtain a copy of the SA-SIP30?

The measure is provided in van Straten et al. (1997). Please click to view a copy of the SASIP-30.

Table of contents
Your opinion