Patient Health Questionnaire (PHQ-9)
Purpose
The Patient Health Questionnaire (PHQ-9) is a brief tool used to diagnose and measure severity of depression
. The PHQ-9 is shorter than many of the other depression
screening
instruments and can be self-administered. Adapted from the Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV), the PHQ-9 is comprised of the same diagnostic symptom criteria used in the DSM-IV:
- Two cardinal signs of depression
Illness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely “pull themselves together” and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
(anhedonia and depressed mood); - Cognitions (e.g. guilt/worthlessness and suicidality/thoughts of death); and
- Physical symptoms (e.g. change in appetite, difficulty sleeping and concentrating, feeling tired/slowed down or restless).
In-Depth Review
Purpose of the measure
The PHQ-9 is a brief tool used to diagnose and measure severity of depression
. The PHQ-9 is shorter than many of the other depression
screening
instruments and can be self-administered. Adapted from the Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV), the PHQ-9 includes all 9 diagnostic symptom criteria used in the DSM-IV, including the two cardinal signs of depression
: anhedonia and depressed mood. The PHQ-9 is widely used by clinicians and can be used with patients with stroke
Available versions
The PHQ-9 was developed by Drs. Robert L. Spitzer, Janet W.B. Williams and Kurt Kroenke in 1999.
Features of the measure
Items:
PHQ-9 : Contains the 9 items from the DSM-IV used in the diagnosis of depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely “pull themselves together” and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
. The respondent must recall how often they have experienced the following symptoms over the last two weeks:
- Little interest or pleasure in doing things;
- Feeling down, depressed or hopeless;
- Trouble falling asleep, staying asleep, or sleeping too much;
- Feeling tired or having little energy;
- Poor appetite or overeating;
- Feeling bad about yourself, or that you’re a failure or have let yourself or your family down;
- Trouble concentrating on things, such as reading the newspaper or watching television;
- Moving or speaking so slowly that other people could have noticed. Or the opposite – being so fidgety or restless that you have been moving around a lot more than usual;
- Thoughts that you would be better off dead or of hurting yourself in some way; and
- If you indicated any problems, how difficult have those problems made it for you to do your work, take care of things at home, or get along with other people?
- Not difficult at all
- Somewhat difficult
- Very difficult
- Extremely difficult
What to consider before beginning:
Similar to when using other depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely “pull themselves together” and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
screeningTesting for disease in people without symptoms.
tools, prior to administration the clinician must rule out physical causes for depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely “pull themselves together” and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
, typical bereavement processes and a history of manic episodes.
Scoring and Score Interpretation:
Each item is evaluated on a severity scale ranging from 0 to 3 where the respondent is asked to rate how often each symptom occurred over the last 2 weeks (0-not at all; 1-several days; 2-more than half of the days or 3-nearly every day), yielding a total score ranging from 0-27. The respondent is also asked how the identified problems have interfered with work, home and/or social life, however responses to this item are not scored or included in the total score.
Score interpretation:
- 1-4 minimal depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely “pull themselves together” and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
; - 5-9 mild depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely “pull themselves together” and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
; - 10-14 moderate depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely “pull themselves together” and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
; - 15-19 moderately severe depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely “pull themselves together” and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
; and - 20-27 severe depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely “pull themselves together” and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Time:
The PHQ-9 takes approximately 2-5 minutes to administer.
Training requirements:
The PHQ-9 can be self-administered or clinician administered. No-formal training is required to use the measure.
Equipment:
Only a pencil and the test are needed if the tool is self-administered.
Alternative forms of the PHQ-9
The PHQ-2 is an abbreviated version of the PHQ-9, comprised of the first two questions of the PHQ-9 in which the respondent is asked to rate how often they experience the two cardinal symptoms of depression
: anhedonia and depressed mood. PHQ-2 scores range from 0-6. Results from a study by Arroll et al. (2010) suggested that the complete PHQ-9 be administered to respondents scoring ≥ 2 on the PHQ-2; and Williams et al. (2010) suggested the complete PHQ-9 be administered to patients with stroke
Client suitability
Can be used with:
- Patients with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain..
- The PHQ-9 has also been validated for use with geriatric patients; patients with TBI; in primary care and obstetrics-gynecology settings; and in the general population.
Should not be used with:
- If self-administered, completion of the PHQ-9 requires that the client have adequate reading comprehension and visual ability. However, in the case of illiteracy or poor vision, the items and possible responses may be read to the respondent.
In what languages is the measure available?
The PHQ-9 has been translated into Afrikaans, Arabic, Assamese, Bangla, Bengali, Cantonese, Creole, Czech, Danish, Dutch, Finnish, French, German, Gujarati, Hebrew, Hindi, Hungarian, Italian, Korean, Malayalam, Malaysian Mandarin, Mandarin, Norwegian, Oriya, Polish, Portuguese, Punjabi, Russian, Somali, Spanish, Swedish, Telugu, Tingrinian, Turkish, Urdu and Vietnamese.
These translations can be found at the following website: http://www.phqscreeners.com/
Summary
What does the tool measure? | The PHQ-9 measures the severity of depression . |
What types of clients can the tool be used for? | Can be used with but is not limited to patients with stroke |
Is this a screening or assessment tool? |
The PHQ-9 has been referred to as both a screening tool and an assessment tool. |
Time to administer | Approximately 2-5 minutes |
Versions | The PHQ-9 was developed by Drs. Robert L. Spitzer, Janet W.B. Williams and Kurt Kroenke in 1999. It is intended to be self-administered but can be administered by interview in person or over the telephone. |
Other Languages | The PHQ-9 has been translated but not necessarily validated in over 35 languages (see PHQ-9 module for complete list). |
Measurement Properties | |
Reliability |
Internal consistency Two studies examined the internal consistency Test-retest: Intra-rater: Inter-rater: |
Validity |
Construct: Convergent: One study reported that the PHQ-9 as excellent correlation with the Beck Depression Inventory (BDI) and General Health Questionnaire (GHQ-12) adequate correlation with the European Quality of Life Questionnaire (EuroQOL); and adequate to excellent correlation with subscales of the Medical Outcomes Study Short Form Health Survey (SF-36). |
Floor/Ceiling Effects | No studies have examined the floor or ceiling effects of the PHQ-9. |
Does the tool detect change in patients? | No studies have examined the responsiveness of the PHQ-9 in patients with stroke |
Acceptability | PHQ-9 is typically self-administered, however it can be interview-administered in person or by telephone for clients who are unable to self-administer the measure. |
Feasibility | The measure is brief, simple to score and only the PHQ-9 test sheet and a pencil are required to complete the measure. |
How to obtain the tool? |
The PHQ-9 can be obtained from: |
Psychometric Properties
Overview
There is an abundance of research on the psychometric properties of the nine-item Patient Health Questionnaire (PHQ-9). However, little research has been conducted specifically in patients with stroke
studies as content to be summarized and presented here.
Reliability
The reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of “true” variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to “noise” in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the PHQ-9 has not been examined in a strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. population.
Internal ConsistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.:
Kroenke, Spitzer and Williams (2001) investigated the internal reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of “true” variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to “noise” in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the PHQ-9 in two large studies involving 6,000 participants from primary care and obstetrics-gynecology clinics. Using Cronbach’s alpha, excellent reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of “true” variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to “noise” in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
was found in both studies (0.89 and 0.86 respectively).
Test-retest:
Kroenke, Spitzer and Williams (2001) investigated the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the PHQ-9 in primary care clinics. Excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
(0.84) was found when the PHQ-9 was administered in clinic and then over the telephone 48 hours later.
Intra-rater:
Not yet examined in a strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. population.
Inter-rater:
Not yet examined in a stroke
Validity
To our knowledge, the study by Williams et al. (2005) is the only study to date that has examined the validityThe degree to which an assessment measures what it is supposed to measure.
of the PHQ-9 in individuals with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain..
Criterion:
Concurrent:
Not yet examined in a strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. population.
Predictive:
Not yet examined in a strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. population.
Construct:
Convergent/Discriminant:
Martin, Rief, Klaiberg and Braehler (2005) examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the Brief Beck DepressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely “pull themselves together” and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
Inventory (BDI), General Health Questionnaire (GHQ-12), European Quality of Life (EuroQOL) Questionnaire and Medical Outcomes Study Short Form Health Survey (SF-36) with an alternative language version of the PHQ-9, in 2060 participants from the general population. The relationships between the measures were compared using the Welch test. The BDI had excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases – for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases – for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
with the PHQ-9 (r=.73) and the GHQ-12 and EuroQOL had adequate correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases – for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases – for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
(r=.59 and r=.50 respectively). The subscales of the SF-36 had adequate to excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases – for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases – for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
(ranging from r=-.45 to r=-.71).
Known groups:
Not yet examined in a strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. population.
SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also “Specificity.”
/ SpecificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).
:
Kroenke, Spitzer and Williams (2001) examined the sensitivity
and specificity
of the PHQ-9 in participants from primary care settings. Mental health professionals (Clinical Psychologists and Psychiatric Social Workers) conducted telephone interviews using the Structured Clinical Interview for Depression
(SCID) and PRIME-MD to confirm diagnosis of depression
. A PHQ-9 score of ≥10 had excellent sensitivity
and specificity
for detecting major depression
, 88% and 88% respectively.
Williams et al. (2010) examined the sensitivity
and specificity
of the PHQ-9 and PHQ-2 in 316 patients with stroke
was confirmed using the Structured Clinical Interview for Depression
(SCID). A PHQ-9 score of ≥10 was found to have excellent sensitivity
and specificity
for detecting any severity of depression
(78% and 96% respectively) and major depression
(91% and 89% respectively). Based on these results, the PHQ-9 should be used as a brief screening
measure for assessing depression
in patients with stroke
Responsiveness
To date, the responsivenessThe ability of an instrument to detect clinically important change over time.
of the PHQ-9 has not been examined in a strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. population but has been examined in a group of patients receiving treatment for depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely “pull themselves together” and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
in primary care settings.
Lowe, Unutzer, Callahan, Perkins and Kroenke (2004) examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the PHQ-9 and the depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely “pull themselves together” and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
scale from the Hopkins Symptom Checklist (SCL-20) in 434 patients receiving treatment for depressionIllness involving the body, mood, and thoughts, that affects the way a person eats and sleeps, the way one feels about oneself, and the way one thinks about things. A depressive disorder is not the same as a passing blue mood or a sign of personal weakness or a condition that can be wished away. People with a depressive disease cannot merely “pull themselves together” and get better. Without treatment, symptoms can last for weeks, months, or years. Appropriate treatment, however, can help most people with depression.
in primary care settings (mean age 70.9 years). Standardized effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the “effect size correlation”.
scores for the intervals between baseline and 3-months and baseline and 6-months were calculated. Large effect sizes were found for the PHQ-9 and the SCL-20, however the results of this study indicate that PHQ-9 is more responsive. Standardized effect sizes of -1.3 at the 3-month interval and -1.3 at the 6-month interval were found for the PHQ-9; and -0.9 at the 3-month interval and -1.2 at the 6-month interval for the SCL-20.
References
- Arroll, B., Goodyear-Smith, F., Crengle, S., Gunn, J., Kerse, N. and Fishman, T. et al. (2010). Validation of PHQ-2 and PHQ-9 to screen for major depression in the primary care setting. Annals of Family Medicine, 8, 348-353.
- Kroenke, K. Spitzer, R.L. & Williams, J.B.W. (2001). Validity of a brief depression severity measure. Journal of General Internal Medicine, 16, 606-613.
- Martin, A., Rief, W., Klaiberg, A. & Braehler, E. (2005). Validity of the brief Patient Health Questionnaire Mood Scale (PHQ-9) in the general population. General Hospital Psychiatry, 28, 71-77.
- Williams, L.S., Brizendine, J., Plue, L., Bakas, T., Tu, W. & Hendrie, H. et al. (2005). Performance of the PHQ-9 as a screening tool for depression after stroke. Journal of the American Heart Association, 36, 635-638.
See the measure
How to obtain the PHQ-9?
The PHQ-9 is available for free for educational and clinical purposes at:
http://www.phqscreeners.com/