Box and Block Test (BBT)

Evidence Reviewed as of before: 09-06-2011

Author(s)*: Sabrina Figueiredo, BSc

Editor(s): Lisa Zeltzer, MSc OT; Nicol Korner-Bitensky, PhD OT; Elissa Sitcoff, BA BSc

Purpose

The Box and Block Test (BBT) measures unilateral gross manual dexterity. It is a quick, simple and inexpensive test. It can be used with a wide range of populations, including clients with stroke.

In-Depth Review

Purpose of the measure

The Box and Block Test (BBT) measures unilateral gross manual dexterity. It is a quick, simple and inexpensive test. It can be used with a wide range of populations, including clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Available versions

The original version of the BBT was developed, in 1957, by Jean Hyres and Patricia Buhler. This version was modified into the current one by E. Fuchs and P. Buhler (Cromwell, 1976). In 1985, normative data on the BBT was established by Mathiowetz, Volland, Kashman, and Weber.

Features of the measure

Items:

The BBT is composed of a wooden box divided in two compartments by a partition and 150 blocks. The BBT administration consists of asking the client to move, one by one, the maximum number of blocks from one compartment of a box to another of equal size, within 60 seconds. The box should be oriented lengthwise and placed at the client’s midline, with the compartment holding the blocks oriented towards the hand being tested. In order to practice and register baseline scores, the test should begin with the unaffected upper limb. Additionally, a 15-second trial period is permitted at the beginning of each side. Before the trial, after the standardized instructions are given to clients, they should be advised that their fingertips must cross the partition when transferring the blocks, and that they do not need to pick up the blocks that might fall outside of the box (Mathiowetz, Volland, Kashman, & Weber, 1985-1).

Scoring:

Clients are scored based on the number of blocks transferred from one compartment to the other compartment in 60 seconds (Mathiowetz et al., 1985-1). Higher scores are indicative of better manual dexterity. During the performance of the BBT, the evaluator should be aware of whether the client’s fingertips are crossing the partition. Blocks should be counted only when this condition is respected. Furthermore, if two blocks are transferred at once, only one block will be counted. Blocks that fall outside the box, after trespassing the partition, even if they don’t make it to the other compartment, should be counted.

Mathiowetz et al. (1985-1) reported that healthy male adults, aged 20 to 80 years, transfer an average of 77 blocks (SD ±11.6) with the right hand and 75 blocks (SD ±11.4) with the left hand within the 60 second limit. Scores for normal healthy men, aged 60 years old or more ranged from 61 to 70 blocks. Healthy female adults, aged 20 to 80 years, transfer an average of 78 blocks (SD ±10.4) with the right hand and 76 blocks (SD ±9.5) with the left hand. Scores for normal healthy women, aged 60 years old or more, ranged from 63 to 76 blocks. The score on the BBT and age are inversely correlated, meaning that average scores on the BBT decrease with older age.

Time:

The BBT requires 2 to 5 minutes to administer (Finch, Brooks, Stratford, & Mayo, 2002; Mathiowetz et al., 1985-1).

Subscales:

None.

Equipment:

The standardized equipment consists of:
A wooden box dimensioned in 53.7 cm x 25.4 cm x 8.5 cm. The partition should be placed at the middle of the box, dividing it in two containers of 25.4 cm each. (Mathiowetz et al., 1985-1).
150 wooden cubes – 2.5 cm in size (Mathiowetz et al., 1985-1). Stopwatch.

Training of administrator:

None typically reported.

Alternative forms of the Box and Block Test

None.

Client suitability

Can be used with:

Clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Should not be used in:

The BBT cannot be used with clients who have severe upper extremity impairment.
The BBT cannot be used with clients with severe cognitive impairment.

In what languages is the measure available?

There are no official translations of the BBT. The specific instructions provided to the client are in English. Clinicians and researchers may be using “home-grown” translations of the instructions as evidenced from peer-reviewed publication from Sweden, French Canada, Italy and Germany that have used the BBT as an outcome measure. (Broeren, Rydmark, Bjorkdahl, & Sunnerhagen, 2007; Dannenbaun, Michalsen, Desrosiers, & Levin, 2002; Mercier & Bourbonnais, 2004; Platz, Pinkowski, Kim, di Bella, & Johnson, 2005; Schneider, Schonle, Altenmuller, & Munte, 2007).

Summary

What does the tool measure?	Unilateral gross manual dexterity.
What types of clients can the tool be used for?	The BBT can be used with, but is not limited to clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	From 2 to 5 minutes.
Versions	There are no alternative versions.
Other Languages	There are no official translations.
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency.: No studies have examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. of the BBT. Test-retest: Two studies have examined the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). of the BBT. Both reported excellent test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society). using ICC’s. Inter-rater: Two studies have examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the BBT and reported excellent inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. using correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. coefficients and ICC. One study used Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. and the other, ICC and Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. .
ValidityThe degree to which an assessment measures what it is supposed to measure.	Criterion: Concurrent: One study has examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard." of the BBT and reported adequate to excellent correlations with the Action Research Arm Test (ARAT) and the Nine-Hole Peg Test (NHPT) at pre and post-treatment. Predictive: One study has examined predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. and reported that the BBT, compared to the NHPT, the Frenchay Arm Test, Grip Strength and the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Rehabilitation Assessment of Movement (STREAM) was the best predictor of upper limb function 5 weeks post-stroke. Construct: Convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. : Three studies have examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the BBT and reported excellent correlations between the BBT and the Minnesota Rate of Manipulation Test, the ARAT, the Hemispheric StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale and the motor function score of the Fugl-Meyer Assessment (FMA). Adequate correlations were reported between the BBT and the SMAF, the Ashworth scale and the Passive Joint Motion/Joint Pain subscore of the FMA. Poor correlations were reported between the BBT and the Sensation subscore of the FMA and the Modified Barthel Index.
Floor/Ceiling Effects	No studies have examined floor/ceiling effects of the BBT
SensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." / SpecificitySpecificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative).	No studies have examined sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity." /specificity of the BBT
Does the tool detect change in patients?	Two studies have examined the responsivenessThe ability of an instrument to detect clinically important change over time. of the BBT and reported that the BBT has moderate to large Standardized Response MeanThe standardized response mean (SRM) is calculated by dividing the mean change by the standard deviation of the change scores. , therefore, is able to detect change in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..
Acceptability	The BBT should not be used clients with severe upper extremity impairment and severe cognitive impairments.
Feasibility	The administration of the BBT is quick and simple, however requires standardized equipment.
How to obtain the tool?	The BBT instructions can be obtained in the study by Mathiowetz et al. (1985) Standardized equipment can be obtained at the website: http://www.sammonspreston.com/Supply/Product.asp?Leaf_Id=7531

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the Box and Block Test (BBT) in healthy individuals and individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. We identified four studies. The BBT appears to be responsive in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain..

Floor/Ceiling Effects

No studies have examined floor/ceiling effects of the BBT.

Reliability

Test-retest:
Desrosiers, Bravo, Hebert, Dutil, and Mercier (1994) examined test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the BBT in 34 elderly with upper limb sensorimotor impairments from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=13) and other conditions. Participants were re-assessed with a 1-week interval by the same rater and under the same conditions. The test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
for the BBT was reported as excellent (ICC = 0.97; ICC = 0.96) for the right and left hand, respectively.

Platz, Pinkowski, van Wijck, Kim, di Bella, and Johnson (2005) estimated test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the BBT, the Action Research Arm Test (Lyle, 1981), and the Fugl-Meyer Assessment (FMA) upper extremity items including items from the motor function, sensation and passive joint motion/joint pain sub-scores, (Fugl-Meyer, Jääskö, Leyman, Olsson, & Steglind, 1975) in 23 participants with upper extremity paresis either from stroke, multiple sclerosis, or traumatic brain injury. The participant’s most affected arm was re-assessed after a 1-week interval by the same rater. The test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the BBT, as calculated using ICC’s and Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, was excellent (ICC = 0.96 and r = 0.97).
Note: This result applies only to the most affected upper limb.

Inter-rater:
Mathiowetz, Volland, Kashman, and Weber (1985-1) assessed the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the BBT in 26 healthy young females. Participants were evaluated simultaneously and independently by two raters. Pearson correlationcoefficients showed excellent agreement (r = 1.00; r = 0.99) for the right and left hand, respectively.
Note: Pearson correlation coefficient is not the statistical analysis of choice for assessing inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
as it may artificially inflate agreement.

Platz et al. (2005) as described earlier also analyzed inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the BBT, the Action Research Arm Test (Lyle, 1981), and the FMA upper extremity items including items from the motor function, sensation and passive joint motion/joint pain sub-scores (Fugl-Meyer et al., 1975) in 44 individuals with upper limb paresis either from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain., multiple sclerosis, or traumatic brain injury. Participants had the most affected arm videotaped and scored independently by two raters. Inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
for the BBT, as calculated using the ICC and Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, was excellent (ICC = 0.99 and r = 0.99).
Note: This result applies only to the most affected upper limb.

Validity

Content:

Not available.

Criterion:

Concurrent:
No gold standardA measurement that is widely accepted as being the best available to measure a construct.
exists against which to compare the BBT.

Lin, Chuang, Wu, Hsieh and Chang (2010) compared the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also "gold standard."
of the BBT, Action Research Arm Test (ARAT) and Nine-Hole Peg Test (NHPT) for evaluating hand dexterity in 59 patients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. The Fugl-Meyer Assessment (FMA), Motor Activity Log (MAL) and StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Impact Scale (SIS) were also administered to assess the concurrent validity of the BBT, ARAT and NHPT. Using Spearman rank correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient, the BBT, ARAT and NHPT were found to have adequate to excellent correlations at pre-treatment (ranging from rho=-0.55 to -0.80) and post-treatment (ranging from rho=-0.57 to -0.71). In addition, the BBT and ARAT were found to have adequate correlations with the FMA, MAL and SIS (ranging from rho=0.31-59); however, the NHPT had only poor to adequate correlations with the FMA and MAL (ranging from rho=-0.16 to -0.33); and adequate to excellent correlations with the SIS (ranging from rho=-0.58 to -0.66). When considering both the results of responsivenessThe ability of an instrument to detect clinically important change over time.
and validation components of the study, the BBT and ARAT are believed to be more appropriate than the NHPT for evaluating dexterity.

Predictive:
Higgins, Mayo, Desrosiers, Salbach and Ahmed (2005) estimated wheter the BBT, Nine-Hole Peg Test (Kellor, Frost, Silberberg, Iversen, & Cummings, 1971; Mathiowetz, Weber, Kashman, & Volland, 1985-2), Frenchay Arm Test (Heller, Wade, Wood, Sunderland, Hewer, & Ward, 1987), Grip Strength (Mathiowetz, Kashman, Volland, Weber, Dowe, & Rogers, 1985-3), and StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Rehabilitation Assessment of Movement (STREAM – Daley, Mayo, Wood-Dauphine, Danys, & Cabot, 1997) were able to predict upper limb function, measured by the BBT, at 5 weeks post-stroke. Predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the BBT was measured in 55 participants with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Assessments were performed at two points in time: one and five weeks post-stroke. Compared to the other upper limb performance tests, the BBT when performed at one week post-stroke, was the best predictor of upper limb function at five months post-stroke, followed by the STREAM.

Construct:

Convergent/Discriminant:
Cromwell (1976) examined the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the BBT by comparing it to the Minnesota Rate of Manipulation Test (American Guidance Service, 1969) in an unspecified population. The correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between BBT and the Minnesota Rate of Manipulation Test was excellent (r = 0.91).

Desrosiers et al. (1994) assessed the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the BBT by comparing it to the Functional Autonomy Measurement System – FAMS, known as the SMAF in French (Hebert, Carries, & Bilodeau, 1988), and to the Action Research Arm Test (ARAT – Lyle, 1981) in 104 elderly with upper limb impairments secondary to strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=53) amongst other conditions. Excellent correlations (r = 0.80) were found between the BBT and the ARAT. Adequate pearson correlations were found between the BBT and the FAMS (r = 0.47; r = 0.51) for the right and left hand, respectively.

Platz et al. (2005) tested the convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
of the BBT by comparing it to the Action Research Arm Test (ARAT – Lyle, 1981) and to the Fugl-Meyer Assessment (FMA)upper extremity items including items from the motor function, sensation and passive joint motion/joint pain sub-scores (Fugl-Meyer et al., 1975) using Spearman CorrelationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
, in 56 participants with upper extremity paresis either from strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (n=37) or other conditions. Excellent correlations were found between the BBT and the ARAT (r = 0.95) and the Motor Function sub-score (r = 0.92) of the FMA. Furthermore, the BBT was correlated with more general measures of impairment and activity limitation, such as the Ashworth Scale (Ashworth, 1964), the Hemispheric StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale (Adams, Meador, Sethi, Grotta, & Thomson, 1986) and the Modified Barthel Index (Collin, Wade, Davies, & Horne, 1988). Excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
was found between the BBT and the Hemispheric StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale (r = -0.67). Adequate correlations were found between the BBT and the passive joint motion/joint pain sub-score of the FMA (r = 0.43) and the Ashworth Scale (r = -0.38). Poor correlations were found between the BBT and the sensation sub-score of the FMA (r = 0.28) and the Modified Barthel Index (r = 0.04).
Note: Negative correlations are observed because a high score on the BBT indicates better performance, whereas a low score on the Hemispheric StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Scale or the Ashworth Scale indicates better performance.

Known groups:
No studies have examined known groups validityKnown groups validity is a form of construct validation in which the validity is determined by the degree to which an instrument can demonstate different scores for groups know to vary on the variables being measured.
of the BBT.

Responsiveness

Higgings et al. (2005) evaluated the responsivenessThe ability of an instrument to detect clinically important change over time.
on the BBT, Frenchay Arm Test (Heller et al., 1987), Grip strength (Mathiowetz et al., 1985-3) and the StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. Rehabilitation Assessment of Movement (STREAM – Daley et al., 1997) in 50 participants with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain.. Participants were assessed one and four weeks post-stroke. The Standardized Response MeanThe standardized response mean (SRM) is calculated by dividing the mean change by the standard deviation of the change scores.
(SRM) was used to calculate responsivenessThe ability of an instrument to detect clinically important change over time.
. Amongst these upper extremity performance tests, the BBT was the most sensitive to detecting change, having a large SRM of 0.8.
Note: SRM is a variant of effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the "effect size correlation".
and higher values indicate better responsivenessThe ability of an instrument to detect clinically important change over time.
.

Lin, Chuang, Wu, Hsieh and Chang (2010) evaluated the responsivenessThe ability of an instrument to detect clinically important change over time.
of the BBT, the Action Research Arm Test (ARAT) and the Nine-Hole Peg Test (NHPT) for evaluating hand dexterity in 59 patients with subacute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. (< 6-months) and Brunnstrom stage IV to VI for proximal and distal upper extremity function. Patients were randomly assigned to receive constraint-induced therapyA form of intervention that involves restraining the unaffected upper or lower extremity in order to encourage movement of the affected limbs. For persons with USN, constraint-induced therapy involves restraining the unaffected arm or hand using a sling or padded mitt, in order to promote visual scanning and movement in the neglected hemispace.
, bilateral arm training or control treatment and received 2 hours of therapy, 5 days per week for 3 weeks. Assessments were performed at baseline and 3 weeks. Using Standardized Response MeanThe standardized response mean (SRM) is calculated by dividing the mean change by the standard deviation of the change scores.
(SRM) to calculate responsivenessThe ability of an instrument to detect clinically important change over time.
, the BBT, ARAT and NHPT were all found to have moderate SRM (0.74, 0.64, 0.79 respectively), indicating sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also "Specificity."
for detecting change in hand dexterity. When considering both the results of responsivenessThe ability of an instrument to detect clinically important change over time.
and validation components of the study, the BBT and ARAT are believed to be more appropriate than the NHPT for evaluating dexterity.

References

American Guidance Service. The Minnesota Rate Manipulative Tests. Examiner’s manual. Circle Pines, (MN): Author; 1969.
Adams, R.J., Meador, K.J., Sethi, K.D., Grotta, J.C., & Thomson, D.S. (1986). Graded neurologic scale for the use in acute hemispheric stroke treatment protocols. Stroke 18, 665-669.
Ashworth, B. (1964). Preliminary trial of carisoprodol in multiple sclerosis. Practitioner, 192, 540-542.
Broeren, J., Rydmark, M., Bjorkdahl, A., & Sunnerhagen, K.S. (2007). Assessment and training in a 3-dimensional virtual environment with haptics: a report on 5 cases of motor rehabilitation in the chronic stage after stroke. Neurorehabilitation & Neural Repair, 21(2), 180-189.
Collin, C., Wade, D.T., Davies, S., & Horne, V. (1988). The Barthel ADL Index: a reliability study. International Disability Study, 10, 61-63.
Cromwell, F.S (1965). Occupational therapists manual for basic skills assessment: primary prevocational evaluation. Pasadena, (CA): Fair Oaks Printing; 29-31.
Daley, K., Mayo, N.E., Wood-Dauphinee, S., Danys, I., & Cabot, R. (1997). Verification of the Stroke Rehabilitation Assessment of Movement (STREAM). Physiotherapy Canada, 49, 269-278.
Dannenbaum, R.M., Michaelsen, S.M., Desrosiers, J., & Levin, M.F. (2002). Development and validation of two new sensory tests of the hand for patients with stroke. Clinical Rehabilitation, 16(6), 630-639.
Desrosiers, J., Bravo, G., Hébert, R., Dutil, É., & Mercier, L. (1994). Validation of the box and block test as a measure of dexterity of elderly people: reliability, validity and norms studies. Archives of Physical Medicine and Rehabilitation, 75, 751-755.
Desrosiers, J., Rochette, A.,Â Hebert, R.,Â & Bravo, G. (1997). The Minnesota manual dexterity test: reliability, validity and reference values studies with healthy elderly People. Canadian Journal of Occupational Therapy, 64(5), 270-276.
Finch, E., Brooks, D., Stratford,P.W, & Mayo, N.E. (2002). Physical Outcome Measures: A guide to enhance physical outcome measures. Ontario, Canada: Lippincott, Williams & Wilkins.
Fugl-Meyer, A.R., Jääskö, L., Leyman, I., Olsson, S., & Steglind, S. (1975). The post-stroke hemiplegic patient 1. A method for evaluation of physical performance. Scandinavian Journal of Rehabilitation Medicine, 7, 13-31.
Hébert, R., Carrier, R., & Bilodeau, A. (1988). The functional autonomy measurement system (SMAF): description and validation of an instrument for the measurement of handicaps. Age Ageing, 17, 293-302.
Heller, A., Wade, D.T., Wood, V.A., Sunderland, A., Hewer, R., & Ward, E. (1987). Arm function after stroke: measurement and recovery over the first three months. Journal of Neurology, Neurosurgery & Psychiatry, 50(6), 714- 719.
Higgins, J., Mayo, N.E., Desrosiers, J., Salbach, N.M., & Ahmed, S. (2005). Upper-limb function and recovery in the acute phase poststroke. Journal of Rehabilitation Research & Development, 42(1), 65-76.
Jebsen, R.H., Taylor, N., Trieschmann, R.B., Trotter, M.J., & Howard, L.A. (1969). An objective and standardized test of hand function. Archives of Physical Medicine and Rehabilitation, 50, 311-319.
Kellor, M., Frost, J., Silberberg, N., Iversen, I., & Cummings R. (1971). Hand strength and dexterity. American Journal of Occupational Therapy, 25, 77-83.
Lin, K-C., Chuang, L-L., Wu, C-Y., Hseih, Y-W. & Chang, W-Y. (2010). Responsiveness and validity of three dexterous function measures in stroke rehabilitation. Journal of Rehabilitation Research and Development, 47(6), 563-572.
Lyle, R.C. (1981). A performance test for assessment of upper limb function in physical rehabilitation treatment and research. International Journal of Rehabilitation and Research, 4, 483-492.
Mathiowetz, V., Volland, G., Kashman, N., & Weber, K. (1985-1). Adult norms for the box and block test of manual dexterity. American Journal of Occupational Therapy, 39, 386-391.
Mathiowetz, V., Weber, K., Kashman, N., & Volland, G. (1985-2). Adult norms for the nine hole peg test of finger dexterity. Occupational Therapy Journal of Research, 5, 24 -33.
Mathiowetz, V., Kashman, N., Volland, G., Weber, K., Dowe, M., & Rogers, S. (1985-3). Grip and pinch strength: normative data for adults. Archives of Physical and Medicine and Rehabilitation, 66, 69-72.
Mercier, C. & Bourbonnais, D. (2004). Relative shoulder flexor and handgrip strength is related to upper limb function after stroke. Clinical Rehabilitation, 18(2), 215-221.
Platz, T., Pinkowski, C., van Wijck, F., Kim, I.H., di Bella, P., & Johnson, G. (2005). Reliability and validity of arm function assessment with standardized guidelines for the Fugl-Meyer Test, Action Research Arm Test and Box and Block Test: a multicentre study. Clinical Rehabilitation, 19(4), 404-411.
Schneider, S., Schonle, P.W., Altenmuller, E., & Munte, T.F. Using musical instruments to improve motor skill recovery following a stroke. Journal of Neurology, 254(10), 1339-1346.
Tiffin, J. (1968). Purdue Pegboard Examiner Manual. Chicago, USA: Science Research Associates.

See the measure

How to obtain the BBT

The BBT instructions can be obtained in the study by Mathiowetz et al. (1985)

Standardized equipment can be obtained at the website:
http://www.sammonspreston.com/Supply/Product.asp?Leaf_Id=7531

By clicking here, you can access a video showing how to administer the assessment.