Disparate Impact Reference Trilogy for Statistics
By Richard E. Biddle
(Labor Law Journal, November 1995)
© Copyright 1995 Richard E. Biddle
The author is with Biddle and Associates, Inc. of Sacramento, California, an Equal Employment Opportunity consulting and training firm.
In the Equal Employment Opportunity (EEO) field, "disparate impact" refers to one possible means of proving employment discrimination against an individual or a class of individuals. In a disparate impact case, the burden of proof is passed back and forth like a baton, with each party getting a turn or two to lead the course. Setting the exchange in motion, the plaintiff charges that facially neutral practices, procedures or tests are discriminatory in effect, regardless of whether or not discrimination is intentional. The complaining party must present a prima facie statistical case of disparate impact, showing a group protected by law to be disproportionately impacted. This constitutes the initial burden.
Assuming the complaining party succeeds in this regard, the burden shifts to the defendant, who must prove job-relatedness or business-necessity in order to justify the practice, procedure or test in question. If the defense carries its burden successfully, the baton is passed, and again the plaintiff must satisfy the burden of proof. At this point, the plaintiff's aim is to determine that an alternate selection device, or an alternate use of a preexisting selection device, could serve the employer's legitimate business purposes with lesser adverse impact.
Disparate impact cases follow a deliberately structured and well-worn course of burden-sharing and turn-taking, but a single, comprehensive rule book or reference manual for charting the course does not exist. Instead, EEO practitioners rely on a trilogy of primary reference sources in order to evaluate a situation in the employment field for disparate impact. The trilogy consists of: (1) the Uniform Guidelines on Employee Selection Procedures, 43 Federal Register, no. 166, 38290-38315, (1978), followed by the Questions and Answers to Clarify and Provide a Common Interpretation of the Uniform Guidelines on Employee Selection Procedures, 43 Federal Register, 11996-12009, (1979); (2) U.S. Supreme Court decisions of Wards Cove Packing Co. v. Atonio, 109 S. Ct. 2115 (1989), Hazelwood School District v. United States, 97 S.Ct. 2736 (1977), 433 U.S. 299 (1977), and Connecticut v. Teal, 102 U.S. 2525 1982); and (3) the 1991 Civil Rights Act. Each of these reference sources describes specific requirements for determining disparate impact. Unfortunately, the requirements are not the same.
This paper will identify the statistical requirements from each reference source, describe the consistencies between the sources on the statistical requirements, and delineate the differences.
What the Guidelines Require
The Uniform Guidelines require evidence of both statistical and practical significance in order to identify adverse impact--the initial burden in a disparate impact case. Section 4D introduces a rule of thumb measurement for adverse impact known as the 80 Percent Rule: "A selection rate for any race, sex or ethnic group which is less than four-fifths (4/5) (or eighty percent) of the rate for the group with the highest rate will generally be regarded...as evidence of adverse impact...."
However, the Guidelines immediately depict circumstances for which the so-called 80 Percent Rule of Thumb is inadequate. Smaller differences in selection rate [i.e., differences within the 80 percent limit, such as .81 or .95, etc.] may nevertheless constitute adverse impact, where they are significant in both statistical and practical terms or where a user's actions have discouraged applicants disproportionately on grounds of race, sex, or ethnic group. Greater differences in selection rate [i.e., differences outside the 80 percent limit, such as .79 or .60, etc.] may not constitute adverse impact where the differences are based on small numbers and are not statistically significant, or where special recruiting or other programs cause the pool of candidates to be atypical of the normal pool of applicants from that group.
In the above passage, the Guidelines propose situations where: (1) there is no violation according to the 80 Percent Rule of Thumb and yet adverse impact may still exist; and (2) there is a definite violation according to the 80 Percent Rule of Thumb and yet adverse impact may not exist. Potential exceptions to this rule can be attributed to the influence both statistical and practical significance exerts over the ultimate equation. Note, however, that discouraged applicants or an atypical applicant pool are conditions that need to be evaluated, as well.
In order to find statistical and practical significance, the Guidelines require a method of statistical analysis called a rate comparison, which calls for the hypergeometric statistical approach. This method compares, for example, the selection rate of one group to the selection rate of another group (e.g., the rate of women hired versus the rate of men hired, or the rate of blacks receiving promotions versus the rate of whites receiving promotions, etc.). The groups called for in the Guidelines for rate comparison analyses are specified in Section 4B as: men, women, blacks, American Indians, Asians, Hispanics and whites.
Finally, the Guidelines require an initial analysis of the overall selection process, rather than an individual evaluation of each distinct practice, procedure or test. Section 4C of the Guidelines maintains that if "the total selection process does not have an adverse impact, the Federal enforcement agencies...will not expect a user to evaluate the individual components for adverse impact, or to validate such individual components...."
However, the Guidelines expect individual components of a selection process to be scrutinized for adverse impact under the following conditions: (1) where the selection procedure is a significant factor in the continuation of patterns of assignments of incumbent employees caused by prior discriminatory employment practices, (2) where the weight of court decisions or administrative interpretations hold that a specific procedure (such as height or weight requirements or no-arrest records) is not job related in the same or similar circumstances. In unusual circumstances, other than those listed in (1) and (2) above, the Federal enforcement agencies may request a user to evaluate the individual components for adverse impact and may, where appropriate, take enforcement action with respect to the individual component.
Absent these conditions, the analysis is to be made of the overall selection process. According to the Guidelines, if this overall analysis shows no adverse impact, a bottom-line defense of no adverse impact should hold water in this portion of a disparate impact case. However, the Guidelines do not provide the last word.
In 1982, the United States Supreme Court removed the bottom-line balance in hiring or promotion activities as a defense under Title VII. The Court allowed plaintiffs to isolate a specific testing procedure, and thereby successfully focus the adverse impact analysis, independent of the fact that overall adverse impact was not shown.
Where the Guidelines are Silent
When the Uniform Guidelines were published, the year was 1978. PC's were a twinkle in the eye of engineers and scientists. The authors who developed and specified statistical requirements for the Guidelines performed the statistical work with hand-held calculators or with mini or mainframe computers. While such devices could easily calculate standard deviations, practitioners relied on tables to translate these into corresponding probabilities. Direct calculations of probabilities with hand-held calculators is arduous and to impose such a statistical requirement nearly twenty years ago would have been unreasonable. Today, EEO practitioners contend with the variety of presently available statistical approaches and the lack of comprehensive, up-to-date guidance from a primary reference source to help make sense of them.
Conspicuously absent from the requirements specified in the Uniform Guidelines are whether standard deviations or probabilities should be used in determining statistical significance and what formula should be used in determining statistical significance. For example, should the chi-square be used to estimate the probability, or should probability be calculated? If the chi-square should be used, should a statistical correction be used, because the chi-square uncorrected tends to prematurely find statistical significance? If the chi-square should be used and a statistical correction also should be made to the chi-square, which correction should be used, Yates' or Cochran's? Which of Cochran's corrections, if that is chosen?
What level of statistical significance should be used, the .05 level or the .01 level? Should a one-tailed hypothesis be used or a two-tailed hypothesis (i.e., should we be concerned only when a protected group has a statistically significantly lower rate than the comparison group, or should we be concerned whenever there is a statistically significant difference?). The use of a one-tailed test would allow a conclusion of statistical significance when 1.645 standard deviations are calculated, whereas 1.96 standard deviations are required with a two-tailed test.
What practical significance test was contemplated by the authors of the Uniform Guidelines during the late 1970's? (The subject of practical significance is discussed in detail below.)
One year before the publication of the Uniform Guidelines, the U.S. Supreme Court decided two cases that specifically addressed the topics of standard deviations versus probabilities, one-tailed or two-tailed hypotheses, and measurements of statistical significance. The Court concluded that, as a general rule, if the difference between an expected value and the observed number is greater than 2 to 3 standard deviations, the difference is greater than that which can be attributed to chance. At this point we conclude statistical significance.
Practical significance is concerned with the effect small number changes have on statistical conclusions, and with an extensive, contextual and practical assessment of the utility of those conclusions. Practical significance offers EEO practitioners a means to ask: What happens when the results are just barely significant? What is the impact of adding a few people in a hypothetical way?
Practical significance is a presently evolving concept; statistical courses in universities often do not address it. The earliest reference to practical significance in the EEO field probably occurred in the mid-to-late 1960's when fair employment agencies, like the U.S. Department of Labor, the Equal Employment Opportunity Commission, as well as state agencies, first began drafting their regulatory guidelines. Upon publication, the Uniform Guidelines formally incorporated the concept of practical significance into the language of the disparate impact section (4D), but did not specify the acceptable test to conduct in order to determine its parameters. Published a year later, the Questions and Answers companion to the Guidelines mentions practical significance as part of its illustration that the 80 Percent Rule "is not intended to be controlling in all circumstances." One year after the publication of Questions and Answers, the concept of practical significance was referenced in Baldus and Cole's Statistical Proof of Discrimination, McGraw-Hill, 1980, but again without much guidance for specific test methods.
Since the late 1970's, practical significance has received wider recognition in the social sciences and court cases in general, with researchers addressing the issue more regularly. A computerized search for the term "practical significance" in professional literature revealed 137 articles concerned with the topic. Nearly all of these references post-date the publication of the Guidelines.
Practical Significance Tests
In the EEO field, there are at least four different procedures for evaluating practical significance. Three of these evaluate the effect that small number changes have on statistical inferences, while the fourth takes a more pragmatic look at the size of rate differences between groups. To be considered "disproportionately different," discrepancies must be both significantly and practically significant.
One test of practical significance is conducted by adding two people from an unfavorable status (e.g., not passing a test) in the plaintiff group to a favorable status (e.g., passing a test). When the addition of one or two people from the plaintiff group, on a hypothetical basis, can change a previous finding of statistical significance to one which is not statistically significant, the results are of no practical significance.
A second practical significance test evaluates the effect of small number changes on the 80 Percent Rule of Thumb conclusion. This conclusion is the result of two ratios: the number of people from one group receiving favorable status divided by the total number of people from that group, resulting in a rate of success for one group; and the division of this ratio into a second ratio representing a disadvantaged group. For example, let's assume a total of 60 whites apply and 12 of those applicants are hired. The selection rate for this group is 20 percent. The same procedure results in a rate of success for a second group: out of a 20 black applicants 3 are hired, and the selection rate is 15 percent. Next, the highest rate is divided into the lowest rate of the two groups. In this case, .15 divided by .20 results in a selection rate of 75 percent. According the 80 Percent Rule, a result of less than .80 constitutes a violation. However, if the addition of only three people to those of favorable status in the plaintiff group alters the 80 Percent Rule of Thumb conclusion to become greater than .80, as it would in this example, then the data set is judged to be too small to have any practical significance.
Another practical significance test adds up to four people to the plaintiff group in a favorable way to see how close the selection rates come to each other. When the addition of up to four people to the plaintiff group in a favorable way brings the selection rates of the two groups "close" to one another, the sample is too small to merit any practical significance. Being "close" in this situation means selection rates are within 2.1 percent of each other, following the addition of four people to the plaintiff group's total of those in a favorable standing.
The fourth practical significance test takes a look at the difference in rates between two groups without altering the plaintiff group. This test recognizes that in some samples slight differences in rates between groups can be statistically significant, while the differences might be insubstantial from a practical significance point of view. Differences in group rates between approximately 4.5 percent and 7.1 percent are too small to be considered practically significant, in some situations. Each of the four practical significance tests can be applied to rate comparisons (i.e., hiring rate, promotion rate, layoff rate, retention rate, etc.). The first test in those listed, which adds two people from an unfavorable status (e.g., not getting hired) in the disadvantaged group, to a favorable status (e.g., getting hired), can be applied to a pools analysis as well as to a rate analysis, using the binomial statistic. (More on pools comparisons and the binomial to follow.)
Summary of Guidelines Requirements
According to the Uniform Guidelines, if an overall selection process shows adverse impact, then the separate practices, procedures or tests that cause the adverse impact are to be explored for adverse impact. If the overall adverse impact analysis shows no adverse impact, the Guidelines generally impose no further requirement. However, the U.S. Supreme Court has allowed plaintiffs to focus on a practice, procedure or test for adverse impact analysis independent of the overall finding of no adverse impact. Therefore, under the Uniform Guidelines, the safest approach is to analyze the impact of the entire selection process and each of the practices, procedures or tests, as well.
The Uniform Guidelines are silent on many issues that are critical in calculating adverse impact. With fast 586-type PC's now readily available, the most accurate way of calculating statistical significance with rate differences uses the Fisher Exact Probability Test. The formula is applied as a two-tailed test and concludes statistical significance when the probability is calculated at .05 or less. Next, the probability is inversely transformed into a standard deviation--the key measurement for the EEO field. Practical significance testing appropriately applied occurs after statistical significance has been found and before adverse impact can be inferred.
U.S. Supreme Court Decisions
In 1989, after the U.S. Supreme Court decided Wards Cove, the burden-sharing approach to disparate impact cases described in the introduction to this paper was redefined. The Supreme Court called for the plaintiffs to maintain the burdens throughout the case. The 1990 Congress did not secure the necessary votes to reinstate burden-sharing as it had existed prior to Wards Cove, but the 1991 Congress did. With the 1991 Civil Rights Act, burdens between plaintiff and defendant moved back to the way they were understood on the day prior to the Wards Cove decision.
The 1991 Civil Rights Act revised Wards Cove in regard to burdens and specifically defined "disparate impact" with a section of the law. What the Act did not address was the procedure of proving disparate impact as described by Wards Cove to include three distinct tests in the disparate impact analysis: the "threshold" analysis (also called the initial inquiry), the "barriers" analysis and the "selection" analysis.
Two kinds of statistical distributions are used to measure the differences between observed data and data expected by chance in the above analyses. The hypergeometric distribution has already been discussed in relation to the Uniform Guidelines and the "selection" test. A second distribution, known as the binomial, applies to all three of the above tests described in Wards Cove.
While the hypergeometric statistic compares the rate of one group to the rate of another group, the binomial statistic compares one group in a selected pool to the same group in the population pool (e.g., Hispanics selected versus Hispanics available). In doing so, the binomial considers "availability" as one of the pools. Availability data can be external census occupation data gathered for the relevant labor market or it can be an internal pool of employees eligible for selection, under some circumstances.
Analysis of Disparate Impact from Wards Cove
Wards Cove calls for an initial inquiry or threshold test within the disparate impact analysis: "It is such a comparison--between the racial composition of the qualified persons in the labor market and the persons holding at-issue jobs--that generally forms the proper basis for the initial inquiry in a disparate impact case."
The threshold analysis conducted with Affirmative Action Plans is called a "utilization analysis." It is a comparison between the percentages of various groups in the relevant labor market and those on the job or jobs, and requires the binomial statistic.
Wards Cove calls for a "barriers" analysis to be conducted prior to the "selection" analysis to determine if a "proxy" group should be used as a comparison group, or if the actual applicants can be used. Actual applicants are preferable, "as long as there are no barriers or practices deterring qualified nonwhites from applying for...positions"
A "barriers" analysis considers how the makeup of the applicant pool is governed by chance, comparing applicant pool percentages with those available in the relevant labor market pool. This comparison relies on the binomial statistic.
If a barrier is found to exist that disproportionately restricts a protected group of applicants--outside of the realm of chance--then a "proxy" group must be used as a comparison group in the subsequent analysis (i.e., the "selection" analysis). If no barrier is found, the actual applicants can be used in the "selection" analysis. A showing that a barrier exists is adequate evidence, in itself, for a disparate impact finding.
Wards Cove describes the "selection" analysis as follows: "if the percentage of selected applicants who are nonwhite is not significantly less than the percentage of qualified applicants who are nonwhite, the employer's selection mechanism probably does not operate with a disparate impact on minorities."
In order to compare the percentage of selected applicants of a particular group to the percentage of qualified applicants of that group, one must apply the binomial statistic.
The binomial statistic was used by the Supreme Court as early as 1977, one year before the Guidelines were published, in Hazelwood School District v. United States. The Supreme Court accepted the analysis of comparing the percentage of protected group members in the selected pool of teachers to the percentage of protected group members in the qualified labor market. In this case, the protected group was "blacks," and the relevant labor market consisted of elementary and secondary school teachers in the occupational census count for St. Louis. Hazelwood used an overall selection analysis that parallels the Wards Cove selection analysis, applying it to one protected group and a variety of teaching positions.
Summary of Supreme Court Requirements
The U.S. Supreme Court has allowed disparate impact analyses to be performed on individual practices, even when the overall process that contains the individual practice shows no disparate impact. In Hazelwood, an analysis was allowed of the overall process.
The initial inquiry or "threshold" analysis called for by Wards Cove compares the utilization of a protected group on the job to the availability for that protected group in the relevant labor market. This indicates whether the protected group has a lower percentage on the job than is expected by chance. The next analysis from Wards Cove, the "selection" analysis, requires a pre-analysis to determine the base comparison pool to use. The "barriers" analysis compares the applicant pool to the percentages in the relevant labor market. If barriers have had the impact of deterring qualified, protected group members, then a "proxy" group must be used in the "selection" analysis. If barriers are not found to exist, then the actual applicants can be used in the "selection" analysis.
The "selection" analysis compares the percentage of the selected pool who are in a group to the percentage of that group in the applicant (or proxy) pool. It is an overall selection process analysis.
All three Wards Cove disparate impact analyses use the binomial statistic. Two earlier U.S. Supreme Court decisions (i.e., Castaneda and Hazelwood) also used the binomial statistic.
An important part of the Wards Cove analyses is the analysis of the group called "nonwhite." Title VII uses the definition of protected groups by race, color, religion, sex, and national origin. The Uniform Guidelines do the same. However, the Uniform Guidelines define the groups as whites, blacks, Hispanics, Asians, and American Indians. Aggregating all minorities into "nonwhites" for an analysis group increases the size of the sample that can be used in the analysis.
1991 Civil Rights Act Statistical Requirements
The 1991 Civil Rights Act requires the focus of disparate impact analysis to rest not on an overall analysis of the selection process, but rather on specific practices, procedures or tests, when this is feasible. Section 105 reads:
"An unlawful employment practice based on disparate impact is established under this title only if...a complaining party demonstrates that...a particular employment practice...causes a disparate impact on the basis of race, color, religion, sex, or national origin.
"The complaining party shall demonstrate that each particular challenged employment practice causes a disparate impact, except that if the complaining party can demonstrate to the court that the elements of a respondent's decision making process are not capable of separation for analysis, the decision making process may be analyzed as one employment process."
The requirements of the 1991 Civil Rights Act are not lengthy. The focus is on the individual practices, procedures or tests, when feasible. Only when the individual components are not capable of separation for analysis, is the overall selection process analyzed as one.
Consistencies and Differences Within the Trilogy
The Uniform Guidelines specify individual groups to compare for analysis, identifying whites, blacks, Hispanics, Asians and American Indians, with definitions of each. Wards Cove permits combining minority groups into a group called "non-white." The 1991 Civil Rights Act identifies groups by race, color, religion, sex, and national origin. Apparently, allowing protected groups to be aggregated for analysis purposes does not constitute an inconsistency among the trilogy sources.
All three references in the trilogy are silent on the issues of one-tailed or two-tailed hypotheses, standard deviations versus probabilities, and the specific level of statistical significance necessary to substantiate a relevant conclusion. However, two 1977 U.S. Supreme Court decisions (i.e., Castaneda and Hazelwood) established a precedent for these requirements by concluding that, as a general rule, if the difference between an expected value and the observed number is greater than 2 to 3 standard deviations, the difference is greater than that which can be attributed to chance. (The minimum level of 2 standard deviations is very close to 1.96 standard deviations--a two-tailed hypothesis at the .05 level of significance.) Applying this range to requirements outlined in the trilogy presents no apparent inconsistency.
Although practical significance is a requirement of the Uniform Guidelines, neither Wards Cove nor the 1991 Civil Rights Act make reference to the concept. Nonetheless, numerous court decisions have imposed and continue to impose the requirement of practical significance. (See the practical significance cases cited above.) Therefore, no inconsistency occurs when practical significance requirements accompany statistical significance requirements in order to infer disparate impact.
The Uniform Guidelines require a comparison of rate differences, necessitating the hypergeometric statistical procedure. Wards Cove and two prior Supreme Court decisions (i.e., Castaneda and Hazelwood) call for pool comparisons, necessitating the binomial procedure. The 1991 Civil Rights Act is silent on the appropriate statistical procedure. The inconsistency in this case is not only due to the various statistical requirements called for in the trilogy, but the difference between the statistical approaches, themselves: the hypergeometric statistic will find evidence of statistical significance more readily than the binomial. In some situations, such as labor market comparisons, the hypergeometric is inappropriate. In situations where both the hypergeometric and the binomial can be used, EEO practitioners are wise to apply both.
Although the 1991 Civil Rights Act is silent on the "barriers" analysis requirement, there appears to be no inconsistency between its silence and the "barriers" analysis required by both the Uniform Guidelines and Wards Cove. A "barriers" analysis is needed to gather comparisons for broader disparate impact analyses.
The Uniform Guidelines focus first on the overall impact of a selection process. If an overall adverse impact is found, then the Guidelines require an analysis of the component parts of the selection procedure to find out the source of the impact. Wards Cove focuses first on a method of initial inquiry, called a "threshold" test or utilization analysis. The Guidelines also focus on an initial inquiry called the 80 Percent Rule of Thumb. Neither of these initial inquiries stand as dispositive evidence by themselves.
Disparate impact can exist with and without the 80 Percent Rule and with or without the "threshold" test. The 1991 Civil Rights Act focuses on the individual practices, procedures or tests of a selection process. Only when the individual practices, procedures or tests are not capable of separate analysis does the focus shift to the overall selection procedure. This does, in fact, appear to present an inconsistency. In truth, most selection procedures have several practices, procedures or, sometimes, tests that are combined in some way during the decision-making process. Additionally, since the 1982 case Connecticut v. Teal, the plaintiff is permitted to evaluate the individual parts of a selection procedure, regardless of the overall selection process analysis. EEO practitioners are prudent to evaluate the overall selection process as well as each component of the selection process.
1. Conduct a "threshold" test (i.e., also called initial inquiry or utilization analysis). Compare the percentage of the protected group in the feeder pool (i.e., either internal jobs in the employer's organization that provide qualified candidates for the at-issue job(s), or occupational data from the relevant labor area--whichever is higher.) Regardless of the outcome of the "threshold" test, continue to the "barriers" test.
2. Conduct a barriers test. If applicant data is available, compare applicant data to occupational census data from relevant labor area with the binomial formula, to determine if applicants have discouraged disproportionately on the basis of race, sex or ethnic origin. In other words, to see if a barrier exists, use occupational census data from the relevant labor area as the proxy pool in a "selection" test.
3. If the barriers test shows a barrier exists which has disproportionately discouraged applicants on the grounds of race, sex or national origin, or if applicant data was not available or was incomplete, use the binomial statistic with the proxy group percentages to compare against those selected as well as those passing individual tests, practices or procedures.
4. If a barrier is not found and applicant data is available, use applicant data. Perform both the hypergeometric statistic to address the Guidelines and the binomial statistic to address the Wards Cove selection test. Use the hypergeometric statistic to evaluate the individual practices, procedures, and tests that can be evaluated separately.
DISPARATE IMPACT REFERENCE TRILOGY SUMMARY
|Uniform Guidelines||U.S. Supreme Court||1991 C.R.A.|
80% RULE OF THUMB
|applied||not applied||not applied|
|group rates||pool percentages||not specified|
|not required||required||not required|
IF NO BARRIERS
|if feasible||if feasible|
* The Guidelines have a barrier requirement. Section 4D, sentence two, states that adverse impact may exist when the employer's actions have discouraged applicants disproportionately on the grounds of race, sex or ethnic group.
** Estimate of applicant pool as though absent of the barrier (i.e. percent of the protected group in occupational category or categories of census for the relevant labor market.)
*Richard Biddle is President of Biddle & Associates, Inc., a Sacramento-based EEO consulting and software firm. Mr. Biddle's practice concentrates on litigation support in the areas of statistical analyses, job analysis, validation of practices, procedures, and tests, affirmative action, and expert witness work. He has been involved in over 100 cases. Richard Biddle can be contacted at RichardBiddle@biddle.com