Disparate Impact Analysis With Small Samples

By Richard E. Biddle

(California Labor & Employment Law Quarterly, Fall 1995)

© Copyright 1995 R.E. Biddle

To prove employment discrimination in a disparate impact case, plaintiffs must first determine that a practice, procedure or test adversely impacts a protected group of people. If the number of hires, promotions or terminations used as data samples is small, the small numbers often preclude plaintiffs from using disparate impact theory. Employers may not have collected or retained applicant-flow data or sex and ethnicity data. Regardless of the reason, small numbers often discourage plaintiffs or artificially encourage defendants. Statistically speaking, sample size plays a key role in findings of adverse impact.

An overview of the ways aggregation is allowed by federal guidelines and has been used in past court cases can help EEO practitioners better advise their clients, and offer attorneys information for preliminary strategic planning. First, to show how and why sample size plays a key role in findings of adverse impact, an introduction is needed to statistical approaches commonly used in such cases.

The Two Basic Statistical Approaches

The EEO field allows two ways of determining adverse impact: one method uses rates comparing, for example, the applicants and hires of one group to another group; a second method compares pools, such as the availability pool and the pool of those selected.

A rates analysis is a two-sample approach, and uses the hypergeometric statistic. A pools analysis is a one-sample approach, and uses the binomial statistic.

Chart 1 shows how comparisons are made with each method in disparate impact analyses. Here, adverse impact is hypothetically evaluated for blacks. The rates method uses numbers a,b,c,d. The pools method uses numbers c,k,n.

CHART 1

 Group

Number
Applied

Number
Selected

Number
Rejected

Percentage
Availability

 whites  a + b  a  b  m
 blacks  c + d  c  d  n
 Hispanics  e + f  e  f  o
 Asians  g + h  g  h  p
 A. Indians  i + j  i  j  q
 Total  k + l  k  l  100.0

In the example in Chart 1, a rates analysis compares the selection rate for whites, or a/(a+b), to the rate for blacks, c/(c+d). A pools analysis compares the percentage of blacks in the availability pool, (n), to the percentage of blacks in the pool after selections have taken place, (c/k).

A rates analysis is easier and much faster to compute than a pools analysis, since rates analyses do not require that labor market availability be identified or computed, and the direct calculations use often readily-accessible numbers. However, a pools analysis is the only feasible technique when the employer has not retained applicant data or obtained sex, race and ethnic data, since this approach does not use it. Similarly, if applicants are "discouraged" from applying, or a "barrier" has artificially restricted the applicant pool so it is not representative of the availability pool, then a pools analysis is appropriate.

EEO practitioners apply the hypergeometric or binomial statistic during the analyses to come up with standard deviations, or probabilities, which ultimately tell whether findings are statistically significant or not. (Statisticians find statistical significance at 1.96 standard deviations, a figure which the Supreme Court has rounded to a range of 2 to 3 standard deviations.) Once findings are considered both statistically and practically significant, adverse impact is said to exist. Here, sample size comes into play. When comparing two rates or two pools, disparities may be found. The smaller the sample size, the greater the likelihood that disparities reflect chance rather than adverse impact.

Sample Size and Standard Deviations

To show how strongly sample size affects the calculations of hypergeometric and binomial standard deviations, the following experiment halves the original sizes of a sample population pool and success pool, then reduces the pools to ten percent of their original sizes. All ratios and pool percentages are maintained, and standard deviations recalculated at each reduction.

Although the proportions by group are maintained in the population pool and the success pool, reducing the samples by 50% in this situation reduces the standard deviations by about 30%. Reducing the pools to 10% of their original size reduces the standard deviations by approximately 75% in this situation. In all cases, whittling sample sizes reduces the binomial standard deviations more than the hypergeometric standard deviations.

The influence that the population sample sizes have within both hypergeometric and binomial distributions is so powerful that a difference between groups which is statistically significant with large samples will not be statistically significant with smaller samples--even though the corresponding rate and pool differences between the samples or groups is identical

Naturally, the larger the sample, the higher confidence the results will have in a given situation. For example, if you flipped a coin one time, and it came up heads, you would not think this result either expected or unexpected. If you flipped five times and each time the coin landed heads, the result might appear to you as a bit unexpected. If, after twenty flips, the coin still came up heads each time, you would strongly suspect that both sides of the coin had a head on it. When a sample is increased, it offers increased confidence that the results depart from those expected by chance, and chance alone.

CHART 2

ORIGINAL SAMPLE

Original Sample Size

 

 APPLIED

 SELECTED

 50% APPL.

 50% SEL.  10% APPL.  10% SEL.

 Men  5294  827  2647  413  529  83
 Women  1765  250  883  125  117  25
 Total  7059  1077  3530  538  706  108
             
 whites  5214  858  2607  429  521  86
 blacks  355  37  178  18  36  4
 Hispanics  870  98  435  49  87  10
 Asians  583  75  291  37  58  7
 Am. Indians  37  9  19  5  5  1
             
 Total  7059  1077  3530  538  706  108

 

Standard Deviations

 HYPERGEOMETRIC SAMPLE SIZE

 BINOMIAL SAMPLE SIZE

   Full  50%  10%  Full  50%  10%
 Men  N/A  N/A  N/A  N/A  N/A  N/A
 Women  1.45  1.03  0.48  1.34  0.90  0.44
             
 whites  N/A  N/A  N/A  N/A  N/A  N/A
 blacks  3.00*  2.22*  0.69  2.44*  1.78  0.44
 Hispanics  3.87*  2.71*  1.11  3.29*  2.30*  0.87
 Asians  2.20*  1.60  0.75  1.60  1.10  0.52
 Am. Indians  N/A  N/A  N/A  N/A  N/A  N/A

* Statistically significant at the 5% level of chance.

Naturally, the larger the sample, the higher confidence the results will have in a given situation. For example, if you flipped a coin one time, and it came up heads, you would not think this result either expected or unexpected. If you flipped five times and each time the coin landed heads, the result might appear to you as a bit unexpected. If, after twenty flips, the coin still came up heads each time, you would strongly suspect that both sides of the coin had a head on it. When a sample is increased, it offers increased confidence that the results depart from those expected by chance, and chance alone.

One way to increase the sample size is to aggregate data. Some techniques successfully used in disparate impact cases or supported in regulations follow:

Aggregating data for those selected into different jobs that fall within the same census code.

Aggregating data for those selected into different, but closely related jobs within more than one census code.

Aggregating data for the same test given over several years for different jobs.

Aggregating data for similar tests given over several years for the same job .

Aggregating several years of data for one job.

Aggregating data for more than one minority group.

Aggregating Data for Different Jobs Within the Same Census Code

In Paige v. the State of California, No. 94 0083 CBM (Ctx) the job titles of police sergeant, police lieutenant, and police captain were combined to provide aggregated data. Because the jobs shared census occupation code 414, the jobs also shared combined availability.

Aggregating Data for Similar Jobs From More Than One Census Code

In Hazelwood School District v. United States, 97 S.Ct. 2736 (1977), the Court recognized that a group of teacher-applicants had applied for a variety of teaching positions which fell into two different occupational census codes: 144 (Secondary School Teachers) and 142 (Elementary School Teachers). Data for applicants selected for teaching positions from both codes was combined and compared against correlating availability, for the purposes of a disparate impact analysis.

Aggregating Data for the Same Test for Different Jobs

The Questions and Answers supplement to the Uniform Guidelines, 44 Fed. Reg. 12000, question #27 (1979), specifically allows data for a test to be aggregated when the test is used for different jobs. "If the test is administered and used in the same fashion for a variety of jobs, the impact of that test can be assessed in the aggregate."

Aggregating Data for Similar Tests for the Same Job

In Bouman v. Block, 940 F.2d 1211, 1226 (9th Cir. 1991), cert. denied, 502 U.S. 1005 (1991) aggregating data for similar sergeant tests was allowed by the Federal District Court and supported by the Ninth Circuit Court of Appeals, because the tests appeared to measure similar knowledge, skills and abilities.

The 1991 Civil Rights Act requires plaintiffs to identify the specific practice, procedure or test causing adverse impact, if feasible. In actuality, a combination of practices, procedures and tests may contribute to a decision-making process. These contributions may be considered inseparable for analysis purposes. In this respect, data can be aggregated for an appropriate combination of practices, procedures and tests, depending on the circumstance.

Aggregating Selection Data for the Same Job Over Several Years

In Eldredge v. Carpenters 46 Northern California Counties (JATC), 833 F.2d 1334, 1340 n.8 (9th Cir. 1987) cert. denied, 487 U.S. 1210 (1988), comparisons were made between analyzing selection data year by year or the aggregated years of 1976-1984. The Ninth Circuit stated that "aggregated data is the most probative in this case."

Section 4D of the Uniform Guidelines on Employee Selection Procedures, 43 Fed. Reg. 38297 (1978), also supports aggregating data over several years when adverse impact has been found but the data is somehow insufficient. "Where...evidence concerning the impact of a selection procedure indicates adverse impact but is based upon numbers which are too small to be reliable, evidence concerning the impact of the procedure over a longer period of time...may be considered in determining adverse impact."

Aggregating Data for Different Race/Ethnic Groups

The United States Supreme Court allowed an adverse impact analysis of a class called "nonwhites" in Wards Cove Packing Co., Inc. v. Atonio, 490 U.S. 642. This means that separate minority groups can be aggregated for disparate impact purposes. Note, however, that combining data for all minority groups can sometimes mask a problem for the others if one minority group performs very well and the others do not.

Who Can Do the Grouping?

Anyone can aggregate data if the court allows it. However, in two cases reviewed by the Ninth Circuit Court of Appeals the defendants have attempted to group data when it was to their clear advantage to do so. The plaintiffs resisted. The plaintiffs prevailed on that point each time.

In Contreras v. City of Los Angeles, 656 F.2d 1267 (9th Cir. 1981), the City combined "the distribution of scores on the auditor examination with the results of a separate, senior auditor examination, on which Spanish-surnamed applicants performed better than any other ethnic group." Aggregating the data of the senior auditor exam with the results of the auditor exam was of benefit to the City's position, not to the plaintiff's. The Ninth Circuit ruled that the "senior auditor results were taken from a small sample, since only 9 Spanish-surnamed applicants took the [senior auditor] test." The Ninth Circuit stated that "it was clear error for the district court to conclude that these statistically insignificant results of the senior auditor examination permitted disregard of the statistical results of the auditor examination, particularly when there was no evidence that the Spanish-surnamed senior auditor applicants performed well on the same questions that the Spanish-surnamed auditor applicants failed."

In Bouman v. Block, 940 F.2d 1211 (9th Cir. 1991), cert. denied 12-9-91, the Federal District Court and the Ninth Circuit allowed the plaintiffs to aggregate data to their advantage, which meant combining the 1975 and 1977 Sergeant exam data and excluding the 1980 Sergeant's exam data from the aggregation. The 1980 exam results were favorable to the defendants in the case. The Ninth Circuit concluded that because "the 1980 examination was administered after Bouman brought suit...inclusion of the 1980 data not only would have failed to improve the reliability of the 1975 and 1977 data, but would have improperly obscured the discriminatory effects of the earlier examinations."

Summary

When analyzing part or all of a selection procedure for adverse impact, EEO practitioners are often hampered by small samples. With small samples, it can be difficult to find statistical significance -- differences beyond what is expected by chance and chance alone. Statistical significance is the first in a series of requisites leading to a disparate impact conclusion.

The EEO field allows two ways of determining adverse impact: one uses rates and compares applicants to hires or promotions by group, the other compares pools, such as availability to those selected. Along with knowledge of how to permissibly combine data, it is also helpful to understand how samples translate into statistics that lead to adverse impact findings. Aggregating data is often an acceptable means of increasing the sample that EEO practitioners rely upon to conduct analyses.


*Richard Biddle is President of Biddle & Associates, Inc., a Sacramento-based EEO consulting and software firm. Mr. Biddle's practice concentrates on litigation support in the areas of statistical analyses, job analysis, validation of practices, procedures, and tests, affirmative action, and expert witness work. He has been involved in over 100 cases. Richard Biddle can be emailed at RichardBiddle@biddle.com