Summary
noting that "a problem with keywords 'is that they often are overinclusive, that is, they find responsive documents but also large numbers of irrelevant documents'"
Summary of this case from Youngevity Int'l Corp. v. SmithOpinion
No. 13-MD-02420 YGR (DMR)
02-24-2015
ORDER ON JOINT DISCOVERY LETTER [DOCKET NO. 633]
Plaintiffs and Defendants filed a joint letter brief on January 16, 2015 regarding a disputed provision in the parties' proposed Search Term Protocol. [Docket No. 633 (Joint Letter).] The court held a hearing on February 19, 2015. Having considered the parties' arguments and for the reasons set forth below, the court enters the following order.
I. Background
On December 3, 2014, the court ordered the parties to meet and confer to negotiate a protocol for the use of search terms. [Docket No. 592.] The parties have agreed upon an iterative process for the development and testing of search terms, summarized as follows: 1. the producing/responding party will develop an initial list of proposed search terms and provide those terms to the requesting party (Joint Letter Ex. A (Proposed Search Term Protocol) § B2); 2. within 30 days, the requesting party may propose modifications to the list of terms or provide additional terms (up to 125 additional terms or modifications) (Proposed Search Term Protocol § B3); and 3. upon receipt of any additional terms or modifications, the producing/responding party will evaluate the terms, and
a. run all additional/modified terms upon which the parties can agree and review the results of those searches for responsiveness, privilege, and necessary redactions (Proposed Search Term Protocol § B4), or
b. for those additional/modified terms to which the producing/responding party objects on the basis of overbreadth or identification of a disproportionate number of irrelevant documents, that party will provide the requesting party with certain quantitative metrics and meet and confer to determine whether the parties can agree on modifications to such terms. Among other things, the quantitative metrics include the number of documents returned by a search term and the nature and type of irrelevant documents that the search term returns. In the event the parties are unable to reach agreement regarding additional/modified search terms, the parties may file a joint letter regarding the dispute. (Proposed Search Term Protocol §§ B5, B7.)
The parties now request the court's guidance on a single remaining issue regarding their search term protocol. Plaintiffs propose that if disputed search terms remain after the quantitative metrics evaluation, the parties would then conduct a randomized qualitative sampling. Such sampling "shall be done formally, by means of a random number generator, which will generate a statistically valid number of ordinal positions of the identified documents," and the "randomly selected documents can be viewed by the Requesting Party immediately after the appropriate privilege check." (Proposed Search Term Protocol § B6.) Plaintiffs argue that qualitative sampling will provide insight into why a seemingly relevant search term may be returning disproportionate numbers of irrelevant documents. This process could lead to search adjustments, which would improve precision in identifying relevant documents.
Plaintiffs' proposal is similar to the protocol proposed by the plaintiffs and ordered by Magistrate Judge Joseph C. Spero in In re Optical Disk Drive Products Antitrust Litigation ("ODD"), No. 3:10-md-02143 RS (JCS) (Docket Nos. 708 (order), 660-1 (plaintiffs' proposal). That protocol provided "[f]or any of the Disputed Terms where the parties cannot reach agreement, each defendant will pull a random sample of documents with unique hits, and provide the sample to plaintiffs (after removing any privileged documents), so that the parties can better discuss the utility of adding that particular term. The parties will, in good faith, attempt to reach agreement on the appropriate sample size." [Docket No. 660-1.] Judge Spero supplemented the protocol and ordered that the "sample size for all Disputed Terms taken together is limited to a total of 250 documents per defendant with sufficient safeguards to ensure that those documents are randomly selected and not privileged." [Docket No. 708.]
II. Legal Standard
Federal Rule of Civil Procedure 26 provides that a party may obtain discovery "regarding any nonprivileged matter that is relevant to any party's claim or defense." Fed. R. Civ. P. 26(b)(1). All discovery is subject to the limitations imposed by Rule 26(b)(2)(C), which requires the court to limit the frequency or extent of discovery otherwise allowed if it determines that "the discovery sought is unreasonably cumulative or duplicative, or can be obtained from some other source that is more convenient, less burdensome, or less expensive." Fed. R. Civ. P. 26(b)(1), 26(b)(2)(C)(i). Pursuant to Rule 34, a party may request, within the scope of Rule 26(b), that any other party "produce and permit the requesting party . . . to inspect, copy, test, or sample" materials sought under the rule, including documents or electronically stored information. Fed. R. Civ. P. 34(a)(1).
III. Discussion
Defendants object to the proposed sampling provision solely on the grounds that it will provide Plaintiffs with access to non-responsive, irrelevant documents that will be generated through the procedure. Defendants argue that the proposed protocol runs counter to the Federal Rules, the practices of this Court, and the conduct of discovery, in that it seeks to impose a requirement that Defendants produce non-responsive documents that are not subject to discovery and need not be produced.
Defendants did not object on the grounds that producing the qualitative samples could become burdensome.
According to Defendants, the provision is unnecessary due to the detailed quantitative information that they have already agreed to produce regarding disputed search terms. They assert that these provisions satisfy their obligation to cooperate with Plaintiffs and allow Plaintiffs adequate insight into the process of developing and applying search terms. Finally, they argue that the provision is unnecessary as there has been no showing that any Defendant's production is incomplete. See Han v. Futurewei Techs., Inc., No. 11-CV-831-JM (JMA), 2011 WL 4344301, at *5 (S.D. Cal. Sept. 15, 2011) (denying mirror imaging of plaintiff's personal computers and storage devices on grounds that it was premature and would lead to inspection of non-responsive, irrelevant, privileged info; "[a] requesting party . . . must rely on the representations of the producing party . . . that it is producing all responsive, relevant, and non-privileged discovery.").
In response, Plaintiffs argue that the proposed provision incorporates ESI best practices, including those embodied in materials developed by this Court. Plaintiffs do not dispute that irrelevant documents are outside the scope of discovery. In fact, they assert that irrelevant documents are useless and burdensome to Plaintiffs because they are inadmissible and any time spent reviewing them is wasted. Instead, Plaintiffs argue that both parties will benefit from this sampling procedure because it will provide insight into why a particular search term fails to return an appropriate set of documents, enabling the parties to focus computer searches on relevant, discoverable material. Plaintiffs contend that the best way to refine searches and eliminate unhelpful search terms is to analyze a random sample of documents, including irrelevant ones, to modify the search in an effort to improve precision.
See, e.g., United States District Court, Northern District of California, "Guidelines for the Discovery of Electronically Stored Information," Guideline 2.02(f) ("[At the Rule 26(f) meet and confer conference, the parties should consider discussing] [o]pportunities to reduce costs and increase efficiency and speed, such as by conferring about the methods and technology for searching ESI to help identify the relevant information and sampling methods to validate the search for relevant information . . . ."); United States District Court, Northern District of California, "Checklist for Rule 26(f) Meet and Confer Regarding Electronically Stored Information," Section V ("Search") ("The search methods(s), including specific words or phrases or other methodology, that will be used to identify discoverable ESI and filter out ESI that is not subject to discovery"; "The quality control method(s) the producing party will use to evaluate whether a production is missing relevant ESI or contains substantial amounts of irrelevant ESI").
The court agrees. The point of random sampling is to eliminate irrelevant documents from the group identified by a computerized search and focus the parties' search on relevant documents only. As the court noted in Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y. 2012), a problem with keywords "is that they often are overinclusive, that is, they find responsive documents but also large numbers of irrelevant documents." Id. at 191. In Moore, the court ordered the defendants to produce a random sample of privileged checked irrelevant documents "to allow calculation of the approximate degree of recall and precision of the search and review process used." Id. at 202. Although that case dealt with predictive coding, the principles guiding the court are applicable here. The goal of "quality control test[ing]" is "to assure accuracy in retrieval and elimination of 'false positives.'" Id. at 191 (citing William A. Gross Constr. Assocs., Inc. v. Am. Mfrs. Mut. Ins. Co., 256 F.R.D. 134, 136 (S.D.N.Y. 2009)). As Plaintiffs point out, a random sample that shows that a search is returning a high proportion of irrelevant documents is a bad search and needs to be modified to improve its precision in identifying relevant documents. The proposed sampling procedure is designed to prevent irrelevant documents from being reviewed or produced in the litigation, and will obviate, or at least clarify, motion practice over the search terms themselves.
Defendants raise a valid concern that the sampling protocol will result in the production of irrelevant information to which Plaintiffs have no right. This concern can be easily mitigated. At the hearing, Plaintiff agreed that Defendants may review the random qualitative sample and remove any irrelevant document(s) from the sample for any reason, provided that they replace the document(s) with an equal number of randomly generated document(s). In addition, the parties agreed that the procedure for qualitative sampling shall apply only after exhaustion of the quantitative evaluation process. Irrelevant documents in the sample shall be used only for the purpose of resolving disputes regarding search terms in this action, and for no other purpose in this litigation or in any other litigation; those irrelevant documents, as well as any attorney notes regarding the sample, shall be destroyed within fourteen days of resolution of the search term dispute, with such destruction confirmed in an affidavit by counsel. In addition, the court held that access to the random sample shall be limited to one attorney from each law firm designated co-lead class counsel for Direct Purchaser Plaintiffs and Indirect Purchaser Plaintiffs (total of six attorneys). The court further held that Plaintiffs may invoke the random sampling process with respect to no more than five search terms per defendant group. The parties indicated that under this protocol, a defendant family would run one combined search for up to five disputed terms, rather than creating separate samples for each disputed term. The parties were ordered to meet and confer regarding the sample size, as well as the overall limit on the number of sample documents generated per defendant family.
At the hearing, the court suggested a sample size of 100 + one-half percent (instead of Plaintiffs' proposed one percent) of the number of identified documents, with a total ceiling of 2,500 sample documents per defendant family.
--------
IV. Conclusion
For the foregoing reasons, the court grants Plaintiffs' request to incorporate a provision regarding qualitative sampling of documents returned by disputed search terms into the parties Search Term Protocol. The parties shall meet and confer in accordance with this order. By no later than February 26, 2015, the parties shall submit a revised Proposed Search Term Protocol consistent with this order that also incorporates the parties' agreements regarding sample size and ceiling. Dated: February 24, 2015
/s/_________
DONNA M. RYU
United States Magistrate Judge