Is predictive coding the answer to reducing the costs of eDisclosure?

James Kent
James Kent

Dr James Kent, Global Head of Investigations and CEO EMEA, Nuix, believes predictive coding must be combined with other technologies and investigative workflows to address the runaway costs of legal review.

In the age of big data, organisations facing litigation, regulatory disputes and audits are spending too much time and money on data discovery processes. Quite simply, organisations must deal with more information created and stored in more places. Paper documents make the occasional appearance, but the critical evidence is most often found in electronic documents, email, instant messages, text messages or even blogs or social media.

For each custodian, legal counsel must examine multiple devices including company-owned and personal computers, email and collaboration servers, file shares, archives, smartphones, tablet devices, USB hard drives and flash memory cards. A case may also involve data stored in cloud services such as Gmail or Dropbox. Critical evidence may also be stored in file formats that are difficult and complex to access, including email archives, compliance storage repositories and legacy platforms.

In addition, surveys of corporate counsel and business executives consistently show they expect to face a greater burden of litigation and regulatory scrutiny in the future. When each case requires a human being to review every page of data from each source, the costs quickly become unsustainable.

Can predictive coding solve the problem?

A consensus is emerging in the legal industry that it has become prohibitively expensive – and often impossible – for human beings to review every page of evidence. Some industry experts believe predictive coding technology will solve this problem. Predictive coding is a way of teaching a computer to automatically classify electronic evidence based on statistical analysis and machine learning techniques.

However, many legal practitioners are still not comfortable handing such important decisions over to computers. This is especially the case for ‘black box’ predictive coding solutions that do not clearly explain how their engines classify documents. If lawyers can’t understand how the technology works, they almost certainly can’t explain it to a judge.

I believe predictive coding alone will not greatly reduce discovery costs because it occurs at the wrong end of the process – once a case has already progressed to full litigation. eDisclosure requires a multifaceted approach that combines technologies across the Electronic Discovery Reference Model (EDRM) process. These technologies include:

  • Light metadata scans
  • Legal hold
  • Targeted collection from multiple sources
  • Efficient collection from email archives
  • Rapid data indexing
  • Clustering
  • Visualisation
  • Investigative search
  • Deduplication
  • Predictive coding.

This approach gives legal counsel access to all the facts sooner and makes it possible to drive a proactive, winning strategy.

Five ways to reduce costs across the discovery process

By applying the right technologies across all stages of the EDRM process, legal professionals can minimise costs and use their superior knowledge of the facts to guide their strategy and maintain a leading edge.

Thoroughly gather all the facts

Although the ultimate aim is to minimise the number of documents handed over to legal advisors, a vital first step is to start with all the facts of the case, gathered from all available custodians and data sources. This requires a rigorous investigative workflow.

  • Interview the first key players that come to light and ask them to identify any documents and data sources that might be relevant to the case.
  • Notify these custodians, and anyone else who manages that content, of their duty to preserve evidence, and retain a record that they have received and acknowledged this notification. If such a facility exists, place all the relevant content under legal hold to prevent automated or inadvertent deletion.
  • Map the data sources identified and gain a high-level understanding of their content.
  • Perform targeted data collection, focusing on the relevant data sources, custodians, dates and document types.
  • Fully index all collected data and analyse it using search, clustering and visualisation to examine the relationships between custodians and evidence.
  • If this analysis identifies information gaps or other people who may have been involved, repeat the fact-gathering process with these new custodians or lines of enquiry.
  • Once an organisation is satisfied it has all the relevant information, use techniques such as deduplication and predictive coding to cull the data down to a small number of critical documents, and pass these on to its legal advisors to begin setting strategy.

Being extremely thorough at this stage can unearth the custodians, documents and facts that traditional approaches would miss, and avoid nasty surprises further down the track.

Speed is also essential. At every stage of the process, organisations must avoid bottlenecks such as technologies that cannot quickly map, collect and analyse data.

Leverage superior knowledge to settle…

By rigorously chasing up all relevant custodians and data sources, then culling down to the most relevant documents, an organisation can confidently make an informed decision about its chances of winning a lawsuit or regulatory dispute.

Knowing more than the opposition allows an organisation to set a settlement strategy that works to its advantage. This in turn avoids the costs and business disruption of litigation, the wrong kind of publicity and the potential disclosure of sensitive information.

… or maintain an advantage throughout litigation

Despite the best efforts of setting a settlement strategy, it may not be possible to avoid litigation if the opposing side believes it has a strong case. Having rapid access to the full corpus of evidence – and powerful investigative search capabilities – can help a litigant maintain a strategic lead over the opposition through multiple rounds of depositions and discovery.

This begins with the ability to make a very strong and comprehensive claim based on a thorough understanding of the case and all the available evidence. Unless the other side can make an equally compelling claim, the case should be over very quickly.

If the case progresses, having already indexed and investigated all the evidence sources will confer on-going advantages. As witnesses or documents emerge, the legal team can quickly search and analyse the case evidence. If new evidence sources come to light, these can be added to the corpus, indexed and analysed quickly.

Deeply investigate the other side’s disclosures

With superior technology, a litigant can also deeply investigate the other side’s disclosures to reveal information gaps. Analytics technologies such as clustering can reveal differences between the two sides’ document sets. This can quickly reveal what one side knows that the other does not, or may point to critical evidence the opposition has failed to disclose.

Cases that involve native format production provide an opportunity to forensically investigate an opponent’s disclosures. This can often uncover secrets they are trying to conceal, for example through ‘track changes’ mark-ups in documents or hidden files.

Minimise document sets and the risk of revealing privileged information

Predictive coding can help sort relevant from irrelevant during the review process, but it is far from the only useful technology. In addition, review is only one stage of the eDiscovery process where it is helpful to minimise the number of documents in the evidence base.

  • Before collection, conducting a light metadata scan makes it possible to target collection efforts to the data sources, custodians, document types and date ranges that are most likely to be relevant.
  • After processing or before production, deduplication can eliminate a large proportion of documents – up to half in many cases.
  • During review, organisations can use many automated techniques to identify responsive, privileged or irrelevant documents. In addition to predictive coding, they can use near-deduplication, clustering, advanced searching and data analysis. What’s more, it can be helpful to use multiple techniques and compare the results, as a form of quality assurance.
  • A final run of predictive coding and other analytical techniques over production sets can provide a safeguard against revealing privileged information.

Combine technologies to reduce costs

It is now virtually impossible, and certainly prohibitively expensive, for a human being to review every document that emerges through the discovery process.

Many in the legal profession are looking to one technology – predictive coding – as the answer to this dilemma. By prioritising or minimising manual reviews, predictive coding can take out a major component of discovery costs.

However, predictive coding is only one of many ways organisations can make eDiscovery faster, cheaper and more strategic. By applying multiple advanced technologies, litigants can quickly decrease the volume and prioritise the relevance of the evidence they hold to gain a clear and detailed picture of the entire case faster than other parties. They can exploit this information to set an advantageous settlement strategy which is more likely to avoid the costs of litigation and the need to disclose sensitive information.

Nuix offers a suite of powerful, integrated tools and workflows to streamline the entire eDiscovery process, including legal hold, collection, processing and search, and review. The end result is lower costs throughout litigation, from collection to court.

About Jim Kent, CEO, EMEA

Dr. (h.c) HND/Eng

Jim Kent has over 15 years experience as a pioneering digital forensics investigator, eDiscovery consultant and high-level advisor within law enforcement, government, financial and commercial sectors. He is a contributing author of the ACPO guidelines for Digital Evidence, the de-facto standard used within law courts. Previously Managing Director of 7Safe’s Digital Forensics and eDiscovery consulting units, James led the company to its position as one of the leading consultancies in that market today.