Technology assisted review in the age of big data

Epiq Systems’ Saida Joseph, international director of document review services and Celeste Kemper, director of document review services Asia, reveal how the latest technology assisted review (or predictive coding) is providing a critical advantage in the search and analysis of vast data volumes.

We now live in an age of ‘Big Data’, a period defined and controlled by the exchange of massive amounts of electronic information. Data is erupting from email accounts, smart phones, tablets, social communities, and search engines; it crosses borders, takes new forms, and is housed in virtual clouds. Every year, a typical Fortune 500 company can produce several petabytes of electronic information. According to a 2009 surveythe approximately 800 exabytes – that’s 800 billion gigabytes – of information created in 2009 will have grown to 35 trillion gigabytes in 2020. That’s nearly 44 times more than the 24 DVDs worth of information estimated to have been created in 2009 for every man, woman, and child on the planet.

To prepare for and respond to legal requests for discovery and disclosure in this digital age, new technologies for searching and analysing large volumes of electronically stored information (ESI) are necessary. Typically, teams of lawyers are utilised to review documents for relevancy, privilege, confidentiality, fact development, and early case assessment. Technology assisted review (TAR) is the latest revolution in ESI technology that is helping minimise the volume of data and intelligently analyse content.

TAR refers to a type of machine-learning technology that uses input from a human reviewer and analytics to help identify responsive or important documents. Using this technology, a case expert reviews a sample of documents and codes the documents as either relevant or not relevant. The software applies a principal known as statistical learning theory to recognise complex patterns in the data and actively learns from the reviewer’s coding decisions. Once the software is trained, it is able to predict the likely relevancy of the document collection.

TAR offers several advantages over traditional approaches to document review. It provides metrics about a document population that a hit list from keyword searching does not provide. This can be extremely valuable for early case assessment, developing case strategy, and designing a more efficient and cost-effective review workflow. TAR also removes human bias inherent in keyword searches as initial assumptions about the facts and evidence often change throughout the disclosure process. The software can also be used for reviewing document collections containing multiple languages consistently.

One of the biggest myths about TAR is that the technology is a threat to legal practice because machines are replacing lawyers. In reality, TAR is about injecting augmented intelligence into the legal process and humans and machines working together. With the volume of data growing exponentially, human linear review of documents is difficult in legal cases without extreme cost, undue burden and lengthy timelines. But machines alone are not the answer. The use, and the value, of the output is solely dependent on intelligent input and training from a human expert.

Understanding the technological tools available for analysing and reviewing large volumes of data is critical for survival in the age of Big Data. The sheer volume of data, and the variety of ways in which that data can now be transferred and received, adds a complexity to the review process that challenges traditional practices. While TAR may not be the appropriate tool in every case, knowing how and when to use TAR can provide a competitive advantage in this digital age.