What’s next for predictive coding?

Those of us in the eDiscovery industry recall that 2012 was declared the ’year of predictive of coding’. Despite a similar prediction for 2013, many of us assumed predictive coding would continue to be the focus for corporate clients and become further engrained in legal culture. However, even though adoption rates seemed to level off, and widespread usage remains relatively low, several software companies are continuing to develop the next iteration of predictive coding technology for the discovery process. As it stands, even though we are many years away from artificial intelligence replacing virtually all of the human labour involved in the review process, it is likely that 2014 will mark another defining moment in the evolution of predictive coding. 

Cost considerations aside, the perceived difficulty of use is a legitimate barrier in the market. Even though there is strong support for predictive technology in some legal circles, many of these lawyer-advocates already have a good understanding of technology and are outliers, constituting a discreet minority in the profession. Attendees of eDiscovery conferences will note that the audience is often very homogenous. This is not a mere coincidence; it reflects the reality that eDiscovery remains a niche practice, tangential to the merits of the case, and the interest in the topic to the Bar, in general, is limited. Naturally, eDiscovery advocates will argue this is a myopic view by our colleagues; as the impact of eDiscovery on the litigation process (and litigation budgets) cannot be overstated. Nonetheless, if predictive coding is to become standard practice, it must be more accessible to the majority of professionals who aren’t interested in calculating recall and precision or don’t know their confidence interval from their correlation coefficient. 

One way to achieve this greater accessibility is to prove predictive coding’s worth beyond mere culling of documents. There are many ways to reduce the volume of data subject to review, but this benefit alone is enough to take predictive coding out of the province of massive cases and put the technology to work in everyday cases. But culling seems to be the focus of too many enterprises shopping for a predictive coding solution. After all, most of the predictive coding platforms in the market openly tout first pass review elimination as their primary benefit. This is an excellent achievement but is also one that minimises the power of predictive coding by treating it simply as a strategy to defensibly delete sensitive information. There are additional ways to use the technology, but they require a rethinking of what predictive coding has come to mean in the first place. Instead of just using it to reduce the number of documents subject to review, predictive coding can be used in conjunction with a variety of advanced analytics to prioritise the review process; helping the review teams understand the nuances of the case by giving them time to learn and understand the facts behind it. Predictive coding can also be used to prioritise the review so legal teams can spend their time (and client resources) looking at the data that will actually impact the merits of the case. Predictive coding strategies should be wrapped in engaging visualisations to allow for advanced internal investigations that can quickly turn up areas of concern. These are the types of uses that should be expected from eDiscovery vendors by corporate clients looking to get more use out of their predictive coding technology.

In order to take predictive coding beyond a mere culling tool, it is critical that it is combined with other technologies and search methodologies to become more intuitive and subsequently more useful to the user.  Of course, in order for this approach to succeed, it must first defeat a point of view that is widespread in the market: the idea that random sampling is the only way to generate a training set for predictive coding. 

The argument for using only random sampling to generate the initial data seed set for predictive coding is often based on an assumption that the best results are unbiased. However, by treating the discovery process like a science experiment, litigators and vendors ignore the presence of bias that occurs naturally – though not through traditional processes. By basing a search on pre-existing information, rather than randomly looking through all the information, users are able to find the information they’re looking for more efficiently, cutting down time and costs of the process. This is not to say that a degree of randomness in the search process is not helpful, but relying on nothing but random sampling only creates a false sense of statistical certainty that cab belie the complexity of the data management process that is influenced by outside factors. Outside counsel and legal departments must stop buying vendor claims that are based on lab work in favor of solutions that reflect real life needs and scenarios.

Additionally, predictive coding can achieve wider adoptability by looking for additional use cases that relay its’ relevance to professionals besides general counsels and IT. One of these use cases is information governance. Recent survey data shows that information governance, although an important issue for corporations is yet to become widespread.  This is understandable due to the time and financial commitment that comprehensive information governance programs require. However, their inherent value cannot be understated; predictive coding-based workflows allow users to take what they learned through eDiscovery and use it for proactive information management. By merging predictive coding and information governance, “predictive governance” might be the most likely vehicle to take information governance out of the experimental area and make it more of a reality. After all, any governance program that relies strictly on custodians as the primary method of classification is unlikely to succeed in the long run. 

Whatever the future may hold for predictive coding and the vendors selling these solutions, it is clear that the market for predictive coding is in the early stages of its evolution. Better user interfaces and experiences, more integrated technology suites, and automated workflows are all critical components in the evolution of predictive coding. In 2014, these technologies will be critical for companies looking to successfully integrate predictive coding as a larger part of their business and vendors wanting to expand their reach from the legal department to the boardroom.