ZyLAB, leading provider of eDiscovery and information risk management solutions, today announced that the US Patent and Trademark Office (USPTO) has granted the company a patent for its exact and near de-duplication technology in its eDiscovery solutions and Information Risk Management platforms.
Locating, deleting and removal of exact- and near-duplicates drastically reduces the volume of electronically stored documents in Big Data collections that have to be searched and reviewed by humans in case of an electronic discovery (e-discovery), regulatory audit or fraud investigation. It also decreases the risk that an old version of a document is found and used.
The patented technology covers the process of detection of duplicate and near-duplicate emails (properties, email body and attachments), electronic documents and other electronic content (all referred to as objects), the tagging of these potential duplicate and near-duplicate objects and the visualization of these objects to the end user.
“Detecting exact-duplicates can be done by using “hashing techniques” that make it possible to detect documents that are exactly the same or have exactly the same document properties. This technique however cannot be used to identify near-duplicates as even the smallest difference in a document will trigger a very different hash code. Vice versa an almost similar hash value does not at all guarantee that two documents are similar.” says Johannes Scholtes, Chief Strategy Officer of ZyLAB.
Common algorithms and methods that allow near-duplicate detection require huge computational resources because the necessary memory and number of calculations increases quadratically with the number of documents involved in the process. Since email and hard disk collections can include millions of documents, such de-duplication processing is computationally unacceptable.
Johannes Scholtes: “Under the new patent, the R&D team of ZyLAB developed a novel near de-duplication that provides a “computational linear behavior” which by-passes the need to compare each single document. This makes this method and system much faster than conventional systems and methods.”
Additional advantages of the new method are that exact and near duplicates are recognized based upon the full-text so that users can set an understandable measure of similarity to determine near and exact-duplicates. The methodology is language and domain independence and also works well with text that is linguistically not perfect or contains errors.
ZyLAB’s industry-leading, modular eDiscovery and enterprise information management solutions enable organizations to manage boundless amounts of enterprise data in any format and language, to mitigate risk, reduce costs, investigate matters and elicit business productivity and intelligence.
The ZyLAB eDiscovery system is directly aligned with the Electronic Discovery Reference Model (EDRM) and the company’s products and services are used on an enterprise level by corporations, government agencies, courts, and law firms, as well as on specific projects for legal services, auditing, and accounting providers. ZyLAB systems are also available in a Software-as-a-Services (SaaS) model.
ZyLAB is positioned by Gartner, Inc. as one of the strongest “Visionaries” in the 2013 Magic Quadrant for eDiscovery Software and has received numerous other industry accolades over the last 3 decades.
Headquartered in McLean, Virginia, and Amsterdam, the Netherlands ZyLAB also serves local markets from regional offices in New York, Barcelona, Frankfurt, London, Paris, and Singapore. To learn more about ZyLAB visit www.zylab.com