How to investigate a firm with 60 million documents
- Published
Imagine having to search through all the documents, emails and messages of a huge multinational company.
Yousr Khalil does not have to imagine. The forensic accountant was part of a team that had to ferret out proof of wrongdoing at the aerospace giant Airbus after it admitted paying bribes via middlemen.
"Airbus was like a tower block with 900 apartments in it. We had to decide which ones we were going to go into and investigate," she says.
Ms Khalil works for FRA, a forensic investigation business that supports legal cases across the globe.
But this was a case apart and FRA's largest ever job.
In order to qualify for a Deferred Prosecution Agreement (DPA), Airbus opened up its operations to intense scrutiny in 2016.
The four-year project to root out corrupt practices helped Airbus reach the agreement with regulators in the UK, the US and France under which it paid €3.6bn (£3bn) of fines in recognition of acts of fraud and bribery.
Ms Khalil and a 70-strong team faced an ocean of files, transaction data and emails spanning worldwide activities, most of them entirely innocuous.
So how did they plot a course through?
Artificial intelligence (AI) and a bespoke computer unlike any PC you have ever worked on played a big part in this epic data trawl.
A daunting collection of 500 million documents and transactions had to be whittled down.
As data volumes are growing exponentially AI is being used more frequently in such investigations.
After duplicates and other irrelevant material were eliminated the investigators were left with 60 million documents for review. AI searched these for patterns and spotted snippets that were out of place, such as a sports sponsorship deal for $100m.
How were relationships with Airbus staff while all of this was going on? "No business is ever really ready for a full forensic investigation," Ms Khalil says, but her co-workers from Airbus were very responsive. "When the regulator pushed for a quick response on something they moved on it."
As if 60 million items were not enough of a challenge, 800 Airbus employees around the world were legally assigned as custodians of those documents.
"You might have information spread across different items of media, such as laptops, storage devices, USB drives etc. We had to identify who was the custodian of that data," says Greg Mason, founding partner and co-head of data analytics at FRA.
Seven secure investigation sites were set up. These allowed documents to be examined in complete security, a crucial point for Airbus. It is a vast business enmeshed with major European military aircraft projects. So the investigation had to devise a way to keep material that was nationally sensitive out of the picture.
Specialised software allowed the collection of information without seeing the entire document it came from, thus preserving secret defence information from prying eyes.
In addition, bespoke, $100,000 computers, running multiple disks and with no connection to the internet were used.
This is called air gapping, providing a definite divide between sensitive data and the outside world of the internet.
Processing a mountain of data gets easier and faster if it's treated as just that - data. FRA extracted the metadata, the information underlying every electronic document that defines what it is, and used this to index material so that irrelevant files could be stripped out.
AI formed the basis for this Technology Assisted Review (TAR).
AI was trained to search unstructured data such as emails. These are tough to scan unlike structured data contained in forms and columns.
Using the principle of machine learning, whereby the AI software sees multiple examples of a particular type of message and begins to spot which category they belong to, FRA was able to extract relevant documents at a pace. "The AI program looked for the context of messages, context is all," Mr Mason observes.
The software was hunting for bribes that were arranged via codes, such as a doctor prescribing a medicine. By running examples of this kind of hidden message the software acquired the concept of medicine and then the concept of prescription. This meant it could wade through unstructured data and spot corrupt practices.
"As you identify more and more examples of covert payment the AI learns on the fly. That's the beauty and the magic of AI," says Mr Mason. A scoring system was set up, with points added for certain attributes. Any score above a certain number was deemed worthy of further investigation. The machine-learning technology became better and better as it progressed.
Mr Mason reckons only about 5% of the documents set aside were checked by people, but that still amounts to three million files. "AI is not a panacea, but it is pretty extraordinary how it learns."
A statistician by training, he is impressed by how AI technology makes short work of big numbers. "Even a small case today comes with an enormous volume of data."
He had to sell the novel concept of the TAR to regulators such as the UK Serious Fraud Office (SFO) and get approval for what was not a traditional approach to an investigation. "This was the most complex investigation I had ever set up."
A four-year investigation sounds exhausting. But unmasking fraud with an AI assistant gave the team a lot of personal satisfaction.
And their labours received a legal seal of approval.
Dame Victoria Sharp, one of the most senior civil court judges in England and Wales, summed up the far-reaching impact of this investigation with its prominent role for AI.
Speaking for the British end of the tri-national case in January 2020 she declared that Airbus "truly turned out its pockets and is now a changed company to that which existed when the wrongdoing occurred".