It’s hard to think of JPMorgan Chase Bank — a behemoth with more than $3 trillion in total assets, market cap of $415 billion, the fifth biggest bank in world — as a hapless victim deserving our sympathy. But here’s what happened, according to allegations from the DOJ, SEC, and the bank itself.
In 2021, JPMorgan agreed to acquire college financial planning platform Frank for $175 million. It was buying user data — names, email addresses, and phone numbers — for 4.25 million real people, or so it believed.
The bank insisted on verifying Frank’s user list during due diligence. Frank’s 31-year-old founder and CEO, Charlie Javice, objected to disclosure because of user privacy.
Eventually the two sides compromised. Javice agreed to provide Frank’s user list to a third-party “validator” who would report its results directly to JPMorgan. The validator — Nasdaq-listed Acxiom, a big player in the data industry with more than 2,000 employees — reviewed the list. It reported to JPMorgan that “100 percent of the 4,265,085 entries it reviewed had data in the first name, last name, email address, and phone fields.”
What JPMorgan didn’t know was that Frank’s user list came from a data science professor Javice (pictured here) had hired. The professor used Frank’s list of less than 300,000 real users as a sample. From it he created synthetic data, and that’s mostly what the validator reviewed.
With the validator’s report in hand, JPMorgan closed the deal. Javice’s share of Frank’s $175 million sale price was supposed to be about $41 million.
Soon after closing, JPMorgan sent 400,000 emails to test the user list. The emails had dismal delivery and opening rates, far below typical rates JPMorgan experienced with other email campaigns.
JPMorgan shut Frank down in January 2023.
——
What is synthetic data? Unlike forged art or counterfeit money, legitimate companies produce and market it, and reputable customers use it for lawful purposes.
Many companies need data in vast amounts to test algorithms, scalability, and machine-learning systems. When they can’t use real data — because it doesn’t exist yet, or there are privacy restrictions or proprietary concerns — they use synthetic data instead.
Telecoms and retailers use synthetic data to test customer-level systems. Health care companies use it to debug patient-tracking programs. Payment processors need synthetic data to dry-run anti-fraud protections before they go live. And so on.
Here’s the tricky part about synthetic data: quality matters. To be useful to legitimate buyers, synthetic data must mimic real data. It’s meant to fool the world’s smartest machines. As a result, the best synthetic data is extremely hard to differentiate from the real thing.
As MIT computer scientist Kalyan Veeramachaneni puts it: “You can take a phone number and break it down. When you resynthesize it, you’re generating a completely random number that doesn’t exist. The result is a data set that contains the general patterns and properties of the original — which can number in the billions — along with enough ‘noise’ to mask the data itself.”
——
The due diligence problems become apparent. Even high-level data analytics may not detect that it’s synthetic. And anyway, how deep into verification can potential buyers go? Often during pre-acquisition there are data-protection and privacy concerns, legal restrictions like GDPR, and reluctance to expose proprietary data.
What’s the answer? Should all synthetic data producers tag their products with disclaimers? Is that feasible or would electronic watermarks render synthetic data unfit for purpose? What about government regulation and oversight — an FDA for synthetic data? Would that keep synthetic data out of unscrupulous hands, or would it drive more criminality underground?
And this: How many more times will synthetic data confound due diligence? Good question. The emerging synthetic data industry is more competitive and sophisticated every day, and we’re still at the start of the journey. No one knows where it’s all going but there are bound to be surprises.
——
Earlier this month, the DOJ charged Charlie Javice with four fraud counts, three punishable by up to 30 years in prison and one up to 20 years. The SEC added civil fraud charges and is asking for disgorgement and civil penalties.
JPMorgan filed a civil lawsuit against Javice in January
(Neither the data science professor Javice allegedly hired to produce the synthetic user list nor the validator of the list, Acxiom, are named as defendants by the DOJ, SEC, or JPMorgan.)
Javice — who appeared on the Forbes “30 under 30” list in 2019 — counter-sued JPMorgan. She says the bank “manufactured” reasons to fire her and avoid paying money she’s due from the sale of Frank.
Javice’s lawyer, Alex Sprio, cast her as victim. He told the Wall Street Journal in January: “After JPM rushed to acquire Charlie’s rocketship business, JPM realized they couldn’t work around existing student privacy laws, committed misconduct, and then tried to retrade the deal. Charlie blew the whistle and then sued.”
At publication time, Javice — who’s presumed innocent unless convicted in a court of law — hasn’t entered a plea to the DOJ’s criminal charges and is free on $2 million bond. She hasn’t responded in court yet to the SEC’s civil fraud complaint.
In the lawsuit with JPMorgan, she asked the judge to dismiss the case and compel arbitration of both parties’ claims under provisions of her agreements with the bank.
Comments are closed for this article!