David Tracy: Unlocking the power of synthetic data for good
David Tracy discusses the potential of synthetic data generation in overcoming data access, privacy, and compliance issues, and fostering innovation in the financial sector, with a focus on the role of Smart Data Foundry’s synthetic data engine, aizle.
Good quality data is critical to financial innovation, but we know that data access, permissions and security are the biggest factors holding innovators back.
The use of real-world data comes with huge responsibility and regulation which can create complexities when problem solving. This is where we believe synthetic data generation can really make a difference and help tackle some of the biggest issues facing the financial sector.
It is also an area gaining increasing attention as rapid progress in Machine Learning and AI combined with ever-increasing computing power means it’s possible to create higher quality artificial data than ever before.
Whilst real-world data contains information about real people, entities, real events and real interactions, synthetic data provides the same accurate and reliable insight, except some or all of the people, entities, events, or interactions are artificial.
This means synthetic data is more accessible than real-world data as it removes the complexities around privacy and compliance, meaning it can facilitate cases that would otherwise be problematic, enable applications that would otherwise be impossible or augment real-world data to enable analyses that would otherwise be more difficult.
This could include, for example, developing a rapid prototype, training an AI model, or running scenarios on the strategic impact of a new initiative.
At Smart Data Foundry, we think of three factors when assessing how ‘good’ synthetic data is – fidelity, privacy and utility - how similar the synthetic dataset is to the ‘real-world’ data, the risk of identifying real people, and the ‘usefulness’ of the synthetic data.
Our synthetic data engine aizle produces customised synthetic data sets which contain the important and meaningful features of real-world data but without requiring any real-world input data to generate its synthetic data sets, removing privacy and other data risks.
Based on this approach, the engine can provide data where no real data is available or can provide alternative data when available data is inadequate or has bias, enabling innovation in areas previously thought too difficult or expensive to explore.
So how does this work in practice? Towards the end of 2022, we provided customised datasets to drive such innovation, working with the Financial Conduct Authority (FCA) and Payment Systems Regulator (PSR) to innovate in the area of Authorised Push Payment Fraud, one of the fastest growing financial crimes in the UK.
The data environment around this type of FinCrime is complex, combining banking systems, telecommunications systems and criminal data around the outcomes of scam reporting. The many regulations in play from organisations ranging from the FCA, to the Information Commissioners Office, the PSR and OFCOM, in reality would limit access to the data needed to innovate.
This is an example case of where synthetic data shines. We used our aizle agent-based simulation approach to augment a world in which these scams manifest and then created the synthetic criminals to interact with the world we created. This generated a detailed customised dataset that was worked on by financial service providers, innovators, academics and regulators during a three-day TechSprint.
We’re only starting to see the potential of synthetic data being realised, with its power and rise reflected in market growth. Gartner estimates that by 2024, 60 per cent of data for AI applications will be synthetic, and the total of publicly-known funding for synthetic data companies reached $328 million in October 2022 - $275 million more than in 2020.
In terms of open banking and the fintech community, helping organisations who have real issues around data availability take those first steps to creating specific synthetic data sets could be a real game changer. Particularly when we start to see synthetic data driving new propositions from innovators that will benefit customers, businesses and wider society.
David Tracy is head of data products at Smart Data Foundry