By Dr. MuckAI Girish-
(MuckAI Girish is the co-founder & CEO of the Generative AI company, Rockfish Data, which is developing a state-of-the-art Synthetic Data Workbench Software for Data Scientists. The views in this column are his own)
Artificial Intelligence (AI) is at its prime. According to Forrester, “Firms that actively harness generative AI (genAI) to enhance experiences, offerings, and productivity will realize outsized growth and will outpace their competition.”
A majority of enterprises are now working with genAI projects, albeit, at varying levels. AI practitioners realize that creating scenarios and training their models across these and measuring the outcome to tweak the models are necessary to realize effective value from these models. One of the biggest hurdles in scenario analysis is the lack of access to relevant data.
For example, the historical financial transactions available to a credit card company may not represent various fraud levels for different segments of customers that are possible. However, to effectively train a fraud detection and prediction model, it is imperative to have various scenarios play out during training. Reducing false negatives, especially among certain customer segments, and reducing false positives, in general, would make such a model robust and accurate.
READ: MuckAI Girish’s previous columns featured in indica
Amplifying the fraud level for certain customer segments is a non-trivial task, especially if all the other dependencies and statistical characteristics have to be maintained. A versatile synthetic data platform can be used to make this happen.
The first step is to ingest the historical data into the platform. If needed, PII (personally identifiable information) can be redacted during this phase. The synthetic data platform then trains its model using this data.
Then, this model can be used to generate synthetic data. One can generate literally unlimited quantity of such data. During the generation phase, a comprehensive synthetic data platform would allow the user to specify conditions to create various scenarios. For example, the user may want the fraud level to be 10x for customers with, say, FICO scores of 760 and above. Or the number of transactions per week to be 0.5x for customers during a specific week-end. The specific needs are based on the scenario analysis being planned by the user that would be used to experiment with the model or outcome.
Let us look at another example where a financial lender is caught between the need to lend more and fairly and positively impact underserved markets, all while maximizing profits and minimizing risk. This is a non-trivial problem to solve. However, with an effective scenario analysis, the lender may be able to find the sweet spots to meet their objectives and not get shackled by the constraints. By generating relevant and realistic synthetic data to represent scenarios of interest, this gives the lender the ability to identify the boundaries, the variables and determine the characteristics that matter. Synthetic data again comes to the rescue to help create the relevant datasets for scenario analysis.
According to Ian Goodfellow, “Generative models are a key enabler of machine creativity, allowing machines to go beyond what they’ve seen before and create something new.”
By augmenting existing data with carefully selected and generated synthetic data, enterprises can explore beyond what they have seen before, study scenarios and tune and train their models effectively. By unleashing the true power of synthetic data, we can accelerate our journey through the genAI curve.