Mostly AI launches open-source toolkit for synthetic data

Thu, 6th Feb 2025

FYI, this story is more than a year old

Austrian synthetic data firm MOSTLY AI has released the world's first industry-grade open-source toolkit designed to produce synthetic data for AI training purposes.

This new toolkit provides developers, businesses, and enterprises the tools to generate synthetic data from their existing data internally, without needing to enlist services from MOSTLY AI. With increasing data privacy concerns inhibiting AI training, this toolkit is available at no cost, aiming to support further AI product development.

Elon Musk recently highlighted that the AI sector faces a shortage of data for training large models, while privacy and security issues around customer data remain significant barriers to AI advancement within enterprises. MOSTLY AI has responded by making its state-of-the-art technology widely accessible.

Founded in 2017, MOSTLY AI is known for its expertise in privacy-preserving synthetic data and has secured USD $31 million in funding, including a USD $25 million Series B round in 2022. The firm's clientele includes major global institutions such as Citi Bank, the U.S. Department of Homeland Security, and Telefonica.

By 2026, it is projected that 75% of businesses will utilise generative AI for creating synthetic customer data, growing from the current 5% in 2023. As part of addressing this growing demand, the UK's AI Opportunities Action Plan encourages the exploration of synthetic data to construct privacy-preserving datasets.

The synthetic data produced by MOSTLY AI mimics real data's statistical and analytical value, yet poses no privacy risks as it contains no personal identifiers. This feature allows seamless sharing within and between organisations, facilitating new AI developments.

MOSTLY AI believes that synthetic data offers a solution to the constraints currently affecting AI innovation.

Alexandra Ebert, Chief AI and Data Democratization Officer at MOSTLY AI, expressed the challenge facing many enterprises: "Enterprises across the world are stuck between a rock and a hard place. They know they need to rapidly innovate their AI capabilities to stay ahead of the curve – but they're forced to lock up the customer data needed to do that for fear of breaking data privacy regulations."

She further commented on the broader impact on both corporate and societal levels: "Both among C-suite executives and society more widely, the huge potential of AI to move the dial on some of the world's most intractable issues is being held back by the inability of organisations to use their proprietary, sensitive data for AI training and development."

Explaining the motivation behind this open-source release, Ebert said: "Our mission has always been to empower every business and every individual with safe access to data. With the open-source release of our industry-proven synthetic data toolkit, we can unlock the ability of all businesses to harness the full power of their proprietary data with zero compromises on privacy."

Preferred Source

Mostly AI launches open-source toolkit for synthetic data

FinTech

Industry

MarTech

Infrastructure

Commerce

Enterprise

Cybersecurity

Telecomms