Sentra Claims New AI Engine Classifies Unstructured Data with 99% Accuracy, Cuts Costs 10x
<p>There is a quiet crisis unfolding in the AI boom. In the push to wire GenAI into every workflow, companies are quietly giving these systems access to massive stores of […]</p> <p>The post <a ...
There is a quiet crisis unfolding in the AI boom. In the push to wire GenAI into every workflow, companies are quietly giving these systems access to massive stores of unstructured data. Most of it they barely remember creating. Slide decks, old contracts, customer support tickets, internal emails, and more. This data is often fed into a system that can use this for their AI systems. Once it’s in the loop, it is difficult to pull it back out again.
That’s the problem Sentra is stepping into with its latest release. The company has developed a new unstructured data classifier built on Small Language Models (SLMs) — compact and domain-aware models. The company claims that the classifier can analyze petabytes of enterprise data with 99% accuracy, at far lower compute cost than traditional LLMs. This goes beyond optimization and offers a more structural shift in how organizations can secure the most overlooked, most dangerous part of their AI stack.
We know that unstructured data has become a threat multiplier. It now makes up nearly 80% of the data created inside modern enterprises, and most of it is generated in ways that never pass through formal oversight. It spreads through an organization faster than any team can meaningfully track. Even the security controls designed for structured systems simply don’t reach this part of the environment.
GenAI changes the stakes even more. Models don’t separate polished documents from stray thoughts buried in old directories. In fact, they treat the entire 80% as usable material. That’s how internal discussions, sensitive project details, or early-stage ideas can begin influencing outputs without anyone realizing where they got put into the system from. So when the whole unstructured layer slips into the AI pipeline, that’s when the risk becomes too much to ignore.
The legacy Data Security Posture Management (DSPM) tools were built to help companies find and protect sensitive data across their cloud environments. And to be fair, they’ve done a decent job with the structured data such as databases and spreadsheets, where the data is easy to read and predictable. So for such types of data, the legacy tool usually work well enough.
However, unstructured data is a different beast. Most of the older DSPM tools just aren’t built to deal with that. They lean on pattern matching or keyword rules, which either miss the important stuff or drown teams in noise. Some have tried plugging in large language models to fix it, but those models are expensive to run and too broad to really understand what sensitive data looks like inside a specific company.
According to Sentra, the core difference lies in how its system interprets context. Instead of relying on general-purpose LLMs or brittle regex rules, the company says its platform uses a set of SLMs which allow it to go beyond surface-level matching to infer the role and sensitivity of a document based on its actual content and usage.
That includes segmenting data by department, geography, and ownership, and detecting proprietary information like internal product names or early-stage research drafts. Over time, the company says the models adapt to each organization’s data environment by learning new patterns and refining classification without requiring repeated full scans.
Sentra positions this as a way to bring structure and policy enforcement to unstructured data without introducing significant compute overhead. The engine runs without moving or copying data, supports over 70 languages, and is designed to improve classification precision over time — assuming the company’s claims hold up in practice.
The company also cites performance benchmarks to support those claims. It reports a false positive and false negative rate of under 1%, compared to Cyera’s 93% accuracy (with over 5% error rate). Scanning 100 petabytes costs around $40,000 on Sentra’s system, versus $400,000 or more for LLM-based platforms. In one example, Sentra says it processed 9 PB in under 72 hours, while competing tools took more than 600 hours, or failed to finish at all.
Sentra’s approach reflects a broader shift in data governance: moving away from static rule sets and toward adaptive and context-aware systems that can scale with how businesses actually work. Whether that vision delivers across complex real-world environments remains to be seen. But early adopters appear optimistic.
“What impressed us most is how Sentra’s AI classification engine continuously learns from our environment. It not only identifies sensitive financial data at scale but evolves with our business — giving us confidence to safely integrate GenAI into more of our workflows,” said Zachary Schulze, Sr. Staff Application Security Engineer, SoFi.
From the company’s perspective, the goal is to provide a technical foundation for more secure and scalable AI integration. “Generative AI offers incredible potential, but it also introduces new data risks if sensitive or proprietary information isn’t properly governed,” said Yair Cohen, VP of Product and Co-Founder of Sentra. “Our AI classification engine provides unmatched scale, accuracy, context and adaptability — empowering enterprises to innovate with GenAI safely, responsibly and cost-effectively.”
With unstructured data increasingly flowing into AI pipelines, how, and how well organizations manage that risk may prove to be one of the defining challenges of enterprise AI in the years ahead.
The post Sentra Claims New AI Engine Classifies Unstructured Data with 99% Accuracy, Cuts Costs 10x appeared first on BigDATAwire.
