Deliver Your News to the World

Introducing Scorecard: An Evaluation Platform to Test and Deploy AI Agents 100x Faster

Founded by ex-Waymo Simulation Lead, the platform allows AI developers to run tens of thousands of evaluation tests a day to rapidly iterate and improve their products


San Francisco, CA – WEBWIRE
Scorecard’s Founder and CEO Darius Emrani
Scorecard’s Founder and CEO Darius Emrani

Scorecard, an AI agent evaluation platform, today announced $3.75 million in seed funding from Kindred Ventures, Neo, Inception Studio, Tekton Ventures, along with angel investors from OpenAI, Apple, Waymo, Uber, Perplexity, Meta, and others. The company’s technology lets developers test and break their AI agent products continually at high frequency in a virtual environment to rapidly iterate and improve performance. The company has already run millions of tests for customers, including Thomson Reuters which is using Scorecard to test and deploy CoCounsel, the company’s suite of legal AI agents.

Flood of AI Agents with Little Quality and Feedback Mechanisms

In the coming years, millions of AI agents are set to be built and deployed across legaltech, fintech, healthtech, and insurtech, but testing remains slow and error-prone. Manual evaluation often requires writing custom scripts, curating datasets, and exporting results, all of which can take days or weeks and introduce human error. This sluggish process not only delays feature rollouts but also obscures blind spots in an AI agent’s behavior, creating risks to compliance, security, and user trust. Without fast, repeatable validation, teams struggle to ship innovations confidently or respond quickly to production issues.

"At Thomson Reuters, the reliability and effectiveness of CoCounsel Core, our professional-grade legal AI assistant, are paramount,” said Tyler Alexander, Director of AI Reliability at Thomson Reuters. “Scorecard enables us to scale our continuous monitoring efforts and make them vastly more efficient.”

A Programmable Platform for 100x Faster Iteration

Scorecard AI’s core is a fully managed evaluation engine that lets teams define test suites in minutes using a no-code API or our TypeScript SDK. Users can script end-to-end scenarios, from conversational prompts and compliance checks to performance benchmarks, and execute tens of thousands of tests per day against live or staged AI agents. All results feed into an interactive dashboard with real-time metrics, failure reports, and trend analysis, making it easy to spot regressions, diagnose edge-case errors, and measure improvements over time. By integrating directly into existing CI/CD pipelines, Scorecard AI automates continuous validation, so every code commit, model update, or configuration change triggers a fresh round of rigorous testing without human intervention.

“At Waymo, we saw firsthand how millions of simulations can make the difference between a working prototype and a category defining product that changes the world,” said Darius Emrani, founder and CEO of Scorecard. “With Scorecard we’re giving companies the tools to iterate on AI agents at unprecedented velocity, reducing months of manual testing to seconds.”

Emrani spent many years building testing and simulation technologies, previously serving as the product manager for this critical function at Waymo. As product lead, he helped grow the simulation & evaluation team from a few dozen engineers to over 200, partnering with engineering leadership to design and scale the infrastructure that ran millions of scenario evaluations each day. His team’s work directly powered the world’s first commercial autonomous ride-hail service, operating across multiple states and completing tens of millions of rides. Prior to Waymo, Emrani led simulation product at Uber’s Advanced Technologies Group, where he built the evaluation frameworks used to stress-test self-driving systems at scale. Driven by those experiences, he founded Scorecard AI to democratize high-speed, large-scale testing for every AI team.

“With millions of AI agents set to be deployed over the coming years in regulated industries like legal, finance, and healthcare, trust in AI isn’t optional, it’s mission-critical,” explained Steve Jang, General Partner at Kindred Ventures. “The Scorecard team’s experience testing and deploying self driving car systems at Waymo and Uber, uniquely equip them to build robust and reliable evals for agents in virtual or physical environments"

AI agent developers can learn more and try Scorecard at scorecard.io.

About Scorecard

Scorecard AI provides a modern evaluation platform to test and deploy AI agents 100x faster. With seamless integration into development workflows and the industry’s first open-source TypeScript framework, Scorecard AI empowers teams in legaltech, fintech, healthtech, compliance, and insuretech to test, measure, and deploy trusted AI systems at unprecedented speed. For more information or to request a demo, visit scorecard.io.


( Press Release Image: https://photos.webwire.com/prmedia/81468/344343/344343-1.jpg )


WebWireID344343




 
 Ai Evaluation
 Ai Testing


This news content may be integrated into any legitimate news gathering and publishing effort. Linking is permitted.

News Release Distribution and Press Release Distribution Services Provided by WebWire.