Inside the AI Data Prep Playbook: Lessons from Overstock, Adobe & MLtwist

In April 2025, MLtwist and Vectara hosted the AI Data Readiness Forum—a focused, in-person event in Palo Alto that brought together leaders at the forefront of generative AI data strategy.

In this panel, executives from Overstock, Adobe, and MLtwist share how their teams are tackling the messy realities of AI data preparation, from building scalable pipelines to managing RAG sprawl and aligning infrastructure with business goals. Whether you’re refining your data stack or just starting your AI journey, the insights shared here offer a grounded look at what it really takes to operationalize AI today.

Panel Summary

Title: AI Data Readiness Forum: AI Data Preparation Insights
Hosts: Vectara / MLtwist

Speakers

Ravi Shankar, Manager of Machine Learning, Overstock
Aditya Gupta, Machine Learning Engineer, Google
Audrey Smith, COO, MLtwist
Sai Kumar Arava, AI Implementation Leader, Adobe

Session Background

This session centers on the evolving challenges and strategies in evaluating and labeling data for machine learning (ML) systems, particularly in the context of generative AI, foundation models, and agentic workflows. The panel explores synthetic data, KPI trade-offs, labeling automation, and the complexities of delivering high-quality, scalable data in real-world applications. Topics include:

– The rise of synthetic data for training and evaluation.

– The shift from manual to AI-assisted data labeling.

– Balancing quality, cost, and timeline in large-scale ML projects.

– Best practices in evaluation frameworks and pipeline scalability.

Challenges & Learnings

Evaluation Systems

– Creating useful evaluation (eval) sets requires tight alignment with use cases and customer needs.

– Enterprises now need comprehensive, scalable evals to support complex agentic workflows.

– Maintenance and iteration of these evals are crucial as benchmarks and models evolve.

Key KPIs in Evaluation

Data Quality: Always the top priority.
Cost: Impacts volume and feasibility of evaluations.
Timeline: Influences how thorough evaluations can be, especially under production deadlines.

Synthetic Data

Synthetic data helps fill gaps in real-world datasets, especially for rare events, underrepresented languages, or instruction tuning.
Used heavily in RAG (retrieval-augmented generation) systems to avoid hallucinations and improve alignment.
Challenges include data balancing and engineering risks—overrepresentation of edge cases can lead to distorted model behavior.

Data Labeling

- – Most teams now use hybrid approaches: AI-assisted labeling plus human validation.
- – Tools and models have improved, but complex or noisy data (e.g., drone videos, geospatial imagery) still require human-first approaches.
- – Common pitfalls include overestimating model reliability and underinvesting in instruction quality for human labelers.
- – Evaluating labeling accuracy remains a challenge—agreement among experts is helpful but not foolproof.

Conclusion

The session concluded with a Q&A that emphasized:

– There’s no “set-and-forget” solution—labeling and eval remain iterative, judgment-driven processes.

– Synthetic and AI-assisted methods are powerful but not yet replacements for human oversight.

– High-quality data starts with clear instructions, domain understanding, and strong feedback loops.

– The future points toward more automation, but for now, expert involvement is critical—especially in high-stakes domains.