Born in Silicon Valley: Revolutionizing Data Preparation for Machine Learning
Transcript: Full Interview with David Smith, CEO & Co-founder of MLtwist
Host: Jake Villarreal – Match Relevant
Guest: David Smith, CEO & Co-founder, MLtwist
[Opening – 0:00]
Jake: I’m excited to welcome David Smith, CEO of MLtwist, to the show. David, welcome!
David: Thanks, Jake. Great to be here.
Jake: David’s background spans tech giants like Google, Oracle, and DoubleClick. He’s been through four acquisitions and has deep experience in strategic data for AI. Let’s start there—where are you calling from today?
David: I’m in San Jose, California. Born and raised in Silicon Valley.
[Journey into Tech – 2:30]
Jake: Tell us how you got into tech and startups.
David: I was a Dungeons & Dragons kid—into computers, Magic: The Gathering. I earned a computer engineering degree from UC Davis. But I graduated during the tech bust and couldn’t land a job immediately. I ended up in Europe bartending, then landed in a sales engineering role with an early NLP company, which sparked everything. Eventually, I joined a company acquired by DoubleClick, which got acquired by Google.
[Why Start MLtwist – 4:40]
Jake: You’ve worked in data across your career. What led to founding MLtwist?
David: I noticed traditional ETL tools weren’t fit for AI. They didn’t address questions like: Where did this data come from? Who touched it? What rights do you have? I wanted to build a solution focused on preparing data specifically for AI—especially for people who aren’t engineers.
[Strategic Data & Use Cases – 6:00]
Jake: What problem are you solving for companies already working with LLMs?
David: Many of them built initial models, but realize they need to improve them. Whether it’s identifying manufacturing defects, classifying references in articles, or parsing financial transcripts—they hit performance walls. They realize: “We need better data.” That’s where we come in.
[MLtwist in Action – 8:00]
Jake: How does MLtwist help when things start to break down?
David: Our platform handles everything between raw data and model input:
Pre-labeling with AI
Routing to internal/external experts
Quality control
Audit trails
Format conversion for model ingestion
It’s built for data operations people—not just engineers or scientists. Think lawyers reviewing documents or TSA agents identifying threats.
[No-Code Interface – 10:00]
Jake: Walk us through the user experience.
David: You log in, point to unstructured data, select a model for pre-labeling, assign experts, and receive a Human Readable Report (HRR) for
The Ultimate Guide to AI Data Pipelines: Learn how to Build, Maintain and Update your pipes for your unstructured data