I have been in Data Operations for the past 8 years at various companies from startups to Fortune 500 organizations, working on AI projects in almost every industry you can think of. I truly feel that most of the time, my role as a Data Ops person is underrated. However, it struck me that DOps people are at the forefront of what it means to deliver a successful AI product. We have a specific expertise, an operational mindset that adds another dimension to a very technical team by asking the right questions and making sure the product developed will speak to actual users.
How did I fall into Dops, and why did I stay in this field?
I am French and studied law in France. I then became an in-house lawyer in the staffing industry, in Paris. I always wanted to gain international experience and the occasion arose to move to London in 2009 where I spent the following 6 years working in compliance. In 2015, my family moved to the US, more precisely to the Bay Area. Since I could not continue my legal career there, I decided to start fresh for the third time in my mid-thirties. My first job in the US was as an ML data localization analyst, and I had to listen to Belgian French speakers talking to a famous virtual assistant and check if the transcription software did a good job transcribing what was said. You can listen to some crazy stuff when you do that type of job! I have to say, starting by labeling and correcting data was a career changing event for me. It was hard in every way, and especially hard to relate to the final product I was actually helping perform better. I had to dig into what I was really doing to understand the bigger picture. This is when I realized how exciting it could be to be part of the AI space, especially for me who had very little technical knowledge. Back in Europe, I always looked at the tech industry like something really cool that I could never be a part of.
I started applying around and landed a contractor position at Google working on making quality control checks, improving efficiency workflows and writing policies for various projects from GDPR compliance enforcement to ensure that internal advertising policies were applied, to improving the user experience for online shoppers… I had a blast touching and contributing to a big variety of projects!
I then joined Amazon for almost 4 years. Amazon is at the forefront of the AI revolution, and I was lucky to get a manager with a PHD in computer vision who believed in me and wanted to share as much knowledge as he could with his team. Being a program manager in a Data Operations team was exciting as I was contributing to a lot of different projects from robotics, image/text recognition to search relevance, 3D or even audio tasks. I touched on all data types, creating projects from scratch from drafting the ontology and the guidelines to testing it internally and training the labelers on it. I learned how to manage data vendors, negotiate budgets and SLAs. I also improved my quality control techniques, discovered so many different quality control approaches that would increase accuracy by integrating feedback loops in any workflow.
That does not mean it was an easy ride, for the first 3 months, I was invited to meetings with only data scientists and ML engineers. I could not understand a thing. It took me a little while to feel like I had a real impact, I had to deep dive and learn on the fly a lot of technical vocabulary, how an AI model was trained, how the data was ingested, how the quality of the data could have a disastrous impact on the model performance… Yes! Data quality, garbage in and garbage out, this is what a data ops person does on a daily basis, this is what we are focused on. The Data Centric AI approach for Data Ops people has been a real thing for many years, not just an actual trend.
Looking back, it was the best school I could have asked for.
From there, I joined Labelbox, a data training software startup where I launched and managed the Data Operations team for 2 years, worked with very smart people, and learned a ton. It was also a great school, as being a Data Ops person in a startup brings a whole other type of challenges and opportunities compared to working at some of the Tech giants.
One year ago, I joined MLtwist. This decision came after I had noticed for many years that more and more issues were surfacing with pre- and post-processing of the data for ML projects. In fact, 80% of the data scientists’ time is spent on that alone. One of the main reasons is because the current ML data ecosystem is overcrowded and complicated to understand, and therefore difficult to leverage efficiently for many AI companies. MLtwist is connecting the dots via strong partnerships with a lot of great tools to offer a seamless experience for any AI project.
Finally, if we want AI adoption to grow bigger, faster, at lower costs and more efficiently, we need more people without a traditional computer engineering background on board, to broaden the mindset of those currently in charge of building AI. If I may say, educational diversity should be represented in any ML team to improve the outcome in all aspects including product quality. As the DOps field is still very new and unknown to a lot of ML teams who still think that a career in AI has to come only with a technical education, I feel like I need to share my own story and offer a different perspective to open the doors to more people like me.
If you want to know more about the Data Ops field, think about joining our community “Data Ops for AI” on LinkedIn to learn about this relatively new career path, share knowledge on the matter, relevant news and job posts, and read how some of us got where we are at right now.
Join to learn how Sandia National Labs ran into this challenge when building AI for the TSA,
and how they overcame it.
June 25, 2024 / 2pm EST / 11am PST
The Ultimate Guide to AI Data Pipelines: Learn how to Build, Maintain and Update your pipes for your unstructured data