TWIML AI: Managing Data Labeling Ops for Success

What exactly makes for good data labeling operations, and how can we talk about a field that is less than 10 years old? Data labeling has exploded into the mainstream over the past few years and businesses are racing to catch up. What do they need to consider when starting the huge task of data labeling with internal or external teams? I was honored to talk about these topics, and more, with Sam Charrington in this episode of the TWIML AI podcast.

Our conversation delves into the challenges and strategies associated with ensuring the quality of labeled data in machine learning projects. We talk about the importance of regular quality control to maintain accuracy, highlighting the need for a feedback loop with labelers.

We also cover various approaches to quality assurance, including internal spot checks and the balance between cost, speed, and quality. We explore the ethical considerations of outsourcing data labeling, particularly in low-income regions, and talk about fair trade labeling practices.

The episode concludes with insights into the future of data labeling operations, foreseeing a more unified ecosystem and an increasing role for non-technical professionals in the field. Thanks again for the invite, Sam!

TRANSCRIPT

SC: Hello everyone, welcome to another episode of the TwinML AI podcast. I’m, of course, your host Sam Charrington, and today I’m joined by Audrey Smith, the Chief Operating Officer at MLtwist.

Before we get into today’s conversation, be sure to take a moment to head over to Apple podcast or your listening platform of choice, and if you enjoy the show, please leave us a 5-star rating and review. Audrey, welcome to the podcast.

AS: Thank you, Sam. Thanks for having me.

SC: Absolutely! I am really looking forward to our conversation. For those listening in with us, if you’ve been listening to the show recently, you know that we’ve been digging deep into data-centric AI, and this conversation will continue in this theme, and I’m super excited to have Audrey on. She’s got a ton of experience with labeling, label ops, and so much more in this space. So, Audrey, once again, welcome. Let’s start with a little bit about your background and how you came to work in machine learning.

AS: Sure, so I actually studied law. I am a lawyer; I studied in France. You can hear my accent, of course, and I was a lawyer in France for three years before I decided that I wanted to have an international experience. So I went to the UK, worked there for five years, and then when I moved to the US in 2014, I was really looking forward to finding my place in the tech industry. Until then, I had no technical background, didn’t know anything about machine learning, so I didn’t know exactly how I would start, but I applied for jobs basically where you needed to have French-speaking skills. My third job was to listen to Siri and listen especially to French speakers talking to Siri. And that’s the start of my data labeling journey, really. That’s when I learned about machine learning and how data labeling is so important to it. I wanted to dig into it a little bit more, went to Google, worked on more projects linked to GDPR compliance, user experience, ads policy, and all around data labeling. And I got hooked. So, I went to Amazon, stayed there for four years working on labeling operation as well, helping the Amazon team with their machine learning projects, got lucky to work on a lot of different projects pertaining to different formats like video, image, text, and then after four years went to Labelbox, when they were still Series A. I was the director of labeling operation over there for a couple of years and then joined MLtwist a few months ago as the Chief Operating Officer.

SC: Awesome. Why don’t you give us a quick summary of MLtwist?

AS: MLtwist is coming from the idea that there is a space in the data labeling space for machine learning. There are so many different players on the market, and they’re all offering different solutions, and they are like newcomers on the market. So, if you think about it, there are over 80 data labeling platforms out there and so much more when it comes to workforce labeling companies. And now you have the newcomers that are the synthetic data platforms, the augmented data platforms, and so on and so on. This is all great, all this technology is great, but they don’t connect to each other in a very easy way, it’s pretty siloed, and a lot of them are even specializing in certain verticals or in certain formats. The idea behind MLtwist is to connect all this ecosystem and give companies the choice to use the right tool for the right use case.

SC: Got it. So kind of like middleware for your labeling software and systems?

AS: Yeah, we’re called middleware. I think that’s okay. I think that’s okay to call us middleware. And really, the idea is to, be the glue and connect all the different tooling so that like a data science team doesn’t spend time looking for the right tool, the right workforce, the right everything, and just easily connect all the pieces of the puzzle together to get their machine on a machine learning model trained and performing at the right level.

SC: Let’s maybe start by talking about some of the commonalities you’ve experienced as you’ve tackled labeling across, you know, many different companies and customers. What’s the typical journey for an organization getting started with labeling?

AS: Yeah, that’s a great question. Whether it’s a small startup starting the ML journey or a big company who wants to enter the AI world, this journey is very similar, as I mentioned earlier. The space is very crowded with a lot of very great tools on the market. The idea for these companies that are just starting their journey is that they don’t have a data labeling operation person in-house.

Usually, data scientists, machine learning engineers, or product managers are the ones who are really trying to get all the right pieces in place. And without any knowledge, without any background, it’s quite overwhelming to go after it. You have to find the right data labeling tools. There are many on the market. Which one is the best one for your use case? And then the right workforce, and depending on what you want to label, each workforce is going to have their own strengths and their own weaknesses. So, like, you have to find, assess them, and then make your own choice and selection. That takes time. It takes, like, probably a few months to get there.
Once you get to that place, there is also the formatting issue, right? Because you have your data in-house with a certain format, and you need to connect that to the data labeling platform that you’re going to be using. So, data scientists are going to have to constantly change the format to be able to plug into the data labeling tool. That’s also another issue that needs to be tackled. So, yeah, definitely you need to think about all the pieces. And once you find those pieces and you connect them, you have to train, you have to create your task, put them on the platform, train the workforce on the task, and do quality control, look at the quality because it’s not only about maintaining. It’s not about just reaching the high quality of your data; it’s also maintaining it as you go through all your rounds of data labeling. That can be very challenging.

SC: In the early days of labeling when folks would use, you know, commoditized crowdsource platforms like Mechanical Turk, you know, data quality was and continues to be a huge issue. And, you know, folks would have multiple labelers and try to kind of abstract away.

AS: That’s very true and I think that’s like really the core of the data labeling operation team is that we are in the middle. We can be at the start of it, but there is always this idea that there is a product that’s going to be released at the end. And what does that mean is that, as a data labeling operation person, you are dealing with humans. You are dealing with a labeling team. You are dealing with the product managers. You are dealing with so many people having their own requirements, but also their own limitations. You have to juggle all of that to make sure that the product is going to be released on time. That’s true whether or not you are a data labeling operation team internally or you are just, you know, like this company is outsourcing all of that piece. This is very central to the success of a product release for sure.

SC: You’ve mentioned data labeling operations a few times. How ubiquitous and mature is that as a role, a job title, a function? Do most large organizations that have a significant investment in ML labeling have that team in place?

AS: That’s a really good question. I think that I was really at the early stage of that journey, like data labeling operation when I started. I mean, I got lucky, right? Because basically there was no knowledge or degree in data labeling operation. And so they were opening the doors to anyone who wanted to give it a try. As the years go by, now you get people who have been in the space for a few years. I’ve been in the space for the past seven years. That’s the type of people that you want to have if you want to lead a data labeling operation team at a big company.

Additionally, the complexity of the data labeling space, not only in the ecosystem, as I mentioned earlier, there are so many players on the market right now, but also the complexity of the labeling tasks themselves, all of that has grown exponentially over the years. You need people who are specialized in the domain to make the right decision, but also to make a decision very fast. And that’s still not completely understood, but we’re coming, we’re getting there. I see more and more people with a lot of experience getting hired at other places. So there is this idea now that you’re looking for people with experience in data labeling. But now, you know, when you start up or when you only start your journey, you don’t have that in-house. It takes a little bit of time to realize that this is a niche and an essential role in the entire AI loop.

SC: You mentioned that when you started, there was no degree program. Is there now, or have you seen certifications or that kind of…?

AS: No, not yet. But I think it’s going to happen in the next few years. I think that it has to happen. That has become its own specialty. And I really truly believe that it’s going to happen soon.

SC: So if you’re in this role, if you’re the data labeling ops team, and you’re at a larger company that has multiple projects, and you’re approached with a new project, can you walk us through the steps of onboarding a new project or initiative or customer, however you would think about it? What are the things you’re thinking of? What are the things you’re asking them for? And just kind of how do you think about spinning up a new effort?

AS: Basically, you need to talk to the machine learning engineers. You need to talk to data scientists and understand their needs. What are they looking to accomplish with that labeling task? What is their model about? What do they want to recognize or predict with that model that they want to get the data for? That’s really what’s going to help make sure that you’re going to get the right task to the labeler, so that you can get the right data labeled, and then you’re going to be able to feed your model and train it. So really, there is this discussion on where they want to go.

Once you frame it, you talk about the task, and either they have an origin idea about the task or they don’t. They can just tell you, “I have 50,000 images of dresses, and I want to recognize all the attributes on the dress. Do it. Good luck. I need that in three months’ time.” And then you go about it. The idea is to show them the task that you’re going to be creating, the guidelines that will go with it, but also the examples that you’re going to give. Because the best way to train a labeling team on a task is to show them examples about how to train, sorry, about how to label a specific task. What I used to do, for instance, even at Amazon, is create my own task and then work on it myself, label a few images myself to see if that was making sense, if I was covering all the different use cases. Once it was done, I submitted it to the machine learning team. They were telling me if they liked it or not, if that worked for them. From there, I would train the team. And that goes back to what I was talking about, which is the feedback loop. Tell the machine learning engineers, “Hey, you’re going to have to do some quality control after a first pass to see if what you’re getting is what you want or not. Give feedback.” And so on and so on. The idea is to have a great relationship with the technical team and just, you know, give them, in advance, the knowledge, get them used to the fact that it’s not going to be a one-off thing. That’s going to be a project that’s going to be on and on, that’s going to be very repetitive, but that we will be there to help them get there with high-quality annotations.

SC: In a lot of ways, it sounds like a product management type of role. Like, you’re the labeling product manager.

AS: Yeah, yeah, I like that idea. A lot of people doing labeling operations, some, like, they end up also being product managers for machine learning products. That’s correct.

SC: You’ve developed this task, and you’ve got this feedback loop. You know, it sounds like the task will often evolve quite a bit from what you originally thought it should be to what you do at scale. Can you talk a little bit about that evolution?

AS: Yeah, I think it’s very important to keep in mind that there will be a feedback loop. And that’s what I was trying to say earlier is that, you know, like, you’re gonna have two types of feedback loops, basically. You’re going to have the one that’s going to be about training the labelers, making sure they understand your requirements, making sure that they are reaching the right quality. And once you get there, you’re going to get your data back. And then what happens is that you’re going to feed your model with this, with that data. And that’s when you’re going to realize that your model is responding well to everything.

I talk to people who are saying, “Well, I don’t want anything else than 99 or 100. That’s a bit complicated to reach, but the idea is to be able to check the quality. You have to do it regularly. Once you’re able to check the quality of the work that has been done by the labelers, what does that mean? Is that you’re going to have a sample of your data that you’re going to look at.

Say you look at a hundred images, and you know 95 of them have been labeled correctly. So you have a 99.95 accuracy rate. You’re happy with that. Right? Like how are you going to maintain that accuracy level weeks after weeks? I think, again, it goes back to what I was talking about, which is the feedback loop. You want to make sure that you keep an eye on the quality. Again, you’re working with people who are labeling, even though they are using performance tooling. In the end, if you don’t have a good labeling workforce doing the labeling correctly, the quality is not going to be good. So really, the idea is to be able to do the quality control regularly and make sure that if you see that something is dropping, you are able to address it as soon as possible so that the labelers can retrain and can perform well moving forward and even correct the labels that were done incorrectly in the past. So yeah, it goes back to this idea about this cycle of quality control.

SC: Is that quality control being measured against, you know, kind of what’s that process? Is it having internal folks spot check? Where does that come from?

AS: Yeah, so everyone is going at it in a different way, but ideally, I think, as you mentioned earlier, the internal laborers in a company are going to be the ones with most of the knowledge. Ideally, you would use those people for QA. You would use those people to check the quality of the work that has been outsourced to make sure that it’s maintained, but also to give some feedback and retrain, more than anything else, because cost-wise, it probably makes more sense to do it that way than to have like an army of people in-house that would do the whole labeling and the whole QA.

SC: And are there established norms about what percentage of labels you want to spot-check or that kind of thing?

AS: There are a lot of opinions on that. What I’ve been taught over the years is that if you look at a hundred every other day, you’re going to be able to have a good idea about what’s going on, even if it’s like 10,000 images that have been labeled. That’s still going to give you a good idea about what’s going on in your dataset.

SC: Okay. So it sounds in some ways, it’s less about, you know, establishing the statistical significance of sampling or anything like that and more about just having a feel for how it’s going.

AS: Yeah, it has to be done regularly. So, obviously, if you do that, you know, once a month, you’re not going to get a good idea about it. But if you do it like every other day, you’re going to have a really good idea about what’s going on.

SC: I’d love to get your take on the various ethical considerations with regards to labeling, particularly with many of the labeling workforces being remote and with workforces that are used to a much lower kind of income than in Western countries. There have been some recent articles, a MIT Tech Review article by Karen Howe, talking about how one of the labeling companies was talking about taking advantage of economic turmoil in Venezuela. Facebook was getting sued by a labeling company in Kenya. What have you kind of thought about and managed the various ethical considerations?

AS: Before answering your question, I think I would like to ask two questions. The first one is, is how does it work for outsourcing in general, not only for data labeling, but if you think about recycling, we are doing that already in other domains, in other verticals. Are we comfortable with it? Do we need to revisit? But that’s a bigger discussion, a broader discussion that we should probably have as well.

The second one is, you’re probably mentioning data labeling in general, but I think that the article also mentioned content moderation because you can have some disturbing content, and that might affect psychologically some people. Is it needed? Do we need content moderation? I think, as a parent myself, I want to feel that the internet is a bit safer for my kids, and I want to have a choice to protect them if I can. So, in my view, it’s very important that we keep doing content moderation.

SC: Not to answer. One thing I would like to add is that, yes, content moderation has been outsourced, but it’s not entirely outsourced. I know for a fact that content moderation is currently done in the United States and also in Europe. So the question – and I’ve done – I mean, when I worked my first job, I was not doing content moderation, but I had some pretty disturbing things when I was doing the job. And so, you know, it’s unfortunately part of the low-income jobs in the data labeling space. To jump in, I think, you know, you’re already raising several important issues.

One, I think, is that there are a multitude of potential issues. It’s not just one issue and it’s not solely outsourced versus in-source. There’s a lot of complexity to it. But I also thought the first point you made around the questions in labeling and outsourced labeling, in particular, are the same as other kinds of outsourcing and even more broadly other types of commerce. Like the first thing I thought of was, like in coffee, we’ve got fair trade, you know, um, and so, you know, maybe the future is like fair trade labeling or something like that.

AS: Absolutely, absolutely. I think that’s a great comment that you’re making right now. I think that that’s the future, and that’s what I was going to talk about is I think that, yes, content moderation is needed, but how you do it is what needs to be improved. And definitely, throughout the years, I’ve seen companies taking action in that field. So, for instance, they’re going to be having a therapist on-site so that people who feel like they need to talk to someone can straight away go and talk to that person. They can also do content moderation only on a voluntary basis. It’s not something you’re gonna lose your job if you don’t want to do it. You have the choice.

And these people also work fewer hours than other regular labelers working on regular labeling tasks just because it’s a way to recognize that, okay, you’re doing a very, very difficult job, and you don’t need to work as many hours as the other people doing other tasks. I think it’s the beginning, and every company has their own vision of what it should look like. But I’ve seen a lot of companies, especially FANG, making decisions to work only with companies that have made that type of effort for their own workers. And you’re right. I believe that the future is going to be that there will be some sort of committee that would, you dictate the rules in terms of fair data labeling, how does that work? How you protect the workers and so on. And that’s definitely something that I would like to see happening.

SC: If anything, the unifying thread between the various aspects of this conversation is recognizing the humanity of the folks that are doing the labeling and the implications of that both from your process as well as the, you know, now the ethical considerations we’re discussing.

AS: Yeah, definitely. One thing I would like to add to that is that, you know, all these people, like a lot of them, are not educated. They don’t have degrees, and that’s their way also to go and start having, like, an education around IT, how to use a computer, how to label. A lot of the companies I’ve been working with throughout the years have, you know, promoted internally all these people from doing the labeling to doing quality control to being team leads to being program managers. So there is all this new industry that’s going to generate, you know, educated people that will be able to get, you know, good salaries and grow into their career. So I was one of them, even if I was doing the same thing but in the United States. But I started, you know, at that level, and then I went up. So I think that’s very important also to talk about that.

SC: What do you see as the future of data label operations?

AS: I think that the ecosystem needs to get unified. I think that it’s very [Music]. There are so many players that it’s just like it’s gonna be hard to move forward for a company going into machine learning to find their way, even though there are incredible tools on the market right now. So that’s one. So unifying the ecosystem is a very important one. For the role in itself, data labeling operations, I think that we’re going to see more and more data labeling operations people hired, even in smaller companies or new companies going into the journey. Instead of, you know, having only machine learning engineers and thinking, okay, we’re good, we can start our journey. There will be this idea that other non-technical people are really important to the journey.

SC: Well, Audrey, thanks so much for joining us and sharing a bit of your wealth of experience in labeling.

AS: Well, thank you. I was very honored to be on your podcast. Thank you, Sam. Thank you.

TWIML AI: Managing Data Labeling Ops for Success

Leave A Comment Cancel reply

Prev Post

What exactly makes for good data ...

Next Post

What exactly makes for good data ...

Contact

Info

MLtwist Heads to CES 2026

Get it now!