13
AugThe cleantech industry is turning to AI to improve waste sorting and recycling, but training machine learning models requires vast amounts of high-quality labeled data. Barcelona-based cleantech company Candam Technologies is developing an AI-powered waste identification and recycling system that faced a major hurdle: manually labeling a large number of data files every week to train their models.
During the recycling process, Candam acquires data through sensors when recycled objects slide down a ramp. Accurately identifying the material of each recycled object requires the structuring and labeling of all data segments.
When Candam began using MLtwist, they were searching for a scalable solution that could adapt to their specific technical requirements. With MLtwist, Candam was able to leverage automated, high-precision, and scalable labeling pipelines that helped them spend 3x more time on quality validation and AI model training.
THE CHALLENGE
Candam’s patented RecySmart technology takes the humble recycling bin to a new level with integrated sensors that can identify the type and quantity of waste deposited into a bin. Users receive rewards for recycling, while municipalities manage waste more effectively with a fine-grain overview of which items are being recycled where, and how full the bins are.
This innovative AI-powered waste identification system needs high-quality labeled data to train and refine its models to stay up to date on a daily basis. Candam’s solution relies on analyzing data acquired from sensors and identifying it through AI models that have been trained with self-generated data that require labeling.
“To review the type of event, a visual representation of the data is helpful. All types of events can be mapped and validated easily using this technique,” said Mario Gutierrez, Technology Director at Candam. “Having both is essential to verify the label quality, and it takes a lot of time and rigor to accurately process this data.”
Candam faced key challenges in preparing its data for AI model training:
High Volume & Manual Bottlenecks: Labeling many data files per week was labor-intensive and slowed model training.
Precision Requirements: Identifying the different dimensions of a recycling event–like sliding, impact, and material type–required precise segmentation and labeling.
Scalability: Existing solutions couldn’t handle Candam’s dataset needs, which grew on a weekly basis.
Versioning: As data was rejected, approved, renamed, and changed, they needed a platform to stay with the data throughout its whole life cycle and roll back or merge forward when required.
MLtwist provided an automated data processing pipeline that transformed how Candam handled labeling. Novel features included:
Automated Preprocessing: A streamlined workflow that ingested and prepared large datasets for labeling.
Precision Labeling Tools: A customized interface allowed labelers to visually segment and annotate acoustic data accurately.
Scalable Infrastructure: The system processed large datasets every week and was able to scale up quickly with no delays or interruption in service.
Automated JSON Quality Control: Consistent anomaly detection and reporting was critical to ensure teams had the information they needed to make judgement calls on quality.
Using MLtwist allowed Candam to improve its data labeling and model training in three key aspects:
Faster Turnaround: Automated workflows reduced processing time, ensuring datasets were validated and returned on a daily delivery schedule.
High-Quality Labels: The company’s internal QA team had 3x more time to look at data quality and ensure the labeled data met their strict accuracy standards.
Efficient AI Model Training: Reliable labeled datasets accelerated Candam’s AI development, giving them double the time to focus on AI, thus improving waste identification accuracy.
By automating a complex and highly specific labeling process, MLtwist enabled Candam to focus on innovation rather than janitorial data preparation. With a scalable, AI-powered approach, Candam can now process massive datasets with confidence and no loss of time or efficiency as data sets grow increasingly larger—and bring smarter recycling solutions to market faster.
Subscribe us and get latest news and updates to your inbox directly.
The Ultimate Guide to AI Data Pipelines: Learn how to Build, Maintain and Update your pipes for your unstructured data