DagsHub is building a Github-like environment for machine learning, using purely open source tools. It started out as a central place for versioned data through DVC, in a very similar way to how Github acts as a central place for Git-managed source code, and later on added MLFlow for training reproducibility.
From today's announcement, DagsHub adds another great open source tool, this time to cover the task of data labelliing. Label Studio is already used by thousands of data scientists to label data and, for DagsHub, having this capability completes the data loop. What does that mean? essentially, by having versioned data with labelling, andreproducible training pipelines, we can have a workflow that looks like this:
- Collect data
- Label data
- Use the data to train a model
- Deploy the model
- Collect additional data
- Add to the dataset (to be labeled), and repeat the cycle.
The full article for this announcement is on the DagsHub blog: https://dagshub.com/blog/launching-dagshub-2-0