continues to be our tool of choice for managing experiments in data science projects. The fact that it's Git-based makes it a known turf for developers to bring engineering practices to the data science ecosystem. DVC's opinionated view of a model checkpoint carefully encapsulates a training data set, a test data set, model hyperparameters and the code. By making reproducibility a first-class concern, it allows the team to time travel across various versions of the model. Our teams have successfully used DVC in production to enable continuous delivery for ML (CD4ML); it can be plugged in with any type of storage (including AWS S3, Google Cloud Storage, MinIO and Google Drive). However, with data sets getting bigger, file system¨Cbased snapshotting could become particularly expensive. When the underlying data is changing rapidly, DVC on top of a good versioned storage allows tracking model drifts over a period of time. Our teams have effectively used DVC on top of data storage formats like Delta Lake which optimizes versioning (). A majority of our data science teams set up DVC as a day zero task while they bootstrap a project; for this reason we're happy to move it to Adopt.
In 2018 we mentioned in conjunction with the versioning data for reproducible analytics. Since then it has become a favorite tool for managing experiments in machine learning (ML) projects. Since it's based on Git, DVC is a familiar environment for software developers to bring their engineering practices to ML practice. Because it versions the code that processes data along with the data itself and tracks stages in a pipeline, it helps bring order to the modeling activities without interrupting the analysts¡¯ flow.

