Course
Introduction to Data Versioning with DVC
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Loved by learners at thousands of companies
Training 2 or more people?
Try DataCamp for BusinessCourse Description
Exploring DVC features
You will understand the motivations behind data versioning, the machine learning lifecycle, and DVC’s distinct features and use cases. You will also learn about DVC setup, covering installation, repository initialization, and the .dvcignore file. You will explore DVC cache and staging files, learn to add and remove files, manage caches, and understand the underlying mechanisms. You will learn about DVC remotes, explain the distinction between DVC and Git remotes, add remotes, list them, and modify them. You will learn to interact with remotes, push and pull data, check out specific versions, and fetch data to the cache.Automate and evaluate
You will be motivated to automate ML pipelines, emphasizing modularization of code and the creation of a configuration file. You will be introduced to DVC pipelines as directed acyclic graphs, with hands-on experience in adding stages and their inputs and outputs. You will practice executing these pipelines efficiently to enable different use cases in machine learning model training. The course concludes with a focus on evaluation, showcasing how metrics and plots are tracked in DVC.Prerequisites
Supervised Learning with scikit-learnIntroduction to GitIntroduction to DVC
DVC Configuration and Data Management
Pipelines in DVC
Complete
Earn Statement of Accomplishment
Add this credential to your LinkedIn profile, resume, or CVShare it on social media and in your performance reviewEnroll Now
FAQs
How does DVC differ from Git for version control?
Git tracks code changes, while DVC handles large data files and ML models that are too big for Git. DVC uses MD5 hashes and a separate cache to manage data versions alongside your Git repository.
Does this course cover ML pipeline automation?
Yes. Chapter 3 focuses on automating ML pipelines with DVC, including creating configuration files, visualizing pipelines as directed acyclic graphs, and executing them locally.
What tools and libraries should I know before starting?
You need familiarity with Python, pandas, Git basics, introductory statistics, and scikit-learn. The course builds on those foundations to teach DVC-specific workflows.
Does this course teach how to set up DVC remotes for team collaboration?
Yes. Chapter 2 covers adding, listing, and modifying DVC remotes, as well as pushing and pulling data, checking out specific versions, and fetching data to the cache.
Can I track model metrics and compare experiments with DVC?
Yes. The pipeline chapter teaches you to print metrics, create plot files, and compare metrics and plots across different pipeline stages to evaluate model performance.
Join over 19 million learners and start Introduction to Data Versioning with DVC today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Grow your data skills with DataCamp for Mobile
Make progress on the go with our mobile courses and daily 5-minute coding challenges.