How to build a great data science portfolio (with examples)
Data science is one of the most exciting and fastest-growing fields of the last 10 years. As a result, many university programs, bootcamps, and online courses are available for anyone looking to break into the field. These programs are an excellent way to learn the skills required, but when it comes to actually landing a role, it’s important to demonstrate you have the right skillset, as many employers look for hands-on experience. An effective portfolio will allow you to show, rather than tell, your potential employer that you have the proficiency to succeed in a data science role.
Most data scientists today have a portfolio but very few stand out. If your portfolio looks too generic, doesn’t contain interesting projects, or lacks explanations, it can be hard for your readers to follow along and stay interested. In order to make sure your hard work is fully appreciated by your audience, here are some simple tips on how to turn a good portfolio into an exceptional one.
Why invest in a data science portfolio?
As an aspiring data scientist, there is an obvious ‘why’ when it comes to investing in a portfolio: to help you land a role by demonstrating your skills, even before a hiring manager puts you through a technical test. However, finding a new role is an external reward. Finding internal motivation is crucial so that the satisfaction derived from developing a portfolio is dependent on you, rather than an interview process you can’t control. It will also help your portfolio feel more genuine and will motivate you to present the best work you can.
Here are some key reasons why building a high-quality portfolio is worth your time.
Landing your dream role
A portfolio is often a key tool in the data science hiring process. Technical hiring managers and data scientists that interview you will look through it in order to gauge your skills, experience, and interests, and may ask you questions about it.
Giving you essential hands-on experience
While learning the theory behind a machine learning algorithm is an essential step in breaking into data science, the real test is applying the skills you learn to a use case. Working through a project in its entirety will cement what you’ve learned and ensure you can talk about it with confidence.
Getting to know the data community
Data scientists like to look at what other data scientists have done. There isn’t one ‘right’ way to do everything and promoting and discussing your project with the community is a great way to develop interesting solutions to a problem.
Your own enjoyment
This is one of the most important reasons to create a portfolio; after all, data science is fun! If you genuinely enjoy the project you're working on, it'll motivate you to try your best, and others are more likely to resonate with your passion.
What are the different types of portfolio projects
Portfolio projects aren’t just about technical skills like demonstrating your ability to code. Content-based projects are also a fantastic way to showcase your understanding of a topic and demonstrate your communication skills, which are key attributes interviewers look for. In fact, every technical portfolio project should have a clear explanation aimed at a non-technical audience. Below are some examples of the different types of portfolio projects. Having a combination of both in your portfolio is essential to demonstrating the multi-faceted skillset data science roles often require.
Code-based projects are the most common type of portfolio project. In a nutshell, it replicates real-world data science projects by taking a dataset and solving a problem around it. Examples of code-based projects include:
- Scraping a dataset and performing some analysis or training a model
- Building a dashboard around a specific dataset or topic
- Creating a website or app that someone can interact with.
- Analyzing data on a trending topic such as a popular tv show or news story
Content-based projects are commonly less seen as portfolio projects but are extremely effective at showing your communication and writing skills. Examples of content-based projects include
- Blog posts and coding tutorials explaining concepts for other data scientists, or non-technical audiences
- Video tutorials showcasing how a particular tool works
- Participating in a podcast — or hosting your own where you interview data scientists and practitioners
7 ways to craft an outstanding data portfolio
1. Be authentic and pursue your passion
The best portfolio projects aren’t those that use the latest or most complex tools and models. Instead, portfolio projects which capture the most attention are those that come from a place of authentic passion. If you’ve painstakingly scraped a dataset for a specific task, written a compelling story, or created something that tells the world about your passion, people will recognize this. Nick Singh, the co-author of Acing the Data Science Interview, goes one step further in this episode of DataFramed and suggests that passion for your own work can be so infectious that it will make hiring managers believe you are passionate about everything data science-related, including their company and the role you are applying for.
Data science portfolio projects aren’t easy to finish. You’ll hit multiple walls, you’ll have to juggle other commitments, and completing the last 10% can feel like doing the whole project again. Working on something you are passionate about will help you push past your struggles and ensure you create a project you’re proud of.
2. Tell a story
Pouring time and passion into a project can make you an expert, but it’s important to ensure your readers will be able to follow your journey from start to end with the content you have made available. Remember, many people will be looking through your portfolio without prior knowledge of your projects and the time to do extra research. Because of this, a concise but captivating story is essential in a portfolio project. Whether you publish it on the readme page of a GitHub repo or underneath the title of a dashboard, make sure you spell out why the reader should be interested in your project, your motivation for doing it, and the core question it answers. This also serves as a way to capture the attention of readers and draw their attention to your notebook, model, or dashboard.
A compelling story is one of the most important parts of a portfolio since it shows off your genuine empathy, curiosity, and passion. Taking readers on an engaging journey will make your projects stand out.
3. Show off your technical skills—but avoid scope creep
A good portfolio project demonstrates your technical skills, but that doesn’t mean you need to apply every technical skill you have. For example, if you’ve spent hours developing an advanced scraping tool, you don’t have to expand the scope of your project even more to accommodate state-of-the-art modeling techniques.
A good approach is to center your project around one technical domain and apply the fundamentals throughout the rest of your project. If the point of your project is to demonstrate your data cleaning and collection abilities, for instance, it’s okay if you don’t produce the best prediction accuracy possible using the most cutting edge-models. Limiting the scope of your project is a great way to tell a concise, but interesting story that clearly demonstrates different aspects of your technical skillset.
Another great way to show off your technical skills is by ensuring that your code is also readable and well-documented. Make sure notebooks have titles and explanations and go through your code and add comments to functions. People who take the time to look through a notebook will take note of comments and clear variable names.
4. Avoid Cookie-Cutter Projects
Datasets like the Titanic, MNIST, or Iris should be avoided if possible. These are great datasets to learn from and to test models out on, but they’re extensively used by beginner data scientists and online courses, to the point where recruiters and hiring managers may assume you are much earlier on in your data science journey than you actually are. Moreover, they don’t help you show your passion for data science and the type of projects you’d be genuinely interested in.
Showcasing a commonly-done project in your portfolio is risky. Many people looking at your portfolio may have done the project themselves, which could cause them to lose interest—especially since there are many publicly available tutorials around these datasets.
5. Don’t neglect your soft skills
Great storytelling isn’t the only ‘soft skill’ you should try to convey in a portfolio project. Explaining a complex problem simply and concisely is an important skill for any workplace—and one that should be highlighted in your portfolio projects. Plus, your portfolio can be an opportunity to contribute to the data science community and teach your readers new skills. Another attribute that is essential in data science is curiosity, as digging through papers or blog posts is often required to find solutions to a specific problem and this is something that employers seek out.
Generating insights from novel datasets, and explaining how you solved the unique challenges that you encountered in your portfolio project, are fantastic ways to demonstrate your curiosity and creativity; skills that are difficult to learn.
6. Design for your readers
The user experience of your readers is as important to your portfolio as it is to any app or website. It is essential to guide readers to relevant information without overwhelming them, while also providing the opportunity to delve deeper if they want to.
Don’t skimp on the design of your project: An eye-catching aesthetic will hold the reader's interest and help your portfolio to stand out. A clean-looking portfolio can even help readers who are unfamiliar with technical terminology to follow your story. Furthermore, you can adapt your project’s design style as a template for future projects and link between them to seamlessly guide users to more of your content.
7. Market your personal brand
Your portfolio isn’t the only information people can find about you. A simple Google search will likely bring up your LinkedIn profile, website, blog, GitHub, and other social media. You want to ensure your image, writing style and content are consistent across these channels and that they all link to each other. Make sure you include links to your portfolio in your email signature and on your CV or resume. A strong personal brand helps you stand out as an individual. A good personal brand should highlight your key skill, your achievements and show people what you do.
Examples of a great data science portfolio
If you are interested to see how these principles play out in practice — here’s a list of top-tier data science portfolios and projects to get inspired by:
Nikolaos Christoforidis: Passion for Sports
Nikolaos’ project is code-heavy from the start and clearly shows his high proficiency with Pandas and Scikit-learn. He also does a great job of creating an engaging notebook by working on a dataset that’s familiar to general audiences. Many of us are passionate about sports and nowadays there is a wealth of interesting datasets available. A project about a sport can easily draw like-minded people into reading to the end, particularly if it combines an interesting dataset and question with cool visualizations that capture elements of the sport itself. This is also a great way to ensure your passion comes through clearly in your work, which can even attract the interest of people who don’t follow the sport!
Yan Holtz: Fantastic Design
If you’re looking for inspiration for the design of your portfolio, look no further. The design of Yan’s portfolio oozes both passion and slickness, particularly the animation at the top of the page which reacts to your mouse pointer. It is impossible not to keep scrolling until you get to some of the projects themselves. Each project has a unique visualization that draws you in further, while clicking brings up a succinct explanation.
Samuel Verevis: Personality
Although wine datasets are common in portfolios, Samuel brings something completely new to the story through humorous section titles and exceptional visualizations. The charts here combine coolness with clarity, showcasing a clear understanding of how to tell a story and keep a reader interested. This is a great way to showcase authenticity, skills, and passion, even on a dataset that’s often used in other portfolio projects.
Philipp Schöttler: Going Viral
Bitcoin data and engaging portfolio projects; a match made in heaven. Phillipp is clearly passionate and knowledgeable about the topic and has been able to produce a thorough, interesting read on Bitcoin while also showcasing his deep understanding of financial markets. This is a great demonstration of how to create content on a topic that is popular while still providing value for a range of audiences such as other data scientists, investors, blockchain enthusiasts, and people looking to learn.
While looking at examples is great for inspiration, and reading guides like this one can help steer you on the right path, the most important thing to keep in mind is that your portfolio should reflect you—your skills, your interests, and your personality. After all, this is your data science journey. You can find more resources to guide you through your journey below:
- Subscribe to the DataFramed Podcast
- Check out our certifications
- Watch this webinar on building a data science portfolio
Building Your Data Science Portfolio with DataCamp Workspace (Part 1)