Skip to main content

19 Computer Vision Projects From Beginner to Advanced

Explore our list of the top portfolio-worthy computer vision projects from beginner to advanced. Showcase your skills today!
Jul 24, 2024  · 15 min read

Due to the unprecedented amount of image and video data in today’s surveillance and social media world, computer vision engineers are in constant demand. They build everything from your iPhone’s infallible Face ID to models that classify stars in outer space.

But before you can reach those levels, you have to practice and get your hands dirty. The best way to do that is by completing computer vision projects that resemble real-world problems. In this article, we will list 19 such project ideas, divided by complexity level, and the tools you need to make each one a success.

Beginner Computer Vision Projects

Let’s explore some project ideas, starting with the beginner level. At this level, most projects are related to classification or detection techniques, such as face emotion recognition or determining whether an object is in the image or not.

1. Face Mask Detection

Three women wearing masks; example for a face mask detection computer vision project.

Image source: Kaggle

The first project we have is developing a computer vision system for detecting face masks. This project is an excellent fit because it addresses a recent real-world problem (remember COVID?), showing your ability to adapt CV technologies to current issues. It lets you work on two popular subdomains of CV: object detection and facial analysis.

If you develop a real-time detection system, it will be a huge bonus to the project as it demonstrates your skills in performance optimization.

Dataset to use: Face Mask Detection Dataset on Kaggle

High-level implementation steps:

  1. Load and preprocess the dataset
  2. Build a CNN model using TensorFlow or PyTorch
  3. Train the model on the dataset
  4. Implement real-time detection using OpenCV

2. Traffic Signs Recognition

A sample image from the German Traffic Signs Recognition benchmark (GTSRB) dataset for a traffic signs recognition computer vision project.

Image source: Kaggle

The next project is classifying traffic signs using a standard benchmark dataset. This project is valuable as it has direct applications in autonomous driving, a cutting-edge field. It also shows your image classification skills, which is a fundamental CV task.

You can get started on this project with a bit of guidance through this DataLab project.

Dataset to use: German Traffic Signs Recognition Benchmark (GTSRB) Dataset on Kaggle

High-level implementation steps:

  1. Load and preprocess the GTSRB dataset
  2. Design a CNN architecture
  3. Train and validate the model
  4. Create a simple UI for testing with new images

3. Plant Disease Detection

Four samples from the Plant Village dataset for a plant disease detection computer vision project

Image source: Kaggle

Next, we have another multi-class classification project. This time, you should develop a CV application for detecting diseased plants based on images of their leaves. It is recommended to use a pre-trained model like ResNet to improve the accuracy of your solution. This also demonstrates your transfer learning abilities, which are crucial in many CV tasks.

Dataset to use: Plant Village Dataset on Kaggle

High-level implementation steps:

  1. Load and augment the dataset
  2. Use transfer learning with a pre-trained model like ResNet
  3. Fine-tune the model on the plant disease dataset
  4. Build a web application for plant disease diagnosis

4. Optical Character Recognition (OCR) for Handwritten Text

A handwritten chunk of text for a OCR computer vision project

Image source: Kaggle

Even though our world is becoming more and more digitized, there are still many handwritten texts. That’s why this project would be an excellent addition to your portfolio once finalized. 

In this project, you combine CV with natural language processing to showcase your interdisciplinary skills. In addition to CNNs, you can demonstrate your understanding of sequence models (LSTMs).

The computer vision project will challenge you to work with unstructured data (both image and text) and variable data (handwriting). As the project has real-world business applications, it may attract potential employers.

Dataset to use: IAM Handwritten Forms Dataset on Kaggle

High-level implementation steps:

  1. Preprocess and segment the handwritten text images
  2. Implement a CNN-LSTM architecture
  3. Train the model on the IAM dataset
  4. Create a simple application for recognizing handwritten text from images

5. Facial Emotion Recognition

Different faces with their emotions labeled from a facial features dataset for a computer vision dataset.

Image source: Kaggle

The facial emotion recognition project is a strong choice as it showcases your skills in facial analysis, a popular and ever-growing field in computer vision. It has applications in areas like human-computer interaction and market research.

The project can later be expanded to more complex emotion analysis tasks.

Dataset to use: FER-2013 dataset

High-level implementation steps:

  1. Preprocess the FER-2013 dataset
  2. Design a CNN for emotion classification
  3. Train and optimize the model
  4. Implement real-time emotion recognition using a webcam feed

6. Honey Bee Detection

A sample image from a honey bee vs. bumblee dataset for a detection computer vision project

Honeybees are one of the most critical players in our food chain. However, with so many species of bees, it can be challenging to identify which ones are honey bees, especially for computers. Therefore, this honey bee versus bumblebee classification project is an excellent starter for building a large-scale bee species detection solution.

You can get started on the project immediately through this DataLab project.

7. Clothing Classifier

An image of multiple items of clothing for a clothing classifier project

Image source

I have a lot of trouble buying clothes for women as I can’t distinguish between different types of women’s clothing. If you’ve ever found yourself in a similar situation, you might have thought about building a clothing items classifier.

Well, this project can be an excellent starter. By using the Fashion-MNIST dataset, you can build a classifier to recognize 10 different types of clothing. The classifier might not hold up in a fashion show, but it is a good starting point.

Start building the classifier right away through this DataLab computer vision project.

8. Food Image Classification

An image of spaghetti to represent a food image classification

Image source: DataCamp

If you thought naming women’s clothing was hard, try naming different types of food. With thousands of recipes from around the world, you might get overwhelmed by not knowing their names or ingredients when you travel abroad.

You can build a food classification model, but that requires a vast image dataset. However, you can always start small with this DataLab project that uses Hugging Face.

Intermediate Computer Vision Projects

After you build up fundamental skills like classification, detection, and building simple user interfaces, it is time to tackle more serious problems. Below, we will list some intermediate-level projects that would look excellent on your portfolio.

9. Multi-object Tracking in Video

An image with multiple objects annotated from a MOT benchmark dataset

Image source: Papers With Code

Object detection problems come in many flavors. For example, in this project, you must build a system for tracking multiple fast-moving objects in short video clips. Developing a working solution would make you a highly desirable candidate in fields such as surveillance, sports analytics, and autonomous driving.

However, be aware that the real challenge in this project is deploying a solution that can handle real-time video.

Dataset to use: Multiple Object Tracking (MOT) Benchmark Challenge Dataset

High-level implementation steps:

  1. Implement object detection using YOLO or Faster R-CNN
  2. Apply a tracking algorithm like SORT or DeepSORT
  3. Optimize for real-time performance
  4. Visualize tracking results on video streams

10. Image Captioning

Image from the COCO dataset homepage

Image source: COCO Homepage

Image captioning is one of the best projects that combine CV and NLP. A working solution would demonstrate your ability to work with complex, multi-modal architectures. The skills you gain could be applicable in many scenarios, such as accessibility technology and content management.

After working on this problem, you will gain a practical understanding of feature extraction techniques and transformer-like architectures.

Dataset to use: Common Objects in Context (COCO) Dataset

High-level implementation steps:

  1. Use a pre-trained CNN (e.g., ResNet) for image feature extraction
  2. Implement an LSTM or Transformer for caption generation
  3. Train the model end-to-end on the COCO dataset
  4. Create a web interface for uploading and captioning new images

11. 3D Object Reconstruction From Multiple Views

Various objects from different angles from the ShapeNet dataset for a 3D object reconstruction project

Image source: Papers With Code

3D computer vision skills are highly complex and, thus, in high demand. Therefore, this is one of the most challenging projects on the list, but it offers high rewards.

In this project, you are tasked with reconstructing objects in 3D using images of the same object from multiple views. The process involves complex mathematical concepts, providing an excellent opportunity to showcase the depth of your technical knowledge. Additionally, you will work with non-standard data representations, giving your portfolio an edge over candidates who can only work with 2D data.

In the end, you will build something useful in many domains, such as AR/VR, robotics, and digital twin technology.

Dataset to use: ShapeNet Dataset

High-level implementation steps:

  1. Implement a multi-view stereo algorithm
  2. Use a 3D convolutional network for volumetric reconstruction
  3. Train and optimize the model on ShapeNet
  4. Develop a tool for reconstructing 3D objects from uploaded images

12. Gesture Recognition For Human-Computer Interaction

The main challenge in this project is collecting your own data. While there are many open-source datasets, such as the ASL (American Sign Language) dataset and the Hand Gestures dataset, most of the images are too preprocessed and cleaned to represent real-world scenarios.

To build this project, you must collect your own dataset and annotate it. Data collection and annotation might sound tedious, but you might end up spending most of your time on these tasks in a real job, as custom datasets aren’t available for all business problems.

Gesture recognition has direct applications in gaming, VR, and accessible technology.

Dataset to use: Collect your own using a depth camera (e.g., Kinect)

High-level implementation steps:

  1. Collect and annotate a custom gesture dataset
  2. Implement skeleton extraction from depth data
  3. Design an LSTM or GRU network for gesture classification
  4. Create a demo application controlling a computer interface with gestures

13. Visual Question Answering (VQA)

A sample image from the Visual Question Answering Dataset (VQA)

This is another fun but satisfying project at the intersection of CV and NLP. To make the project a success, you must have the skills to work with multi-modal data (images and text) and to design and train complex neural network architectures.

The project has applications in AI assistants and information retrieval systems.

Dataset to use: Visual Question Answering (VQA) Dataset

High-level implementation steps:

  1. Implement image feature extraction using a pre-trained CNN
  2. Design a text processing pipeline for questions
  3. Create a fusion network combining image and text features
  4. Train on the VQA dataset and build a demo interface

14. Insurance Code Extraction

A team of data entry specialists as a cover image for an Insurance Code Extraction computer vision project

Image source: DataCamp projects

This is another project where your skills in working with multi-modal data are put to the test. By using images of scanned insurance documents and their associated insurance types, you are tasked with retrieving the documents’ primary and secondary IDs.

This project is excellent as digitizing historical documents is a common task in many fields. Get started on the problem immediately through this DataLab project.

Dataset to use: Implementing Multi-input OCR System Project

Advanced Computer Vision Projects

Once you’ve mastered some of the intermediate techniques and challenged yourself with some suitable projects, it’s time to turn your attention to some of the more advanced projects using computer vision. Here are some ideas: 

15. Image Deblurring

A blurred image as a cover image for an image deblurring vision project

Image source: Kaggle

Despite the prevalence of high-precision cameras, the world is full of low-quality, blurry images. Learning to improve image quality by removing blur and noise is a skill applicable to almost any computer vision project. It can be particularly useful in fields such as photography, medical imaging, and satellite imagery.

This project can be an excellent addition to your portfolio as it showcases your ability to handle real-world image degradation problems.

Dataset to use: A Curated List of Image Deblurring Datasets

High-level implementation steps:

  1. Data preparation and processing
  2. Developing a multi-scale CNN or GAN model
  3. Implement various evaluation metrics such as Peak Signal-to-Noise Ratio (PSNR)
  4. Optimize the model for inference speed; create and deploy use-friendly web application

16. Video Summarization

Samples from the SumMe dataset for a video summarization project

Image source

Has anyone ever shared a YouTube video with you, and you felt bad because you would never watch it due to the video’s length? Well, if you build this project correctly, you can easily escape that awkward situation.

Video summarization is another CV + NLP project, but it also tests your video processing skills. Handling large-scale temporal data is a rare skill, as it involves many sub-tasks, such as:

  • Shot detection
  • Feature extraction
  • Image processing
  • Video analytics

On top of helping you in your social interactions, the project has applications in content management and video analytics.

Dataset to use: SumMe Dataset

High-level implementation steps:

  1. Implement shot boundary detection
  2. Design a feature extraction pipeline for video frames
  3. Create a sequence-to-sequence model for frame importance scoring
  4. Develop a user interface for uploading videos and generating summaries

17. Face De-Aging/Aging

An image from a paper on age prediction based on facial features.

Image source: DEX paper

In this project, you have annotated a dataset of human faces with their ages. Your goal is to build a generative network that can age and de-age a person using the information provided in the dataset. A complete solution can have applications in entertainment, forensics, and privacy protection.

The project involves using some advanced skills, such as generative modeling, building complex GAN architectures, handling subtle and intricate image transformations, and deploying complex models as interfaces.

Dataset to use: IMDB-WIKI dataset

High-level implementation steps:

  1. Preprocess and clean the IMDB-WIKI dataset
  2. Implement a cycle-consistent GAN architecture
  3. Train the model to perform age transformation
  4. Create a web application for uploading and aging/de-aging faces

18. Human Pose Estimation And Action Recognition in Crowded Scenes

A GIF that shows how human pose estimation works

Image source: PoseTrack.net

Another sub-domain that has fascinated CV engineers for many years is human pose estimation. The attention this problem receives is highly justifiable, as it has applications in high-stakes fields such as surveillance, sports analytics, and behavioral studies.

Building this project will teach you techniques in both spatial (pose) and temporal (action) analysis. A successful solution would be a powerful addition to your portfolio, as you would need to use state-of-the-art CV techniques.

Dataset to use: PoseTrack dataset

High-level implementation steps:

  1. Implement multi-person pose estimation (e.g., OpenPose)
  2. Design a temporal convolutional network for action recognition
  3. Train and optimize the model on PoseTrack
  4. Develop a system for real-time pose estimation and action recognition in videos

19. Unsupervised Anomaly Detection in Industrial Inspection

Images of good and faulty products from a manufacturing process for an unsupervised anomaly detection project

Image source: Kaggle

The last project on our list is an excellent fit because it has direct applications in manufacturing and quality control, two fields that direly need good CV solutions.

The real challenge of this project is working with a dataset without any annotations, making this an unsupervised anomaly detection project. Additionally, the dataset is relatively small—containing just over 5000 high-resolution images—so you would have to think carefully about data augmentation strategies.

The fact that this is an unsupervised problem and involves working with specialized industrial datasets makes the project a highly desirable addition to your portfolio.

Dataset to use: MVTec Anomaly Detection Dataset

High-level implementation steps:

  1. Implement an autoencoder architecture for normal sample reconstruction
  2. Train the model on normal samples only
  3. Develop an anomaly scoring mechanism based on reconstruction error
  4. Create a demo for uploading industrial images and highlighting anomalies

Components of a Good Computer Vision Project

A good portfolio-worthy computer vision project that can capture recruiters’ attention typically has these three components in common:

  • Technical depth and complexity
  • Real-world applicability
  • End-to-end implementation

Let’s elaborate on each of these components.

1. Technical depth

In a vision project, you must demonstrate a strong understanding of CV concepts and techniques. These include:

  • Algorithms: Implementations of classic to state-of-the-art algorithms for solving problems
  • Model architecture: Design and implementation of neural network architectures and correct use of custom layers or loss functions
  • Data processing: Adequate data preprocessing, image augmentation and handling techniques.
  • Performance optimization: Techniques for improving model accuracy, reducing computational complexity, or enhancing inference speed.
  • Handling challenges: Addressing common CV challenges such as variations in lighting, scale, or occlusion.

The depth of your technical skills must be evident in the code, documentation, and project write-up, showcasing your professional approach to solving real-world problems.

2. Real-world applicability

This component is key because it demonstrates the practical value of your skills. A project with clear real-world use shows that you can bridge the gap between knowledge gained in courses and industry needs. Here are some important aspects:

  • Solving a painful need or problem in a specific industry or domain
  • Using large-scale real-world datasets or collecting your own
  • Considering practical constraints such as computational costs, budget limits, and real-time processing requirements

For example, faulty product detection in a conveyer belt in a plant or a medical image analysis tool for early disease detection would have clear real-world applicability.

3. End-to-end implementation

Finally, the most important aspect of a CV project is whether it is a complete, functional solution or not. This means that you can’t put up a model trained inside Jupyter on GitHub and call it a day. The project repository must contain the following important parts:

1. Data pipeline

2. Model development

3. Deployment and interface

4. Documentation and presentation

  • Clear explanation of the problem and solution approach
  • Documentation of the codebase
  • Analysis of results and performance
  • Discussion of limitations and potential improvements

5. Version control and reproducibility

The ability to deliver a complete, usable solution is a highly valuable trait in the industry. So, ensure any future or existing projects meet the above-mentioned requirements.

How to Find Good Datasets For Computer Vision Projects

The success of computer vision projects largely depends on the dataset used. Therefore, your chosen dataset must align with the three core components of CV projects. With that said, there are dozens of places you can look to find good open-source datasets. Here are some established sources:

1. Public Dataset Repositories:

2. Domain-Specific Repositories:

3. Academic Sources:

  • Look for datasets mentioned in recent research papers in your area of interest
  • Check conference websites (e.g., CVPR, ICCV, ECCV) for dataset challenges

4. Government and Non-Profit Organizations:

  1. Creating Custom Datasets:
  • Web scraping (ensure you comply with legal and ethical guidelines)
  • Data collection using sensors or cameras
  • Synthetic data generation using tools like Unity or Blender

Remember, your chosen dataset must:

  • Be relevant to your project idea
  • Be large enough to train a robust model
  • Be diverse to represent various scenarios and conditions
  • Have a suitable license for your intended use (commercial, research)
  • Be up-to-date
  • Be well-documented

By considering these factors, you ensure the final delivered solution is robust and reliable.

Conclusion and Further Resources

In this article, we have listed 19 computer vision projects categorized based on their difficulty. To make these projects successful, we have discussed three core components of good vision projects: technical depth, applicability, and end-to-end implementation. We have also shared some established open resources where you can find high-quality datasets.

If you want to see more ideas for portfolio projects, check out the following articles:

For technical resources, consider the following:


Photo of Bex Tuychiev
Author
Bex Tuychiev
LinkedIn

I am a data science content creator with over 2 years of experience and one of the largest followings on Medium. I like to write detailed articles on AI and ML with a bit of a sarcastıc style because you've got to do something to make them a bit less dull. I have produced over 130 articles and a DataCamp course to boot, with another one in the makıng. My content has been seen by over 5 million pairs of eyes, 20k of whom became followers on both Medium and LinkedIn. 

Topics

Top DataCamp Courses

course

Image Processing in Python

4 hr
43.8K
Learn to process, transform, and manipulate images at your will.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related
Machine Learning

blog

25 Machine Learning Projects for All Levels

Machine learning projects for beginners, final year students, and professionals. The list consists of guided projects, tutorials, and example source code.
Abid Ali Awan's photo

Abid Ali Awan

15 min

blog

7 Exciting AI Projects for All Levels in 2024

Develop your portfolio and improve your skills in creating innovative solutions for complex problems by working on AI projects.
Abid Ali Awan's photo

Abid Ali Awan

8 min

blog

20 Data Analytics Projects for All Levels

Explore our list of data analytics projects for beginners, final-year students, and professionals. The list consists of guided/unguided projects and tutorials with source code.
Abid Ali Awan's photo

Abid Ali Awan

17 min

blog

6 Tableau Projects to Help Develop Your Skills

Explore our list of Tableau projects for beginner and intermediate learners across different industries and use cases.
Jess Ahmet's photo

Jess Ahmet

6 min

blog

9 Power BI Projects To Develop Your Skills

Explore our list of Power BI projects for beginner and intermediate learners across various different industries and use cases.
Jess Ahmet's photo

Jess Ahmet

8 min

blog

Top 13 AWS Projects: From Beginner to Pro

Explore 13 hands-on AWS projects for all levels. Enhance your cloud skills with practical, real-world applications and expert guidance.
Joleen Bothma's photo

Joleen Bothma

12 min

See MoreSee More