Skip to main content
HomeTutorialsSpreadsheets

Datasets from Images

This tutorial will demonstrate how you can make datasets in CSV format from images and use them for Data Science, on your laptop.
Aug 2018  · 4 min read

In machine learning, Deep Learning, Datascience most used data files are in json or CSV, here we will learn about CSV and use it to make a dataset. CSV stands for Comma Separated Values. These database fields have been exported into a format that contains a single line where a comma separates each database record. Files with the .csv extension are similar to plain text files. This allows individuals who do not run the same database applications to share database files between one another.

For this tutorial we have few requirements:

1->Microsoft Excel
2->Few Images
3->Notepad++(I suggest it as it is easy to use, you can try with WordPad, notepad or any text editor)

With this, let’s get started.

Now let's install 'Notepad++', visit this link- https://notepad-plus-plus.org/download/v7.5.6.html and install the version that works well on your device.

For 32-Bit

download 32 bit

For 64-Bit

download 64 bit

Now download a few images

Make a new folder (I named it as a dataset), make a few folders in it and fill those folders with images. I have downloaded car number plates from a few parts of the world and stored them folders.

new dataset
car number plates

Open terminal/Command Prompt in the current directory, i.e., in the folder dataset and run commands that I will be giving. Now I will list out commands for windows users:

Command to get a list of folders and files in your directory:-- dir /b/s

Command to get file names and save to a text file:-- dir /b/s/w *.jpg > "filename.txt"

For Linux users(Ubuntu):

the command to get a list of folders and files in your directory:-- ls -LR

Command to get file names and save to a text file:-- ls -LR *.jpg > files.txt

For Mac OSX: macOS is POSIX compliant, so it contains the usual command line utilities found in Unix environments.

the command to get a list of folders and files in your directory:--ls /b/s

Command to get file names and save to a text file:-- ls /b/s/w*.jpg > filename.txt

terminal

terminal

Here file was named by me as 'filename' here you can anything of your wish like 'names.txt', it will be stored in the directory where you used the command prompt (here I wanted only images stored with extension .JPG so I used *.jpg to call them you can use .jpeg or XML anything depending on your extension of images)

filename
dataset

Now enter ctrl+f and remove main directory details, for excel to pull images into it we need to give details of subdirectories and filenames starting with a " ./ " so we replace first backslash \ with ./ and the second one with /

find next

After making changes to the file, save the text file.

save text file

Now open Microsoft Excel, copy all names in a text file and paste them in excel sheet.

excel

If you want to label images, then make another column named label and fill them depending on how you want to label them. Here I labeled them depending on their country

excel

Now save it with extension CSV(Comma delimited) in the folder(dataset) where you have folders containing images.

csv

Now remove text file from that folder and convert folder which we named dataset to zip file.

You're Done!!!

You have successfully made a dataset in CSV format.

Conclusion

This tutorial provides a quick guide on how to make datasets in CSV format from images for data science. I hope you find this tutorial useful when you want to make a dataset. Hurray!!! You have completed this tutorial. If you have any questions or thoughts on the tutorial, feel free to reach out in the comments below.

Learn more about Spreadsheets

Data Analysis in Spreadsheets

BeginnerSkill Level
3 hr
8.6K
Learn to use spreadsheets to clean, analyze, and draw insights from data. Discover how to sort, filter, and use VLOOKUP to combine data.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

How to Earn a Microsoft Excel Certification in 2024: Top Tips and Resources

Discover step-by-step instructions, exam insights, and expert tips to achieve Excel certification.
Matt Crabtree's photo

Matt Crabtree

12 min

Google Cloud for Data Scientists: Harnessing Cloud Resources for Data Analysis

How can using Google Cloud make data analysis easier? We explore examples of companies that have already experienced all the benefits.
Oleh Maksymovych's photo

Oleh Maksymovych

9 min

Top 25 Excel Interview Questions For All Levels

A guide to the most common Excel interview questions for beginner, intermediate, and advanced users to ace the technical interview.
Chloe Lubin's photo

Chloe Lubin

17 min

A Guide to Docker Certification: Exploring The Docker Certified Associate (DCA) Exam

Unlock your potential in Docker and data science with our comprehensive guide. Explore Docker certifications, learning paths, and practical tips.
Matt Crabtree's photo

Matt Crabtree

8 min

Functional Programming vs Object-Oriented Programming in Data Analysis

Explore two of the most commonly used programming paradigms in data science: object-oriented programming and functional programming.
Amberle McKee's photo

Amberle McKee

15 min

A Comprehensive Introduction to Anomaly Detection

A tutorial on mastering the fundamentals of anomaly detection - the concepts, terminology, and code.
Bex Tuychiev's photo

Bex Tuychiev

14 min

See MoreSee More