In machine learning, Deep Learning, Datascience most used data files are in json or CSV, here we will learn about CSV and use it to make a dataset. CSV stands for Comma Separated Values. These database fields have been exported into a format that contains a single line where a comma separates each database record. Files with the .csv extension are similar to plain text files. This allows individuals who do not run the same database applications to share database files between one another.
For this tutorial we have few requirements:
1->Microsoft Excel 2->Few Images 3->Notepad++(I suggest it as it is easy to use, you can try with WordPad, notepad or any text editor)
With this, let’s get started.
Now let's install 'Notepad++', visit this link- https://notepad-plus-plus.org/download/v7.5.6.html and install the version that works well on your device.
Now download a few images
Make a new folder (I named it as a dataset), make a few folders in it and fill those folders with images. I have downloaded car number plates from a few parts of the world and stored them folders.
Open terminal/Command Prompt in the current directory, i.e., in the folder dataset and run commands that I will be giving. Now I will list out commands for windows users:
Command to get a list of folders and files in your directory:-- dir /b/s
Command to get file names and save to a text file:-- dir /b/s/w *.jpg > "filename.txt"
For Linux users(Ubuntu):
the command to get a list of folders and files in your directory:-- ls -LR
Command to get file names and save to a text file:-- ls -LR *.jpg > files.txt
For Mac OSX: macOS is POSIX compliant, so it contains the usual command line utilities found in Unix environments.
the command to get a list of folders and files in your directory:--ls /b/s
Command to get file names and save to a text file:-- ls /b/s/w*.jpg > filename.txt
Here file was named by me as 'filename' here you can anything of your wish like 'names.txt', it will be stored in the directory where you used the command prompt (here I wanted only images stored with extension .JPG so I used *.jpg to call them you can use .jpeg or XML anything depending on your extension of images)
Now enter ctrl+f and remove main directory details, for excel to pull images into it we need to give details of subdirectories and filenames starting with a " ./ " so we replace first backslash \ with ./ and the second one with /
After making changes to the file, save the text file.
Now open Microsoft Excel, copy all names in a text file and paste them in excel sheet.
If you want to label images, then make another column named label and fill them depending on how you want to label them. Here I labeled them depending on their country
Now save it with extension CSV(Comma delimited) in the folder(dataset) where you have folders containing images.
Now remove text file from that folder and convert folder which we named dataset to zip file.
You have successfully made a dataset in CSV format.
This tutorial provides a quick guide on how to make datasets in CSV format from images for data science. I hope you find this tutorial useful when you want to make a dataset. Hurray!!! You have completed this tutorial. If you have any questions or thoughts on the tutorial, feel free to reach out in the comments below.
← Back to tutorial