Tutorials
python

Working With Zip Files In Python

In this tutorial, you are going to learn how to work with Zip Files in Python using the zipfile module. zipfile is a Python built-in module.

Table Of Contents

Prerequisites To Work With Zip Files

  • You must know the file handling of Python to understand Zip file handling. If you don't know the file handling, head over to the W3Schools File Handling section to learn.
  • OOPS concept in Python
  • Python concepts like conditionals, loops, functions, classes, etc.,

Open this link to download all of the Zip folders which I have used in the upcoming sections.

What is Zip File?

Zip is an archive file format which supports the lossless data compression. The Zip file is a single file containing one or more compressed files.

Uses for Zip File?

  • Zip files help you to put all related files in one place.
  • Zip files help to reduce the data size.
  • Zip files transfer faster than the individual file over many connections.

zipfile Module

Explore all the methods and classes of the zipfile module using dir() method. See the code to get all the classes and methods of the zipfile module.

import zipfile # importing the 'zipfile' module

print(dir(zipfile))
['BZIP2_VERSION', 'BadZipFile', 'BadZipfile', 'DEFAULT_VERSION', 'LZMACompressor', 'LZMADecompressor', 'LZMA_VERSION', 'LargeZipFile', 'MAX_EXTRACT_VERSION', 'PyZipFile', 'ZIP64_LIMIT', 'ZIP64_VERSION', 'ZIP_BZIP2', 'ZIP_DEFLATED', 'ZIP_FILECOUNT_LIMIT', 'ZIP_LZMA', 'ZIP_MAX_COMMENT', 'ZIP_STORED', 'ZipExtFile', 'ZipFile', 'ZipInfo', '_CD64_CREATE_VERSION', '_CD64_DIRECTORY_RECSIZE', '_CD64_DIRECTORY_SIZE', '_CD64_DISK_NUMBER', '_CD64_DISK_NUMBER_START', '_CD64_EXTRACT_VERSION', '_CD64_NUMBER_ENTRIES_THIS_DISK', '_CD64_NUMBER_ENTRIES_TOTAL', '_CD64_OFFSET_START_CENTDIR', '_CD64_SIGNATURE', '_CD_COMMENT_LENGTH', '_CD_COMPRESSED_SIZE', '_CD_COMPRESS_TYPE', '_CD_CRC', '_CD_CREATE_SYSTEM', '_CD_CREATE_VERSION', '_CD_DATE', '_CD_DISK_NUMBER_START', '_CD_EXTERNAL_FILE_ATTRIBUTES', '_CD_EXTRACT_SYSTEM', '_CD_EXTRACT_VERSION', '_CD_EXTRA_FIELD_LENGTH', '_CD_FILENAME_LENGTH', '_CD_FLAG_BITS', '_CD_INTERNAL_FILE_ATTRIBUTES', '_CD_LOCAL_HEADER_OFFSET', '_CD_SIGNATURE', '_CD_TIME', '_CD_UNCOMPRESSED_SIZE', '_ECD_COMMENT', '_ECD_COMMENT_SIZE', '_ECD_DISK_NUMBER', '_ECD_DISK_START', '_ECD_ENTRIES_THIS_DISK', '_ECD_ENTRIES_TOTAL', '_ECD_LOCATION', '_ECD_OFFSET', '_ECD_SIGNATURE', '_ECD_SIZE', '_EndRecData', '_EndRecData64', '_FH_COMPRESSED_SIZE', '_FH_COMPRESSION_METHOD', '_FH_CRC', '_FH_EXTRACT_SYSTEM', '_FH_EXTRACT_VERSION', '_FH_EXTRA_FIELD_LENGTH', '_FH_FILENAME_LENGTH', '_FH_GENERAL_PURPOSE_FLAG_BITS', '_FH_LAST_MOD_DATE', '_FH_LAST_MOD_TIME', '_FH_SIGNATURE', '_FH_UNCOMPRESSED_SIZE', '_SharedFile', '_Tellable', '_ZipDecrypter', '_ZipWriteFile', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_check_compression', '_check_zipfile', '_get_compressor', '_get_decompressor', 'binascii', 'bz2', 'compressor_names', 'crc32', 'error', 'importlib', 'io', 'is_zipfile', 'lzma', 'main', 'os', 're', 'shutil', 'sizeCentralDir', 'sizeEndCentDir', 'sizeEndCentDir64', 'sizeEndCentDir64Locator', 'sizeFileHeader', 'stat', 'stringCentralDir', 'stringEndArchive', 'stringEndArchive64', 'stringEndArchive64Locator', 'stringFileHeader', 'struct', 'structCentralDir', 'structEndArchive', 'structEndArchive64', 'structEndArchive64Locator', 'structFileHeader', 'sys', 'threading', 'time', 'zlib']

You have seen a bunch of classes and methods right. But, you are not going to learn all of them. You will learn only some classes and methods to work with the Zip files.

Let's see some useful Exceptions, Classes, and Methods with brief explanations.

Exceptions

Exception is a message which is used to display the exact error as you like. In Python, you use try, except, finally keywords for the error handling.

If you are not familiar with the error handling, go to Pythons Error Handling documentation to learn the error handling.

Let's see all exceptions in zipfile module.

zipfile.BadZipFile

zipfile.BadZipFile is an exception in the zipfile module. This error will raise for Bad Zip files. See the example below.

## zipfile.BadZipFile
import zipfile

def main():
    try:
        with zipfile.ZipFile('sample_file.zip') as file: # opening the zip file using 'zipfile.ZipFile' class
            print("Ok")
    except zipfile.BadZipFile: # if the zip file has any errors then it prints the error message which you wrote under the 'except' block
        print('Error: Zip file is corrupted')

if __name__ == '__main__': main()

## I used a badfile for the test
Ok

zipfile.LargeZipFile

Suppose if you want to work with a large Zip file, you need to enable the ZIP64 functionality while opening the Zip. If you don't enable it, LargeZipFile will raise. See the example.

## zipfile.LargeZipFile
## Without enabling 'Zip64'
import zipfile

def main():
    try:
        with zipfile.ZipFile('sample_file.zip') as file:
            print('File size is compatible')
    except zipfile.LargeZipFile: # it raises an 'LargeZipFile' error because you didn't enable the 'Zip64'
        print('Error: File size if too large')

if __name__ == '__main__': main()
File size is compatible
## zipfile.LargeZipFile
## With enabling 'ZIP64'
import zipfile

def main():
    try:
        with zipfile.ZipFile('sample_file.zip', mode = 'r', allowZip64 = True) as file: # here enabling the 'Zip64'
            print('File size is compatible')
    except zipfile.LargeZipFile:
        print('Error: File size if too large') # if the file size is too large to open it prints the error you have written

if __name__ == '__main__': main()
File size is compatible

Choose a Zip file which best suits for the Exception handling and then tries to run the program. You will get a clear Idea.

Classes

In simple words, class is a set of methods and attributes. You use the class methods and attributes wherever you want by creating the instances of class.

Let's see some classes of the zipfile module.

zipfile.ZipFile

The most common class which is used to work with Zip Files is ZipFile class.

zipfile.ZipFile is used to write and read the Zip files. It has some methods which are used to handle the Zip files.

Now, explore the methods of the ZipFile class using the dir() objects. See the code.

import zipfile

print(dir(zipfile.ZipFile)) # accessing the 'ZipFile' class
['_RealGetContents', '__class__', '__del__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_extract_member', '_fpclose', '_open_to_write', '_sanitize_windows_name', '_windows_illegal_name_trans_table', '_write_end_record', '_writecheck', 'close', 'comment', 'extract', 'extractall', 'fp', 'getinfo', 'infolist', 'namelist', 'open', 'printdir', 'read', 'setpassword', 'testzip', 'write', 'writestr']

You've already used the zipfile.ZipFile class to read Zip files in previous examples.

zipfile.ZipFile contains many methods like extract, open getinfo, setpassword, etc., to work the Zip files.

Let's see some methods of the ZipFile class.

## zipfile.ZipFile
import zipfile

def main():
    with zipfile.ZipFile('sample_file.zip') as file:

        # ZipFile.infolist() returns a list containing all the members of an archive file
        print(file.infolist())

        # ZipFile.namelist() returns a list containing all the members with names of an archive file
        print(file.namelist())

        # ZipFile.getinfo(path = filepath) returns the information about a member of Zip file.
        # It raises a KeyError if it doesn't contain the mentioned file
        print(file.getinfo(file.namelist()[-1]))

        # ZipFile.open(path = filepath, mode = mode_type, pwd = password) opens the members of an archive file
        # 'pwd' is optional -> if it has password mention otherwise leave it
        text_file = file.open(name = file.namelist()[-1], mode = 'r')

        # 'read()' method of the file prints all the content of the file. You see this method in file handling.
        print(text_file.read())  

        # You must close the file if you don't open a file using 'with' keyword
        # 'close()' method is used to close the file
        text_file.close()

        # ZipFile.extractall(path = filepath, pwd = password) extracts all the files to current directory
        file.extractall()
        # after executing check the directory to see extracted files


if __name__ == '__main__': main()
[<ZipInfo filename='extra_file.txt' filemode='-rw-rw-rw-' file_size=59>, <ZipInfo filename='READ ME.txt' filemode='-rw-rw-rw-' file_size=59>, <ZipInfo filename='even_odd.py' filemode='-rw-rw-rw-' file_size=129>]
['extra_file.txt', 'READ ME.txt', 'even_odd.py']
<ZipInfo filename='even_odd.py' filemode='-rw-rw-rw-' file_size=129>
b"num = int(input('Enter a Number:- '))\r\nif num % 2 == 0:\r\n\tprint('{} is Even'.fromat(num))\r\nelse:\r\n\tprint('{} is Odd'.fromat(num))"

If you want to learn all of the methods of ZipFile class use the help() function on the method you want to learn.

Or go to the official documentation of Python to learn.

zipfile.ZipInfo

zipfile.ZipInfo class used to represent the member of a Zip folder.

First, explore all the objects of zipfile.ZipInfo class using dir() method. See the code below.

## zipfile.ZipInfo
import zipfile

print(dir(zipfile.ZipInfo))
['CRC', 'FileHeader', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '_decodeExtra', '_encodeFilenameFlags', '_raw_time', 'comment', 'compress_size', 'compress_type', 'create_system', 'create_version', 'date_time', 'external_attr', 'extra', 'extract_version', 'file_size', 'filename', 'flag_bits', 'from_file', 'header_offset', 'internal_attr', 'is_dir', 'orig_filename', 'reserved', 'volume']

Now, you are going to see some methods of the class zipfile.ZipInfo

## zipfile.ZipInfo
import zipfile

def main():
    with zipfile.ZipFile('sample_file.zip') as file:

        # 'infolist()' is the object of 'ZipFile' class
        # 'infolist()' returns a list containing all the folders and files of the zip -> 'ZipInfo' objects
        # assigning last element of the list to a variable to test all the methods of 'ZipInfo'
        archive = file.infolist()
        read_me_file = archive[-1]

        # 'ZipInfo' methods

        # ZipInfo_object.filename returns the name of the file
        print("Name of the file:- {}".format(read_me_file.filename))

        # ZipInfo_object.file_size returns the size of the file
        print("Size of the file:- {}".format(read_me_file.file_size))

        # ZipInfo_object.is_dir() returns True if it's directory otherwise False
        print("Is directory:- {}".format(read_me_file.is_dir()))

        # ZipInfo_object.date_time() returns the created date & time of file
        print("File created data & time:- {}".format(read_me_file.date_time))

if __name__ == '__main__': main()
Name of the file:- sample_file/READ ME.txt
Size of the file:- 59
Is directory:- False
File created data & time:- (2018, 10, 4, 11, 32, 22)

Go to ZipInfo, if you want to learn more about the objects of ZipInfo.

Methods

Methods are a block of code for a specific functionality in the program. For example, if you want to find the absolute value of a number, you can use the Pythons methods called abs.

You can use it anywhere you want. Let's see some methods of the zipfile module.

zipfile.is_zipfile()

is_zipfile(filename) method of zipfile module returns True if the file is a valid Zip otherwise it returns False.

Let's see an example.

## zipfile.is_zip(filename)
import zipfile

def main():
    print(zipfile.is_zipfile('sample_file.zip')) # it returns True

if __name__ == '__main__': main()
True

Handling Zip Files

In this section, you will learn how to handle Zip files like Opening, Extracting, Writing, etc..,.

Extracting A Zip File

Extracting the files of a Zip file using the extractall method to the current directory.

## extracting zip file
import zipfile

def main():

    # assigning filename to a variable
    file_name = 'sample_file.zip'

    # opening Zip using 'with' keyword in read mode
    with zipfile.ZipFile(file_name, 'r') as file:
        # printing all the information of archive file contents using 'printdir' method
        print(file.printdir())

        # extracting the files using 'extracall' method
        print('Extracting all files...')
        file.extractall()
        print('Done!') # check your directory of zip file to see the extracted files

if __name__ == '__main__': main()
File Name                                             Modified             Size
sample_file/                                   2018-10-04 11:33:22            0
sample_file/even_odd.py                        2018-06-29 23:35:54          129
sample_file/READ ME.txt                        2018-10-04 11:32:22           59
None
Extracting all files...
Done!

Extracting A Zip With Password

To extract a Zip with a password, you need to pass a value to pwd positional argument of extract(pwd = password) or extractall(pwd = password) methods.

You must pass a password which is in bytes. To convert a str to bytes use the Pythons built-in method bytes with utf-8 encode format.

Let's see an example.

## extracting zip with password
import zipfile

def main():
    file_name = 'pswd_file.zip'
    pswd = 'datacamp'

    with zipfile.ZipFile(file_name) as file:
        # password you pass must be in the bytes you converted 'str' into 'bytes'
        file.extractall(pwd = bytes(pswd, 'utf-8'))

if __name__ == '__main__': main()

You can also extract files by using the setpassword(pwd = password) method of ZipFile class. See the example below.

## extracting zip with password
import zipfile

def main():
    file_name = 'pswd_file.zip'
    pswd = 'datacamp'

    with zipfile.ZipFile(file_name) as file:
        # 'setpassword' method is used to give a password to the 'Zip'
        file.setpassword(pwd = bytes(pswd, 'utf-8'))
        file.extractall()

if __name__ == '__main__': main()

Creating Zip Files

To create a Zip file, you don't need any extra methods. Just pass the name to the ZipFile class, and it will create an archive in the current directory.

See the below example.

## Creating Zip file
import zipfile

def main():

    archive_name = 'example_file.zip'
    # below one line of code will create a 'Zip' in the current working directory
    with zipfile.ZipFile(archive_name, 'w') as file:
        print("{} is created.".format(archive_name))

if __name__ == '__main__': main()
example_file.zip is created.

Writing To Zip Files

You have to open Zip files in write mode to write files to the archive file. It overrides all the existing files in the Zip.

Let's an example.

## Writing files to zip
import zipfile

def main():

    file_name = 'sample_file.zip'

    # Opening the 'Zip' in writing mode
    with zipfile.ZipFile(file_name, 'w') as file:
        # write mode overrides all the existing files in the 'Zip.'
        # you have to create the file which you have to write to the 'Zip.'
        file.write('extra_file.txt')
        print('File overrides the existing files')

    # opening the 'Zip' in reading mode to check
    with zipfile.ZipFile(file_name, 'r') as file:
        print(file.namelist())

if __name__ == '__main__': main()
File overrides the existing files
['extra_file.txt']

Appending Files To Zip

You have to open Zip in append(a) mode in order to append any files to the Zip. It doesn't override the existing files.

Let's see an example.

## Appending files to zip
import zipfile

def main():

    file_name = 'sample_file.zip'

    # opening the 'Zip' in writing mode
    with zipfile.ZipFile(file_name, 'a') as file:
        # append mode adds files to the 'Zip'
        # you have to create the files which you have to add to the 'Zip'
        file.write('READ ME.txt')
        file.write('even_odd.py')
        print('Files added to the Zip')

    # opening the 'Zip' in reading mode to check
    with zipfile.ZipFile(file_name, 'r') as file:
        print(file.namelist())

if __name__ == '__main__': main()
Files added to the Zip
['extra_file.txt', 'READ ME.txt', 'even_odd.py']

Till now, you have learned how to handle Zip files. Now, you will able to open, write, append, extract, create, etc.., Zip files. Now, you're going to write a simple program.

Let's see what it is?

Extracting Multiple Sub Zips With Password Using Loops And zipfile

  • You have a Zip which contains some sub Zip files in depth. And each of the Zip files has a password which is their name. Our challenge is to unzip all the Zip until you reach the end.

Steps To Solve The Problem

  • Extract the Parent File file using its name as the password.
  • Get the first child name using the namelist() method. Store it in a variable.
  • Run the loop for Infinite Times.

    • Check whether the file is Zip or not using is_zipfile() method. If yes do the following.

      • Open the zip with the name variable.

      • Get the password of the Zip from the name variable.

      • Extract the Zip.

      • Get and store the next Zip name in the name variable.

    • else

      • Break the loop using break.

I have created the above Procedure. If you want you can change it according to your files arrangement.

## Solution
import zipfile

def main():

    # storing the parent name
    parent_file_name = '000.zip'

    with zipfile.ZipFile(parent_file_name, 'r') as parent_file:

        # extracting the parent file
        pswd = bytes(parent_file_name.split('.')[0], 'utf-8')
        parent_file.extractall(pwd = pswd)

        # getting the first child
        next_zip_name = parent_file.namelist()[0]

        # looping through the sub zips infinite times until you don't encouter a 'Zip' file
        while True:

            if zipfile.is_zipfile(next_zip_name):
                # opening the zip
                with zipfile.ZipFile(next_zip_name, 'r') as child_file:

                    # getting password from the zip name
                    pswd = bytes(next_zip_name.split('.')[0], 'utf-8')

                    # extracting the zip
                    child_file.extractall(pwd = pswd)

                    # getting the child zip name
                    next_zip_name = child_file.namelist()[0]
            else:
                break

if __name__ == '__main__': main()

After executing the above program, you'll see all the sub zips are extracted to the current directory.

EndNote

Congratulations on completing the tutorial!

Hope you enjoyed the tutorial. This article helps you a lot when you are working with Zip files. Now, you are able to work with the Zip files.

If you have any doubts regarding the article, ask me in the comment section. I will reply as soon as possible.

Again, if you are new to Python take DataCamp's free Intro to Python for Data Science course to learn Python language or read Pythons official documentation.

Happy Coding!

Want to leave a comment?