Rating: 1 Star2 Stars3 Stars4 Stars5 Stars

How to read zip archives in Python

You must have run under space issues when Zip is the name of a popular file compression algorithm, which lets you both combine multiple files into a single archive and store them on disk using less space. We’ll show you how you can open and read zip files in your Python scripts.

kijiji canada goose mystique, Canada Goose Jacket, Canada Goose Jackets, Canada Goose Outlet,canada goose outlet berry chilliwack bomber., Canada Goose coats, Canada Goose, Canada Goose Outlet

Python’s “batteries included” philosophy regarding libraries is well known. What it means to you is that the functions that perform the majority of basic operations you’ll need in your scripts are included in the standard library that comes with the Python distribution. Working with zip archives is no exception — functionality is included in the zipfile module.

If you’re going to be working more than just casually with zip archives, you won’t get too far without knowing the specifics of how files are arranged within archives. The best resource for this is the ZIP Application Note distributed by PKWARE, the company responsible for developing the compression algorithm. The application note contains a full file format specification for .zip archives.

For this article we’ll be working with a zip file called “test.zip” which contains two files: “file1.txt” and “file2.txt”. If you’re following along, you can download the file here.

First things first, we need to load the file and create a ZipFile instance:

NB: It’s important that if you modify a zip file, you always close the archive. Even though we won’t be changing the file in this article, closing the archive is a good habit to be in. The following code will close a ZipFile:

Now that we’ve got a ZipFile, what can we do with it? If all you need is to pull out the contents of the archive, then you can use the read method:

There are two classes defined by the module you must use to read zip archives. The first, ZipFile, we’ve already dealt with. ZipFile deals with methods relating to the archive as a whole: opening and closing the archive, reading and writing from it and the list of files contained.

We’ve demonstrated opening, closing and reading from archives already — we’ll leave writing for another day. Let’s take a look at accessing the file list. The following is a short python script that accepts a number of zip archive names as arguments and then prints the contents:

When we run it with our test archive as a parameter:

The important function here is the namelist function, which simply returns the filenames of the contents of the archives. You can combine this with read to dump the full contents of the archive to screen:

The second class defined by the zipfile module is ZipInfo, and are used to store information about each single file contained within the archive. If you need more information about the archive’s contents than just the filenames then you need to inspect the ZipInfo file.

You can get an info file for any member by using the ZipFile.getinfo(name) method, or if you want the whole list, you can use the ZipFile.infolist() method. The following program prints the name, size and last modification time of each file in the archives given as command line arguments:

Then when we run the program:

In Python, working with zip files is as easy as that. Building zip functionality into your own scripts allows you to access and store resources while using reduced disk space.

Leave me a comment and let me hear your opinion. If you’ve got any thoughts, comments or suggestions for things we could add, leave a comment! Also please Subscribe to our RSS for latest tips, tricks and examples on cutting edge stuff.