Extract a zip file stored as Azure Blob with this simple method

Ever got a scenario wherein you receive zip files into Azure Blob Storage and you are asked to implement a listener to process individual files from the zip file?

If you’re already working on azure PaaS services, I don’t need to explain you how to implement a listener or a blob trigger using azure function app but what i wanted to focus in this post is to extract the zip file contents into an another blob container and process individual files in it.

Let’s get started…

Set the stage:

You’ll get Storage Account Connection String after this section, if you already have one, you can skip this section.

If you’re  following along with me, you will need an Azure Subscription for this sample. if you don’t have an active subscription, you can get one free here for trial.

once you get the subscription,

Go to azure portal and create a general purpose Storage Account which  is needed to create two blob containers for our sample. One for storing zip files (we will just upload sample zip file here but in real-world application, some process will upload the file) and other blob container is to extract this zip file blob(one single blob) into individual files(one blob for each file).

As per our scenario, we need to have a container and a zip file in it to be ready for extraction and processing. So just to focus on extracting file, let’s create a Blob Container named “zip-file-container” and upload a sample zip file(“samplelargefile.zip”) as a Block Blob into it.

you don’t need to create the other container at this point as that can be created from the code as well which i will show in a moment.

Copy the Connection String from Storage Account’s Settings > Access Keys > Connection Strings for later use.

Perform:

In this section, You’ll connect to Azure Storage and Extract Zip file into another Blob Container.

Now that you got connection ready for Azure Blob Storage and Zip file, let’s create a console application to extract it and process individual files. you may use the same logic as part of actual listener implementation.

Install below 3 Packages from NuGet Package Manager Console:

PM > Install-Package WindowsAzure.Storage -Version 8.1.4

PM > Install-Package System.IO.Compression -Version 4.3.0

PM > Install-Package Microsoft.WindowsAzure.ConfigurationManager -Version 3.2.3

Now, let me give step by step logic for our main topic today with some explanation for each of them.

with the above package installations, we get Azure Storage Client SDK to connect and work with Azure Storage services.

Reference below namespaces

 using System.IO.Compression;
 using Microsoft.Azure;
 using Microsoft.WindowsAzure.Storage;
 using Microsoft.WindowsAzure.Storage.Blob;

Retrieve the storage account reference using the connection string we copied at the end of previous section and create an instance for the CloudBlobClient.

// Retrieve storage account from connection string.
 CloudStorageAccount storageAccount =
 CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));

// Create the blob client.
 CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

Get a reference to the container where we have our zip file as well as reference to the file itself.

// Retrieve reference to a zip file container.
 CloudBlobContainer container = blobClient.GetContainerReference("zip-file-container");

// Retrieve reference to the blob - zip file which we wanted to extract 
 CloudBlockBlob blockBlob = container.GetBlockBlobReference("samplelargefile.zip");

Now also get a reference to the container where you wanted to extract the files to and create the container in Storage Account if it is not exists already.

//Retrieve reference to a container where you wanted to extract the zip file. 
 var extractcontainer = blockBlob.ServiceClient.GetContainerReference("file-extract-container");
 await extractcontainer.CreateIfNotExistsAsync();

As we have both source and target container references are setup, let’s download the zip file blob into a memory stream and and pass it on to ZipArchive class which is from System.IO.Compression namespace.

ZipArchive will take this memory stream as input and will provide a collection of entries property where in each entry represents an individual file in it.

// Save blob(zip file) contents to a Memory Stream.
 using (var zipBlobFileStream = new MemoryStream())
 {
    await blockBlob.DownloadToStreamAsync(zipBlobFileStream);
    await zipBlobFileStream.FlushAsync();
    zipBlobFileStream.Position = 0;
    //use ZipArchive from System.IO.Compression to extract all the files from zip file
    using (var zip = new ZipArchive(zipBlobFileStream))
    {
       //Each entry here represents an individual file or a folder
       foreach (var entry in zip.Entries)
       {
          //creating an empty file (blobkBlob) for the actual file with the same name of file
          var blob = extractcontainer.GetBlockBlobReference(entry.FullName);
          using (var stream = entry.Open())
          {
             //check for file or folder and update the above blob reference with actual content from stream
             if (entry.Length > 0)
                await blob.UploadFromStreamAsync(stream);
          }
 
         // TO-DO : Process the file (Blob)
         //process the file here (blob) or you can write another process later 
         //to reference each of these files(blobs) on all files got extracted to other container.
       }
    }
 }

With Zip.Entries property we can enumerate thorough each entry and create a new blob at target container with same file name from the entry stream. you can even process the file at this point using the stream for each entry object.

Other library for extracting zip file, which i feel worth giving a try is DotNetZip.

Let me know if you come across any other simple ways of doing it.

Hope this helped. Happy Coding!!!

Leave a comment