A while ago I was working on a little tool to manage my media files – TV shows and movies that I’d ripped from my DVD collection that I wanted to make available through XBMC on my home theater system. To display things like plot, characters, runtime, etc., XBMC reads NFO files – which are just XML files – one for each media file. Entering this information by hand is not a fun afternoon activity, so I looked for a way to automate it. I found a couple of different sites that provided the data I needed via APIs and eventually settled on TheTVDB.com for TV series and themoviedb.org for movies. So all I needed to do was read the data using the APIs and write it to the NFO files. Simple!
For each TV series, TheTVDB provides a ZIP file containing several XML files which splits up the information to make it more manageable and allows for easier internationalization. A typical ZIP from TheTVDB contains files such as actors.xml, banners.xml, and en.xml. The actors.xml file contains a list of actors, the roles they play, a link to an image, etc.. banners.xml provides links to fan art and thumbnails which XBMC can use for display. en.xml provides the core data: a description of the TV series and a list of all the episodes from the series including plot, runtime, air date, etc. for each of them.
So what’s the best way to download and access this zipped data? One solution – I think the most obvious – is to download the ZIP file to a temp directory, unzip it either using a system command or a built-in ZIP implementation, then to read the XML file you’re interested in from the disk. One disadvantage with this solution is speed. Why unzip the whole file if you just want to read some data from one of the files in it? This just results in a bunch of unnecessary I/O. So my solution was to download the ZIP file to a temp directory, then simply read the XML file I wanted directly into a data structure without unpacking it.
I created a little class called asmZip to read a ZIP file and access a specific file within it. It is used like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
#include "asmZip.h" int main( int argc, char *argv[] ) { Q_UNUSED( argc ); Q_UNUSED( argv ); asmZip zipFile( "en.zip" ); if ( zipFile.isValid() ) { zipFile.listFiles(); QByteArray data; zipFile.extractFile( "banners.xml", data ); // "data" now contains the contents of "banners.xml" } return 0; } |
I’d used minizip in a project years ago, so I decided to use it for unzipping. I downloaded v1.1 with zip64 support and unpacked it into the same directory as my source.
Next I added the files I needed (ioapi.[hc] and unzip.[hc]) to my qmake .pro file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
QT += core TARGET = asmZip TEMPLATE = app DEFINES += NOCRYPT USE_FILE32API HEADERS += asmZip.h SOURCES += main.cpp asmZip.cpp # unzip stuff INCLUDEPATH += unzip $$(QTDIR)/src/3rdparty/zlib HEADERS += unzip/ioapi.h unzip/unzip.h SOURCES += unzip/ioapi.c unzip/unzip.c mac { LIBS += -lz } |
Here’s the header file for the asmZip class:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
#ifndef __ASM_ZIP_H__ #define __ASM_ZIP_H__ /* Access ZIP files - List files contained in ZIP - Unzip specific files in memory and put the contents in a QByteArray Uses minizip code from here: http://www.winimage.com/zLibDll/minizip.html */ #include "unzip.h" class QByteArray; class QString; class QTextStream; class asmZip { public: // enum for case sensitivity // [see unzStringFileNameCompare() in unzip.c] enum eCaseSensitivity { CASE_OS_DEFAULT = 0, CASE_SENSITIVE, CASE_INSENSITIVE }; // Wrap the minizip error defines from unzip.h in a nice enum and add our own enum eErrCode { NO_ERR = UNZ_OK, ERR_END_OF_LIST_OF_FILE = UNZ_END_OF_LIST_OF_FILE, ERR_ERRNO = UNZ_ERRNO, ERR_EOF = UNZ_EOF, ERR_PARAMERROR = UNZ_PARAMERROR, ERR_BADZIPFILE = UNZ_BADZIPFILE, ERR_INTERNALERROR = UNZ_INTERNALERROR, ERR_CRCERROR = UNZ_CRCERROR, ERR_FILE_NOT_FOUND_IN_ZIP = -1000 }; public: asmZip( const QString &inFileName ); ~asmZip(); // Is this ZIP file readable and valid? bool isValid() const { return mFile != NULL; } // Extract a file from the ZIP file and put the contents into a QByteArray eErrCode extractFile( const QString &inFileName, QByteArray &outData, eCaseSensitivity inCaseSensitive = CASE_OS_DEFAULT ) const; // List the contents of the ZIP file // Modified version of the do_list() function in miniunz.c eErrCode listFiles() const; private: static void sOutputListLine( QTextStream &out, const QStringList &inStrings ); unzFile mFile; }; #endif |
This is a pretty simple, straightforward class as there are only three public functions and a constructor. This could be expanded to do all sorts of interesting stuff with ZIP files, but for my project I only needed the capability to read a file from a ZIP into memory. I also included a method to list files in the ZIP which is a modified version from the minizip code which uses Qt classes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
#include <QStringList> #include <QDebug> #include "asmZip.h" asmZip::asmZip( const QString &inFileName ) { mFile = unzOpen64( qPrintable( inFileName ) ); if ( mFile == NULL ) { qDebug() << "asmZip: could not open file" << inFileName; } } asmZip::~asmZip() { unzClose( mFile ); } |
The constructor and destructor handle the opening and closing of the ZIP file using the minizip functions. Note that the ZIP file will remain open for the duration of the existence of the asmZip instance, so if you don’t want that you can move the open and close code into their own functions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
asmZip::eErrCode asmZip::extractFile( const QString &inFileName, QByteArray &outData, eCaseSensitivity inCaseSensitive ) const { if ( !isValid() ) return ERR_BADZIPFILE; if ( unzLocateFile( mFile, qPrintable( inFileName ), inCaseSensitive ) != UNZ_OK ) { qDebug() << "ERROR: asmZip::extractFile() - file not found in the zipfile" << inFileName; return ERR_FILE_NOT_FOUND_IN_ZIP; } unz_file_info64 file_info; int err = unzGetCurrentFileInfo64( mFile, &file_info, NULL, 0, NULL, 0, NULL, 0 ); if ( err != NO_ERR) { qDebug() << "ERROR: asmZip::extractFile() - unzGetCurrentFileInfo64" << err; return static_cast<eErrCode>( err ); } err = unzOpenCurrentFile( mFile ); if ( err != NO_ERR) { qDebug() << "ERROR: asmZip::extractFile() - unzOpenCurrentFile" << err; return static_cast<eErrCode>( err ); } outData.fill( 0, file_info.uncompressed_size + 1 ); qDebug() << "asmZip::extractFile() - Extracting" << inFileName << "buffer size" << outData.size(); err = unzReadCurrentFile( mFile, outData.data(), outData.size() ); if ( err < 0 ) { qDebug() << "ERROR: asmZip::extractFile() - unzReadCurrentFile" << err; return static_cast<eErrCode>( err ); } unzCloseCurrentFile( mFile ); return static_cast<eErrCode>( err ); } |
This is the code which finds and extracts a file from a ZIP into a QByteArray. I’ve added debug statements to provide a bit of info on error, but you’ll want to handle this your own way. It also outputs a message to qDebug() when you extract a file:
1 |
asmZip::extractFile() - Extracting "banners.xml" buffer size 19621 |
Finally we have a couple of functions which are a quick-and-dirty translation to Qt to list the contents of a ZIP file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
void asmZip::sOutputListLine( QTextStream &out, const QStringList &inStrings ) { if ( inStrings.count() != 8 ) return; out << qSetFieldWidth( 8 ) << inStrings[0]; out << qSetFieldWidth( 8 ) << inStrings[1]; out << qSetFieldWidth( 9 ) << inStrings[2]; out << qSetFieldWidth( 6 ) << inStrings[3]; out << qSetFieldWidth( 10 ) << inStrings[4]; out << qSetFieldWidth( 7 ) << inStrings[5]; out << qSetFieldWidth( 9 ) << inStrings[6]; out << qSetFieldWidth( 20 ) << inStrings[7]; out << endl; } asmZip::eErrCode asmZip::listFiles() const { unz_global_info64 gi; int err = unzGetGlobalInfo64( mFile, &gi ); if ( err != NO_ERR ) { qDebug() << QString( "asmZip::list - error %d with zipfile in unzGetGlobalInfo \n").arg( err ); return static_cast<eErrCode>( err ); } QTextStream out( stdout ); out << right; QStringList strList; strList << "Length" << "Method" << "Size" << "Ratio" << "Date" << "Time" << "CRC-32" << "Name"; sOutputListLine( out, strList ); strList.clear(); strList << "------" << "------" << "----" << "-----" << "----" << "----" << "------" << "----"; sOutputListLine( out, strList ); err = unzGoToFirstFile( mFile ); if ( err != NO_ERR ) { qDebug() << QString( "asmZip::list - error %d with zipfile in unzGoToFirstFile \n").arg( err ); return static_cast<eErrCode>( err ); } for ( unsigned int i = 0; i < gi.number_entry; i++ ) { unz_file_info64 file_info; QByteArray filename_inzip; filename_inzip.resize( 1024 ); err = unzGetCurrentFileInfo64( mFile, &file_info, filename_inzip.data(), filename_inzip.size(), NULL, 0, NULL, 0 ); if ( err != NO_ERR) { qDebug() << QString( "asmZip::list - error %d with zipfile in unzGetCurrentFileInfo \n").arg( err ); break; } int ratio = 0; if ( file_info.uncompressed_size > 0 ) ratio = ((file_info.compressed_size*100) / file_info.uncompressed_size); QString method; if ( file_info.compression_method == 0 ) { method = "Stored"; } else if ( file_info.compression_method == Z_DEFLATED ) { int iLevel = (file_info.flag & 0x6) / 2; if ( iLevel == 0 ) method = "Defl:N"; else if ( iLevel == 1 ) method = "Defl:X"; else if ( (iLevel == 2) || (iLevel == 3) ) method = "Defl:F"; /* 2:fast , 3 : extra fast*/ } else if ( file_info.compression_method == Z_BZIP2ED ) { method = "BZip2"; } else { method = "Unkn."; } // add a '*' if the file is encrypted if ( (file_info.flag & 1) != 0 ) method += '*'; strList.clear(); strList << QString::number( file_info.uncompressed_size ); strList << method; strList << QString::number( file_info.compressed_size ); strList << QString( "%1%" ).arg( ratio ); strList << QString( "%1-%2-%3" ).arg( file_info.tmu_date.tm_mon + 1, 2, 10, QChar( '0' ) ) .arg( file_info.tmu_date.tm_mday, 2, 10, QChar( '0' ) ) .arg( file_info.tmu_date.tm_year % 100, 2, 10, QChar( '0' ) ); strList << QString( "%1:%2" ).arg( file_info.tmu_date.tm_hour, 2, 10, QChar( '0' ) ) .arg( file_info.tmu_date.tm_min, 2, 10, QChar( '0' ) ); strList << QString::number( file_info.crc, 16 ); strList << filename_inzip; sOutputListLine( out, strList ); if ( (i+1) < gi.number_entry ) { err = unzGoToNextFile( mFile ); if ( err != NO_ERR) { qDebug() << QString( "asmZip::list - error %d with zipfile in unzGoToNextFile \n").arg( err ); break; } } } return NO_ERR; } |
Executing this member function will give you a list of files in the ZIP. Something like this:
1 2 3 4 5 |
Length Method Size Ratio Date Time CRC-32 Name ------ ------ ---- ----- ---- ---- ------ ---- 143107 Defl:N 29246 20% 12-22-11 14:02 8266991c en.xml 19620 Defl:N 1486 7% 12-22-11 14:02 cc2b778f banners.xml 3839 Defl:N 823 21% 12-22-11 14:02 9faa00bf actors.xml |
I didn’t really need this for my project, but it can be useful to see what is in the ZIP, what other info you have access to, and how to access it.
That’s it! A simple little class using Qt to access files in-memory from the ZIP file instead of unpacking them.
I hope someone, somewhere finds it useful…
Hi,
How would this all work with password-protected zip archives?
Thanks
Rajiv
Been a long time since I looked at this, but I think each file within the archive is password protected. So I would change the call to unzOpenCurrentFile() to unzOpenCurrentFilePassword().
Inserting NULL for filename_compare_func worked for me.
Yes – that’s what Steven did too. That will make it default to
strcmp()
.He was missing zlib which is why he had link issues.
After changing the code to minizip from github , asm_zip.cpp give the casting error below.
I guess they changed the signature of the function to make the comparison more generic.
One option would be to write two
unzFileNameComparer
functions and pick the right one based oneCaseSensitivity inCaseSensitive
.I try looking into header file (unzip.h) and just find
In unzip.c file , I found
therefore I change the extractFile in asm_zip.cpp to
but now I got link error below
I don’t know where those are defined. You might want to post an issue in the github for that project.
I think all of those functions are zlib functions. zlib is installed on Mac OS X by default (and linked in the .pro with -lz on the Mac), but you’re on Windows.
So you’ll need to download, install, and link your project to zlib.
My QT creator said it doesn’t found ‘file_info’ in ‘asmZip.cpp’. Did I miss something in ‘.pro’ file?
Two things to check:
1) Did you download the minizip code and add the files (ioapi.[hc] and unzip.[hc]) to your .pro?
2) This is rather old – perhaps minizip has changed or it might be a Qt5 thing (this was written with Qt4)?
Are there any other clues? Maybe you can paste in the actual error?
I am using QT 5.5.Here are some of the error message.
and my “.pro” look like this
It looks like you have the wrong minizip code. You want “v1.1 with zip64 support”.
I don’t see the package on that page anymore, but you can try this version of the code:
https://github.com/nmoinvaz/minizip
Hi, thanks for sharing. You can access the method’s name via __FUNCTION__ macro for the debug output.
Thanks. The __FUNCTION__ macro is non-standard, so not all compilers support it. The C99 standard provides __func__, but I don’t know the state of adoption.