Calculate CRC32, MD5 and SHA1 of zip content without decompression in Python2019 Community Moderator ElectionSHA1 vs md5 vs SHA256: which to use for a PHP login?Is calculating an MD5 hash less CPU intensive than SHA family functions?MD5 and SHA1 C++ hashing libraryCRC32+Size vs MD5/SHA1md5 hash or crc32 which one to use in this caseCalculate MD5 checksum for a filesha1, crc32, and md5 how to read this data?split a FileStream for multiple consumers in C#Hashing passwords with MD5, SHA1 and MD5 over SHA1Calculate MD5 and SHA1 simultaneously on large file

If there are any 3nion, 5nion, 7nion, 9nion, 10nion, etc.

Why won't the strings command stop?

Make me a metasequence

Is there a limit on the maximum number of future jobs queued in an org?

Why are special aircraft used for the carriers in the United States Navy?

Where is the fallacy here?

What is better: yes / no radio, or simple checkbox?

How can I be pwned if I'm not registered on the compromised site?

When do _WA_Sys_ statistics Get Updated?

For the Kanji 校 is the fifth stroke connected to the sixth stroke?

Why did the Cray-1 have 8 parity bits per word?

How to substitute values from a list into a function?

I can't die. Who am I?

Does "legal poaching" exist?

Create chunks from an array

Four buttons on a table

Should I use HTTPS on a domain that will only be used for redirection?

"seeing as you don't know anyone but me" meaning in this context

Fake utcnow for the pytest

If nine coins are tossed, what is the probability that the number of heads is even?

Giving a talk in my old university, how prominently should I tell students my salary?

Movie: Scientists travel to the future to avoid nuclear war, last surviving one is used as fuel by future humans

Why is it "take a leak?"

Should we avoid writing fiction about historical events without extensive research?

Calculate CRC32, MD5 and SHA1 of zip content without decompression in Python

2019 Community Moderator ElectionSHA1 vs md5 vs SHA256: which to use for a PHP login?Is calculating an MD5 hash less CPU intensive than SHA family functions?MD5 and SHA1 C++ hashing libraryCRC32+Size vs MD5/SHA1md5 hash or crc32 which one to use in this caseCalculate MD5 checksum for a filesha1, crc32, and md5 how to read this data?split a FileStream for multiple consumers in C#Hashing passwords with MD5, SHA1 and MD5 over SHA1Calculate MD5 and SHA1 simultaneously on large file

I need to calculate the CRC32, MD5 and SHA1 of the content of zip files without decompressing them.

So far I found out how to calculate these for the zip files itself, e.g.:

CRC32:

import zlib


zip_name = "test.zip"


def Crc32Hasher(file_path):

 buf_size = 65536
 crc32 = 0

 with open(file_path, 'rb') as f:
 while True:
 data = f.read(buf_size)
 if not data:
 break
 crc32 = zlib.crc32(data, crc32)

 return format(crc32 & 0xFFFFFFFF, '08x')


print(Crc32Hasher(zip_name))

SHA1: (MD5 similarly)

import hashlib


zip_name = "test.zip"


def Sha1Hasher(file_path):

 buf_size = 65536
 sha1 = hashlib.sha1()

 with open(file_path, 'rb') as f:
 while True:
 data = f.read(buf_size)
 if not data:
 break
 sha1.update(data)

 return format(sha1.hexdigest())


print(Sha1Hasher(zip_name))

For the content of the zip file, I can read the CRC32 from the zip directly without the need of calculating it as follow:

Read CRC32 of zip content:

import zipfile

zip_name = "test.zip"

if zip_name.lower().endswith(('.zip')):
 z = zipfile.ZipFile(zip_name, "r")

for info in z.infolist():

 print(info.filename,
 format(info.CRC & 0xFFFFFFFF, '08x'))

But I couldn't figure out how to calculate the SHA1 (or MD5) of the content of zip files without decompressing them first.
Is that somehow possible?

asked May 22 '17 at 3:55

paradadf

486

add a comment |

I need to calculate the CRC32, MD5 and SHA1 of the content of zip files without decompressing them.

So far I found out how to calculate these for the zip files itself, e.g.:

CRC32:

import zlib


zip_name = "test.zip"


def Crc32Hasher(file_path):

 buf_size = 65536
 crc32 = 0

 with open(file_path, 'rb') as f:
 while True:
 data = f.read(buf_size)
 if not data:
 break
 crc32 = zlib.crc32(data, crc32)

 return format(crc32 & 0xFFFFFFFF, '08x')


print(Crc32Hasher(zip_name))

SHA1: (MD5 similarly)

import hashlib


zip_name = "test.zip"


def Sha1Hasher(file_path):

 buf_size = 65536
 sha1 = hashlib.sha1()

 with open(file_path, 'rb') as f:
 while True:
 data = f.read(buf_size)
 if not data:
 break
 sha1.update(data)

 return format(sha1.hexdigest())


print(Sha1Hasher(zip_name))

For the content of the zip file, I can read the CRC32 from the zip directly without the need of calculating it as follow:

Read CRC32 of zip content:

import zipfile

zip_name = "test.zip"

if zip_name.lower().endswith(('.zip')):
 z = zipfile.ZipFile(zip_name, "r")

for info in z.infolist():

 print(info.filename,
 format(info.CRC & 0xFFFFFFFF, '08x'))

But I couldn't figure out how to calculate the SHA1 (or MD5) of the content of zip files without decompressing them first.
Is that somehow possible?

asked May 22 '17 at 3:55

paradadf

486

add a comment |

I need to calculate the CRC32, MD5 and SHA1 of the content of zip files without decompressing them.

So far I found out how to calculate these for the zip files itself, e.g.:

CRC32:

import zlib


zip_name = "test.zip"


def Crc32Hasher(file_path):

 buf_size = 65536
 crc32 = 0

 with open(file_path, 'rb') as f:
 while True:
 data = f.read(buf_size)
 if not data:
 break
 crc32 = zlib.crc32(data, crc32)

 return format(crc32 & 0xFFFFFFFF, '08x')


print(Crc32Hasher(zip_name))

SHA1: (MD5 similarly)

import hashlib


zip_name = "test.zip"


def Sha1Hasher(file_path):

 buf_size = 65536
 sha1 = hashlib.sha1()

 with open(file_path, 'rb') as f:
 while True:
 data = f.read(buf_size)
 if not data:
 break
 sha1.update(data)

 return format(sha1.hexdigest())


print(Sha1Hasher(zip_name))

For the content of the zip file, I can read the CRC32 from the zip directly without the need of calculating it as follow:

Read CRC32 of zip content:

import zipfile

zip_name = "test.zip"

if zip_name.lower().endswith(('.zip')):
 z = zipfile.ZipFile(zip_name, "r")

for info in z.infolist():

 print(info.filename,
 format(info.CRC & 0xFFFFFFFF, '08x'))

But I couldn't figure out how to calculate the SHA1 (or MD5) of the content of zip files without decompressing them first.
Is that somehow possible?

asked May 22 '17 at 3:55

paradadf

486

I need to calculate the CRC32, MD5 and SHA1 of the content of zip files without decompressing them.

So far I found out how to calculate these for the zip files itself, e.g.:

CRC32:

import zlib


zip_name = "test.zip"


def Crc32Hasher(file_path):

 buf_size = 65536
 crc32 = 0

 with open(file_path, 'rb') as f:
 while True:
 data = f.read(buf_size)
 if not data:
 break
 crc32 = zlib.crc32(data, crc32)

 return format(crc32 & 0xFFFFFFFF, '08x')


print(Crc32Hasher(zip_name))

SHA1: (MD5 similarly)

import hashlib


zip_name = "test.zip"


def Sha1Hasher(file_path):

 buf_size = 65536
 sha1 = hashlib.sha1()

 with open(file_path, 'rb') as f:
 while True:
 data = f.read(buf_size)
 if not data:
 break
 sha1.update(data)

 return format(sha1.hexdigest())


print(Sha1Hasher(zip_name))

For the content of the zip file, I can read the CRC32 from the zip directly without the need of calculating it as follow:

Read CRC32 of zip content:

import zipfile

zip_name = "test.zip"

if zip_name.lower().endswith(('.zip')):
 z = zipfile.ZipFile(zip_name, "r")

for info in z.infolist():

 print(info.filename,
 format(info.CRC & 0xFFFFFFFF, '08x'))

But I couldn't figure out how to calculate the SHA1 (or MD5) of the content of zip files without decompressing them first.
Is that somehow possible?

python hash md5 sha1 crc32

asked May 22 '17 at 3:55

paradadf

486

asked May 22 '17 at 3:55

paradadf

486

asked May 22 '17 at 3:55

paradadf

486

asked May 22 '17 at 3:55

paradadf

486

asked May 22 '17 at 3:55

paradadf

486

add a comment |

1 Answer
1

active

oldest

votes

It is not possible. You can get CRC because it was carefully precalculated for you when archive is created (it is used for integrity check). Any other checksum/hash has to be calculated from scratch and will require at least streaming of the archive content, i.e. unpacking.

UPD: Possibble implementations

libarchive: extra dependencies, supports many archive formats

import libarchive.public as libarchive
with libarchive.file_reader(fname) as archive:
 for entry in archive:
 md5 = hashlib.md5()
 for block in entry.get_blocks():
 md5.update(block)
 print(str(entry), md5.hexdigest())

Native zipfile: no dependencies, zip only

import zipfile

archive = zipfile.ZipFile(fname)
blocksize = 1024**2 #1M chunks
for fname in archive.namelist():
 entry = archive.open(fname)
 md5 = hashlib.md5()
 while True:
 block = entry.read(blocksize)
 if not block:
 break
 md5.update(block)
 print(fname, md5.hexdigest())

edited 6 hours ago

Steve Barnes

20.8k43852

answered May 22 '17 at 4:04

Marat

4,02811931

thanks for answering. What would be the most memory efficient way of doing that?

– paradadf
May 23 '17 at 16:17

1

@paradadf updated the answer

– Marat
May 25 '17 at 18:43

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44104426%2fcalculate-crc32-md5-and-sha1-of-zip-content-without-decompression-in-python%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

UPD: Possibble implementations

libarchive: extra dependencies, supports many archive formats

import libarchive.public as libarchive
with libarchive.file_reader(fname) as archive:
 for entry in archive:
 md5 = hashlib.md5()
 for block in entry.get_blocks():
 md5.update(block)
 print(str(entry), md5.hexdigest())

Native zipfile: no dependencies, zip only

import zipfile

archive = zipfile.ZipFile(fname)
blocksize = 1024**2 #1M chunks
for fname in archive.namelist():
 entry = archive.open(fname)
 md5 = hashlib.md5()
 while True:
 block = entry.read(blocksize)
 if not block:
 break
 md5.update(block)
 print(fname, md5.hexdigest())

edited 6 hours ago

Steve Barnes

20.8k43852

answered May 22 '17 at 4:04

Marat

4,02811931

thanks for answering. What would be the most memory efficient way of doing that?

– paradadf
May 23 '17 at 16:17

1

@paradadf updated the answer

– Marat
May 25 '17 at 18:43

add a comment |

UPD: Possibble implementations

libarchive: extra dependencies, supports many archive formats

import libarchive.public as libarchive
with libarchive.file_reader(fname) as archive:
 for entry in archive:
 md5 = hashlib.md5()
 for block in entry.get_blocks():
 md5.update(block)
 print(str(entry), md5.hexdigest())

Native zipfile: no dependencies, zip only

import zipfile

archive = zipfile.ZipFile(fname)
blocksize = 1024**2 #1M chunks
for fname in archive.namelist():
 entry = archive.open(fname)
 md5 = hashlib.md5()
 while True:
 block = entry.read(blocksize)
 if not block:
 break
 md5.update(block)
 print(fname, md5.hexdigest())

edited 6 hours ago

Steve Barnes

20.8k43852

answered May 22 '17 at 4:04

Marat

4,02811931

thanks for answering. What would be the most memory efficient way of doing that?

– paradadf
May 23 '17 at 16:17

1

@paradadf updated the answer

– Marat
May 25 '17 at 18:43

add a comment |

UPD: Possibble implementations

libarchive: extra dependencies, supports many archive formats

import libarchive.public as libarchive
with libarchive.file_reader(fname) as archive:
 for entry in archive:
 md5 = hashlib.md5()
 for block in entry.get_blocks():
 md5.update(block)
 print(str(entry), md5.hexdigest())

Native zipfile: no dependencies, zip only

import zipfile

archive = zipfile.ZipFile(fname)
blocksize = 1024**2 #1M chunks
for fname in archive.namelist():
 entry = archive.open(fname)
 md5 = hashlib.md5()
 while True:
 block = entry.read(blocksize)
 if not block:
 break
 md5.update(block)
 print(fname, md5.hexdigest())

edited 6 hours ago

Steve Barnes

20.8k43852

answered May 22 '17 at 4:04

Marat

4,02811931

UPD: Possibble implementations

libarchive: extra dependencies, supports many archive formats

import libarchive.public as libarchive
with libarchive.file_reader(fname) as archive:
 for entry in archive:
 md5 = hashlib.md5()
 for block in entry.get_blocks():
 md5.update(block)
 print(str(entry), md5.hexdigest())

Native zipfile: no dependencies, zip only

import zipfile

archive = zipfile.ZipFile(fname)
blocksize = 1024**2 #1M chunks
for fname in archive.namelist():
 entry = archive.open(fname)
 md5 = hashlib.md5()
 while True:
 block = entry.read(blocksize)
 if not block:
 break
 md5.update(block)
 print(fname, md5.hexdigest())

edited 6 hours ago

Steve Barnes

20.8k43852

answered May 22 '17 at 4:04

Marat

4,02811931

edited 6 hours ago

Steve Barnes

20.8k43852

edited 6 hours ago

Steve Barnes

20.8k43852

edited 6 hours ago

Steve Barnes

20.8k43852

answered May 22 '17 at 4:04

Marat

4,02811931

answered May 22 '17 at 4:04

Marat

4,02811931

answered May 22 '17 at 4:04

Marat

4,02811931

thanks for answering. What would be the most memory efficient way of doing that?

– paradadf
May 23 '17 at 16:17

1

@paradadf updated the answer

– Marat
May 25 '17 at 18:43

add a comment |

thanks for answering. What would be the most memory efficient way of doing that?

– paradadf
May 23 '17 at 16:17

1

@paradadf updated the answer

– Marat
May 25 '17 at 18:43

thanks for answering. What would be the most memory efficient way of doing that?

– paradadf
May 23 '17 at 16:17

@paradadf updated the answer

– Marat
May 25 '17 at 18:43

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ggtcf

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Thal And Out Agency railway station See also References External links Navigation menuOfficial Web Site of Pakistan RailwaysArchivedOfficial Web Site of Pakistan Railwayseeexpanding ite

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Thal And Out Agency railway station See also References External links Navigation menuOfficial Web Site of Pakistan RailwaysArchivedOfficial Web Site of Pakistan Railwayseeexpanding ite

1 Answer
1

1 Answer
1

1 Answer
1