Calculate CRC32, MD5 and SHA1 of zip content without decompression in Python2019 Community Moderator ElectionSHA1 vs md5 vs SHA256: which to use for a PHP login?Is calculating an MD5 hash less CPU intensive than SHA family functions?MD5 and SHA1 C++ hashing libraryCRC32+Size vs MD5/SHA1md5 hash or crc32 which one to use in this caseCalculate MD5 checksum for a filesha1, crc32, and md5 how to read this data?split a FileStream for multiple consumers in C#Hashing passwords with MD5, SHA1 and MD5 over SHA1Calculate MD5 and SHA1 simultaneously on large file

If there are any 3nion, 5nion, 7nion, 9nion, 10nion, etc.

Why won't the strings command stop?

Make me a metasequence

Is there a limit on the maximum number of future jobs queued in an org?

Why are special aircraft used for the carriers in the United States Navy?

Where is the fallacy here?

What is better: yes / no radio, or simple checkbox?

How can I be pwned if I'm not registered on the compromised site?

When do _WA_Sys_ statistics Get Updated?

For the Kanji 校 is the fifth stroke connected to the sixth stroke?

Why did the Cray-1 have 8 parity bits per word?

How to substitute values from a list into a function?

I can't die. Who am I?

Does "legal poaching" exist?

Create chunks from an array

Four buttons on a table

Should I use HTTPS on a domain that will only be used for redirection?

"seeing as you don't know anyone but me" meaning in this context

Fake utcnow for the pytest

If nine coins are tossed, what is the probability that the number of heads is even?

Giving a talk in my old university, how prominently should I tell students my salary?

Movie: Scientists travel to the future to avoid nuclear war, last surviving one is used as fuel by future humans

Why is it "take a leak?"

Should we avoid writing fiction about historical events without extensive research?



Calculate CRC32, MD5 and SHA1 of zip content without decompression in Python



2019 Community Moderator ElectionSHA1 vs md5 vs SHA256: which to use for a PHP login?Is calculating an MD5 hash less CPU intensive than SHA family functions?MD5 and SHA1 C++ hashing libraryCRC32+Size vs MD5/SHA1md5 hash or crc32 which one to use in this caseCalculate MD5 checksum for a filesha1, crc32, and md5 how to read this data?split a FileStream for multiple consumers in C#Hashing passwords with MD5, SHA1 and MD5 over SHA1Calculate MD5 and SHA1 simultaneously on large file










5















I need to calculate the CRC32, MD5 and SHA1 of the content of zip files without decompressing them.



So far I found out how to calculate these for the zip files itself, e.g.:



CRC32:



import zlib


zip_name = "test.zip"


def Crc32Hasher(file_path):

buf_size = 65536
crc32 = 0

with open(file_path, 'rb') as f:
while True:
data = f.read(buf_size)
if not data:
break
crc32 = zlib.crc32(data, crc32)

return format(crc32 & 0xFFFFFFFF, '08x')


print(Crc32Hasher(zip_name))


SHA1: (MD5 similarly)



import hashlib


zip_name = "test.zip"


def Sha1Hasher(file_path):

buf_size = 65536
sha1 = hashlib.sha1()

with open(file_path, 'rb') as f:
while True:
data = f.read(buf_size)
if not data:
break
sha1.update(data)

return format(sha1.hexdigest())


print(Sha1Hasher(zip_name))


For the content of the zip file, I can read the CRC32 from the zip directly without the need of calculating it as follow:



Read CRC32 of zip content:



import zipfile

zip_name = "test.zip"

if zip_name.lower().endswith(('.zip')):
z = zipfile.ZipFile(zip_name, "r")

for info in z.infolist():

print(info.filename,
format(info.CRC & 0xFFFFFFFF, '08x'))


But I couldn't figure out how to calculate the SHA1 (or MD5) of the content of zip files without decompressing them first.
Is that somehow possible?










share|improve this question


























    5















    I need to calculate the CRC32, MD5 and SHA1 of the content of zip files without decompressing them.



    So far I found out how to calculate these for the zip files itself, e.g.:



    CRC32:



    import zlib


    zip_name = "test.zip"


    def Crc32Hasher(file_path):

    buf_size = 65536
    crc32 = 0

    with open(file_path, 'rb') as f:
    while True:
    data = f.read(buf_size)
    if not data:
    break
    crc32 = zlib.crc32(data, crc32)

    return format(crc32 & 0xFFFFFFFF, '08x')


    print(Crc32Hasher(zip_name))


    SHA1: (MD5 similarly)



    import hashlib


    zip_name = "test.zip"


    def Sha1Hasher(file_path):

    buf_size = 65536
    sha1 = hashlib.sha1()

    with open(file_path, 'rb') as f:
    while True:
    data = f.read(buf_size)
    if not data:
    break
    sha1.update(data)

    return format(sha1.hexdigest())


    print(Sha1Hasher(zip_name))


    For the content of the zip file, I can read the CRC32 from the zip directly without the need of calculating it as follow:



    Read CRC32 of zip content:



    import zipfile

    zip_name = "test.zip"

    if zip_name.lower().endswith(('.zip')):
    z = zipfile.ZipFile(zip_name, "r")

    for info in z.infolist():

    print(info.filename,
    format(info.CRC & 0xFFFFFFFF, '08x'))


    But I couldn't figure out how to calculate the SHA1 (or MD5) of the content of zip files without decompressing them first.
    Is that somehow possible?










    share|improve this question
























      5












      5








      5








      I need to calculate the CRC32, MD5 and SHA1 of the content of zip files without decompressing them.



      So far I found out how to calculate these for the zip files itself, e.g.:



      CRC32:



      import zlib


      zip_name = "test.zip"


      def Crc32Hasher(file_path):

      buf_size = 65536
      crc32 = 0

      with open(file_path, 'rb') as f:
      while True:
      data = f.read(buf_size)
      if not data:
      break
      crc32 = zlib.crc32(data, crc32)

      return format(crc32 & 0xFFFFFFFF, '08x')


      print(Crc32Hasher(zip_name))


      SHA1: (MD5 similarly)



      import hashlib


      zip_name = "test.zip"


      def Sha1Hasher(file_path):

      buf_size = 65536
      sha1 = hashlib.sha1()

      with open(file_path, 'rb') as f:
      while True:
      data = f.read(buf_size)
      if not data:
      break
      sha1.update(data)

      return format(sha1.hexdigest())


      print(Sha1Hasher(zip_name))


      For the content of the zip file, I can read the CRC32 from the zip directly without the need of calculating it as follow:



      Read CRC32 of zip content:



      import zipfile

      zip_name = "test.zip"

      if zip_name.lower().endswith(('.zip')):
      z = zipfile.ZipFile(zip_name, "r")

      for info in z.infolist():

      print(info.filename,
      format(info.CRC & 0xFFFFFFFF, '08x'))


      But I couldn't figure out how to calculate the SHA1 (or MD5) of the content of zip files without decompressing them first.
      Is that somehow possible?










      share|improve this question














      I need to calculate the CRC32, MD5 and SHA1 of the content of zip files without decompressing them.



      So far I found out how to calculate these for the zip files itself, e.g.:



      CRC32:



      import zlib


      zip_name = "test.zip"


      def Crc32Hasher(file_path):

      buf_size = 65536
      crc32 = 0

      with open(file_path, 'rb') as f:
      while True:
      data = f.read(buf_size)
      if not data:
      break
      crc32 = zlib.crc32(data, crc32)

      return format(crc32 & 0xFFFFFFFF, '08x')


      print(Crc32Hasher(zip_name))


      SHA1: (MD5 similarly)



      import hashlib


      zip_name = "test.zip"


      def Sha1Hasher(file_path):

      buf_size = 65536
      sha1 = hashlib.sha1()

      with open(file_path, 'rb') as f:
      while True:
      data = f.read(buf_size)
      if not data:
      break
      sha1.update(data)

      return format(sha1.hexdigest())


      print(Sha1Hasher(zip_name))


      For the content of the zip file, I can read the CRC32 from the zip directly without the need of calculating it as follow:



      Read CRC32 of zip content:



      import zipfile

      zip_name = "test.zip"

      if zip_name.lower().endswith(('.zip')):
      z = zipfile.ZipFile(zip_name, "r")

      for info in z.infolist():

      print(info.filename,
      format(info.CRC & 0xFFFFFFFF, '08x'))


      But I couldn't figure out how to calculate the SHA1 (or MD5) of the content of zip files without decompressing them first.
      Is that somehow possible?







      python hash md5 sha1 crc32






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked May 22 '17 at 3:55









      paradadfparadadf

      486




      486






















          1 Answer
          1






          active

          oldest

          votes


















          5














          It is not possible. You can get CRC because it was carefully precalculated for you when archive is created (it is used for integrity check). Any other checksum/hash has to be calculated from scratch and will require at least streaming of the archive content, i.e. unpacking.



          UPD: Possibble implementations



          libarchive: extra dependencies, supports many archive formats



          import libarchive.public as libarchive
          with libarchive.file_reader(fname) as archive:
          for entry in archive:
          md5 = hashlib.md5()
          for block in entry.get_blocks():
          md5.update(block)
          print(str(entry), md5.hexdigest())


          Native zipfile: no dependencies, zip only



          import zipfile

          archive = zipfile.ZipFile(fname)
          blocksize = 1024**2 #1M chunks
          for fname in archive.namelist():
          entry = archive.open(fname)
          md5 = hashlib.md5()
          while True:
          block = entry.read(blocksize)
          if not block:
          break
          md5.update(block)
          print(fname, md5.hexdigest())





          share|improve this answer

























          • thanks for answering. What would be the most memory efficient way of doing that?

            – paradadf
            May 23 '17 at 16:17






          • 1





            @paradadf updated the answer

            – Marat
            May 25 '17 at 18:43










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44104426%2fcalculate-crc32-md5-and-sha1-of-zip-content-without-decompression-in-python%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          5














          It is not possible. You can get CRC because it was carefully precalculated for you when archive is created (it is used for integrity check). Any other checksum/hash has to be calculated from scratch and will require at least streaming of the archive content, i.e. unpacking.



          UPD: Possibble implementations



          libarchive: extra dependencies, supports many archive formats



          import libarchive.public as libarchive
          with libarchive.file_reader(fname) as archive:
          for entry in archive:
          md5 = hashlib.md5()
          for block in entry.get_blocks():
          md5.update(block)
          print(str(entry), md5.hexdigest())


          Native zipfile: no dependencies, zip only



          import zipfile

          archive = zipfile.ZipFile(fname)
          blocksize = 1024**2 #1M chunks
          for fname in archive.namelist():
          entry = archive.open(fname)
          md5 = hashlib.md5()
          while True:
          block = entry.read(blocksize)
          if not block:
          break
          md5.update(block)
          print(fname, md5.hexdigest())





          share|improve this answer

























          • thanks for answering. What would be the most memory efficient way of doing that?

            – paradadf
            May 23 '17 at 16:17






          • 1





            @paradadf updated the answer

            – Marat
            May 25 '17 at 18:43















          5














          It is not possible. You can get CRC because it was carefully precalculated for you when archive is created (it is used for integrity check). Any other checksum/hash has to be calculated from scratch and will require at least streaming of the archive content, i.e. unpacking.



          UPD: Possibble implementations



          libarchive: extra dependencies, supports many archive formats



          import libarchive.public as libarchive
          with libarchive.file_reader(fname) as archive:
          for entry in archive:
          md5 = hashlib.md5()
          for block in entry.get_blocks():
          md5.update(block)
          print(str(entry), md5.hexdigest())


          Native zipfile: no dependencies, zip only



          import zipfile

          archive = zipfile.ZipFile(fname)
          blocksize = 1024**2 #1M chunks
          for fname in archive.namelist():
          entry = archive.open(fname)
          md5 = hashlib.md5()
          while True:
          block = entry.read(blocksize)
          if not block:
          break
          md5.update(block)
          print(fname, md5.hexdigest())





          share|improve this answer

























          • thanks for answering. What would be the most memory efficient way of doing that?

            – paradadf
            May 23 '17 at 16:17






          • 1





            @paradadf updated the answer

            – Marat
            May 25 '17 at 18:43













          5












          5








          5







          It is not possible. You can get CRC because it was carefully precalculated for you when archive is created (it is used for integrity check). Any other checksum/hash has to be calculated from scratch and will require at least streaming of the archive content, i.e. unpacking.



          UPD: Possibble implementations



          libarchive: extra dependencies, supports many archive formats



          import libarchive.public as libarchive
          with libarchive.file_reader(fname) as archive:
          for entry in archive:
          md5 = hashlib.md5()
          for block in entry.get_blocks():
          md5.update(block)
          print(str(entry), md5.hexdigest())


          Native zipfile: no dependencies, zip only



          import zipfile

          archive = zipfile.ZipFile(fname)
          blocksize = 1024**2 #1M chunks
          for fname in archive.namelist():
          entry = archive.open(fname)
          md5 = hashlib.md5()
          while True:
          block = entry.read(blocksize)
          if not block:
          break
          md5.update(block)
          print(fname, md5.hexdigest())





          share|improve this answer















          It is not possible. You can get CRC because it was carefully precalculated for you when archive is created (it is used for integrity check). Any other checksum/hash has to be calculated from scratch and will require at least streaming of the archive content, i.e. unpacking.



          UPD: Possibble implementations



          libarchive: extra dependencies, supports many archive formats



          import libarchive.public as libarchive
          with libarchive.file_reader(fname) as archive:
          for entry in archive:
          md5 = hashlib.md5()
          for block in entry.get_blocks():
          md5.update(block)
          print(str(entry), md5.hexdigest())


          Native zipfile: no dependencies, zip only



          import zipfile

          archive = zipfile.ZipFile(fname)
          blocksize = 1024**2 #1M chunks
          for fname in archive.namelist():
          entry = archive.open(fname)
          md5 = hashlib.md5()
          while True:
          block = entry.read(blocksize)
          if not block:
          break
          md5.update(block)
          print(fname, md5.hexdigest())






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 6 hours ago









          Steve Barnes

          20.8k43852




          20.8k43852










          answered May 22 '17 at 4:04









          MaratMarat

          4,02811931




          4,02811931












          • thanks for answering. What would be the most memory efficient way of doing that?

            – paradadf
            May 23 '17 at 16:17






          • 1





            @paradadf updated the answer

            – Marat
            May 25 '17 at 18:43

















          • thanks for answering. What would be the most memory efficient way of doing that?

            – paradadf
            May 23 '17 at 16:17






          • 1





            @paradadf updated the answer

            – Marat
            May 25 '17 at 18:43
















          thanks for answering. What would be the most memory efficient way of doing that?

          – paradadf
          May 23 '17 at 16:17





          thanks for answering. What would be the most memory efficient way of doing that?

          – paradadf
          May 23 '17 at 16:17




          1




          1





          @paradadf updated the answer

          – Marat
          May 25 '17 at 18:43





          @paradadf updated the answer

          – Marat
          May 25 '17 at 18:43



















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44104426%2fcalculate-crc32-md5-and-sha1-of-zip-content-without-decompression-in-python%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to get text form Clipboard with JavaScript in Firefox 56?How to validate an email address in JavaScript?How do JavaScript closures work?How do I remove a property from a JavaScript object?How do you get a timestamp in JavaScript?How do I copy to the clipboard in JavaScript?How do I include a JavaScript file in another JavaScript file?Get the current URL with JavaScript?How to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I remove a particular element from an array in JavaScript?

          Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

          List of MPs elected to the English parliament in 1640 (April) Contents List of constituencies and members See also Notes References Navigation menueNational Archives – The Glynde Place ArchivesCobbett's Parliamentary history of England, from the Norman Conquest in 1066 to the year 1803'Aldermen in Parliament', The Aldermen of the City of London: Temp. Henry III – 1912onepage&q&f&#61, false 229