Inspired by how rsync works, I sliced and diced a bunch of music files to see if they have hash commonality.

sample 1 stats (mp3 '2001'):

  • 1.875 GB of files
  • 29937374 hashes taken (every 64 bytes)
  • unique hashes: 29863214
  • matches: 74160 (.2%)

sample 2 stats (mp3 'classical'):

  • 107 MB of files
  • 1702615 hashes taken (every 64 bytes)
  • unique hashes: 1699769
  • matches: 2846 (.2%)

sample 3 stats (jpg & mpeg ):

  • 148 MB of files
  • 2358603 hashes taken (every 64 bytes)
  • unique hashes: 2345314
  • matches: 13289 (.5%)

sample 4 stats (just jpg):

  • MB of files
  • 172160 hashes taken (every 64 bytes)
  • unique hashes: 171970
  • matches: 190 (.1%)

-- MattWalsh - 07 Dec 2005

Topic revision: r1 - 07 Dec 2005 - 21:13:30 - MattWalsh
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback