Inspired by how
rsync works, I sliced and diced a bunch of music files to see if they have hash commonality.
sample 1 stats (mp3 '2001'):
- 1.875 GB of files
- 29937374 hashes taken (every 64 bytes)
- unique hashes: 29863214
- matches: 74160 (.2%)
sample 2 stats (mp3 'classical'):
- 107 MB of files
- 1702615 hashes taken (every 64 bytes)
- unique hashes: 1699769
- matches: 2846 (.2%)
sample 3 stats (jpg & mpeg ):
- 148 MB of files
- 2358603 hashes taken (every 64 bytes)
- unique hashes: 2345314
- matches: 13289 (.5%)
sample 4 stats (just jpg):
- MB of files
- 172160 hashes taken (every 64 bytes)
- unique hashes: 171970
- matches: 190 (.1%)
--
MattWalsh - 07 Dec 2005
Topic revision: r1 - 07 Dec 2005 - 21:13:30 -
MattWalsh