[Voiceglue] Hashing filenames

emiliano esposito emiespo at tiscali.it
Wed Jan 14 05:34:12 EST 2009


Andrew Cumming ha scritto:
>
> I found that some long text (and some not so long) would create very 
> long filenames. I think this also caused some problems when the 
> filenames got exceedingly long
>

I was working on a similar thing, though I tried with MD5. The problem 
comes into existence with collisions. You should have two hashes to 
disambiguate when two texts produce the same SHA-1 digest.

Since the quickest way to do this was to have a counter and append it to 
the name, I discarded the MD5 part and simply used a single hash (tied 
on the disk for memory  usage)  with text as key and a counter as data 
(and filename). This way there won't be duplicates (ie collisions) and 
the hashing logic is done by perl itself. The only limit is the size of 
the counter, but I think that with "just" 32 bit you can render over 4 
billion of texts... I think it's more than enough :-)

I hope I was clear...


More information about the Voiceglue mailing list