[Voiceglue] Voiceglue Digest, Vol 18, Issue 2
PodGlo Info
info at podglo.com
Thu Jan 15 19:09:07 EST 2009
Why not use md5 or sha1? md5 has been compromised for a while now so it is not used for security reasons any more. Anyway, not by anyone who knows any better. Does't voiceglue use these audio files effectively as cached files so it doesn't have to recreate them every time? So it would only have a negative impact if you created new content each time. Even then the tts would have a much larger impact on performance. md5 is used in web apps all of the time and scales nicely. So my suggestion is just use md5 to avoid collisions and be done with it.
doug
-----Original Message-----
From: voiceglue-request at voiceglue.org
Sent: Thursday, January 15, 2009 12:00 PM
To: voiceglue at voiceglue.org
Subject: Voiceglue Digest, Vol 18, Issue 2
Send Voiceglue mailing list submissions to
voiceglue at voiceglue.org
To subscribe or unsubscribe via the World Wide Web, visit
http://www.voiceglue.org/mailman/listinfo/voiceglue
or, via email, send a message with subject or body 'help' to
voiceglue-request at voiceglue.org
You can reach the person managing the list at
voiceglue-owner at voiceglue.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Voiceglue digest..."
Today's Topics:
1. Re: Hashing filenames (Doug Campbell)
2. Re: Hashing filenames (Archie Cobbs)
3. Re: Hashing filenames (Doug Campbell)
4. Using text2wave from Festival as tts (Carlos Alarc?n)
5. Re: Using text2wave from Festival as tts (emiliano)
6. Re: Hashing filenames (emiliano esposito)
7. Re: Using text2wave from Festival as tts (Carlos Alarc?n)
----------------------------------------------------------------------
Message: 1
Date: Wed, 14 Jan 2009 16:28:30 -0500
From: Doug Campbell <voiceglue at campbellcastle.com>
Subject: Re: [Voiceglue] Hashing filenames
To: General discussion about voiceglue <voiceglue at voiceglue.org>
Message-ID: <496E58FE.1070803 at campbellcastle.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Andrew Cumming ha scritto:
>> I found that some long text (and some not so long) would create very
>> long filenames. I think this also caused some problems when the
>> filenames got exceedingly long
This is fixed in the 0.9 release, which will be coming out
"any day now". I used a simpler hash than Digest::SHA1, but
your suggestion of Digest::SHA1 will make me go back and
consider using that instead.
emiliano esposito wrote:
> I was working on a similar thing, though I tried with MD5. The problem
> comes into existence with collisions. You should have two hashes to
> disambiguate when two texts produce the same SHA-1 digest.
Voiceglue almost certainly doesn't need the cryptographically sound
MD5 solution with its associated CPU cost.
The issue of collisions is a valid one, though. My guess is that
collisions will probably never happen, but I'm still pondering a
multi-entry-bucket structure just in case they do.
Doug Campbell
------------------------------
Message: 2
Date: Wed, 14 Jan 2009 17:03:06 -0600
From: Archie Cobbs <archie.cobbs at gmail.com>
Subject: Re: [Voiceglue] Hashing filenames
To: General discussion about voiceglue <voiceglue at voiceglue.org>
Message-ID:
<3bc8237c0901141503n5bcd0887ua95f666dfd54e526 at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
On Wed, Jan 14, 2009 at 3:28 PM, Doug Campbell <voiceglue at campbellcastle.com
> wrote:
> Voiceglue almost certainly doesn't need the cryptographically sound
> MD5 solution with its associated CPU cost.
>
> The issue of collisions is a valid one, though. My guess is that
> collisions will probably never happen, but I'm still pondering a
> multi-entry-bucket structure just in case they do.
>
With MD5 or SHA1 you will never see accidental collisions.
Collisions do exist but have to be specially engineered, take 2^69 hash
computations, etc.
http://www.cryptography.com/cnews/hash.html
-Archie
--
Archie L. Cobbs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.voiceglue.org/pipermail/voiceglue/attachments/20090114/8d655567/attachment-0001.html
------------------------------
Message: 3
Date: Wed, 14 Jan 2009 21:04:50 -0500
From: Doug Campbell <voiceglue at campbellcastle.com>
Subject: Re: [Voiceglue] Hashing filenames
To: General discussion about voiceglue <voiceglue at voiceglue.org>
Message-ID: <496E99C2.5050601 at campbellcastle.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> > The issue of collisions is a valid one, though. My guess is that
> > collisions will probably never happen, but I'm still pondering a
> > multi-entry-bucket structure just in case they do.
>
> With MD5 or SHA1 you will never see accidental collisions.
With reasonable hash length that is true, but the main point
of this exercise is to reduce file name lengths. I'm thinking
about, say, a 32-character hash, which would have a very small
possibility of collision.
Doug Campbell
------------------------------
Message: 4
Date: Thu, 15 Jan 2009 10:34:56 +0100
From: Carlos Alarc?n <carlos.alarcon at tyven.com>
Subject: [Voiceglue] Using text2wave from Festival as tts
To: voiceglue at voiceglue.org
Message-ID: <496F0340.8010601 at tyven.com>
Content-Type: text/plain; charset="iso-8859-1"
Hi,
I has just joint into voiceglue users recently, and I was needed it to
work with text2wave from Festival TTS, so I did some modifications to
/usr/bin/voiceglue_tts_gen, it is my final version:
#!/usr/bin/perl -- -*-CPerl-*-
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)=localtime(time);
$file = $::ARGV[$#ARGV];
open LOGGER, ">>", "/var/log/voiceglue/tts.log";
print LOGGER $mday, "-", $mon+1, "-", $year+1900, " - ", $hour, ":",
$min, ":", $sec, "-------> ", @::ARGV;
print LOGGER "\n", $mday, "-", $mon+1, "-", $year+1900, " - ",
$hour, ":", $min, ":", $sec, "-------> /usr/bin/text2wave ","-F 8000
-o ", $file ," /tmp/tts.txt\n" ;
close LOGGER;
open SALIDA, ">", "/tmp/tts.txt";
print SALIDA $::ARGV[1];
close SALIDA;
#print ("/usr/bin/text2wave","-F 8000 -otype ulaw -o", $file
,"/tmp/tts.txt");
system ("/usr/bin/text2wave","-F 8000 -o", $file ,"/tmp/tts.txt ");
My knowledge of perl is almost none, so I guess there will be lots of
different and better ways of doing it. I didn't know if there was any
previous way, I didn't find it so far.
Just wanted to keep it avaliable in case it could be usefull to the
other people.
Regards.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: carlos_alarcon.vcf
Type: text/x-vcard
Size: 267 bytes
Desc: not available
Url : http://www.voiceglue.org/pipermail/voiceglue/attachments/20090115/fad338e9/attachment-0001.vcf
------------------------------
Message: 5
Date: Thu, 15 Jan 2009 11:02:10 +0100
From: emiliano <emiespo at tiscali.it>
Subject: Re: [Voiceglue] Using text2wave from Festival as tts
To: General discussion about voiceglue <voiceglue at voiceglue.org>
Message-ID: <496F09A2.1010604 at tiscali.it>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Carlos Alarc?n ha scritto:
> open SALIDA, ">", "/tmp/tts.txt";
> print SALIDA $::ARGV[1];
> close SALIDA;
> #print ("/usr/bin/text2wave","-F 8000 -otype ulaw -o", $file
> ,"/tmp/tts.txt");
> system ("/usr/bin/text2wave","-F 8000 -o", $file ,"/tmp/tts.txt ");
>
> My knowledge of perl is almost none, so I guess there will be lots of
> different and better ways of doing it. I didn't know if there was any
> previous way, I didn't find it so far.
> Just wanted to keep it avaliable in case it could be usefull to the
> other people.
I had a similar problem (I also had to deal with non standard ASCII
characters), but using just one file (ie: /tmp/tts.txt) for the output
stopped working as soon as multiple prompts got rendered, this happened
since voiceglue calls the tts_gen script in an asynchronous way.
I was getting prompts overwritten, I had to use something like
$file.".txt" for the text file.
------------------------------
Message: 6
Date: Thu, 15 Jan 2009 11:05:09 +0100
From: emiliano esposito <emiespo at tiscali.it>
Subject: Re: [Voiceglue] Hashing filenames
To: General discussion about voiceglue <voiceglue at voiceglue.org>
Message-ID: <496F0A55.90506 at tiscali.it>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Doug Campbell ha scritto:
>>> The issue of collisions is a valid one, though. My guess is that
>>> collisions will probably never happen, but I'm still pondering a
>>> multi-entry-bucket structure just in case they do.
>>>
>> With MD5 or SHA1 you will never see accidental collisions.
>>
>
> With reasonable hash length that is true, but the main point
> of this exercise is to reduce file name lengths. I'm thinking
> about, say, a 32-character hash, which would have a very small
> possibility of collision.
>
Thanks for your observations.
Where in the code does VG check if the file has already been rendered
(thus skipping a new tts command)? I can't figure it out.
------------------------------
Message: 7
Date: Thu, 15 Jan 2009 11:29:26 +0100
From: Carlos Alarc?n <carlos.alarcon at tyven.com>
Subject: Re: [Voiceglue] Using text2wave from Festival as tts
To: General discussion about voiceglue <voiceglue at voiceglue.org>
Message-ID: <496F1006.5040903 at tyven.com>
Content-Type: text/plain; charset="iso-8859-1"
Thanks for the suggestion, I didn't had that problem yet since my system
is only a test case to check if an existent vxml application worked with
voiceglue so my testing haven't been much at the moment and I didn't
stressed the application at all (just some single calls to check the
application flow).
I will do something similar to what you proposed.
Thanks.
emiliano escribi?:
> Carlos Alarc?n ha scritto:
>
>> open SALIDA, ">", "/tmp/tts.txt";
>> print SALIDA $::ARGV[1];
>> close SALIDA;
>> #print ("/usr/bin/text2wave","-F 8000 -otype ulaw -o", $file
>> ,"/tmp/tts.txt");
>> system ("/usr/bin/text2wave","-F 8000 -o", $file ,"/tmp/tts.txt ");
>>
>> My knowledge of perl is almost none, so I guess there will be lots of
>> different and better ways of doing it. I didn't know if there was any
>> previous way, I didn't find it so far.
>> Just wanted to keep it avaliable in case it could be usefull to the
>> other people.
>>
>
>
> I had a similar problem (I also had to deal with non standard ASCII
> characters), but using just one file (ie: /tmp/tts.txt) for the output
> stopped working as soon as multiple prompts got rendered, this happened
> since voiceglue calls the tts_gen script in an asynchronous way.
>
> I was getting prompts overwritten, I had to use something like
> $file.".txt" for the text file.
> _______________________________________________
> Voiceglue mailing list
> Voiceglue at voiceglue.org
> http://www.voiceglue.org/mailman/listinfo/voiceglue
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: carlos_alarcon.vcf
Type: text/x-vcard
Size: 267 bytes
Desc: not available
Url : http://www.voiceglue.org/pipermail/voiceglue/attachments/20090115/4804cea0/attachment-0001.vcf
------------------------------
_______________________________________________
Voiceglue mailing list
Voiceglue at voiceglue.org
http://www.voiceglue.org/mailman/listinfo/voiceglue
End of Voiceglue Digest, Vol 18, Issue 2
****************************************
More information about the Voiceglue
mailing list