[FRIAM] Whisper, a speech-to-text prrogram based on GPT-3

Barry MacKichan barry.mackichan at mackichan.com
Fri Feb 10 14:10:59 EST 2023


I downloaded Whisper and tried it out. I fed it a 20-minute screencast I 
did about 10 years ago. There are about four levels you can choose which 
trade accuracy for speed. I tried the recommended level, down one from 
the highest. After a substantial wait, I got the results.

First, the input did not contain any hints about punctuation and 
sentences, such as what you have to use with most other text-to-speech 
programs. It decided where to end sentences and place commas and put 
them where I thought they should be.

I was quite surprised that it correctly understood “LaTeX”. Also I 
was surprised when it understood “Skim” (a pdf reader on the Mac) as 
“SCIM”. This is the evidence of nerd hands in the selection of the 
training set. But it did not correctly get “pdfTeX”; it came out as 
“PDF Tech”. Unsurprisingly, it did not capitalize the names of our 
products that I mentioned in the screen cast, such as “Scientific 
WorkPlace”.

It allows as an input option a string of text that is consumed by the 
program before it attempts to transcribe the input file. For the next 
run, I used the option

	--initial_prompt "pdfTeX, LaTeX, skim, Scientific WorkPlace, Scientific 
Word"
and the result was flawless but for one error: it still failed to 
recognize “pdfTeX”.

It appears that it does not try to break the text into paragraphs, but I 
may not have given it enough text to test that.

They claim to support 99 languages and to translate text from one of 
these languages to English.

I ran a shorter test of it after disconnecting from all networks, and it 
succeeded. This seems to say that the model data is on my computer.

—Barry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://redfish.com/pipermail/friam_redfish.com/attachments/20230210/9142d244/attachment.html>


More information about the Friam mailing list