Kicking the tires on Adobe CS4 speech transcription

Adobe’s CS4 Production Premium suite of applications was packed full of big new features. There’s mocha for After Effects, native RED .R3D editing in Premiere Pro and After Effects, XML import of Final Cut Pro projects, a unified interface among the applications, better dynamic linking and a lot more. And there is also the ability to automatically transcribe video and audio clips. For editors doing documentary work or a lot of talking heads then this could be a killer feature. Automated transcription almost seems to good to be true … and you know what they say about something that is too good to be true…

The transcribe feature is available in both Premiere Pro CS4 and Soundbooth CS4. In both Soundbooth and Premiere Pro the transcription function is part of the metadata browser. The CS4 applications are metadata kings with more tabs and titles than anyone will probably ever need. Since transcribed text is metadata they are saved with the file itself. Transcribe a clip in Soundbooth and that transcription will show up in Premiere Pro as well. In the metadata pane is a Speech Transcript window:

Control begins here. The little yellow triangle will alert you if your source media has changed and that you need to transcribe again to update the text. When you are ready, hit the Transcribe… button. The Speech Transcription Options lets you select languages and dialects, the quality of the operation and if you have more than one speaker. Hit OK and off it goes to transcribing:

The clip I tested with was an English speaker with a Canadian accent. The clip is 1:08  and Soundbooth took 2:00 to do the transcription High quality. If you’ve ever used any other automated transcription software you know their accuracy can be way off so I figured I would start with the best quality it can produce. How did it do? See for yourself:


Adobe CS4 speech transcription test clip from Scott Simmons on Vimeo.

This is the transcription from Soundbooth:

let’s see everybody’s dollar missing channels like it was Brandon Man you know what its hands on me having Harwood who I think is the ideology is at you know what to expect and seventy but you know what is a ready response so concerns in nicely it was so much understeer than you can imagine stars in the mid one year actually the thing about it there’s a little bit in technology have ended here to keep it from me in a moron you knew it definitely sounds like a Trans-Am car this is you know guards not like the M for a new moon but it was definitely true in August I shouldn’t of late sixties trends and cars I want to get too hard on that call him now will roll in on the straightaway

Premiere Pro CS4 does transcription as well by launching Adobe Media Encoder and handling it in that separate application. This is nice as you can continue to work in PPro while it is transcribing as well as set up a batch. This is the transcription from Premiere Pro but on the medium quality setting. It is definitely faster, taking only about 30 seconds to transcribe the same piece of video:

see everybody’s dollar missing channels like it was Brandon Man you know what it’s had some nice having car with RU I thank the ideology is at and what that is expected and seventy but you know what is a ready response so concerns in nicely it was so much understeer there and then the engine starts to come in one year I actually think Jonah the technology happening here to keep it from me in a moron RU you it definitely sounds like a Trans-Am car this is you know guards not like the M3 new moon but it was definitely true in August I shouldn’t of late sixties trends and cars a lot to get too hard on that call him now will roll in on the straightaway

So that’s almost comical. Now I admit that this might not be the absolute most pristine footage to transcribe since according to the Adobe user manual “Accurate speech transcripts require good audio quality. Background noise significantly reduces accuracy. To remove such noise, use the tools and processes in Soundbooth.” This is commentary from a moving car so there is some car noise but overall it’s not too far off from what you might record on a documentary-style shoot or an interview where you didn’t have total control of the surroundings. And let’s be honest, I really wouldn’t expect any software to be able to transcribe a phrase like “bags of ya-ya juice” … but it did get “Trans-Am car” so go figure. I also tried the transcription on a controlled talking head interview and it did do better, probably 30 – 40% more accurate.

Thankfully you can go into the transcribe pane and correct the words that transcription missed. There are a number of options available when you right-click in the Speech Transcript window for inserting and merging words:

That could really take a long time for hours and hours of footage but then it might be a good task for an intern to take on! I think the best thing to say about this feature is that it now actually exists and can be built upon and improved in future versions. Obviously the clearer the speaking is, the better the speaker enunciates and the less background noise the better. Sometimes you just can’t control those things so it would be a decision among the post team if it is worth the time, money and man-hours to utilize this feature.

My first thought when I heard about this speech transcription was how great it would be to take a bunch of interviews into Soundbooth and do a transcription that I could then print out for an edit in Avid or Final Cut Pro. I was hoping that you would be able to transcribe video or audio and then be able to take that transcription to a text editor with timecode intact. While you can copy/paste the transcription into a text editor there is no associated timecode numbers that come along. Bad if you want to use these apps as a transcribing system only. But if you are staying in Premiere Pro for your edit you now have an amazing new way to navigate clips. Click a word in the speech transcript and the playhead immediately jumps to that word and you can play, mark IN and OUT points, whatever you normally do. Plus there is a search field at the top of the metadata window that allows you to search for a single word:

Searching could be vastly improved if you could search for a whole phrase instead of just a single word. I drool at the thought of how handy this could be with a very long clip that has been transcribed, corrected for the mistakes, and then printed with a paper edit built from the printout. Soundbooth actually has an Export > Speech Transcription option that exports an XML. The terribly bad Adobe CS4 help files say this is for exporting words as cue points but I wonder if some smart XML expert could do something more with the file.

Overall I would say speech transcription is off to a great start in Adobe CS4 and is one more compelling reason to add the Adobe Creative Suite 4 Production Premium to your editing toolkit. With good quality audio and a few version upgrades this might be one feature an editor will wonder how they ever lived without.