• Isadora
  • Get it
  • Forum
  • Help
  • ADD-ONS
  • Newsletter
  • Impressum
  • Dsgvo
  • Impressum
Forum
    • Categories
    • Recent
    • Popular
    • Tags
    • Register
    • Login

    Speech to Text in Isadora!

    Showcase
    osc python
    1
    1
    646
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • liminal_andyL
      liminal_andy
      last edited by liminal_andy

      A user over on the Facebook Group posted a video of actors sitting on a couch in front of a screen, with a projection of their words seemingly falling from the sky as they were spoken. As many of the comments identified, an easy way to get this effect accomplished in Isadora is to prebake the asset or even draw the words with Text Draw individually because, in scripted performance, the language (theoretically) is always the same. However, it got my wheels turning: is there a way to do this live, with improvised language? Turns out the answer is: yes!

      There is a well-known speech recognition library for python, and I've become comfortable programming OSC into python scripts over the years, so I figured there would be a way to combine them to make it happen. Turns out, the folks over at Programming for People beat me to it, as they put out a YouTube tutorial of programming the functions a few years ago and sell a package of the source code and a Max interface for a few dollars on their website. I purchased the source code and began to tinker as it needed some updates to be compatible with current python versions, etc. I received a friendly email from the developer and after a brief exchange, we decided that I can redistribute my fork of their work free of charge, though I would encourage those of you who may end up using this to shoot a few dollars over their way in good will. 

      As a result, here is OSCTranscribe! For those of you who want to tinker with the source code, you can follow that link to the git repository where I have the program hosted. Otherwise, you can grab the release for Windows 10 here.

      When you run the program, it will print a list of all available audio devices it can bind to, and you just select the one you want to use. Then, you select the OSC network settings (defaults will be compatible with Isadora). Next, you use Isadora to send the following OSC commands via whatever triggers you want:

      OSC API for Controlling OSCTranscribe:

      /OSCTranscribe/calibrate {int thresh}: Setup, establishes the amount of quite space before processing

      /OSCTranscribe/startListening: Setup, begins listening to the mic and sending to the learning system

      /OSCTranscribe/stopListening: Stops listening to the mic and stops sending to the learning system

      By default, you send these commands to port 7070. Then, you should listen for the following OSC message:

      /OSCTranscribe/data {string text}: The words recognized from the audio stream in between pauses

      Now, Isadora's OSC Multi Listener, set to Text mode, will deliver the spoken words as Text for you to pass to Text draws, formatters, Javascript, etc.

      Here's a quick demo:

      The recognition from the speech API is pretty good! Visually, a lot more tinkering could be done within Isadora to make this look more pleasing, but I will leave that to all of you to try out. Let me know what you think 🙂

      Andy Carluccio
      Zoom Video Communications, Inc.
      www.liminalet.com

      [R9 3900X, RTX 2080, 64GB DDR4 3600, Win 10, Izzy 3.0.8]
      [...also a bunch of hackintoshes...]

      1 Reply Last reply Reply Quote 5
      • First post
        Last post