Turning text into voice: Freedom for your content!

Turning text into voice: Freedom for your content!

github.com/W01fw00d/text-to-voice

  • Post header illustration by @gelabert.art πŸ™‡πŸ»β€β™‚οΈ

Some context

I used to manage and write in a collaborative narrative writing group βœ’ (mostly in Spanish): I even did a 5 min talk about it! At some point, we started recording in audio πŸŽ™ whenever we met to read together a finalized book. Then, I decided to do some editing and create a podcast using those audios.

That was cool enough 😎 but… what about the first books we wrote, before having a good mic to record them? Did I really want to use my eyes to be able to enjoy them again...? After an 8-hour session of computer work πŸ–₯?

What I needed

  • A task that could read through every chapter of a story and generate an audio file with a voice narrating it πŸ”ˆ
  • Dynamically add the chapter number at the beginning πŸ”’
  • Dynamically change the narrator voice when reading the chapter title, on dialogues (I wanted to have at least two voices for dialogues in order to indicate two different speakers πŸ‘©πŸ»β€πŸ€β€πŸ‘©πŸΌ)
  • Dynamically add the opening and ending songs 🎼 assigned to that book in every one of its chapters
  • Support both Spanish and English 🀝🏻
  • Output in .mp3 so I just have to upload it to my podcast platform of choice: ivoox (free πŸ’Έ)

How it went

  • Issue 🈲: The library node-gtts is cool and useful, but it didn’t allow me to change voices using markdown or anything fancy like that 🌟, and it didn’t offer more than 2 different Spanish voices
  • Solution πŸ‘ŒπŸ½: I developed my own custom loop logic and called the library for each paragraph, indicating which voice to use in each one. I had to use Portuguese and Italian, which are phonetically quite similar to Spanish, in order to have more available voices πŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦

Lessons learned

  • Parsing narrative text can be expensive to develop πŸ•™: every writer uses different conventions for identifying the dialogues, scene changes… you need to design a system quite flexible and "open" because you’ll need to change it whenever you find an unexpected chapter format! I even ended up using some regex 😡...

  • It’s a good idea to invest some time investigating a library before developing your solution. And sometimes, even if a library seems to be the perfect fit for your need, it’s better to use a library with more powerful features or just write the needed code yourself πŸ‘¨πŸΏβ€πŸ’»

  • Audio edition is effort-expensive for someone like me who doesn’t really know about sampleRates and similar audio file attributes 🎧. For this kind of feature, it's better to delegate to some program or library

The fruits of hard work

You can access the code in my github and check the current status of the project in the issues section πŸ€—!

You can even check an example of the final output! It’s an specific chapter that I wrote in English 🀠

Is there a future for this project?

While I finish transforming all the old books πŸ“š, I’ll extract some features (dialogue interpretation, adding opening and closure songs...) for more general use (something like a podcast-chapter-generator).

I would like to fix the burden of having to upload manually every chapter to ivoox (they don't offer an API, so I'm thinking of using Cypress or other similar automated tools πŸ€– in order to, as always, make my life a bit easier...)

Β