Reply to post

Hot!Working with quoted text and RegEx

Author
DJugs
User
  • Total Posts : 0
  • Reward points: 0
  • Joined: 2020/04/20 11:43:22
  • Status: offline
2020/04/21 09:50:40 (permalink)

Working with quoted text and RegEx

Greetings.
 
I have an Ivona voice installed. It is the only TTS system that I've encountered that allows me to use SSML to change the tonal characteristics of the voice (pitch, rate, etc). I've taken advantage of this by using "Edit speech" to substitute opening and closing quotation marks with <prosody> tags to alter the pitch of the voice when the quote is rendered by the TTS engine. I do this to differentiate the quoted speaker's voice from the author's voice and to add some dynamism to the otherwise monotonous nature of TTS voice readings, it was exciting to have that experience. I've encountered a re-occurring problem, however. I can effectively do this hack only if the quoted text is a single phrase or sentence because data is sent to the TTS engine (and hence the RegEx engine) one sentence at a time. If there is quoted text that includes multiple sentences, the <prosody> tag that replaces the opening quotation mark is lost at the completion of the initial sentence, which is spoken at a higher pitch, but all subsequent sentences are then spoken at the default pitch; of course, because they are sent without <prosody> tags. I have no way of capturing these sentences to place them in <prosody> tags. I have conjured a hack for quotes that include only two sentences by capturing the opening and closing quotation marks separately with two "Edit speech" entries, but if there are more than two sentences, I lose the sentences in the middle.
 
So, my question is, can you develop a way to provide the option to send an extended quote to the engine as a single block, instead of splitting it into individual sentences? I'm hardly a programmer, but it's apparent that the current delimiter for what chunks are sent to the TTS engine is the sentence-ending period (.). Setting the delimiter to quotation mark pairs seems like it would be simple enough to code. Or, maybe even better, allow users the ability to specify the delimiters for the chunks that are sent to the TTS engine ourselves via RegEx.
 
I've seen articles written where quoted text included the same quotation marks twice, for e.g. “The quick “brown” fox jumped...” instead of the appropriate syntax which I believe should be “The quick ‘brown’ fox jumped...”. The former would likely cause a problem if all characters between (“) and (”) were sent to the engine since that string would then be “The quick “brown”. I must also mention the fact that some articles use straight quotes (" ') and not curled quotes (“ ‘). Allowing users to set our own delimiters with RegEx is the only solution I can see unless you can conjure an all condition RegEx code and give us the option to activate it as a delimiter.
 
Please let me know what you think of this.
 
Best regards!
post edited by DJugs - 2020/04/21 09:57:37

1 Reply Related Threads

    Admin
    Administrator
    • Total Posts : 275
    • Reward points: 0
    • Joined: 2010/11/22 00:00:00
    • Location: USA
    • Status: offline
    Re: Working with quoted text and RegEx 2020/04/21 11:51:37 (permalink)
    The chunks sent to TTS engine must be of limited length. Some TTS engines will not accept anything longer than about 500 characters, a few maybe will allow up to 2000 characters, but not much beyond that. The app must use the smallest reasonable limit. It still could process longer chunks of text with RegEx (e.g. have the RegEx replacements work on paragraphs, not sentences), but it would take even longer time to process each fragment of text before sending it to the TTS engine (first get the paragraph, do the replacements in it, then split it back into sentences etc. etc.) What you try to do would probably be best done in a separate, text processing program that reads one file, adds the speech commands to it, outputs it as a new file, and then that new file is taken for reading aloud...
     
    I may one day attempt to add paragraph speech replacements, but don't know if the results will be good. And frankly, I'm running out of time, patience and energy for all this work, particularly in the light of all the bad reviews and hostility I get in Google Play.
     
    Greg
    Jump to:
    © 2020 APG vNext Commercial Version 5.1