Reply to post

Could you guide me what should I add Edit speechs?

Author
Sorawut
User
  • Total Posts : 0
  • Reward points: 0
  • Joined: 2019/01/21 07:50:55
  • Status: offline
2019/05/20 01:32:25 (permalink)

Could you guide me what should I add Edit speechs?

Hi,
 
I would like to split these Thai words into character. 
Could you guide me what should I add Edit speechs?
 
Souce
ผู้สมัคร สส กทม 1 คน

 
My Edit speech was
*"(^|\D\s)([ก-ฮ])([ก-ฮ])([ก-ฮ]?)(\s|$)" "$1 $2 $3 $4 $5"

 
Result
ผู้สมัคร ส ส กทม 1 คน

 
What I want
ผู้สมัคร ส ส ก ท ม 1 คน

 
Thank you very much.

5 Replies Related Threads

    Admin
    Administrator
    • Total Posts : 275
    • Reward points: 0
    • Joined: 2010/11/22 00:00:00
    • Location: USA
    • Status: offline
    Re: Could you guide me what should I add Edit speechs? 2019/05/20 04:52:13 (permalink)
    The RegEx parser creates groups, that can be referred as $1, $2 etc., if you enclose the parts that corresponds to them in parenthesis (). I would use something like this (I use Latin letters, replace a, z with Thai letters):
     
    Pattern: (\s|^)([a-z])([a-z])\s+([a-z])([a-z])([a-z])(\s|$)
    Replace: $2 $3 $4 $5 $6
     
    The above would replace any string consisting of 2 letter word followed by a 3 letter word into separated letters, e.g. "ab cde" into "a b c d e". Not sure if this is really what you want though.
    Sorawut
    User
    • Total Posts : 0
    • Reward points: 0
    • Joined: 2019/01/21 07:50:55
    • Status: offline
    Re: Could you guide me what should I add Edit speechs? 2019/05/20 05:22:27 (permalink)
    I am sorry. I did not explain clearly.
     
    My example (สส กทม) are 2 Thai abbreviations. I found that many Thai abbreviations were spoken as a single word (which was wrong) so I am trying to detect and add space manually to correct all of them.
     
    So the order and length of them might be any combination.
    Example. กกต คสช สว มาจากกลุ่ม -> ก ก ต ค ส ช ส ว มาจากกลุ่ม
    Sorawut
    User
    • Total Posts : 0
    • Reward points: 0
    • Joined: 2019/01/21 07:50:55
    • Status: offline
    Re: Could you guide me what should I add Edit speechs? 2019/05/20 05:41:27 (permalink)
    From my understanding by testing on @Voice, each source character can be matched with only single Edit speech.
     
    For example.   
    Source
    "AB ABC ABCD AB ABC"

    Edit speech
    *"(\s|^)([A-Z])([A-Z])([A-Z]?)([A-Z]?)(\s|$)" "$1$2 $3 $4 $5$6"

    Expect
    "A B A B C A B C D A B A B C"

    Actual
    "A B ABC A B C D AB A B C"

    post edited by Sorawut - 2019/05/20 07:01:45
    Admin
    Administrator
    • Total Posts : 275
    • Reward points: 0
    • Joined: 2010/11/22 00:00:00
    • Location: USA
    • Status: offline
    Re: Could you guide me what should I add Edit speechs? 2019/05/20 06:36:23 (permalink)
    I don't know if there is a question here. I'm not really an expert on Regular Expressions, and they are not my invention. I use some open source code to handle them, and each time I face the problem I experiment and read the references, until I find a solution. If the RegEx is not working as you expect, try to fix it until it does...
    I entered your test regex into "Regex Match Tracer" tool, and it tells me that you regex only matches the "AB " part of the test string (found twice in the test string), nothing more.
    Sorawut
    User
    • Total Posts : 0
    • Reward points: 0
    • Joined: 2019/01/21 07:50:55
    • Status: offline
    Re: Could you guide me what should I add Edit speechs? 2019/05/20 07:32:44 (permalink)
    Ok.
    Jump to:
    © 2024 APG vNext Commercial Version 5.1