Reply to post

Using RegEx to match and remove random text within PDF

Author
Morden
User
  • Total Posts : 1
  • Reward points: 0
  • Joined: 2015/07/15 01:00:56
  • Status: offline
2015/07/15 01:34:36 (permalink)

Using RegEx to match and remove random text within PDF

Hi there,
I have a PDF which has text boxes indented within the actual text of a book.  When using @Voice (on Android) with this PDF, the result is that the text boxes will be read even though they are already part of the main text and do not need to be read.  The result is that the flow of a sentence is interrupted and doesn't make sense (unless you're watching the highlighted text as it is read out).
Below is an example of the kind of text I am referring to:
 
"In order to correctly use this program it is recommended that you first read the entire manual.  This is particularly a must for new users!  I cannot stress this enough!! If you've read this manual before, you can however, use it as a reference<NO PERIOD HERE>
<CR>
This is particularly a must for new users!  I cannot stress this enough!!   <<< Repeated text I'd like not to be read
<CR>
and skip to any section you like."  <<< continued text before it was interrupted.
 
In the above example please ignore the comments within <> since they are only to highlight carriage returns <CR> and that a period does not end the original sentence before the <CR>.
 
Is this possible to use RegEx (within Settings/Edit speech) since the software seems to read one sentence at a time and even considers the sentence ended though there isn't a period at the end of the first sentence (as in the example above), but rather a carriage return/newline?
 
Many thanks for your help!
 
Kind regards,
Morden
 

3 Replies Related Threads

    Admin
    Administrator
    • Total Posts : 275
    • Reward points: 0
    • Joined: 2010/11/22 00:00:00
    • Location: USA
    • Status: offline
    Re: Using RegEx to match and remove random text within PDF 2015/07/15 15:52:26 (permalink)
    I have another kind of filters in @Voice app, that can operate on longer fragments of text to process them, e.g. remove unwanted parts. However, it would be very difficult to identify the repeated parts, I cannot think of a RegEx expression that would match such case. If they were enclosed into some kind of text markers, the match would be easy, but otherwise I really don't know.
     
    If you can, send me by email attachment the original PDF and indicate where exactly the problem happens. When I see the original file, I may get inspired with some other idea to find and remove such text, but no promise.
     
    Greg
    Sss
    User
    • Total Posts : 0
    • Reward points: 100
    • Joined: 2020/02/12 05:29:17
    • Status: offline
    Re: Using RegEx to match and remove random text within PDF 2020/03/30 22:49:06 (permalink)
    Is there a way we can remove page numbers distributed all across page? They have a fixed format
    For ex: 
    -1-
    -2-
    -3-
     
    I tried hard coding regex replace, it worked well. Is there anyway to make it dynamic? Something like "-TEXT-" will work I think.
     
    Thanks
    Admin
    Administrator
    • Total Posts : 275
    • Reward points: 0
    • Joined: 2010/11/22 00:00:00
    • Location: USA
    • Status: offline
    Re: Using RegEx to match and remove random text within PDF 2020/03/31 04:52:31 (permalink)
    If it's in a PDF file, the best way is to choose the option to "Manually crop pages" when you open that PDF, and then, when the actual PDF pages are displayed - move the top or low edge down or up to "shade" the page numbers and other unwanted parts of a page. Then copy the same "crop" to all other pages, if they have similar layout. When you open the "Manually crop pages" screen, also press the menu button there (3 vertical dots at top-right) and press Help to read more on how to use this feature.
    Jump to:
    © 2024 APG vNext Commercial Version 5.1