Reply to post

Silencing Footnotes in Scientific Books and Papers - RegEx Pattern

Author
cuibono
User
  • Total Posts : 0
  • Reward points: 0
  • Joined: 2017/09/20 07:24:18
  • Status: offline
2017/09/21 09:36:50 (permalink)

Silencing Footnotes in Scientific Books and Papers - RegEx Pattern

Hi, first of all thank you so much for this great app. This is propably the best and most useful app in the whole play store.
 
As I see from the FAQs it is a topic to silence or exchange the uninteresting parts (for listening) of scientific publications. I've tried to understand RegEx fully, but failed. The instruction website is good, but I would need more examples how to implement it in your app, to fully understand the key characters and so on. I hope you can help me with my questions:
 
I have 4 cases which I don't get to work myself:

1. Every sentence starting with a superscript number from 1 to 9999 (with or without space in between) (normaly at the bottom of most pages) should be exchanged
Example: "²Smith, The Book, 1973" or "² Smith, The Book, 1973, p. 12"

2. Every sentence containing "Vgl." or "vgl." should be exchanged
Example: "² Vgl. Smith, 1973" or "²Vgl. Smith, 1973"

3. Superscript Footnotes which are directly behind a "." from 1 to 9999 should be exchanged
Example: "...as shown in his puplication.²"

4. Every Text in "()" containing a year (1900 to 2050) should be exchanged
Example: "(Smith 1973)" or "(Smith, 1973, p. 12)"
Note: This one should be already working with your newest FAQ thread. I'll try that. Thanks.
 
Thanks in advance and all the best
cuibono
 
PS: I have already added many speech edit rules for German and English scientific publications, if there is some interest I can upload my export file here. (rules for stuff like: p. = Page; S. = Seite; etc. = et cetera;...)

3 Replies Related Threads

    Admin
    Administrator
    • Total Posts : 275
    • Reward points: 0
    • Joined: 2010/11/22 00:00:00
    • Location: USA
    • Status: offline
    Re: Silencing Footnotes in Scientific Books and Papers - RegEx Pattern 2017/09/21 09:51:49 (permalink)
    Thank you for your interesting post! Please let me know if your number 4. works correctly with the RegEx I provided at the top of this thread. For the examples 1., 2. and 3. I would need sample articles/texts where they occur, could you possibly email them to me as files, or provide exact links where I could open them? In case like superscript we would have to look at HTML code of the sentences that contain them.
     
    For removing the references at the bottom of PDF pages, maybe another approach would work better - our PDF Crop Plugin, where you could raise the bottom margin to exclude such references from each page. But it's manual work to browse through all the pages and adjust the margin... Anyway, I should be able to tell more when I have sample articles.
    cuibono
    User
    • Total Posts : 0
    • Reward points: 0
    • Joined: 2017/09/20 07:24:18
    • Status: offline
    Re: Silencing Footnotes in Scientific Books and Papers - RegEx Pattern 2017/09/21 11:01:22 (permalink)
    Wow thanks for your fast answer. Yes I will try your solution for number 4 in a few days (when I'm back at the office) and will answer here.
     
    For the other points: I did use the pdf crop already (which is awesome!), but the publications always have a different number of references on every page (as you suggest) and I wanted to get a better solution for all the publications I want to listen to (often 200-600 pages, that's too much work). In the best case a universal solution of course.
     
    I have pdf files, but most of the time use them as a txt in @Voice, because I thought it would be more efficient with your app (cpu usage, errors,...). For txt files the supercript numbers are changed to just normal integer numbers, but I thought if there is a universal solution for superscript in genereal I would try rtf/doc/html or something else (recommondations?). The pdfs on the other side could be different in their handling of superscript, but if there is a universal solution for them it would be the best solution. I'll send you a small collection of examples (just a few pages) in the next days.
     
    Thanks a lot in advance!
     
    PS: my biggest problem with RegEx is, how to make an if-condition (word) and combine it with a then-condition to remove the whole sentence. There is only one line of code possible. But, I'll send you the examples and I hope from the first working pattern I can understand the rest and every modification needed myself.
    Admin
    Administrator
    • Total Posts : 275
    • Reward points: 0
    • Joined: 2010/11/22 00:00:00
    • Location: USA
    • Status: offline
    Re: Silencing Footnotes in Scientific Books and Papers - RegEx Pattern 2017/09/21 11:56:27 (permalink)
    OK, I'll look at the examples once I have them to come up with solution. RegEx are not like a programming language, there are no if-then-else statements, just patterns to match and possible replace. For example for "2. Every sentence containing "Vgl." or "vgl." should be exchanged" the RegEx pattern could be:
     
    .*[Vv]gl\.\s+.*
     
    and "Replace with" leave empty to silence such sentence, or some word or phrase to replace it with. If @Voice stops sentence at "Vgl." abbreviation (the yellow highlight ends on it), it should be added to known abbreviations for German language. I'm updating the German abbreviations in my standard distribution to account for both "Vgl." and "vgl."
    Jump to:
    © 2024 APG vNext Commercial Version 5.1