Reply to post

[FAQ]@Voice "does not support" Hindi PDF files (or other Indic languages PDFs)

Author
Admin
Administrator
  • Total Posts : 275
  • Reward points: 0
  • Joined: 2010/11/22 00:00:00
  • Location: USA
  • Status: offline
2019/04/10 16:20:28 (permalink)

@Voice "does not support" Hindi PDF files (or other Indic languages PDFs)

@Voice can extract text from any PDF file that does contain valid text (and not just images of scanned pages, where OCR needs to be employed to recognize text from images), AND contains a valid translation table from the font codes used in this PDF to Unicode standard. Somehow it's a very bad "tradition" that PDF files created in Hindi and other Indic languages do not provide such translation tables at all. If you open such PDF in @Voice and instead of valid text see some gibberish, random characters, read below.
 
You can now open and read aloud such files in @Voice app as follows: open the original PDF file again in @Voice app, using the "Open" button on top of the screen (folder icon). Then on the next screen entitled "PDF Text Import Settings" turn on the OCR option and choose the correct language for your file. Then proceed to extract the text. The OCR processing will take much longer than normal opening of a correctly encoded PDF, but it will read aloud fine, if the text quality on the PDF pages is good.
 
If it's a long file, that you may need to open several times to continue reading, it's best to open the next time the file with extracted text, instead of opening the origina PDF again. The extracted text will be in the one of following folders, under the @Voice home folder, depending on which format of extraction you selected:
 
- for Plain Text extraction - in PdfText folder, the file name will be the same as the original PDF, with .pdf.txt extension
- for HTML extraction - in eBooks folder, the file name will be the same as the original PDF, with .pdf.epub extension
 
 
Greg
post edited by Admin - 2021/05/13 13:48:53

0 Replies Related Threads

    Jump to:
    © 2021 APG vNext Commercial Version 5.1