I have OCR'ed all the volumes (text recognition). NO changes were made to the images so the text looks EXACTLY the same.
But now you can copy the text and search.
Drama CD included
I know this sounds dumb but did you run any kind of test to make sure there were no OCR mistakes? Especially where there are furiganas.
Thanks nonetheless
@ Killua69100
Hey, firstly, I ran OCR without modification of the text, so you will be able to read the original text without any errors.
Secondly, I randomly checked different pages and some dense looking phrases in each volume, and the OCR was really good.
I do not remember furigana errors, but there were a couple of Kanji that were either simplified or recognized incorrectly (but you will be able to read and look them up anyway.
I think you will not be disappointed,
>I ran OCR without modification of the text
@hong_hua
I'm not too knowledgeable about this stuff so I'm sorry if the question sounds stupid but, you mean that you kept the original "images" and added an OCR text on "a layer behind it" or something along those lines? Am I getting it right?
Again, thank you :)
@ Killua69100
The OCR has different settings:
What I Did:
- Exact searchable image> the images look the same as original file, but you can copy and search the text.
because the font of the text is recognized and added to the data of the document. safest method.
and these options which I obviously avoided:
- searchable image> might change the font a little bit, might increase error rate
- editable text and images> allows you to basiclally type and edit the text. most dangerous because can make dramatic changes to the text and many errors.
@makumaku31
Hey, I use adobe acrobat Pro DC.
Firstly, I drop the raw images into the program (combine files into a PDF function). Then I save the file and OCR (scan&ocr function) with the following settings (Recognize Text function)
- Language Japanese
- Output: Searchable Image EXACT
- Downsample to 600dpi
Make sure not to use Enhance features. They might change the text significantly, and usually are not needed
Hi, there's spaces between each character in the OCR, it's in most pages I've looked through so far. It doesn't look like there's good support for vertical text. Do you have any idea on how to fix this?
@ IllIIIllI
Hey, I checked on several devices, and OCR works just fine.
Maybe you have not installed Japanese language support on your PC, and that would be a possible cause..
https://helpx.adobe.com/acrobat/kb/windows-font-packs-32-bit-reader.html
If you read on tablet, maybe having a Jap keyboard will help.
Comments - 11
Killua69100
hong_hua (uploader)
Killua69100
hong_hua (uploader)
Killua69100
JDO_27
makumaku31
hong_hua (uploader)
IllIIIllI
hong_hua (uploader)
sifadil