Situation: I got a scanned book that I’d like to read that is in chinese and has no available translation. I really want to read it, because it would probably help a lot with my university project.
What I tried: tried creating a version with ocr to get a text layer and use some translation tool on it, but found no way to make the ocr text visible. I also tried this tool, but the ocr didn’t work for me, and I found no way to use it with some local model
Have any of you ever done a similar task? I’d appreciate any kind of suggestions and tips.
i did this with a chinese book, but have to check what i used.
The translation was entirely readable.
i think i used tesseract.
No, GImagereader!
that was it.
tesseract was also very straightforward, but gimage reader had a GUI, and all I had to do was import the file and then click export and it did the whole thing.
I used tesseract, but the output pdf didn’t have visible text, and I found no way to change it. Maybe I don’t know how to properly use it., or it’s not intended to keep formatting.
try gImagereader.
it’s a frontend to tesseract and is more workable via its GUI and option menus.
Load the file, execute the program.
That’s all I had to do for a successful OCR.