Situation: I got a scanned book that I’d like to read that is in chinese and has no available translation. I really want to read it, because it would probably help a lot with my university project.
What I tried: tried creating a version with ocr to get a text layer and use some translation tool on it, but found no way to make the ocr text visible. I also tried this tool, but the ocr didn’t work for me, and I found no way to use it with some local model
Have any of you ever done a similar task? I’d appreciate any kind of suggestions and tips.
Be wary that their docs are so and so. Nanonets OCR, Mistral OCR and MinerU will also extract formulas and images.
One other model I forgot to mention is Docling. This one is quite quick to set up in a docker container, and will have a web interface ready to go where you can upload documents. This sort of follows the PaddleOCR pipeline, but also allows you to use vLMs.
Good luck!