Are there any tools I can use for translating a ~400 pages scanned book?

morto@piefed.social · edit-2 20 hours ago

Are there any tools I can use for translating a ~400 pages scanned book?

morto@piefed.social · 14 hours ago

That PaddleOCR looks very interesting. It will even extract images and formulas and somewhat preserve formatting in the output! I will try this one, even if takes more than a day to process is with my low end cpu. Thank you for the suggestion!

andrew0@lemmy.dbzer0.com · 10 hours ago

Be wary that their docs are so and so. Nanonets OCR, Mistral OCR and MinerU will also extract formulas and images.

One other model I forgot to mention is Docling. This one is quite quick to set up in a docker container, and will have a web interface ready to go where you can upload documents. This sort of follows the PaddleOCR pipeline, but also allows you to use vLMs.

Good luck!