Situation: I got a scanned book that I’d like to read that is in chinese and has no available translation. I really want to read it, because it would probably help a lot with my university project.

What I tried: tried creating a version with ocr to get a text layer and use some translation tool on it, but found no way to make the ocr text visible. I also tried this tool, but the ocr didn’t work for me, and I found no way to use it with some local model

Have any of you ever done a similar task? I’d appreciate any kind of suggestions and tips.

  • morto@piefed.socialOP
    link
    fedilink
    English
    arrow-up
    4
    ·
    14 hours ago

    That PaddleOCR looks very interesting. It will even extract images and formulas and somewhat preserve formatting in the output! I will try this one, even if takes more than a day to process is with my low end cpu. Thank you for the suggestion!

    • andrew0@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      2
      ·
      10 hours ago

      Be wary that their docs are so and so. Nanonets OCR, Mistral OCR and MinerU will also extract formulas and images.

      One other model I forgot to mention is Docling. This one is quite quick to set up in a docker container, and will have a web interface ready to go where you can upload documents. This sort of follows the PaddleOCR pipeline, but also allows you to use vLMs.

      Good luck!