Situation: I got a scanned book that I’d like to read that is in chinese and has no available translation. I really want to read it, because it would probably help a lot with my university project.

What I tried: tried creating a version with ocr to get a text layer and use some translation tool on it, but found no way to make the ocr text visible. I also tried this tool, but the ocr didn’t work for me, and I found no way to use it with some local model

Have any of you ever done a similar task? I’d appreciate any kind of suggestions and tips.

  • bitofarambler@crazypeople.online
    link
    fedilink
    arrow-up
    9
    arrow-down
    1
    ·
    20 hours ago

    i did this with a chinese book, but have to check what i used.

    The translation was entirely readable.

    i think i used tesseract.

    No, GImagereader!

    that was it.

    tesseract was also very straightforward, but gimage reader had a GUI, and all I had to do was import the file and then click export and it did the whole thing.

    • morto@piefed.socialOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      14 hours ago

      I used tesseract, but the output pdf didn’t have visible text, and I found no way to change it. Maybe I don’t know how to properly use it., or it’s not intended to keep formatting.

      • bitofarambler@crazypeople.online
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        11 hours ago

        try gImagereader.

        it’s a frontend to tesseract and is more workable via its GUI and option menus.

        Load the file, execute the program.

        That’s all I had to do for a successful OCR.