• apftwb@lemmy.worldOP
    link
    fedilink
    English
    arrow-up
    16
    ·
    edit-2
    9 hours ago

    Are you having as much trouble with OCR as the article author? I would have thought OCR was a solved problem in 2026 even with poor font selection.

    • kescusay@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      ·
      6 hours ago

      I’m not having trouble with it as such, it’s just a slow and painstaking process. The source is crappy enough that an enormous number of characters need to be checked manually, and it’s ridiculously time-consuming.

    • floofloof@lemmy.ca
      link
      fedilink
      English
      arrow-up
      4
      ·
      7 hours ago

      I wonder if they gave considered crowdsourcing this, having many people type in small chunks of the data by hand, doing their own character recognition? Get enough people in and enough overlap and the process would have some built-in error correction.

        • Kevlar21@piefed.social
          link
          fedilink
          English
          arrow-up
          7
          ·
          edit-2
          6 hours ago

          Not an expert at all but I’m genuinely curious how long it would take to check all possibilities for each I or 1? Is that the full length of the hash or whatever? So in this example image we have 2^8 =256 different possibilities to check? Seems like that would be easy enough for a computer.

          Edit: actually read the article. It’s much more complicated than this. This isn’t really the only issue and the base64 in the example was 76 pages long.