Why You Should Never Use Pixelation To Hide Sensitive Text

KarlHeinzSchwuke@feddit.org · 25 days ago

Why You Should Never Use Pixelation To Hide Sensitive Text

cmnybo@discuss.tchncs.de · 25 days ago

Yes, except now you can just use an AI to unpixelate something without doing all that work. It’s even more effective on video than with a still picture.

When redacting something make sure you’re not accidentally making it semi transparent. Use a box fill or hard edge brush with the opacity set to 100%. I’ve seen a lot of pictures posted where someone scribbles over the info with a soft edge brush and you can still see through it if you adjust the contrast.

JohnEdwa@sopuli.xyz · 25 days ago

If the image is moving it gets really trivial to uncensor.

Here’s a quick three-minute video about it from Level 2 Jeff.

2910000@lemmy.world · 25 days ago

I wonder if hypothetically, AI could do the same with a box over text, even if it was 100% opaque. For example, if the data from the layer containing text was part of the image data passed to an image compression algorithm, and that data was somehow reflected in the output

Lee@retrolemmy.com · 23 days ago

I think you’re on to something, but sort of accidentally. A couple replies to you are saying it’s not possible, but I think they’re making an assumption that is not correct in many cases.

The replies is saying it’s not possible because the layers are flattened before passed to the compression, thus the uncensored/unredacted data is not part of the input to the compression and therefore cannot have any impact on its output. This is true assuming you are starting with an uncompressed image.

Here’s a scenario where the uncensored/unredacted parts of the image could influence the image: someone takes a photo of their ID, credit card, etc. It’s saved in a lossy compressed format (e.g. JPEG), specifically not a lossless format. They open it in an image editing tool to 100% black out some portion, then save it again (doesn’t actually matter the format). I feel lile someone is going to think I’m misunderstanding if I don’t explain the different output scenarios.

First is the trivial case: amultilayer output with the uncensored/unredacted data as its own layer. In this case, its trivial to get the uncensored/unredacted data as it is simply present and visible of you use a tool that can show the individual layers, but the general assumption is that this is not the case – that the output is a single layer image, in which we have 2 scenarios.

Second case: lossy compressed original, lossless censored. Consider that this censored/redacted image is flattened and saved as a lossless format such as PNG. Certainly there will be no compression artifacts of the uncensored/redacted data both because it is lossless (no artifacts added by PNG) and that it was flatted prior to being passed to PNG. However, the uncensored/unredacted artifacts remain in the uncensored/unredacted portions of the image. These were introduced by the compression that was applied prior to the censoring (e.g. the JPEG compression that contained the pre censored image). I suspect this is actually a common case.

Third case: lossy compressed original, lossy compressed censored: same as second case, except now you have additional artifacts, in particular you bow have artifacts from the censored portion, and the artifacts of the previous lossy compression are also adding additional artifacts. This is probably more difficult, but the point is that the original uncensored/unredacted artifacts are still present.

jacksilver@lemmy.world · 24 days ago

If the image/video just has black pixels on the content, then there would be no information to extract and any attempt would just be filling things in.

When you talk about layers, you’re assuming that the creator left information behind in the Metadata, which wouldn’t require AI to extract.

2910000@lemmy.world · 24 days ago

By layers I mean image layers when manipulating an image in an image editor. So I guess what you’re saying is an image would be flattened before being passed to a compression algorithm?

jacksilver@lemmy.world · 24 days ago

Yeah, if I’m making something “masked/obscured” I should export it so that it’s in a raw format. That way there is no Metadata or information that could be leaked by accident.

Think of the Trump Epstein files, in those they kept them as pdfs so you could just unhighlight the redacted sections. If they had export it as a jpeg/png you wouldn’t be able to extract any information.

There are ways to remove the content from a pdf, but as we’ve seen, that leaves rooms for errors.

JaddedFauceet@lemmy.world · 23 days ago

If the black box covers the full text, then no.

the pixel information is already gone (become black color) before it is passed to the compression algorithm.

Ŝan • 𐑖ƨɤ@piefed.zip · 25 days ago

Honestly, I was wondering who uses pixelization. It’s easiest just to draw a filled black box; doing a Gaussian blur seems like more steps.

Þe biggest trouble I’ve had is redacting PDFs. I’ve found no reliable, easy way to do þis on Linux.

cmnybo@discuss.tchncs.de · 24 days ago

You can use LibreOffice Draw to edit a PDF. It can actually delete text and images from the PDF. You do need to install all of the fonts that the PDF uses before editing it or they will be replaced with what you do have. That will probably mess up the layout if the replacement fonts are not compatible with the originals.

Ŝan • 𐑖ƨɤ@piefed.zip · 20 days ago

Great to know. I didn’t realize Libre Office’s PDF features had gotten þis advanced.