this post was submitted on 20 Jun 2024
1000 points (98.9% liked)

Science Memes

11047 readers
2900 users here now

Welcome to c/science_memes @ Mander.xyz!

A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.



Rules

  1. Don't throw mud. Behave like an intellectual and remember the human.
  2. Keep it rooted (on topic).
  3. No spam.
  4. Infographics welcome, get schooled.

This is a science community. We use the Dawkins definition of meme.



Research Committee

Other Mander Communities

Science and Research

Biology and Life Sciences

Physical Sciences

Humanities and Social Sciences

Practical and Applied Sciences

Memes

Miscellaneous

founded 2 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] ChaoticNeutralCzech@feddit.de 5 points 4 months ago

I know PDF providers who visibly print the customer's name or number in the header of every page, along with short copyright text. I use qpdf --stream-decompress to make the PDF into human-readable PostScript, and then Python+regex to remove each header text, which stand out a bit from other PDF elements. The script throws an error if more or fewer elements than pages have been removed but that hasn't happened yet. Processed documents sometimes have screwed-up non-ASCII characters in the Table of Contents for some reason but I don't have the originas anymore so IDK if it's my fault. Still, I wouldn't share the PDFs unless in text-only or printed form because of any other steganographic shenanigans in the file. I would absolutely torrent them if I could repurchase them under a new identity and verify that the files are identical.

BTW, has anyone figured out how to embed Python code in PDF? The whitespace always gets reencoded as x-coordinates so copy&pasting it never preserves indentation. No, you can't use the Ogham Space Mark (Unicode's only non-blank character classified as a space) for indentation in Python, I tried.