It should be reasonably trivial to programmatically watch the frames; original programming will have mastered audio levels and set video compression; any shift to an ad should stand out like a sore thumb.
So as long as things aren’t locked down to a DRM’d player, it should be possible to fingerprint the audio and video stream content and drop any inserted frames that don’t match.
If YouTube decides to mangle the original content to fight back… then maybe that’s finally the impetus people will need to switch platforms.