That is a pretty weak argument. The issues are minor and in a library that people are moving off of to a better build and stronger validated library. Yes, it should have been like that in the first place, but the problem is minor and being addressed.
I would look more to the various features of Matrix that aren't encrypted like room names, topics, reactions, ... and not to mention the oodles of unencrypted metadata. I really wouldn't call Matrix a high-privacy system.
I like Matrix and use it regularly, but it definitely doesn't have a privacy-first mindset like Signal does. I'm hoping that this improves over time, but without a strong privacy first leadership it seems unlikely to happen.
This isn't how YouTube has streamed videos for many, many years.
Most video and live streams work by serving a sequence of small self-contained video files (often in the 1-5s range). Sometimes audio is also separate files (avoids duplication as you often use the same audio for all video qualities as well as enables audio-only streaming). This is done for a few reasons but primarily to allow quite seamless switching between quality levels on-the-fly.
Inserting ads in a stream like this is trivial. You just add a few ad chunks between the regular video chunks. The only real complication is that the ad needs to start at a chunk boundary. (And if you want it to be hard to detect you probably want the length of the ad to be a multiple of the regular chunk size). There is no re-encoding or other processing required at all. Just update the "playlist" (the list of chunks in the video) and the player will play the ad without knowing that it is "different" from the rest of the chunks.