this post was submitted on 02 Sep 2024
156 points (99.4% liked)

technology

23267 readers
134 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 4 years ago
MODERATORS
all 47 comments
sorted by: hot top controversial new old
[–] blakeus12@hexbear.net 69 points 2 months ago (2 children)

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaand literally nobody is surprised

[–] M500@lemmy.ml 31 points 2 months ago

The biggest part of this, is that they have denied this for years. But I think everyone knew it was happening.

[–] SeducingCamel@hexbear.net 3 points 2 months ago (1 children)

But the people on reddit told me this totally wasn't happening

[–] ganymede@lemmy.ml 3 points 2 months ago* (last edited 2 months ago)

smug redditors who not only 100% "know" it can't be happening, but actively get annoyed if someone dares to even wonder if it might happen

[–] SkingradGuard@hexbear.net 57 points 2 months ago* (last edited 2 months ago)

Corpos abusing phone permissions to sell you stuff? shocked-pikachu

[–] Camdat@hexbear.net 50 points 2 months ago* (last edited 2 months ago) (5 children)

This is maybe my biggest pet peeve. These companies are not listening to you in any meaningful way.

You can trivially confirm this by hooking up your home network to Wireshark and filtering packets.

Other reasons:

  1. They can get all of this information elsewhere: searches, ad pixels, location capturing etc.
  2. Processing audio data is basically impossible on-device in a useful way, and the network infrastructure to support mass transcriptions on the cloud would be on the order of billions.
  3. It would be a massive endeavor to cover up the millions of hours of audio data that would need to be analyzed by the lowest paid and most unhappy workers in the industry (content labelers and moderators)

Now I'm sure this is some marketers wet dream, but the logistical and PR nightmare this would create dissuades all but the dumbest ad agencies. This is mostly just terrible tech journalism.

[–] blame@hexbear.net 37 points 2 months ago (2 children)

Not that I disagree with your conclusion because there's an even simpler way to check if an app is listening: iOS and Android will tell you the mic is being used... Anyway, we do have always-on NNs listening for keywords ("Siri,", "Hey google", "Alexa") so I agree that full ass voice transcription like whisper will run like dogshit on your phone they can certainly run a much much lighter model to pick up a handful of keywords.

[–] Camdat@hexbear.net 13 points 2 months ago (1 children)

Sure this is definitely true. I should clarify that single-word NNs do run on-device all the time, but those require specialized models that are trained only on those keywords. Once those models trigger they need to send everything else to the cloud.

[–] blame@hexbear.net 15 points 2 months ago* (last edited 2 months ago)

I agree. If I was going to do something like this for advertising though I wouldn't really care too much about what people were saying so instead I'd just listen for some limited set of keywords (maybe for some of my top paying advertisers) and serve ads for keywords that hit recently. Keep it all on device until an ad actually needs to be served.

[–] RyanGosling@hexbear.net 22 points 2 months ago (1 children)

Not to mention cross site trackers owned by Google and Facebook.

[–] Camdat@hexbear.net 23 points 2 months ago* (last edited 2 months ago)

I think people greatly underestimate (or misunderstand) the pervasiveness of ad tracking pixels.

Basically any website that has ads or tries to sell you something has a tracking pixel. These pixels create profiles of devices and track almost everything you do while interacting with those sites.

These pixels don't require any actual "information" about you, they're only interested in what you (via the device you're browsing on) will buy. They also don't use cookies anymore, it's usually a combination of user agent, IP address, and coarse location. As you said, companies will generally share these profiles.

[–] hotcouchguy@hexbear.net 17 points 2 months ago (1 children)

Eh, I dunno. I remember making exactly those points 20 years ago, but I think it's pretty feasible now. There are open source NNs that look like they can do this locally on mediocre phones. And if the output is garbage quality, that's ok, it just has to be good enough to sell some ads. I think it's largely feasible, although I'm sure it's inflated by startups looking to impress clients and investors.

[–] Camdat@hexbear.net 12 points 2 months ago (2 children)

Feel free to Wireshark your smart devices and confirm what I've said yourself. The most efficient way to do this is the pixels that already exist on almost every site.

On-device NNs use insane amounts of processing, even on "high-end" phones. You would notice if there was a always-on NN running on your device, this is also something you can try for yourself.

[–] hotcouchguy@hexbear.net 19 points 2 months ago* (last edited 2 months ago) (1 children)

And what exactly am I looking for in wireshark? A few KB of encrypted text data occasionally sent to who-knows-where? Mixed in among a flood of other tracking bullshit and general wasteful bloat? Yeah lemme go check real quick.

Computationally, we've had low-quallity speech to text on home PCs for like 30 years, and we've had OK-quality NN implementations for like 15 years. Yes it would be a bit wasteful, but a trimmed-down NN could easily hide among the general bloat of modern software.

Yes it would be kind of a clunky and impractical way to collect data compared to other methods, but it's definitely plausible that an adtech startup could hack together a semi-functional version of this and then slap it in a slide deck. It would let them say "AI" more times during their pitch.

[–] Camdat@hexbear.net 11 points 2 months ago

You can filter by device. Leave your suspect device connected to your network for a few days, filter by destination and review. Also keep an eye on CPU usage.

If your devices have a ton of random outgoing network requests you're already being tracked in a myriad of other ways and need to lock your shit down.

I've done this before, there's not as much network bloat as you might think.

[–] ganymede@lemmy.ml 8 points 2 months ago* (last edited 2 months ago) (2 children)

it sounds like you have enough knowledge to know it’s almost impossible for an individual to assert it absolutely 100% isn’t happening.

imo if you make an honest effort to break the technical problem down you will arrive at a different conclusion - or in the very least not be nearly so bold as to allow this to be an emotional peeve.

consider forgetting the propaganda the media has subjected you to, and most importantly forget whether you do or don’t want it to be true. approach the problem from a purely technical perspective while considering these companies can hire hundreds of very smart people from a variety of subdisciplines. recall these companies have virtually bottomless greed and almost exactly 0 morals.

[–] Camdat@hexbear.net 7 points 2 months ago* (last edited 2 months ago) (1 children)

The Internet and smartphones are not mystical devices. This is something you can independently confirm yourself very easily.

I have the knowledge necessary to say this 100% does not occur on devices that I own.

[–] ganymede@lemmy.ml 8 points 2 months ago* (last edited 2 months ago) (1 children)

The Internet and smartphones are not mystical devices.

Whether they're mystical or not is an entirely different conversation ;p

This is something you can independently confirm yourself very easily...

you are vastly understating how non-trivial this task is. or you are allowing your emotional desires to cloud your technical analysis.

teams of experts put in months at a time to assess only a fraction of the required scope. these experts are putting in so much time while admitting they couldn't achieve full coverage despite having financial backing & well trained teams. it's reasonably unlikely so many experts would dedicate so much time & resources if its such an easy thing to independently confirm.

if Camdat and ganymede were sitting with one of their nontechnical friends, and their friend says "hey my stock smart device which i only use with facebook and a few things seemed like it eavesdropped on my voice about <common product/brand>". and they swear they didn't reveal it via some other channel etc. blah blah we've all heard it many times.

if you, Camdat listed all the reasons why the same phenomena can likely be attributed to a variety of other surveillance and correlation methods, some of which are arguably at least as scary. i would likely agree with every single thing you said.

imo its wiser to leave it at that, rather than making the assertion its absolutely not happening, or getting frustrated with them for even wondering.

[–] sunshine@hexbear.net 3 points 2 months ago

Your posts in this thread have been very helpful! thank you!

[–] whogivesashit@lemmygrad.ml 5 points 2 months ago (1 children)

I don't know if I would say it's impossible, but in my experience I feel like it's unlikely.

Also there is similarly a very large pool of impossibly smart people who don't work for these companies, and who spend a lot of time looking for all kinds of nefarious stuff like this. It would be very unlikely that they could hide something of this nature from the entire world of people who own these devices.

[–] ganymede@lemmy.ml 5 points 2 months ago* (last edited 2 months ago) (1 children)

in each of the studies i've read, if you dig past the popsci headlines reported in the media, and into the actual academic claims being made. everyone i've read has been quite upfront about the limits of the study and how they've been unable to achieve full scope to absolutely rule it out. if you know of any absolutely conclusive full binary analysis please link.

tbh i don't mind people saying they think its not happening, or that its unlikely etc

saying it's absolutely not happening is a very different thing. and a very difficult assertion to justify.

it's always something like "it's impossible cos its too much data to record everyone 247/365" when even a tiny bit of common sense, (even if one knows nothing about computers, networks or even audio) could quickly conceive of the idea that some simple mechanism might detect noise thresholds and not need to record 24/7. you don't even need to be technically minded to work that part out.

i could go on and dig into the actual technical aspects, but the main point is it's always some unbelievably contrived scenario. basically fabricating low hanging fruit which is so low its underground. and then declaring that not only is everything 100.000% safe, but its actually a peeve that you even wondered.

[–] whogivesashit@lemmygrad.ml 4 points 2 months ago (1 children)

Yeah I'm not familiar with any particular research that completely rules it out.

I don't think it's so much that it would be impossible to conceive of them being able to record you in short bursts. It's more so, the amount of computing power to process even small amounts of audio data on a large scale.

And beyond that, not that I think it's not possible for that to be done either, but understanding that these are capitalist systems that will engage with whatever is most profitable.

It has already been shown that it is quite easy to track people through all of the other methods already in place and serve those advertisements very well, which is probably much more cost effective than the audio stuff.

Although with the progression of some of these machine learning models, the equation may look a little different before too long.

[–] ganymede@lemmy.ml 5 points 2 months ago* (last edited 2 months ago) (1 children)

I don’t think it’s so much that it would be impossible to conceive of them being able to record you in short bursts.

that's exactly my point. if there's an argument to be made over a technical aspect, why undermine it with some nonsensical requirements? imo it really suggests an emotional desire for it not to be true, which just compromises the integrity of any subsequent technical analysis.

as for the actual technical analysis, i'm always up to discuss each aspect of it :)

regarding the computing requirements for audio, this is something well worth looking into.

human vocal frequencies are quite narrowband compared with the audio most people think of with their music, gaming and movies/episodes.

CD quality audio is 16-bit 44.1 kHz sample rate, modern 'high fidelity' audio is in the realm of 24-bit 96 kHz or 192 kHz sample rate.

compare with even ancient voice codecs where bandlimited sampling requirements are only 6.6 kHz and 8bit samples can produce an effective 12bit response! that's almost half a century ago btw!

the telecommunications industry has put considerable effort into understanding the human voice and the kinds of margins they can use to be profitable. they can even estimate the differential energy footprint based on different choice of words and tones in a conversation, this stuff has been studied quite a bit, for decades.

therefore the audio computational requirements are quite a bit less than i think alot of people realise. but we can ofc go deeper with the technical analysis into a variety of subdisciplines for the computational requirements to be substantially reduced even further.

understanding that these are capitalist systems

regardless of the reduced costs alluded to above, i think the capitalist system is another insight for us to examine. they are boundlessly greedy, nothing is ever enough.

there's always been the argument they 'have enough data already', (and that is a good argument, because they do have enough).

but when has 'enough' ever been sufficient for these systems? they already had cookies, but they wanted tracking pixels. and when they had tracking pixels, they devised browser fingerprinting. but that still wasn't enough, so they started devising audio beacons, but that wasn't enough, then they started spying on shopping center wireless traffic. etc etc

it's never ever enough. when we demand infinite growth on a finite planet, it will never be enough.

and imo it doesn't actually need to be directly profitable in effect, only to be marketed as such to feed their bottomless appetite. especially when correlated surveillance is highly prized, and an additional channel or medium adds value to the existing gathered surveillance.

Although with the progression of some of these machine learning models, the equation may look a little different before too long.

exactly, imo its not a matter of if but when.

and imo if its finally revealed. some people will say "no shit", some powerless people will be upset. but most people will say "i'm not doing anything wrong so i don't care".

and i'm willing to bet a bunch of the people currently telling us "its impossible", will unironically switch overnight to saying "i always knew they were doing it and it never bothered me"

[–] whogivesashit@lemmygrad.ml 4 points 2 months ago

You make an awful lot of compelling points. I very much appreciate your analysis 😊 thank you

[–] pineapplelover@lemm.ee 9 points 2 months ago

It's actually really successful. I've had some conversations with people and right after, something on their Instagram feed would show something we just talked about. Most recently I made a joke to my friend about my name, next thing on his feed was a meme using my name.

[–] SkingradGuard@hexbear.net 7 points 2 months ago

Bad headline then? Huh

[–] hotcouchguy@hexbear.net 44 points 2 months ago (2 children)
[–] sexywheat@hexbear.net 49 points 2 months ago

Well there it is.

I have nothing but anecdotal examples, but the amount of weirdly specific ads I've gotten related to shit I (or my partner) was just talking about the other day just keeps piling up.

The most noteworthy though was something I heard from an acquaintance. She has no interest in sports, doesn't watch them, doesn't search about them, doesn't care. One day she went to a gathering with her husband where most of the people there were watching the hockey game. In the days that followed she started getting ads (and, IIRC, even push notifications) about hockey disgost

[–] Assian_Candor@hexbear.net 30 points 2 months ago (1 children)

We know what you are thinking...

Is this legal? YES- it is totally legal for phones and devices to listen to you. That's because consumers usually give consent when accepting terms and conditions of software updates or app downloads.

doomjak

[–] Sauerkraut@discuss.tchncs.de 5 points 2 months ago (1 children)

How do we stop it? Maybe an app that alerts us anytime the video or mic gets turned on?

[–] Assian_Candor@hexbear.net 5 points 2 months ago

Practically? I think the Disney thing that just happened shows that these unilateral T&C impositions and changes need to be challenged in court. Nobody reads them and they can just bury whatever the hell they want in there. It's completely unreasonable.

[–] OptimusSubprime@hexbear.net 24 points 2 months ago
[–] Frogmanfromlake@hexbear.net 15 points 2 months ago (1 children)

Lol I pity anyone who listens to me rambling about my special interests.

[–] aaaaaaadjsf@hexbear.net 10 points 2 months ago* (last edited 2 months ago)

I pity anyone hearing the amount of swearing I do on a daily basis lol.

I just checked and the only app on my phone that accessed the microphone permission in the last 24 hours is WhatsApp, and I've set it to "ask every time". So hopefully no poor meta employee has heard me use foul language.

[–] Spongebobsquarejuche@hexbear.net 14 points 2 months ago* (last edited 2 months ago) (2 children)

I'm confused, this has been a thing for years. Is this normies catching up? Limited hang out? Your privacy was given away at the altar of profit long ago.

[–] MonkderVierte@lemmy.ml 3 points 2 months ago

That's why F-Droid on custom ROM.

[–] ganymede@lemmy.ml 2 points 2 months ago

you're definitely asking the right questions. had the same thoughts.