Can't see why he would given that Starmer sent advisors to help Harris campaign and has been openly hostile to Trump during the election. Trump is a very petty man.
Labour in their infinite wisdom sent advisors to work with the Harris campaign against Trump. I think he hates Starmer on a personal level at this point, and wants revenge.
These are great examples. Never considered stuff like lesson planning, but makes perfect sense once you described it. Completely agree that once profit motive is removed then we can start finding genuinely good uses for this tech. I'm really hoping that open source nature of DeepSeek is going to play a positive role in that regard.
I think it’s not fair to call DeepSeek open source. They’ve released the weights of their model but that’s all. The code they used to train it and the training data itself is decidedly not open source.
Sure, but that's now become the accepted definition for open sourcing AI models. I personally find that's sufficient especially given that they published the research associated with it, which is ultimately what matters the most.
That said, I strongly believe that the architecture of LLMs are fundamentally incapable of intelligent behavior. They’re more like a photograph of intelligence than the real thing.
I think you'd have to provide the definition of intelligence you're using here. I'll provide mine here. I would define it as the capacity to construct and refine mental models of specific domains in order to make predictions about future states or outcomes within those contexts. It stems from identifying rules, patterns, and relationships that govern a particular system or environment. It's a combination of knowledge and pattern recognition that can be measured by predictive accuracy within a specific context.
Given that definition, I do not see why LLMs are fundamentally incapable of intelligent behavior. If a model is able to encode the rules of a particular domain then it is able to create an internal simulation of the system to make predictions about future states. And I think that's precisely what deep neural networks do, ad how our own brains operate. To be clear, I'm not suggesting that GPT is directly analogous to the way the brain encodes information, rather that they operate in the same fundamental fashion.
However, you don’t need to dump an absurd amount of resources into training an llm to test the viability of any of the incremental improvements that DeepSeek has made. You only do that if your goal is to compete with OpenAI and others for access to capital.
How do you define what's an absurd amount of resources, that seems kind of arbitrary to me. Furthermore, we also see that there are emergent phenomena that appear at certain scales. So, the exercise of building large models is useful to see what happens at those scales.
I would be much happier if the capital currently directed towards LLMs was redirected towards this type of work. Unfortunately, we’re forced to abide by the dictates of capitalism and so that won’t happen anytime soon.
I do think LLMs get disproportionate amount of attention, but eventually the hype will die down and people will start looking at other methods again. In fact, that's exactly what's already happening with stuff like neurosymbolic systems where deep neural networks are combined with symbolic logic. The GPT algorithm proved to be flexible and useful in many different contexts, so I don't have a problem with people spending the time to find what its limits are.
You're right that R1 does the tuning up front as opposed to dynamically, but I'd still consider that a layer on top of the base LLM.
Sure but the technology has honestly been a bit more evolutionary than revolutionary as far as I’m concerned. The biggest change was the amount of compute and data used to train these models. That only really happened because it seems capital had nowhere else to go and not because LLMs are uniquely promising.
I'm not suggesting LLMs are uniquely promising, it's just the approach that's currently popular and we don't know how far we can push it yet. What's appealing about GPT architecture is that it appears to be fairly general and adaptable in many domains. However, I do think it will end up being combined with other approaches going forward. We're already seeing that happening with stuff like neurosymbolic architecture.
My main point is that the limitations of the approach that people keep fixating on don't appear to be inherent in the way the algorithm works, they're just an artifact of people still figuring out how to apply this algorithm in an efficient way. The fact that massive improvements have already been found suggests that there's probably a while yet before we run out of ideas.
Sure, but how exactly are the companies investing in “AI” going to make it work? To me it just seems like they’re dumping resources into a dead end because they have no other path forward. Tech companies have been promising a new Industrial Revolution since their inception. However, even their “AI” products have yet to have a meaningful impact on worker productivity. It’s worth interrogating why that is.
I don't really care about AI companies myself. I want to see open source projects like DeepSeek and ultimately state level funding which we'll likely see happening in China. It's also a fallacy to extrapolate from the fact that something hasn't happened that it won't happen. Companies often hype and overpromise, but that doesn't mean that the goals themselves aren't achievable.
As I stated before, I think they all fundamentally misunderstand how human cognition works, perhaps willfully. That’s why I’m confident tech companies as they exist will not deliver on the promise of “AGI”, a lovely marketing term created to make up for the fact that their “AIs” are not very intelligent.
Again, I agree that companies like OpenAI are largely hype driven. However, some people do make a genuine effort to understand how human cognition works. For example, Jeff Hawkins did a good effort exploring this topic with his On Intelligence book. The impression I get with DeepSeek is that their goal is to largely do research for the sake of research, and they've actually stated that they're not looking for commercial application as their primary goal right now. I think that exploration for the sake of exploration is the correct view to have here.
I got deepseek-r1:14b-qwen-distill-fp16 running locally with 32gb ram and a GPU, but yeah you do need a fairly beefy machine to run even medium sized models.
seems like I live rent free in the heads of lemmy libs :)
It uses a mix, it does not hallucinate words and it's actually pretty good at catching stuff like grammar mistakes. I find the big value is that it forces you to do free form conversation where you have to think on your feet. I find this is more valuable than just reading and memorizing stuff which other apps do. I ended up getting the Plus plan, and definitely feel it's been worth it. The app itself also has a lot of regular lessons, the AI isn't the only part of it.
You can tell it was written by a lib.
The way to look at models like R1 is as layers on top of the LLM architecture. We've basically hit a limit of what generative models can do on their own, and now research is branching out in new directions to supplement what the GPT architecture is good at doing.
The potential here is that these kinds of systems will be able to do tasks that fundamentally could not be automated previously. Given that, I think it's odd to say that the utility is not commensurate with the effort being invested into pursuing this goal. Making this work would effectively be a new industrial revolution. The reality is that we don't actually know what's possible, but the rate of progress so far has been absolutely stunning.
I think you've basically nailed it.