I'm not entirely sure how I need to effectively use these models, I guess. I tried some basic coding prompts, and the results were very bad. Using R1 Distill Qwen 32B, 4-bit quant.
The first answer had incorrect, non-runnable syntax. I was able to get it to fix that after multiple followup prompts, but I was NOT able to get it to fix the bugs. It took several minutes of thinking time for each prompt, and gave me worse answers than the stock Qwen model.
For comparison, GPT 4o and Claude Sonnet 3.5 gave me code that would at least run on the first shot. 4o's was even functional in one shot (Sonnet's was close but had bugs). And that took just a few seconds instead of 10+ minutes.
Looking over its chain of thought, it seems to get caught in circles, just stating the same points again and again.
Not sure exactly what the use case is for this. For coding, it seems worse than useless.