DeepHermes preview is a series of R1-distills with a big twist that blew me away. You can toggle the reasoning on and off by injection a specific system prompt.
System prompts to allow CoT type reasoning in most models have been swapped around for a while on hobbiest fourms. But they tended to be quite large taking up valuable context space. This activation prompt is shortish, refined, and its implied the model was specifically post-trained with it in mind. I would love to read the technical paper behind what they did different.
You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.
Ive been playing around with R1 CoT models a few months now. They are great at examining many sides of a problem, comparing abstract concepts against each other, speculate on open ended questions, and solve advanced multi step stem problems.
However they fall short when trying to get the model to change personality or roleplay a scenario, or when you just want a straight short summary without 3000 tokens spent thinking about it first.
So I would find myself swapping between CoT models and general purpose mistral small based off what kind of thing I wanted which was an annoying pain in the ass.
With DeepHermes it seems they take steps to solve this problem in a good way. Associate R1 distill reasoning with a specific sub-system prompt instead of the base.
Unfortunately constantly editing the system prompt is annoying. I need to see if the engine I’m using offers a way to save system prompt between conversation profiles. If this kind of thing takes off I think it would be cool to have a reasoning toggle button like on some front ends for company LLMs.
Agree. I also shift between them. As the bare minimum, I use a thinking model to ‘open up’ the conversation, and then often continue with a normal model, but it certainly depends on the topic.
Long ago we got ‘routellm’ I think, that routed a request depended on its content, but the concept never got traction for some reason. Now it seems that closedai and other big names are putting some attention to it. Great to see DeepHermes and other open players be in front of the pack.
I don’t think it will take long before we have the agentic framework do the activation of different ‘modes’ of thinking dependent on content/context, goals etc. It would be great if a model can be triggered into several modes in a standard way.
I think the idea of calling multiple different kinds of ways to for llms to ‘process’ a given input in a standard way is promising.
I feel that after reasoning we will train models how to think emotionally in a more intricate way. By combining reasoning with a more advanced sense of individuality and greater emotions simulation we may get a little closer to finding a breakthrough.
That’s pretty cool. I’ve tried a few of the distills, but I’ve mostly gone back to regular models.
How does it compare to regular deepseek distills though?
DeepHermes 24B CoT thought patterns feels about on par with the official R1 distill Ive tried. Its important to note though my experience is limited to the deepseek r1 NeMo 12B distill as thats what fit nice and fast on my card.
All the r1 distill thought process internal monolouge humanisms “let me write that down” “if I remember correctly” “oh, but wait that doesnt sound right lets try again” are there. the multiple 'but wait, what if’s" before ending the thought to examine multiple sides are there too. It spends about 2-5k tokens thinking. It tends to stay on track and catch minor mistakes or hallucinations.
Compared to the unofficial mistral-24b distills this is top tier for sure. I think its toe to toe with ComputationDolphins 24B R1 distill, and its just a preview.