QWEN CHAT Hugging Face ModelScope DEMO DISCORD
Scaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models. For instance, DeepSeek R1 has achieved state-of-the-art performance by integrating cold-start data and multi-stage training, enabling deep thinking and complex reasoning.
Our research explores the scalability of Reinforcement Learning (RL) and its impact on enhancing the intelligence of large language models.
Settings don’t really matter if you’re comparing two models side by side, and yes I see the same indecision in your reply.
.model ollama:qwq:32b
> Which is larger, 9.21or9.1999999999?
<think>
Okay, let's see... I need to figure out which number is bigger between 9.21 and 9.1999999999. Hmm, both numbers start with 9. So the first digit after the decimal point is important here.
Starting with the first number, 9.21. That's straightforward—it's nine and twenty-one hundredths. The digits after the decimal are 2 and 1.
The second number is9.1999999999. This looks like it's nine point one followed by a bunch of 9s. Let me write them out to compare digit by digit.Letme line them up:
9.21000000009.1999999999
Wait, right? Because 9.21is the same as9.2100000000if we add more decimal places. So now comparing each digit after the decimal:
First digit after the decimal: both have 2 vs. 1. Wait no—the first number's first decimal digit is 2, and the second one's first decimal digit is 1? Wait wait, hold on.
Wait, actually, letme check again. The first number is9.21. So breaking it down:
- The integer part is9.
- Decimal parts: tenths place is2, hundredths is1, andthen the rest are zeros if we go further.
The second number is9.1999999999. Let's see:
- Integer part also 9.
- Tenths place: 1 (since it's .1...), then the next digits are all 9s up to the 10th decimal place?
Wait, so tenths digit of first number is2 vs. second one's tenths digit is 1? That already would make the first number larger because 2 is bigger than 1 in the tenths place.
But maybe I'm miscounting the places? Let me confirm.For9.21: The first decimal digit (tenths) is2, second (hundredths) is1, then zeros beyond.
For9.1999999999: Tenths place is1, hundredths is9, andtheneach subsequent digit up to the 10th decimalis also 9.
So when comparing the two numbers:
At the tenths place:
- 9.21 has a 2
- 9.199... has a 1
Since 2 > 1in the tenths place, that means 9.21is larger right there. The digits after that don't even matter because the first differing digit (the tenths) already determines it.
But maybe I should think if there's another way. Suppose someone might confuse the number of decimal places? Like thinking that more 9s make it closer to 9.2? Wait, but 0.1999999999 is actually equal to 0.2 when rounded to one decimal place because 0.1999999999... recurring would be exactly 0.2, but here the second number has finite decimals—it's not an infinite series.
Wait a second—wait, 0.1999999999is just under 0.2, right? Because if you have infinitely many 9s after 0.1..., then it equals exactly 0.2. But in this case, the number given is9.1999999999with only ten decimal places (assuming that's how it's written). Let me count the digits after the decimal for the second number:
After the decimal, starting from first digit:
1 (tenths), then nine more 9s? Let's see: 9.1 followed by 9, so total of 10 decimal places?
Yes, "9.1999999999" has 10 digits after the decimal. So that's tenths digit is 1, and then nine more 9s in hundredths to ten-billionths or whatever.
Therefore, comparing with9.21 (which is9.2100000000), the first difference is at the tenths place: 2 vs. 1. Therefore 9.21is bigger.
Alternatively, if someone thought maybe converting both to the same number ofdecimal places? Let's see:9.21 can be written as9.2100000000
Comparing term by term:
First digit after .: 2 vs 1 → 2is bigger. So yeah, definitely.
Alternatively, subtract them to see the difference?
Calculate 9.21 - 9.1999999999 = ?
Let's do that subtraction:9.2100000000
-9.1999999999
= 0.0100000001
Which is a positive number, so yes, 9.21is larger by approximately 0.01 (exactly 0.0100000001).
Therefore the answer should be that 9.21is larger.
</think>
The number 9.21is larger than 9.1999999999.
**Step-by-Step Explanation:**
1. **Compare the integer parts:** Both numbers have an integer part of9, so move to the decimal parts.
2. **First decimal place (tenths):**
- 9.21 has a tenths digit of **2**.
- 9.1999999999 has a tenths digit of **1**.
Since \(2 > 1\), the number 9.21is already larger at this point. The remaining digits donot need to be compared further.
3. **Verification via subtraction:**
- Subtracting \(9.1999999999\) from \(9.21\) gives:
\[
9.2100000000 - 9.1999999999 = 0.0100000001
\]
The positive result confirms that \(9.21\) is indeed larger.
**Answer:** 9.21is larger than 9.1999999999.
> .copy
> .info
model ollama:qwq:32b
temperature 0.7
top_p 0.9
use_tools null
max_output_tokens null
save_session null
compress_threshold 4000
rag_reranker_model null
rag_top_k 5
dry_run false
function_calling true
stream true
save true
keybindings vi
wrap auto
wrap_code false
highlight true
light_theme false
Ultimately what matters is whether it gets the correct answer or not. It’s interesting that yours wasn’t able to do the strawberry test while mine did it with very short thinking cycle.
Ultimately what matters is whether it gets the correct answer or not.
That’s… not true at all. It had the right answer, to most of the questions I asked it, just as fast as R1, and yet it kept saying “but wait! maybe I’m wrong”. It’s a huge red flag when the CoT is just trying to 1000 monkeys a problem.
While it did manage to complete the strawberry problem when I adjusted the top_p/top_k, I was using the previous values with other models I’ve tested and never had a CoT go that off kilter before. And this is considering even the 7B Deepseek model was able to get the correct answer for 1/4 of the vram.
It’s true for me. I generally don’t read through the think part. I make the query, do something else, and then come back to see what the actual output it. Overall, I find it gives me way better answers than I got with the version of R1 I was able to get running locally. Turns out the settings do matter though.
Settings don’t really matter if you’re comparing two models side by side, and yes I see the same indecision in your reply.
.model ollama:qwq:32b > Which is larger, 9.21 or 9.1999999999? <think> Okay, let's see... I need to figure out which number is bigger between 9.21 and 9.1999999999. Hmm, both numbers start with 9. So the first digit after the decimal point is important here. Starting with the first number, 9.21. That's straightforward—it's nine and twenty-one hundredths. The digits after the decimal are 2 and 1. The second number is 9.1999999999. This looks like it's nine point one followed by a bunch of 9s. Let me write them out to compare digit by digit. Let me line them up: 9.2100000000 9.1999999999 Wait, right? Because 9.21 is the same as 9.2100000000 if we add more decimal places. So now comparing each digit after the decimal: First digit after the decimal: both have 2 vs. 1. Wait no—the first number's first decimal digit is 2, and the second one's first decimal digit is 1? Wait wait, hold on. Wait, actually, let me check again. The first number is 9.21. So breaking it down: - The integer part is 9. - Decimal parts: tenths place is 2, hundredths is 1, and then the rest are zeros if we go further. The second number is 9.1999999999. Let's see: - Integer part also 9. - Tenths place: 1 (since it's .1...), then the next digits are all 9s up to the 10th decimal place? Wait, so tenths digit of first number is 2 vs. second one's tenths digit is 1? That already would make the first number larger because 2 is bigger than 1 in the tenths place. But maybe I'm miscounting the places? Let me confirm. For 9.21: The first decimal digit (tenths) is 2, second (hundredths) is 1, then zeros beyond. For 9.1999999999: Tenths place is 1, hundredths is 9, and then each subsequent digit up to the 10th decimal is also 9. So when comparing the two numbers: At the tenths place: - 9.21 has a 2 - 9.199... has a 1 Since 2 > 1 in the tenths place, that means 9.21 is larger right there. The digits after that don't even matter because the first differing digit (the tenths) already determines it. But maybe I should think if there's another way. Suppose someone might confuse the number of decimal places? Like thinking that more 9s make it closer to 9.2? Wait, but 0.1999999999 is actually equal to 0.2 when rounded to one decimal place because 0.1999999999... recurring would be exactly 0.2, but here the second number has finite decimals—it's not an infinite series. Wait a second—wait, 0.1999999999 is just under 0.2, right? Because if you have infinitely many 9s after 0.1..., then it equals exactly 0.2. But in this case, the number given is 9.1999999999 with only ten decimal places (assuming that's how it's written). Let me count the digits after the decimal for the second number: After the decimal, starting from first digit: 1 (tenths), then nine more 9s? Let's see: 9.1 followed by 9, so total of 10 decimal places? Yes, "9.1999999999" has 10 digits after the decimal. So that's tenths digit is 1, and then nine more 9s in hundredths to ten-billionths or whatever. Therefore, comparing with 9.21 (which is 9.2100000000), the first difference is at the tenths place: 2 vs. 1. Therefore 9.21 is bigger. Alternatively, if someone thought maybe converting both to the same number of decimal places? Let's see: 9.21 can be written as 9.2100000000 Comparing term by term: First digit after .: 2 vs 1 → 2 is bigger. So yeah, definitely. Alternatively, subtract them to see the difference? Calculate 9.21 - 9.1999999999 = ? Let's do that subtraction: 9.2100000000 -9.1999999999 = 0.0100000001 Which is a positive number, so yes, 9.21 is larger by approximately 0.01 (exactly 0.0100000001). Therefore the answer should be that 9.21 is larger. </think> The number 9.21 is larger than 9.1999999999. **Step-by-Step Explanation:** 1. **Compare the integer parts:** Both numbers have an integer part of 9, so move to the decimal parts. 2. **First decimal place (tenths):** - 9.21 has a tenths digit of **2**. - 9.1999999999 has a tenths digit of **1**. Since \(2 > 1\), the number 9.21 is already larger at this point. The remaining digits do not need to be compared further. 3. **Verification via subtraction:** - Subtracting \(9.1999999999\) from \(9.21\) gives: \[ 9.2100000000 - 9.1999999999 = 0.0100000001 \] The positive result confirms that \(9.21\) is indeed larger. **Answer:** 9.21 is larger than 9.1999999999. > .copy > .info model ollama:qwq:32b temperature 0.7 top_p 0.9 use_tools null max_output_tokens null save_session null compress_threshold 4000 rag_reranker_model null rag_top_k 5 dry_run false function_calling true stream true save true keybindings vi wrap auto wrap_code false highlight true light_theme false
Ultimately what matters is whether it gets the correct answer or not. It’s interesting that yours wasn’t able to do the strawberry test while mine did it with very short thinking cycle.
That’s… not true at all. It had the right answer, to most of the questions I asked it, just as fast as R1, and yet it kept saying “but wait! maybe I’m wrong”. It’s a huge red flag when the CoT is just trying to 1000 monkeys a problem.
While it did manage to complete the strawberry problem when I adjusted the top_p/top_k, I was using the previous values with other models I’ve tested and never had a CoT go that off kilter before. And this is considering even the 7B Deepseek model was able to get the correct answer for 1/4 of the vram.
It’s true for me. I generally don’t read through the think part. I make the query, do something else, and then come back to see what the actual output it. Overall, I find it gives me way better answers than I got with the version of R1 I was able to get running locally. Turns out the settings do matter though.