@Audalin

Audalin@lemmy.world · 7 days ago

What I’ve ultimately converged to without any rigorous testing is:

using Q6 if it fits in VRAM+RAM (anything higher is a waste of memory and compute for barely any gain), otherwise either some small quant (rarely) or ignoring the model altogether;
not really using IQ quants - afair they depend on a dataset and I don’t want the model’s behaviour to be affected by some additional dataset;
other than the Q6 thing, in any trade-offs between speed and quality I choose quality - my usage volumes are low and I’d better wait for a good result;
I load as much as I can into VRAM, leaving 1-3GB for the system and context.