Sims

Sims@lemmy.ml · 4 days ago

Agree. I also shift between them. As the bare minimum, I use a thinking model to ‘open up’ the conversation, and then often continue with a normal model, but it certainly depends on the topic.

Long ago we got ‘routellm’ I think, that routed a request depended on its content, but the concept never got traction for some reason. Now it seems that closedai and other big names are putting some attention to it. Great to see DeepHermes and other open players be in front of the pack.

I don’t think it will take long before we have the agentic framework do the activation of different ‘modes’ of thinking dependent on content/context, goals etc. It would be great if a model can be triggered into several modes in a standard way.

Sims@lemmy.ml · 5 days ago

You can argue that a 4090 is more of a ‘flagship’ model on the consumer market, but it could be just a typing error, and then you miss the point and the knowledge you could have learned:

“Their system, FlightVGM, recorded a 30 per cent performance boost and had an energy efficiency that was 4½ times greater than Nvidia’s flagship RTX 3090 GPU – all while running on the widely available V80 FPGA chip from Advanced Micro Devices (AMD), another leading US semiconductor firm.”

So they have found a way to use a ‘off-the-shelf’ FPGA and are using it for video inference, and to me it looks like it could match a 4090(?), but who cares. With this upgrade, these standard Fpga’s are cheaper(running 24/7)/better than any consumer Nvidia GPU up to at least 3090/4090.

And here from the paper:

"[problem] …sparse VGMs [video generating models] cannot fully exploit the effective throughput (i.e., TOPS) of GPUs. FPGAs are good candidates for accelerating sparse deep learning models. However, existing FPGA accelerators still face low throughput ( < 2TOPS) on VGMs due to the significant gap in peak computing performance (PCP) with GPUs ( > 21× ).

[solution] …we propose FlightVGM, the first FPGA accelerator for efficient VGM inference with activation sparsification and hybrid precision. […] Implemented on the AMD V80 FPGA, FlightVGM surpasses NVIDIA 3090 GPU by 1.30× in performance and 4.49× in energy efficiency on various sparse VGM workloads."

You’ll have to look up what that means yourself, but expect a throng of bitcrap miner cards to be converted to VLM accelerators, and maybe give new life for older/smaller/cheaper fpga’s ?

Sims@lemmy.ml · 5 days ago

Pretty cool with China’s focus on efficiency in the AI stack. DeepSeek was the first eye-opener for how to re-think efficiency, but it appears to happen on all levels of the stack.

Fyi: article is paywalled, so block javascript on page with ublock…