GGUF quants are already up and llama.cpp was updated today to support it.

  • brucethemoose@lemmy.world
    cake
    link
    fedilink
    English
    arrow-up
    1
    ·
    5 days ago

    I tested these out and found they are really bad at longer context… at least in settings that can sanely fit on most GPUs.

    Seems the Gemma family is mostly for short-context work, still.