Uncategorized

vLLM

February 6, 2022

vLLM – virtual Large Language Model (LLM). The vLLM technology was developed at UC Berkeley as “an open source library for fast LLM inference and serving” and is now an open source project. According to Red Hat, it “is an inference server that speeds up the output of generative AI applications by making better use of the GPU memory.”

Red Hat says: “Essentially, vLLM works as a set of instructions that encourage the KV (KeyValue) cache to create shortcuts by continuously ‘batching’ user responses.” The KV cache is a “short-term memory of an LLM [which] shrinks and grows during throughput.”

vLLM

ABOUT US

FOLLOW US

Rubrik touts new cyber-resilience features

PEAK:AIO AI Data Server peaks at 120 GBps

Supermicro says Fibre Channel is on the way out, but SAS lives on and...