HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading Paper ⢠2502.12574 ⢠Published Feb 18 ⢠11
Dolphin 3.0 Collection Dolphin 3.0 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model. ⢠9 items ⢠Updated Feb 7 ⢠145