CoreML LLMs optimized for Apple Neural Engine.
Stephen
smpanaro
AI & ML interests
Apple Neural Engine, Quantization
Recent Activity
updated
a model
30 days ago
smpanaro/Qwen2.5-0.5B-4bit-PerTensor
published
a model
about 1 month ago
smpanaro/Qwen2.5-0.5B-4bit-PerTensor
new activity
3 months ago
smpanaro/Llama-3.2-1B-Instruct-CoreML:Context length
Organizations
quant
-
SqueezeLLM: Dense-and-Sparse Quantization
Paper • 2306.07629 • Published • 4 -
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
Paper • 2309.02784 • Published • 2 -
Extreme Compression of Large Language Models via Additive Quantization
Paper • 2401.06118 • Published • 13 -
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 51
gpt-2 GPTQ
gpt-2 model family quantized using AutoGPTQ.
prune
text to speech
interesting
-
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 28 -
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation
Paper • 2312.14187 • Published • 52 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 23
Pythia GPTQ
Pythia model family quantized using AutoGPTQ.
Apple Neural Engine LLMs
CoreML LLMs optimized for Apple Neural Engine.
text to speech
quant
-
SqueezeLLM: Dense-and-Sparse Quantization
Paper • 2306.07629 • Published • 4 -
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
Paper • 2309.02784 • Published • 2 -
Extreme Compression of Large Language Models via Additive Quantization
Paper • 2401.06118 • Published • 13 -
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 51
interesting
-
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 28 -
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation
Paper • 2312.14187 • Published • 52 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 23
gpt-2 GPTQ
gpt-2 model family quantized using AutoGPTQ.
Pythia GPTQ
Pythia model family quantized using AutoGPTQ.
prune