https://huggingface.co/google/gemma-3n-E4B

#1158

by mrmage - opened 5 days ago

Discussion

mrmage

5 days ago

https://huggingface.co/google/gemma-3n-E4B

Would love an i1-Q4_K_M version of this non-instruction-tuned variant of Gemma-3n.

nicoboss

5 days ago

It's queued! :D

You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#gemma-3n-E4B-GGUF for quants to appear.

nicoboss

5 days ago

•

edited 5 days ago

[ 493/ 847]                 blk.20.attn_k.weight - [ 2048,   512,     1,     1], type =    f16,
====== llama_model_quantize_impl: did not find weights for blk.20.attn_k.weight


============================================================
Missing importance matrix for tensor blk.20.attn_k.weight in a very low-bit quantization
The result will be garbage, so bailing out
============================================================

llama_model_quantize: failed to quantize: Missing importance matrix for tensor blk.20.attn_k.weight in a very low-bit quantization
main: failed to quantize model from './gemma-3n-E4B.gguf'
job finished, status 47
job-done<0 gemma-3n-E4B imatrix 47>

https://huggingface.co/google/gemma-3n-E4B

Strange I see no missing experts in the imatrix log:

system_info: n_threads = 1 (n_threads_batch = 1) / 54 | CUDA : ARCHS = 890 | FORCE_MMQ = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
compute_imatrix: tokenizing the input ..
compute_imatrix: tokenization took 350.925 ms
compute_imatrix: computing over 320 chunks with batch_size 512
compute_imatrix: 1.49 seconds per pass - ETA 7.97 minutes
[1]5.3642,[2]3.9367,[3]3.7815,[4]4.7199,[5]4.9072,[6]4.2094,[7]4.6123,[8]4.8544,[9]5.0650,[10]4.6340,[11]4.8082,[12]5.2022,[13]5.6028,[14]5.7827,[15]6.0963,[16]6.3305,[17]6.4290,[18]6.6484,[19]6.3537,[20]6.4198,[21]6.5066,[22]6.4400,[23]6.5202,[24]6.6168,[25]6.7711,[26]6.5787,[27]6.7394,[28]6.9142,[29]6.9916,[30]6.9475,[31]6.5695,[32]6.3636,[33]6.2266,[34]6.1206,[35]6.0391,[36]5.9990,[37]6.0331,[38]6.1135,[39]6.2312,[40]6.3615,[41]6.4274,[42]6.6757,[43]6.8743,[44]7.0283,[45]7.1703,[46]7.0843,[47]7.0186,[48]7.1532,[49]7.2629,[50]7.2178,[51]7.1953,[52]7.2486,[53]7.3552,[54]7.4962,[55]7.5788,[56]7.6088,[57]7.6219,[58]7.6372,[59]7.5758,[60]7.5559,[61]7.4913,[62]7.4260,[63]7.4597,[64]7.4699,[65]7.3931,[66]7.3690,[67]7.3805,[68]7.3459,[69]7.3043,[70]7.3237,[71]7.3459,[72]7.2992,[73]7.3204,[74]7.3145,[75]7.3042,[76]7.2975,[77]7.2535,[78]7.2029,[79]7.1595,[80]7.1783,[81]7.2356,[82]7.2288,[83]7.2172,[84]7.2024,[85]7.2461,[86]7.1490,[87]7.1274,[88]7.1181,[89]7.1208,[90]7.1412,[91]7.1338,[92]7.0825,[93]7.0242,[94]6.9704,[95]6.9097,[96]6.8643,[97]6.8106,[98]6.7598,[99]6.7247,[100]6.7364,[101]6.7453,[102]6.8241,[103]6.8873,[104]6.9536,[105]7.0543,[106]7.1425,[107]7.1638,[108]7.1838,[109]7.1986,[110]7.1848,[111]7.1733,[112]7.1171,[113]7.0432,[114]7.0769,[115]7.0946,[116]7.0933,[117]7.0921,[118]7.1219,[119]7.1455,[120]7.1567,[121]7.1677,[122]7.1673,[123]7.1361,[124]7.1919,[125]7.2387,[126]7.2767,[127]7.3409,[128]7.3970,[129]7.4472,[130]7.4727,[131]7.5578,[132]7.6033,[133]7.6745,[134]7.7371,[135]7.7921,[136]7.8647,[137]7.9299,[138]8.0200,[139]8.0870,[140]8.1586,[141]8.2152,[142]8.2846,[143]8.3451,[144]8.3989,[145]8.4576,[146]8.5193,[147]8.5571,[148]8.5993,[149]8.6674,[150]8.7484,[151]8.7910,[152]8.8417,[153]8.8842,[154]8.9371,[155]8.9940,[156]9.0272,[157]9.0676,[158]9.1040,[159]9.1516,[160]9.2093,[161]9.2272,[162]9.2744,[163]9.3060,[164]9.3461,[165]9.3878,[166]9.4307,[167]9.4451,[168]9.4884,[169]9.5446,[170]9.5699,[171]9.6016,[172]9.6327,[173]9.6669,[174]9.7100,[175]9.7173,[176]9.7389,[177]9.7827,[178]9.8235,[179]9.8485,[180]9.8785,[181]9.8968,[182]9.9333,[183]9.9554,[184]10.0068,[185]10.0082,[186]10.0430,[187]10.0798,[188]10.0989,[189]10.1417,[190]10.1643,[191]10.1911,[192]10.2015,[193]10.2229,[194]10.2506,[195]10.2916,[196]10.3233,[197]10.3478,[198]10.3471,[199]10.3654,[200]10.4094,[201]10.4435,[202]10.4736,[203]10.5044,[204]10.5395,[205]10.5602,[206]10.5882,[207]10.6095,[208]10.6563,[209]10.6783,[210]10.6986,[211]10.7168,[212]10.7390,[213]10.7491,[214]10.7800,[215]10.8016,[216]10.8337,[217]10.8346,[218]10.8490,[219]10.8707,[220]10.8900,[221]10.9255,[222]10.9510,[223]10.9755,[224]11.0175,[225]11.0280,[226]11.0478,[227]11.0636,[228]11.1042,[229]11.1236,[230]11.1522,[231]11.1797,[232]11.2015,[233]11.2359,[234]11.2710,[235]11.2905,[236]11.3011,[237]11.3266,[238]11.3412,[239]11.3664,[240]11.3911,[241]11.4036,[242]11.4173,[243]11.4503,[244]11.4711,[245]11.4932,[246]11.5230,[247]11.5578,[248]11.5903,[249]11.6170,[250]11.6423,[251]11.6718,[252]11.6917,[253]11.7197,[254]11.7352,[255]11.7496,[256]11.7963,[257]11.8130,[258]11.8225,[259]11.8541,[260]11.8827,[261]11.8990,[262]11.9052,[263]11.9161,[264]11.9338,[265]11.9590,[266]11.9881,[267]12.0249,[268]12.0335,[269]12.0497,[270]12.0649,[271]12.0849,[272]12.1143,[273]12.1342,[274]12.1446,[275]12.1599,[276]12.1863,[277]12.2177,[278]12.2406,[279]12.2651,[280]12.2922,[281]12.2978,[282]12.3089,[283]12.3174,[284]12.3238,[285]12.3360,[286]12.3443,[287]12.3788,[288]12.3871,[289]12.4167,[290]12.4411,[291]12.4539,[292]12.4685,[293]12.4858,[294]12.5050,[295]12.5312,[296]12.5501,[297]12.5557,[298]12.5764,[299]12.5936,[300]12.6249,[301]12.6359,[302]12.6512,[303]12.6746,[304]12.6934,[305]12.7047,[306]12.7221,[307]12.7388,[308]12.7463,[309]12.7614,[310]12.7788,[311]12.7931,[312]12.8160,[313]12.8326,[314]12.8523,[315]12.8573,[316]12.8624,[317]12.8845,[318]12.8917,[319]12.9129,[320]12.9291,
Final estimate: PPL = 12.9291 +/- 0.11929

llama_perf_context_print:        load time =    5086.35 ms
llama_perf_context_print: prompt eval time =  251353.06 ms / 163840 tokens (    1.53 ms per token,   651.83 tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =  267277.33 ms / 163841 tokens

nicoboss

5 days ago

This seam to be a very common issue for all gemma-3n based models currently in the queue:

====== llama_model_quantize_impl: did not find weights for blk.20.attn_k.weight


============================================================
Missing importance matrix for tensor blk.20.attn_k.weight in a very low-bit quantization
The result will be garbage, so bailing out
============================================================

llama_model_quantize: failed to quantize: Missing importance matrix for tensor blk.20.attn_k.weight in a very low-bit quantization
main: failed to quantize model from './Huihui-gemma-3n-E4B-it-abliterated.gguf'
job finished, status 47
job-done<0 Huihui-gemma-3n-E4B-it-abliterated imatrix 47>

https://huggingface.co/huihui-ai/Huihui-gemma-3n-E4B-it-abliterated

====== llama_model_quantize_impl: did not find weights for blk.20.attn_k.weight


============================================================
Missing importance matrix for tensor blk.20.attn_k.weight in a very low-bit quantization
The result will be garbage, so bailing out
============================================================

llama_model_quantize: failed to quantize: Missing importance matrix for tensor blk.20.attn_k.weight in a very low-bit quantization
main: failed to quantize model from './Huihui-gemma-3n-E4B-it-abliterated.gguf'
job finished, status 47
job-done<0 Huihui-gemma-3n-E4B-it-abliterated imatrix 47>

https://huggingface.co/huihui-ai/Huihui-gemma-3n-E4B-it-abliterated

====== llama_model_quantize_impl: did not find weights for blk.20.attn_k.weight


============================================================
Missing importance matrix for tensor blk.20.attn_k.weight in a very low-bit quantization
The result will be garbage, so bailing out
============================================================

llama_model_quantize: failed to quantize: Missing importance matrix for tensor blk.20.attn_k.weight in a very low-bit quantization
main: failed to quantize model from './Orin-Instruct-Alpaca-JP.gguf'
job finished, status 47
job-done<0 Orin-Instruct-Alpaca-JP imatrix 47>

https://huggingface.co/MakiAi/Orin-Instruct-Alpaca-JP

nicoboss

5 days ago

@mradermacher Please skip all of the low bit per wight quants for thouse. They are so small nobody really needs them anyways. We could requant them if llama.cpp fixes this by the time mmproj extraction support got added. Actually by then probably the immatrix.gguf branch is marged anyways in which case this should all be a problem of the past.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment