Stanisław Szymczyk's picture

23 2

Stanisław Szymczyk

sszymczyk

·

AI & ML interests

None yet

Recent Activity

new activity about 19 hours ago

perplexity-ai/r1-1776:This model performs worse in complex problems compared to the DeepSeek R1

new activity about 1 month ago

MiniMaxAI/MiniMax-Text-01:Requesting Support for GGUF Quantization of MiniMax-Text-01 through llama.cpp

new activity about 1 month ago

MiniMaxAI/MiniMax-Text-01:In modeling_minimax_text_01.py attention mask is not passed correctly to MiniMaxText01FlashAttention2::forward() method

View all activity

Organizations

None yet

sszymczyk's activity

New activity in perplexity-ai/r1-1776 about 19 hours ago

This model performs worse in complex problems compared to the DeepSeek R1

#254 opened 2 days ago by

New activity in MiniMaxAI/MiniMax-Text-01 about 1 month ago

Requesting Support for GGUF Quantization of MiniMax-Text-01 through llama.cpp

#1 opened about 2 months ago by

Doctor-Chad-PhD

In modeling_minimax_text_01.py attention mask is not passed correctly to MiniMaxText01FlashAttention2::forward() method

#13 opened about 1 month ago by

New activity in RUC-AIBOX/Virgo-72B about 2 months ago

Missing tokenizer.json and tokenizer_config.json files

#2 opened about 2 months ago by

Please add the "tokenizer.model" file

#3 opened about 2 months ago by

New activity in deepseek-ai/DeepSeek-V3 2 months ago

CUDA out of memory error during fp8 to bf16 model conversion + fix

#17 opened 2 months ago by

New activity in Qwen/QwQ-32B-preview 3 months ago

Hardware Requirements

#1 opened 3 months ago by

New activity in Qwen/QwQ-32B-Preview 3 months ago

Problem with specific output format

#15 opened 3 months ago by

New activity in allenai/ZebraLogic 3 months ago

Please test the QwQ-32B-Preview model

#3 opened 3 months ago by

New activity in AIDC-AI/Marco-o1 3 months ago

Can you provide code for inference with MCTS?

#3 opened 3 months ago by

New activity in allenai/Llama-3.1-Tulu-3-70B 3 months ago

Reason behind not using special tokens in the prompt format?

#2 opened 3 months ago by

New activity in mistralai/Mistral-Large-Instruct-2411 3 months ago

The curse of the Consolidated Safetensors strikes again...

#4 opened 3 months ago by

New activity in meta-llama/Llama-3.1-8B-Instruct 7 months ago

What call() function parameters besides "query" can be used by the model when doing brave_search and wolfram_alpha tool calls?

#89 opened 7 months ago by

What form of the built-in brave_search and wolfram_alpha tool call output is expected by the model?

#88 opened 7 months ago by

The model often enters infinite generation loops

#32 opened 7 months ago by

New activity in nvidia/Nemotron-4-340B-Instruct 8 months ago

Gguf

#5 opened 8 months ago by

New activity in google-t5/t5-3b 8 months ago

Translation to German doesn't work in 3B model

#8 opened 8 months ago by

New activity in deepseek-ai/DeepSeek-V2 9 months ago

Calculation of _mscale during YARN RoPE scaling

#4 opened 10 months ago by

New activity in Snowflake/snowflake-arctic-instruct 10 months ago

Wrong BOS and EOS tokens in tokenizer.model file

#12 opened 10 months ago by

Confusing ArcticDecoderLayer::forward() implementation

#11 opened 10 months ago by