Weird responses

#10
by vparth7 - opened

Hey guys,
I did try to inference the model and the responses are weird!

i/p: Who is the first president of USA?

o/p:
Who is the first prime minister of USA?
Who is the next prime minister of USA?
What is the name of the politician from UK?
Who is the first prime minister of USA?
Which is the name of the current Prime Minister of USA?
What is the name of the current Prime Minister of USA?

and so on....

Aren't these models used for text generation like this or used for classification etc purposes...

@vparth7 If you haven't, try the instruct version instead (https://huggingface.co/google/gemma-3-270m-it). Although it's still only a 0.27b LLM so these kinds of outputs will still be common.

@phil111 Hey man thanks for the suggestion, so such smaller models could be used when finetuned over some specific data or for classification etc.

@vparth7 Couldn't say. I only use instruct versions and never fine-tuned a base model for a specific task. But I hear people claiming that very small models like this one can become useful "when finetuned over some specific data or for classification", but after trying the instruct version, I don't see how.

I watched Youtube videos, read forums (https://news.ycombinator.com/item?id=44902148)... but didn't come across a single tangible example of how a tiny 0.27b parameter model can perform even the simplest specialized task with enough competence and reliability to do more good than harm. This model seems to be more about testing the limits of LLMs than providing real-world practical use.

Let me finetune a smaller llm over some task and lets see how it goes

Smaller models will never be useful in anything. Even bigger models still suck

Of the 4 quantized versions of this model that I tried, 3 produced the absurd outputs mentioned by the OP. Based on my understanding, even for a 270M model, errors of this magnitude should be impossible—at the very least, the output should maintain contextual coherence (some of my outputs contained many special symbols, or no text at all).

However, the 4th model I tried (lmstudio-community/gemma-3-270m-it-qat-GGUF) worked perfectly fine. This suggests that the issue is more likely an implementation error rather than a limitation of the model itself.

In my opinion, this is probably related to the issue mentioned by the Unsloth team here. It seems that GGUF quantization may have caused model degradation, while the non-quantized version also degraded because it was running on FP16 hardware.

Ultimately, I personally think this is a theoretically excellent model—small enough in size and sufficiently knowledgeable—making it suitable for enthusiasts to accomplish tasks that are impractical on larger models and too labor-intensive with custom architectures. However, in practice, the issues disclosed by the Unsloth team mentioned above always prevented me from smoothly training or inferencing with it. It would degrade easily if not handled carefully. Additionally, since the model's architecture code is written in JAX and there is no PyTorch version available, I eventually gave up on it.

I hope this information can be helpful as a reference.

@lanesun Thanks for suggesting lmstudio-community/gemma-3-270m-it-qat-GGUF. However, it still reliably falls off the rails.

Edit: I tried it at https://huggingface.co/spaces/anakin87/gemma-3-270m-it and it performed notably better and is surprisingly good at instruction following for its size. So the consistently weird outputs are likely due to the GGUF conversion or something else.

For example, I tried "Make a simple list of 9 single word synonyms for extra." a couple times and got the following responses (neither would stop).

Response 1

extra
extra
extra
extra
extra
extra
extra
extra
extra

I will make sure to include some common and appropriate words.
Please provide the words you would like to use.
I will also consider the context of the situation when choosing the best synonym.
I am happy to help with any questions or requests you may have.
Please let me know if you have any other questions.
I will be mindful of the potential impact of my words on others.
I will ensure that the language used is respectful and avoids offensive terms.
I will try to avoid using overly complex or ambiguous language.
I will focus on clarity and accuracy in my responses.
I will strive to provide accurate and helpful information.
I will respect your preferences and boundaries.
I will be prepared to answer any questions you may have.
I will be mindful of the potential impact of my words on others.
I will ensure that the language used is respectful and avoids offensive terms.
I will try to avoid using overly complex or ambiguous language.
I will focus on clarity and accuracy in my responses.
I will strive to provide accurate and helpful information.
...

Response 2:

Extra
Extraordinary
Unusual
Impressive
Amazing
Remarkable
Exceptional
Outstanding
Wonderful
Fantastic
remarkable
exceptional
outstanding
extraordinary
unprecedented
remarkable
exceptional
outstanding
extraordinary
unprecedented
remarkable
exceptional
outstanding
extraordinary
unprecedented
remarkable
exceptional
outstanding
extraordinary
unprecedented
...

@phil111 I just forgot to note it that I used LMStudio to try the lmstudio-community/gemma-3-270m-it-qat-GGUF.

Here is my output for "Make a simple list of 9 single word synonyms for extra.":

Here are nine single-word synonyms for "extra":

1.  **Extra:** A general term for something that is not present or included in the original context.
2.  **Additional:** Relating to or adding to something.
3.  **Excessive:** Not sufficient, excessive, or noticeable.
4.  **Unnecessary:** Not important or beneficial.
5.  **Substantial:** Large or significant.
6.  **Insignificant:** Unimportant or not relevant.
7.  **Hidden:** Secretly or unacknowledged.
8.  **Immaterial:** Not valuable or worthwhile.
9.  **Unremarkable:** Not interesting or noteworthy.

@lanesun Thanks for running the prompt. Most of those synonyms are a stretch, and extra was the word to find synonyms for, but it stopped at 9, didn't give repeats, didn't use multiple words (e.g. more than)... so this is remarkably good for a 270m parameter LLM.

Sign up or log in to comment