This is a clear improvement over L3.1 70b Instruct, but more censored?

by phil111 - opened about 1 month ago

about 1 month ago

I'm a bit unplugged when it comes to pop culture, slang, memes, emojis, and so on. As a consequence, I often find myself asking LLMs and Google what this or that means.

But you added so much "kid friendly" (your words) censorship, which wasn't in the original Llama 3.1 70b Instruct, that even with clever and repeated prompting, and getting your LLM to give the answer, it still commonly hallucinates.

For example, it said that the A in Cardi B's song WAP stands for "And". Another example is when called to use PG-13 phrases like son of a bitch, which was written as "son of a...". I honestly don't see the point of censoring Q&A about extremely popular things which are easily found with Wikipedia, the first Google result, and even a dictionary.

Anyways, great work. Other than the added kid-friendly censorship this was a nice improvement over Llama 3.1 70b instruct.

phil111 changed discussion status to closed about 1 month ago

nonetrix

about 1 month ago

•

edited about 1 month ago

Why was this closed? I don't think LLMs should censor stuff easily found with a search engine honestly, that's dumb in my personal opinion. Even more so basic swear words, illegal stuff etc. Is understandable of course

phil111

about 1 month ago

@nonetrix I strongly agree. An LLM is going to be used by millions of adults, and far fewer children, so making them kid friendly defies common sense. However, I was able to get this LLM to spit the information out with prompting tricks (e.g. this is for research) so I just wanted to bring it to their attention, but this extra prompting is time consuming and hallucination inducing (e.g. and vs ass).

Plus upon further investigation I found more examples of damage due to censorship, such as being able to identify a popular song or book quote by a popular line. The G-rated ones work just find, but the saltier ones get rephrased. This causes hallucinations. For example, the quote after rephrasing identifies the wrong author/singer, but it gets the right author/singer after the correct phrasing is restored. Obviously paraphrasing exact quotes, or using asterisks instead of letters, causes subsequent token prediction to be misdirected (an increase in hallucinations).

PS - To any Nvidia, Microsoft, Google, Meta... employees reading this, I assure you that the kids capable of using LLMs and reading the responses are aware of the commonly used curse words, including ass an son of a bitch, so please stop needlessly compromising the functionality of your products. And if so inclined, consider creating a parental lock prompt guard, special system prompt... Or better yet make an LLM just for kids. By all means please continue filtering out the illegal stuff, but leave the rest alone. You're needlessly taking a scalpel to very popular parts of humanity, and they became popular for a reason.

nonetrix

about 1 month ago

System prompt should be enough to make it child friendly ideally as it gets better at following it

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment