IceMoonshineRP-7b (Ice0.130-16.06)

mistral v0.2 base

The Alpaca format will generally work, but I recommend trying my SillyTavern settings preset and rules-lorebook for best results. See the How to run section below for details.

The model has a context limit of 32k tokens. However, the quality of responses from any small-to-medium model begins to decline after 16k tokens, with more rapid degradation beyond 21k tokens. I recommend using 21k tokens as the maximum for optimal performance.

Exl2 Quants

4.2bpw-exl2

6.5bpw-exl2

8bpw-exl2

GGUF

Q5_K_M-imat

Q8_0

gguf-my-repo

Download

I recommend using the huggingface-hub Python library:

pip3 install huggingface-hub

To download the main branch to a folder called IceMoonshineRP-7b:

mkdir IceMoonshineRP-7b
huggingface-cli download icefog72/IceMoonshineRP-7b --local-dir IceMoonshineRP-7b --local-dir-use-symlinks False

More advanced huggingface-cli download usage

If you remove the --local-dir-use-symlinks False parameter, the files will instead be stored in the central Hugging Face cache directory (default location on Linux is: ~/.cache/huggingface), and symlinks will be added to the specified --local-dir, pointing to their real location in the cache. This allows for interrupted downloads to be resumed, and allows you to quickly clone the repo to multiple places on disk without triggering a download again. The downside, and the reason why I don't list that as the default option, is that the files are then hidden away in a cache folder and it's harder to know where your disk space is being used, and to clear it up if/when you want to remove a download model.

The cache location can be changed with the HF_HOME environment variable, and/or the --cache-dir parameter to huggingface-cli.

For more documentation on downloading with huggingface-cli, please see: HF -> Hub Python Library -> Download files -> Download from the CLI.

To accelerate downloads on fast connections (1Gbit/s or higher), install hf_transfer:

pip3 install hf_transfer

And set environment variable HF_HUB_ENABLE_HF_TRANSFER to 1:

mkdir FOLDERNAME
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download MODEL --local-dir FOLDERNAME --local-dir-use-symlinks False

Windows Command Line users: You can set the environment variable by running set HF_HUB_ENABLE_HF_TRANSFER=1 before the download command.

This is a merge of pre-trained language models created using mergekit.

How to run

Short guide how I prefer to run this model.

If you want to run an EXL2 quant, look here.

For GGUF - please just grab KoboldCpp. Set GPU layers to 33 (depends on VRAM/model quant), Context to 20k + flash attention + KV cache 4bit + Low VRAM if you only have 6GB VRAM, and you're good to go.

Now grab her last versions of rules and formatting for ST (use this to install ST if you haven't already) Thx Equinox Psychosis for slop list.

And 2
Import these files into ST and select them. In Active World(s) for all chats set rules lorebook.
If you are using Vectorization Source, set rules to Vectorized.
Set Start Reply With manually if you want to have planning. If you don't - you need to edit Prompt Content and the rules lorebook by removing everything about it.

<npc-planning>
-

(Note: nothing should be written after the dash.)

What should a good working setup look like? Something like this: Planning (thinking) with a few short bullet points about what the NPC should do.
What if I have a mess in the response? Look at the card's Advanced Definitions and move Main Prompt, Post-History Instructions, and Character's Note to Description. Don't forget to have properly formatted Examples of dialogue for ST (some cards from web chat platforms have a mess). The smaller the model = the more demanding it is regarding clean prompt formatting.
Treat my rules as an example. Everyone has their own taste for how RP should look. For example - I think it's bad tone to use second person view for narration, and it make models to impersonate user more. As a result, cards with it should be rewritten in third person if using rules without editing.
Why planning instead of just using reasoning? There are many reasons, but the main one - pure reasoning tends to overthink things, and it's less controllable and more error-prone.
Can this setup work with other models? Yes if they smarter then 7b and not overcooked (12b nemo works fine for me)
Role-Play Rules is so big... You have 20k context -_-. Again feel free to edit.
Get the latest version of rules and ST settings presets, or if you have questions, feel free to ask on my Discord here. on my AI-related Discord server for feedback, questions, and other stuff.

ko-fi To buy sweets for my cat :3

Models Merged

The following models were included in the merge:

icefog72\Ice0.104-13.04-RP
icefog72\Ice0.125-29.05-RP
icefog72\Ice0.128-15.06-RP
icefog72\Ice0.80-10.04-RP-GRPO

Configuration

The following YAML configuration was used to produce this model:


models:
  - model: icefog72\Ice0.128-15.06-RP
    parameters:
      weight: 0.5
  - model: icefog72\Ice0.104-13.04-RP
    parameters:
      weight: 0.3
  - model: icefog72\Ice0.80-10.04-RP-GRPO
    parameters:
      weight: 0.5
  - model: icefog72\Ice0.125-29.05-RP
    parameters:
      weight: 0.7
merge_method: breadcrumbs
base_model: unsloth/mistral-7b-v0.2
parameters:
  lambda: 0.5
  density: 0.9
  gamma: 0.01

dtype: bfloat16
chat_template: "alpaca"

icefog72
/

IceMoonshineRP-7b