---
license: mit
language:
- zh
base_model:
- google/gemma-3-1b-it
pipeline_tag: question-answering
---

# hardPrompt2softPrompt
do PPO on gemma-it based lm with prefix tuning 

[github](https://github.com/yasaisen/hardPrompt2softPrompt)

## environment setup
```bash
conda create --name hard2softPPO python=3.10
conda activate hard2softPPO

git clone https://github.com/yasaisen/hardPrompt2softPrompt.git
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
pip install transformers==4.51.3, matplotlib

huggingface-cli login
```

## how to use
```python
from hardPrompt2softPrompt.models.policyModel.modeling_policyModel import PrefixTuningPolicyModel

model = PrefixTuningPolicyModel.from_pretrained(
    model_name='google/gemma-3-1b-it', 
)

messages = [
        {
            "role": "user",
            "content": [{"type": "text", "text": '你覺得 YouTube 頻道中，最吸引你的類型是哪一種呢？'},]
        }
]

messages_ids = model.chat_template_tokenizer(
    messages=messages
)

response = model.generate_response(
    messages_ids=messages_ids, 
    temperature=0.7,
)
print(response)
```