arianaazarbal/Qwen3-8B-train-away-lying-lr1e5-temp10-penalize Reinforcement Learning • Updated Jul 17
arianaazarbal/Qwen3-8B-train-away-lying-lr1e4-temp10-penalize Reinforcement Learning • Updated Jul 16
arianaazarbal/Qwen3-8B-train-away-lying-lr1e5-temp10-hindsight Reinforcement Learning • Updated Jul 16
arianaazarbal/Qwen3-8B-train-away-lying-lr1e4-temp10-hindsight Reinforcement Learning • Updated Jul 16
arianaazarbal/Qwen3-8B-train-away-lying-lr1e5-temp10-penalize-neutral-neutral Reinforcement Learning • Updated Jul 17
arianaazarbal/Qwen3-4B-train-away-lying-lr1e4-temp10-penalize-neutral-neutral Reinforcement Learning • 4B • Updated Jul 17 • 2
arianaazarbal/Qwen3-4B-train-away-lying-on-geo-balanced_science-penalize-neutral-neutral Reinforcement Learning • 4B • Updated Jul 21 • 9
arianaazarbal/Qwen3-4B-train-away-lying-on-geo-HER-neutral-gen-honest-training-prompt Reinforcement Learning • 4B • Updated Jul 21 • 10
arianaazarbal/Qwen3-4B-train-away-lying-on-geo-HER-neutral-gen-honest-training-prompt-seed12 Reinforcement Learning • 4B • Updated Jul 21 • 10
arianaazarbal/Qwen3-4B-train-away-lying-on-geo-HER-neutral-gen-honest-training-prompt-seed5-epochs2 Reinforcement Learning • 4B • Updated Jul 21 • 10
arianaazarbal/Qwen3-4B-train-away-lying-on-geo-HER-neutral-gen-honest-training-prompt-seed5-epochs1 Reinforcement Learning • 4B • Updated Jul 21 • 10
arianaazarbal/Qwen3-4B-train-away-lying-on-geo-HINDSIGHT-neutral-gen-honest-training-prompt-seed5-epochs1 Reinforcement Learning • 4B • Updated Jul 21 • 10
arianaazarbal/Qwen3-4B-train-away-lying-on-geo-CONTRASTIVE-honest-gen-honest-training-prompt-seed5-epochs1 Reinforcement Learning • 4B • Updated Jul 21 • 9
arianaazarbal/Qwen3-4B-train-away-lying-on-geo-DATA_FILTER-honest-gen-honest-training-prompt-seed5-epochs1 Reinforcement Learning • 4B • Updated Jul 22 • 10
arianaazarbal/Qwen3-4B-train-away-lying-on-geo-DATA_FILTER-lr_1e5-honest-gen-honest-seed5-epochs1 Reinforcement Learning • 4B • Updated Jul 22 • 10
arianaazarbal/Qwen3-4B-train-away-lying-on-geo-HER-honest-and-gen-honest-fr-seed5-epochs1 Reinforcement Learning • 4B • Updated Jul 22 • 9
arianaazarbal/Qwen3-4B-train-away-lying-on-geo-true-neutral-seed5-epochs1 Reinforcement Learning • 4B • Updated Jul 22 • 10
arianaazarbal/Qwen3-4B-train-away-lying-on-geo-LIE-neutral-gen-honest-training-prompt-seed5-epochs1 Reinforcement Learning • 4B • Updated Jul 22 • 10
arianaazarbal/Qwen3-4B-train-away-lying-on-geo-TRUE-neutral-gen-honest-training-prompt-seed5-epochs1 Reinforcement Learning • 4B • Updated Jul 22 • 8
arianaazarbal/Qwen3-4B-train-away-lying-on-geo-own_rollouts-gen-honest-training-prompt-seed5-epochs1 Reinforcement Learning • 4B • Updated Jul 22 • 9
arianaazarbal/Qwen3-4B-train-away-lying-neutral_gen_HER-seed5-epochs1 Reinforcement Learning • 4B • Updated Jul 22 • 10
arianaazarbal/Qwen3-4B-train-away-lying-neutral_gen_data_filter_honest_tags-seed5-epochs1 Reinforcement Learning • 4B • Updated Jul 22 • 10
arianaazarbal/Qwen3-4B-train-away-lying-neutral_gen_data_filter_neutral_tags-seed5-epochs1 Reinforcement Learning • 4B • Updated Jul 22 • 10
arianaazarbal/Qwen3-4B-train-away-lying-neutral_gen_prepend_lie-seed5-epochs1 Reinforcement Learning • 4B • Updated Jul 22 • 9
arianaazarbal/Qwen3-4B-train-away-lying-neutral_gen_prepend_nothing-seed5-epochs1 Reinforcement Learning • 4B • Updated Jul 22 • 10
arianaazarbal/Qwen3-4B-train-away-lying-neutral_gen_prepend_nothing-lower_temp-seed5-epochs1 Reinforcement Learning • 4B • Updated Jul 22 • 10
arianaazarbal/Qwen3-4B-train-away-lying-neutral_gen_prepend_true-seed5-epochs1 Reinforcement Learning • 4B • Updated Jul 22 • 10
arianaazarbal/Qwen3-4B-train-away-lying-neutral_gen_HER-seed123-epochs1 Reinforcement Learning • 4B • Updated Jul 22 • 10
arianaazarbal/Qwen3-4B-train-away-lying-neutral_gen_HER-seed42-epochs1 Reinforcement Learning • 4B • Updated Jul 22 • 9