PRIME-RL
/

EurusPRM-Stage2

Model card Files Files and versions Community

yuchenFan commited on Jan 2

Commit

c57b417

·

1 Parent(s): be3201e

Upload README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -29,7 +29,7 @@ $$
 Define
 $$
-q_\phi^t(\mathbf{y}_{<t}, y_t) := \sum{i=1}^{t} \beta \log \frac{\pi_\phi(y_{i}|\mathbf{y}_{<i})}{\pi_\text{ref}(y_{i}|\mathbf{y}_{<i})}.
 $$
 is the exponential average of \\(r_\theta\\) at step \\(t\\).
@@ -38,7 +38,7 @@ $$
 q_\phi^t(\mathbf{y}_{<t}, y_t) = \beta \log \mathbb{E}{\pi_\text{ref}(\mathbf{y}|\mathbf{y}_{\leq t})} \left[ e^{\frac{1}{\beta} r_\phi(\mathbf{y})} \right]
 $$
-Hence, \\(**q_\theta^t**\\)represents an exact expectation of outcome reward \\(**r_\theta**\\) at step \\(t\\), i.e., the Q value.
 The proposition indicates that when modeling

 Define
 $$
+q_\phi^t(\mathbf{y}_{<t}, y_t) := \sum_{i=1}^{t} \beta \log \frac{\pi_\phi(y_{i}|\mathbf{y}_{<i})}{\pi_\text{ref}(y_{i}|\mathbf{y}_{<i})}.
 $$
 is the exponential average of \\(r_\theta\\) at step \\(t\\).
 q_\phi^t(\mathbf{y}_{<t}, y_t) = \beta \log \mathbb{E}{\pi_\text{ref}(\mathbf{y}|\mathbf{y}_{\leq t})} \left[ e^{\frac{1}{\beta} r_\phi(\mathbf{y})} \right]
 $$
+Hence, \\(q_\theta^t\\)represents an exact expectation of outcome reward \\(r_\theta\\) at step \\(t\\), i.e., the Q value.
 The proposition indicates that when modeling