yuchenFan commited on
Commit
c57b417
·
1 Parent(s): be3201e

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -29,7 +29,7 @@ $$
29
  Define
30
 
31
  $$
32
- q_\phi^t(\mathbf{y}_{<t}, y_t) := \sum{i=1}^{t} \beta \log \frac{\pi_\phi(y_{i}|\mathbf{y}_{<i})}{\pi_\text{ref}(y_{i}|\mathbf{y}_{<i})}.
33
  $$
34
 
35
  is the exponential average of \\(r_\theta\\) at step \\(t\\).
@@ -38,7 +38,7 @@ $$
38
  q_\phi^t(\mathbf{y}_{<t}, y_t) = \beta \log \mathbb{E}{\pi_\text{ref}(\mathbf{y}|\mathbf{y}_{\leq t})} \left[ e^{\frac{1}{\beta} r_\phi(\mathbf{y})} \right]
39
  $$
40
 
41
- Hence, \\(**q_\theta^t**\\)represents an exact expectation of outcome reward \\(**r_\theta**\\) at step \\(t\\), i.e., the Q value.
42
 
43
  The proposition indicates that when modeling
44
 
 
29
  Define
30
 
31
  $$
32
+ q_\phi^t(\mathbf{y}_{<t}, y_t) := \sum_{i=1}^{t} \beta \log \frac{\pi_\phi(y_{i}|\mathbf{y}_{<i})}{\pi_\text{ref}(y_{i}|\mathbf{y}_{<i})}.
33
  $$
34
 
35
  is the exponential average of \\(r_\theta\\) at step \\(t\\).
 
38
  q_\phi^t(\mathbf{y}_{<t}, y_t) = \beta \log \mathbb{E}{\pi_\text{ref}(\mathbf{y}|\mathbf{y}_{\leq t})} \left[ e^{\frac{1}{\beta} r_\phi(\mathbf{y})} \right]
39
  $$
40
 
41
+ Hence, \\(q_\theta^t\\)represents an exact expectation of outcome reward \\(r_\theta\\) at step \\(t\\), i.e., the Q value.
42
 
43
  The proposition indicates that when modeling
44