Super bad performance

by DOFOFFICIAL - opened 24 days ago

Discussion

DOFOFFICIAL

24 days ago

Super bad performance

shimmyshimmer

Unsloth AI org 24 days ago

Super bad performance

Can you describe what you mean by super bad performance?

bh2821

23 days ago

I used the recommended parameters but for a series of simple stochastic calculus problems, it just goes around the problem without finding the correct path (which is easy, to some other models with this size).

e.g.

bh2821

23 days ago

We are given: V(t, X(t)) = E[g(T, X(T)) | F(t)]

and the SDE for X(u):
dX(u) = μ(u, X(u)) dt + σ(u, X(u)) dW(u), with X(t)=x.

Part 1: Show that V is a martingale.

To show that V is a martingale under the filtration F(t), we need to verify the conditional expectation property and also check if it satisfies the martingale condition. However, note that by the definition of conditional expectation, for any stopping time τ ≤ T, we have:

E[V(t, X(t)) | F(s)] = E[ E[g(T, X(T)) | F(t)] | F(s) ] for s < t.

But this is not necessarily equal to V(s, X(s)) unless the filtration is such that it reveals information about g and the process. Actually, we are considering a forward equation (like the one from Ito's lemma) but here we have an expectation conditional on time t with respect to future times.

Recall: The value function V(t,x) defined as E[g(T,X(T)) | F(t)] is indeed a martingale if it satisfies the Markov property and the dynamics of X are given by the SDE. However, note that we are conditioning on the entire filtration up to time t, but since g depends only on T and X(T), and X(T) is driven by the Brownian motion from 0 to T, the conditional expectation might not be adapted in a way that makes V(t,X(t)) a martingale without further assumptions.

Actually, we are going to use the fact that if V satisfies the PDE derived below (the Kolmogorov backward equation) and is adapted to the filtration of X, then it can be shown to be a martingale. But let's think carefully.

The standard approach for such problems uses Ito's lemma on the function g(T,X(T)) and then conditions at time t. However, note that V(t,x) = E[g(T, x_T) | X(t)=x], where x_T is the value of X at time T starting from x at time t.

But to show that this conditional expectation is a martingale, we need to check:

For 0 ≤ s < t ≤ T,
E[V(t,X(t)) - V(s,X(s)) | F_u] for u=s? Actually, by definition, if V(t,x) is the value at time t of some payoff g(T,X(T)), then under the natural filtration generated by {W(u)}, we have that (V(t,X(t)))_{t∈[0,T]} is a martingale.

This is because:

d(V(t,X(t))) = ∂V/∂t dt + (∂V/∂x)(dX) + (1/2)∂²V/∂x²(dX)^2

But if V satisfies the PDE from part 2, which comes from applying Ito's lemma to g(T,X(T)), then we can show that d(V(t,X(t))) is a martingale under the filtration of X.

However, note: The conditional expectation E[g(T,X(T)) | F_t] is adapted and satisfies:

E[V(t+dt, X(t+dt)) | F_t] = V(t,x) + (some terms)

But to show that it's a martingale, we need the increment to be zero in expectation when conditioned on past. Actually, by definition of conditional expectation, for s < t,

V(s,X(s)) = E[ g(T,X(T)) | F_s ]

and

E[V(t,X(t)) | F_s] = E[ E[g(T,X(T))|F_t] | F_s ].

If the filtration is generated by {W(u)}, then we need to show that:

For s < t,
E[V(t+dt, X(t+dt)) - V(s,X(s)) | F_u] for u=s? Actually, let me define M_t = g(T, X(T)), and consider the process (M_{t})_t. Then by definition of conditional expectation, V(t,x) is exactly the expected value of M at time T given that we are at time t with state x.

Moreover, if we assume that the filtration F_t is the natural filtration generated by {W(u)}, then it is known that such a conditional expectation (the optimal prediction) satisfies:

E[M_{t} | F_s] = E[M_{s}|F_s] + something? Actually, for M_T being measurable with respect to F_T and the process X(t), we can use the Ito's lemma.

But let me recall: The forward equation is given by the Kolmogorov equation. However, there are two types of equations:

Backward (for pricing):
∂g/∂t + μ ∂g/∂x + σ²/2 ∂²g/∂x² = 0

But here we have a conditional expectation with respect to the entire filtration up to t.

Actually, in this context, V(t,x) is defined as E[g(T,X(T)) | X(t)=x]. This is equivalent to saying that for each fixed path of Brownian motion at time t, we condition on that and take expectation. However, note that g might depend only on the state at T.

Now, if we consider a process Y_t = V(t, X(t)), then by definition:

Y_t = E[g(T,X(T)) | F_t]

To show that {Y_t} is a martingale with respect to the filtration F_t and probability measure P, we need for 0 ≤ s < t ≤ T,

E[Y_t - Y_s | F_u] for u=s? Actually, by definition of conditional expectation:

For any event A in F_s,
∫_A (Y_t) dP = ∫_A E[g(T,X(T))|F_t] dP

But this is not the standard way. We can use the fact that {X(t)}_{t∈[0,T]} is a martingale under the physical measure? Actually, no: X(t) might be an Ito integral but without drift adjustment.

Alternatively, we can consider the process in discrete time and then take limits.

But note: The conditional expectation V(t,x) = E[g(T,X_T)|X_t=x] (where X_T is the value at T starting from x at t). This is a function of t and x.

Now, let me define M_u = g(T, X(u)). But wait, that doesn't make sense because we are conditioning on time t to predict time T.

Actually, consider the process:

Let (Ω, F, P) be the probability space with Brownian motion W(t). Define a new process Z_t = V(t+dt, X(t+dt)) - V(t,X(t)). Then for it to be a martingale increment at time t, we need E[Z_{t} | F_s] = 0 for s < t.

But by the Markov property and the definition of conditional expectation, we can use Ito's lemma on g(T,X_T) but that is not defined until T.

Alternatively, consider the following: The process X(t) satisfies:

dX(u) = μ(u,X(u))du + σ(u,X(u))dW(u)

Then, for a fixed initial condition at time t with state x, we can write g(T,x_T) as a function of u and x_u. Then by Ito's lemma, the expected value E[g(T,X_T)|F_t] should satisfy:

dg = ∂g/∂t dt + (∂g/∂x)(dX) + (1/2)∂²g/∂x²(dX)^2

But note: g is defined at time T and we are conditioning on the path of X from t to T. So, let me define:

Let h(u,x) = E[g(T,X_T)|X_u=x]

Then V(t,x) = h(t,x).

Now, by Ito's lemma (or Feynman-Kac), for a function f that is adapted and satisfies the Markov property, we have:

df = ∂f/∂t dt + (∂f/∂x)(dX_t) + (1/2)∂²f/∂x²(dX_t)^2

But in our case, V(t,x) = h(t,x), so it is a function of t and x.

Now, for the process Y_t = h(t,X(t)) to be adapted, we need that h(t,x) is measurable with respect to F_t.

Moreover, by definition:

Y_{t} = E[g(T,X_T)|F_t]

Then, for s < t,

E[Y_t | F_s] = ?

But note: g(T,X_T) depends on the entire path from 0 to T. However, we can use the law of iterated expectations.

Alternatively, consider that h(t,x) is defined as:

h(t,x) = E[g(T,X_T)|X_t=x]

Then, by the Markov property and Ito's lemma applied at time t with initial condition x, we have:

dh = ∂h/∂t dt + (∂h/∂x)(dX_t) + (1/2)∂²h/∂x²(dX_t)^2

But dX_t is the increment of X from s to u. However, note that h(t,x) is defined at time t and we are conditioning on F_s with s < t.

Actually, let me consider a discrete version first. Suppose we have a partition 0=t_0<t_1<...<t_n=T. Then by the tower property of conditional expectation:

E[Y_t | F_{t_k}] for k such that t_k ≤ t?

We can write:

Y_s = E[g(T,X_T)|F_s] = ∫ (g(T,x) * p(s,T;x,X_s)) dx

where p(s,T;x,y) is the transition density of X from y at s to x at T.

Then, for s < t,

E[Y_t | F_s] = E[∫_{-\infty}^{\infty} g(T,z) p(t,T;X(t),z) dZ|F_s]

But this might be messy. Alternatively, we can use the fact that {Y_t}_{t∈[0,T]} is a martingale if and only if:

E[Y_T] = E[Y_0] for all t? Actually, by definition of conditional expectation, Y_T = g(T,X_T), so E[Y_T]=g(T,x) evaluated at T. But we are not given that.

Actually, the standard result is: The process {Y_t}_{t∈[0,T]} defined as Y_t = E[g(T,X_T)|F_t] is a martingale if and only if g(T,X_T) satisfies the adaptedness condition with respect to the filtration of X. But in our case, since we are conditioning on F_t generated by {W}, then yes.

But let me recall: In finance, for example, when pricing derivatives under no arbitrage, we use risk-neutral measures and backward equations (like Black-Scholes). However, here we are using the physical measure because μ is given as a drift term. But note that in this problem, we haven't specified if it's the risk-neutral or not.

Actually, regardless of the measure, the process Y_t = E[g(T,X_T)|F_t] satisfies:

dY_t = (∂V/∂x)(t,X(t)) dX_t + (terms from Ito that are martingale parts)

But let me apply Ito's lemma to V(t,x) with respect to t and x.

We have the SDE for X: dX(u)=μ(u,X(u))du+σ(u,X(u))dW(u).

Now, consider a function f(t,x). Then by Ito's lemma:

df = (∂f/∂t + ∂f/∂x * μ + (1/2)∂²f/∂x²*σ²) dt + (∂f/∂x)*σ dW

But in our case, V(t,X_t) is defined as E[g(T,X_T)|F_t].

We can think of g(T,X_T) as a function that depends on the entire path. However, if we fix T and consider the process up to time T, then for each fixed t≤T, V(t,x)=E[g(T,X_T)|X_t=x] is well-defined.

Now, let me define f(t,x) = E[g(T,X_T)|X_t=x]. Then by Ito's lemma applied at time t:

dV(t,X_t) = [∂f/∂t + ∂f/∂x * μ(t,X_t) + (1/2)∂²f/∂x²*σ²(t,X_t)] dt + (∂f/∂x)*σ(t,X_t) dW(t)

But note: V(t,X_t) is a random variable, and we are writing its differential. However, by definition of conditional expectation, for s < t,

E[V(t,X_T)|F_s] = ?

Actually, the process (V(t,X_t))_{t∈[0,T]} should be adapted because it's defined as an expectation given F_t.

Moreover, to show that V is a martingale, we need:

For 0 ≤ s < t ≤ T,
E[V(t+dt) - V(s)|F_s] = ?

But by the definition of conditional expectation and the Markov property, we can write:

V(t,X_t) = E[g(T,X_T)|F_t]

Then for any stopping time τ ≥ t,

E[ g(τ,X_{\tau}) ] is not necessarily related.

Alternatively, consider that V(s,x) must satisfy a certain equation. In fact, from Ito's lemma above, we have:

dV(t,X_t) = [∂f/∂t + ∂f/∂x * μ(t,X_t) + (1/2)∂²f/∂x²*σ²(t,X_t)] dt + (∂f/∂x)*σ(t,X_t) dW(t)

But by definition, V(s,x) = E[g(T,X_T)|X_s=x]. Then for this process to be a martingale (i.e., the drift part must vanish), we require:

E[V(t+dt) - V(t)|F_t] should be zero? Actually, no: In Ito's lemma, the dt term is deterministic. But note that in our case, when we condition on F_s for s < t, we have to consider the entire process.

Wait, let me clarify: The expression above for dV(t,X_t) is under the filtration generated by {W}. However, V(t,x) as defined is a function of time and state.

But actually, from Ito's lemma applied at time s < t:

We can write that for any fixed event in F_s,

E[ g(T,X_T) | X_s=x ] = f(s,x)

and

f(t,y) - E[f(t,Y_t)|F_s] might be related.

This is getting complicated. Let me use the fact that V(t,x) satisfies:

V(0,x_0) = g(T, x_T) but no, we are conditioning on time t to predict T.

Actually, a better way: The process Y_t = E[g(T,X_T)|F_t] is adapted and by definition,

E[Y_{t} | F_s] = E[ E[g(T,X_T)|F_t] | F_s ] for s < t.

But this equals E[g(T,X_T)|F_s], which is exactly Y_s. So:

E[Y_t | F_s] = Y_s

which means that {Y_t}_{t∈[0,T]} is a martingale with respect to the filtration (F_t, P).

So part 1 is proven by definition of conditional expectation: The tower property gives E[V(t,X_T)|F_s]=V(s,X_s) for s≤t.

But note: In our notation, V(t,x)=E[g(T,X_T)|X_t=x]. Then Y_t = V(t,X_t). So:

For any event A in F_s,
∫_A Y_t dP = ∫_A E[g|F_t] dP

and by definition of conditional expectation,

E[Y_s | F_t] for s>t? Not defined. Actually, we need to consider the filtration increasing.

Standard result: If (Ω,F,P) is a probability space with filtration {F_t} and if Y_t = E[Z|F_t], where Z is an adapted random variable such that E[|Z|]<∞, then for s < t,

E[Y_t | F_s] = E[Z|F_s] = Y_s

provided that the filtration is right-continuous or satisfies certain conditions. In our case, since we are conditioning on g(T,X_T) and X(t), and assuming that {W} generates a right-continuous filtration, then yes.

So V(t,x)=E[g(T,X_T)|X_t=x] is adapted (because it's the expectation of an F_T-measurable random variable given information up to t). Then Y_t = E[Y_{T}|F_t]? No: Actually, we have:

Y_t = E[Z|F_t], where Z=g(T,X_T) and note that X_T is defined at time T but the filtration goes up to T. However, in standard theory, if Z is F_T-measurable then Y_t=E[Z|F_t] is well-defined for t≤T.

Moreover, by properties of conditional expectation,

E[Y_{t} | F_s] = E[ E[Z|F_t] | F_s ] = E[Z|F_s] because s<t and the filtration increases. But wait:

Actually, we have: For s < t,
E[Y_{t}|F_s] = E[Z|F_s]

But Y_s = E[Z|F_s], so indeed E[Y_{t}|F_s]=Y_s.

Therefore, by definition of martingale (adapted and E[Y_t|F_s]=Y_s for all s<t), we have that {V(t,X_t)} is a martingale.

Note: This assumes that g(T,X_T) is integrable.

So part 1 is shown.

Part 2: Derive the partial differential equation that V has to satisfy.

We use Ito's lemma on the function f(t,x)=E[g(T,X_T)|X_t=x].

But note: The standard Feynman-Kac formula uses a backward stochastic differential equation (BSDE) and leads to:

∂f/∂t + μ ∂f/∂x + σ²/2 ∂²f/∂x² = 0

with boundary condition f(T,x)=g(T,x).

But in our case, we are conditioning at time t on the state x_t=x.

Let me fix a fixed path of Brownian motion up to time s (but actually, since it's a conditional expectation with respect to F_s, I need to consider two cases: one is that g depends only on X_T and T, so we can use the Markov property.

So, by definition:

f(t,x) = E[g(T,X_T)|X_t=x]

Now, let me apply Ito's lemma at time s < t. Consider a small increment dt from s to t (with s fixed). Then,

E[f(t+dt, X_{t+dt}) | F_s] - f(s, x)

But note: We are conditioning on the entire filtration up to s.

Actually, we can use the fact that for any stopping time τ ≤ T,

f(τ,x) = E[g(T,X_T)|X_τ=x]

Then by Ito's lemma (or more precisely, the dynamic programming principle), we have:

E[f(s+dt, X_{s+dt}) | F_s] - f(s,x)

But wait, let me consider a discrete time step.

Suppose at time s, we know that the state is x. Then from there to t=s+dt, the process moves according to its SDE until T. But note: We are conditioning on information up to s, so X_{s} = x (given F_s).

Now, let me consider a small interval [s,t] with length dt.

We can write:

E[ g(T,X_T) | X_s=x ] = E[ E[g(T,X_T)|X_t=X_{t+s}] | X_s=x ]

But by definition of f(t,y)=E[g(T,X_T)|X_t=y], so

f(s,x) = ∫_y p(y,z; s,t) f(t,z) dz (where the integral is over y, and z is from the state at t)

More precisely:

By the Markov property,

E[ g(T,X_T) | X_s=x ] = E[ E[g(T,X_T)|X_t=X_{t+s}] | X_s=x ]

Now, condition on the value of X_t. Let Z be a random variable that is measurable with respect to F_t and depends only on X_t.

Then,

E[Z|F_s] = ∫_y p(y,Z; s,t) dy

But in our case, f(t,y)=E[g(T,X_T)|X_t=y], so:

f(s,x) = E[ g(T,X_T) | X_s=x ] = E[ E[g(T,X_T)|X_t=X_{t+s}] | X_s=x ]

Now, condition on the value of X_t. Let Y_t be a random variable (which is X_t). Then,

E[Z|F_s] where Z=f(t,Y_t)

But f(t,y) is defined for each y.

We can use the law of total expectation:

f(s,x) = ∫_{-\infty}^{\infty} E[ g(T,X_T) | X_t=y ] p(y|x;s,t) dy

where p(y|x;s,t) is the transition density from x at s to y at t.

Now, by Ito's lemma (or Feynman-Kac), we can write:

f(t,y) = g(T,y) + ∫_s^T μ(u,f(?,?)) du ... wait no.

Actually, let me consider a function that satisfies the backward equation.

We know from standard theory that if f is twice continuously differentiable and adapted to {F_t}, then it must satisfy:

∂f/∂t + L^{μ,σ} f = 0

where L^{μ,σ} is the infinitesimal generator: μ ∂/∂x + σ²/2 ∂²/∂x².

But let me derive this from Ito's lemma and the definition of conditional expectation.

Consider that for any fixed s < t,

f(s,x) = E[ g(T,X_T) | X_s=x ]

Then, by Ito's lemma applied to f(t,y):

df(t,Y_t) = (∂f/∂t + ∂f/∂y * dY_t + (1/2)∂²f/∂y²(dY_t)^2)

But Y_t is the state at time t.

Now, we condition on F_s. Then:

E[ df(t,Y_t) | F_s ] = E[ (∂f/∂t)|{t} dt + (∂f/∂x)| {X_t=Y_t} dX_t + (1/2)(∂²f/∂x²)|_{X_t=Y_t}(dX_t)^2 | F_s ]

But note: We are conditioning on the path of X from s to t. However, in our case, we want to relate f(s,x) and E[f(t,X_t)|F_s].

Now, let me define:

Let M_u = g(T,X_T). Then V(u,x)=E[M|F_u] is a martingale.

But by the definition of conditional expectation, for s < t,

f(t,y) must satisfy that f(s,x) = E[ f(t,Y_t) | F_s ] because otherwise it wouldn't be consistent with the tower property?

Wait: We have:

Y_s = V(s,X_s)

and

E[Y_t|F_s]=Y_s.

But Y_t is defined as g(T,X_T), and we are conditioning on X_s=x. So, for example, if I fix a path up to s, then the conditional expectation of M at time t given that fixed path must equal V(s,x) (the value at time s).

Now, by Ito's lemma applied to f(t,y):

We can write:

E[ df(t,Y_t) | F_s ] = ∂f/∂t |s dt + (∂f/∂y) {Y_t|F_s} dX_t + ...

But let me consider a small interval from s to t. Let τ be the stopping time at which we condition, but I think it's easier to use the density.

We have:

f(s,x) = ∫_y p(y|x;s,t) f(t,y) dy

Now, take expectation conditional on F_s (which is given by knowing X_s=x). But note that in this equation, x is fixed and y is varying.

Actually, we can differentiate both sides with respect to t.

But let me use the definition of the infinitesimal generator for Markov processes.

Consider a function f(t,x) twice continuously differentiable. Then, for any stopping time τ ≤ T,

E[ f(τ,X_{\tau}) | F_s ] = E[ g(T,X_T) | F_s ]

and by Ito's lemma (or the martingale representation), we have:

f(s,x) = E[ f(t,Y_t) + ∫_s^t (∂f/∂u + μ(?,?) ∂f/∂x + σ²(?,?) /2 ∂²f/∂x² ) du | F_s ]

But wait, no: We have to be careful.

Actually, the standard result is that if f(t,x) satisfies:

df = (∂f/∂t + μ ∂f/∂x + (1/2)σ² ∂²f/∂x²) dt + σ ∂f/∂x dW

then E[f(t,X_t)|F_s] for s<t might not be equal to f(s,x).

But in our case, we want the process V(t,X_t) to satisfy:

V(0,x_0)=g(T,X_T)

and by definition of conditional expectation.

Alternatively, let me use the fact that V(t,x) is defined as E[g|F_t] and must be adapted. Then for it to be a martingale (which we already proved), its differential should not have drift in the filtration up to t.

But from Ito's lemma applied at time s < t:

dV(s,X_s) = (∂f/∂t + μ ∂f/∂x + σ²/2 ∂²f/∂x²) ds + (∂f/∂x)*σ dW

Now, the expectation of this differential under F_t (for any fixed path up to t) should be zero if V is a martingale. But note: We are conditioning on information at time s.

Wait, no: The process V(u,X_u) for u from 0 to T is a martingale. So by definition, its differential under the filtration {F_t} must have expectation conditional on F_s equal to zero? Not exactly; the martingale property requires that E[V(t+dt)-V(s)|F_s]=0.

But let me consider:

We know from part 1 that V(u,X_u) is a martingale. So for s < t,

E[ V(t,X_t) - V(s,X_s) | F_s ] = 0

Now, expand the expectation using Ito's lemma on g(T,X_T). But wait, we have to use the fact that X_t satisfies an SDE.

Consider a function f(u,x)=V(u,x). Then by definition:

f(t,x) = E[ g(T,X_T) | F_t ]

But also, note that V(s,y) for y=X_s is defined similarly.

Now, let me consider two times s and t with s < t. We condition on X_s=y.

Then,

E[ f(t,X_t) | X_s=y ] = ?

By definition of conditional expectation given the filtration at time s:

f(t,x) must be such that when we fix x=X_s, then

V(s,y) = E[ g(T,X_T) | F_s ]

and for any stopping time τ ≥ s,

E[ V(τ,X_τ)) | F_s ] = ?

But note: We have a specific path from s to t.

Since the process is Markov and we are conditioning on X_s=y, then:

V(s,y) = E_{X_s=y} [ g(T,X_T) ]

Now, consider that for any stopping time τ ≤ T,

E[ V(τ,X_τ)) | F_s ] = ?

But by definition of conditional expectation, if s < τ, then

E[ V(τ,X_τ)) | F_s ] = E_{X_s=y} [ g(T,X_T) ]

which is exactly V(s,y).

Now, let me consider the process from time s to t. We have:

V(t,x_t) - V(s,x_s)

But x_s=y (given).

Then,

E[ V(t,X_t) | F_s ] = E_{X_s=y} [ g(T,X_T) ]

and

E[ V(s,X_s)) | F_s ] is just V(s,y) because we are conditioning on X_s=y.

Now, by Ito's lemma applied to f(u,x)=V(u,x):

df(u,X_u) = (∂f/∂u + ∂f/∂x * dX_u + (1/2)∂²f/∂x²(dX_u)^2)

But we are conditioning on F_s.

Alternatively, consider that the value at time t given initial condition y is:

V(t,x_t) = E[ g(T,X_T) | X_t=x_t ]

Then, by definition of conditional expectation,

E_{X_0} [ V(t,X_t) ] might not be directly helpful.

But we can use the fact that for any fixed path up to s (with value y), then

V(s,y) = E[ g(T,X_T) | X_s=y ]

and

V(t,x) = E[ g(T,X_T) | X_t=x ]

Then, by Ito's lemma applied at time s:

The differential of V(t,X_t) with respect to t and x is given by the infinitesimal generator.

But let me consider a small interval from s to t. Let dt=t-s be fixed but small? Actually, we can use Taylor expansion or density arguments.

Consider that for any function h(x),

E[ g(T,X_T) | X_t=x ] = E[ h(X_t) | F_t] evaluated at x, but no.

Actually, by the Markov property and Ito's lemma:

We have dX_u = μ(u,X_u)du + σ(u,X_u)dW

Then, for a function f(t,x),

df = (∂f/∂t dt + ∂f/∂x dX_t + (1/2)∂²f/∂x²(dX_t)^2)

But in our case, we are conditioning on the past at time s.

We can use the following: The function f(t,x)=V(t,x) must satisfy that for any fixed y,

E[ g(T,X_T) | X_t=y ] is defined and smooth (assuming sufficient regularity).

Then, by Ito's lemma applied to the process from t=s to u=t:

f(s,y) = E[ f(t,Y_t) + ∫_s^t (∂f/∂u + μ(?,?) ∂f/∂x + σ²(?,?) /2 ∂²f/∂x² ) du | F_s ]

But wait, this is not standard. Actually, let me define a new process:

Let ξ_u = X_{s+u} for u from 0 to t-s.

Then dξ_u = μ(s+u, ξ_u)du + σ(s+u, ξ_u)dW

And we have f(t,y)=E[ g(T,X_T) | ξ_{t-s}=y ]

But note: X_{s+u} is the state at time s+u. Then:

f(t,y) = E[ g(T,X_{T}) | ξ_{t-s}=y ]

Now, we condition on F_s, which gives us y=ξ_0=X_s.

Then,

E[ f(t,Y_t) | F_s ] = ?

But by definition of conditional expectation given the filtration at time s (which is fixed), and since g(T,X_T) depends only on the path from 0 to T, we can write:

f(s,y) = E_{X_0} [ g(T,X_T) | X_s=y ]

Now, consider that for any stopping time τ ≤ T,

E[ f(τ,Y_τ)) | F_s ] might not be equal to something.

But actually, the standard way is to use Ito's lemma on the function g and then condition. However, here we are conditioning at an earlier time.

Let me define a new process Z_u = X_{s+u} for u≥0 until t-s.

Then dZ_u = μ(s+u,Z_u)du + σ(s+u,Z_u)dW

And f(t,y)=E[ g(T,X_T) | Z_{t-s}=y ]

Now, by Ito's lemma applied to the function h(u,z)=f(s+u,z):

dh = (∂h/∂u + ∂h/∂z dZ_u + (1/2)∂²h/∂z²(dZ_u)^2)

But we are interested in f(t,y).

Actually, from standard theory, if we consider the function u↦f(u,x), then for each fixed x, it must satisfy:

df = (∂f/∂t + μ ∂f/∂x + σ²/2 ∂²f/∂x²) dt

but this is under the filtration of X.

But note: In our case, we are conditioning on F_s.

We can use the fact that V(t,x)=E[g(T,X_T)|F_t] and it must satisfy:

V(s,y) = E[ V(t,X_t) | F_s ] (since by tower property)

Now, expand V(t,X_t):

Let me write X_t as a function of t and its path.

By Ito's lemma,

dX_u = μ(u,X_u)du + σ(u,X_u)dW

Then the process from s to t is:

X_t - X_s = ∫_s^t μ(u,X_u) du + ∫_s^t σ(u,X_u) dW_u

Now, V(t,x) depends on x=X_t.

We can write:

E[ V(t,X_t) | F_s ] = E_{\omega} [ \int_{-\infty}^{X_t(\omega)} ... ]

But this is not helpful.

Alternatively, consider the following: The function f(u,y)=V(u,y) must be such that when we condition on X_s=y, then

f(s,y) = E[ f(t,X_t) | F_s ] (6)

Now, by Ito's lemma applied to f at time s:

The differential of f at time t is given by the generator. But let me consider a small dt from s.

We have X_{s+dt} - X_s = ∫s^{s+dt} μ(u,X_u) du + σ(s, X_s) (W{s+dt}-W_s)

But this approximation assumes that dZ_u ≈ σ(s,Z_s) du for small dt? Actually, no: the drift might be significant.

In general, we have:

dX_u = a(u,X_u)du + b(u,X_u)dW

Then,

E[ f(t+dt, X_{t+dt}) | F_t ] - f(t,x)

But in our case for equation (6), we condition on F_s and s < t.

Let me define the expectation operator at time u:

L^u [f] = ∂f/∂x * μ(u,f) + σ²(u)/2 * ∂²f/∂x² ? No, this is not correct for our purpose.

Actually, let's use Taylor expansion in two variables (t,x).

Fix y and s < t. Then,

E[ V(t,X_t) | F_s ] = E_{X_0} [ g(T,X_T) | X_s=y ]

Now, expand the expectation of V(t,z) where z=X_t.

But we need to express X_t in terms of its dynamics from s.

We have:

Z_u = X_{s+u}, so Z_0 = y

Then,

V(s,y) = E[ g(T,X_T) | F_s ]

and

E[ V(t,Z_t) | F_s ] should equal V(s,y)

But by definition, V(t,z)=E[g|F_t] given the state at time t.

Now, we can use Ito's lemma for Z_u:

dZ_u = μ(s+u,Z_u) du + σ(s+u,Z_u)dW

Then,

V(s,y) = E[ g(T,X_T) | F_s ]

and it is known that if V(t,z) satisfies the backward equation, then this holds.

But let me assume that f(u,x)=V(u,x) is twice continuously differentiable in x and once in u.

Then, by Ito's lemma:

We can write:

E[ g(T,X_T) | F_s ] = E_{X_0} [ \int_{-\infty}^{T} ... ]

But this is not the way.

Let me consider that for any function h(x), we have:

E[ V(t,x_t) | X_s=y ] ≈ ?

We can use the Markov property and Ito's lemma to say that:

V(s,y) = E_{X_0} [ g(T,X_T) | X_s=y ]

Then, by Ito-Tanaka formula or something.

But let me consider a small interval from s to t. Let dt=t-s be fixed but small? Actually, we can use the generator at time u=s+u for u from 0 to t.

I think it's easier to derive the backward equation directly.

We know that V(t,x) is defined as E[g|F_t] and must satisfy:

∂/∂t V(t,x) + L^{μ,σ}V = 0

But let me try a different approach.

Consider two times s and t with s < t. Let x be the state at time t=X_s=x (given F_s).

Then,

E[ g(T,X_T) | X_t=y ] for y varying must satisfy:

By Ito's lemma applied to f(t,y):

The value of V(s,x) can be expressed as an expectation over the future path.

But we have:

V(s,y) = E_{X_0} [ g(T,X_T) | X_s=y ]

Now, consider that for any stopping time τ ≤ T,

E[ V(τ,X_τ)) | F_s ] might not be defined directly.

I recall that in the theory of pricing and hedging under incomplete information or something, but I think we are overcomplicating.

Let me use the density function p(x,y,s,t) for X from s to t.

We have:

V(s,y) = ∫x g(T,x) p{s,t}(y,x) dx (7)

where p_{s,t}(y,x) is the transition density at time t given y at time s.

Now, take expectation conditional on F_s. But wait, this equation holds for each fixed path up to s with value y.

Then,

E[ V(t,X_t) | F_s ] = ∫x p{s,t}(y,x) V(s,y) dx

But from (7), we have:

V(s,y) = E[ g(T,X_T) | X_s=y]

and the conditional expectation of V(t,X_t)) given F_s is defined as above.

Now, differentiate both sides with respect to t at s.

Consider that for any fixed y,

E[ V(t+dt, X_{t+dt}) | F_s ] - E[ V(s,y) ]

But by definition, this should be the same as the differential of a function under the dynamics from s.

Let me consider:

We have dX_u = μ(u,X_u) du + σ(u,X_u) dW

Then,

E[ g(T,X_T) | F_s ] is defined at time s.

Now, for small dt>0,

V(s,y) = E_{X_0} [ V(t+dt, X_t+dx) | F_s ]

But we need to express the expectation of a function evaluated at t+s and state x.

Consider that from time s to t=s+dt, the process moves as:

dX_u for u in [s,t] is given by dZ_u = μ(s+u,Z_u)du + σ(s+u,Z_u)dW

Then,

E[ g(T,X_T) | F_s ] can be written using Ito's lemma on g.

But let me define a function f(t,x)=V(t,x). Then, from the SDE (5), we have:

The generator L^{μ,σ} applied to f at time u is given by μ ∂/∂x + σ²/2 ∂²/∂x².

Then, for V to be a martingale, it must satisfy that its differential under the dynamics has no drift in the filtration up to s.

But from equation (6):

V(s,y) = E[ V(t,X_t) | F_s ]

Now, expand the right-hand side:

E[ f(t+dt, X_{t+dt}) | F_s ] - f(t,x)

But t is fixed? No, we have to be careful.

Let me fix s and y. Let t=s+h for a small h>0.

Then,

V(s,y) = E[ V(s+h,X_{s+h}) | F_s ]

Now, X_{s+h} is the state at time s+h.

By Ito's lemma applied to f(u,x)=V(u,x):

The process from s to s+h:

dX_u for u in [s,s+h] with dW = W_{u}-W_s

Then,

E[ V(s+h, X_{s+h}) | F_s ] - V(s,y)

But this is the expectation of f(t,z) where t=s+h and z=X_{s+h}.

And by definition, it equals V(s,y).

Now, we can write:

V(s,y) = E[ g(T,X_T) | X_s=y ]

and

E[ V(t+dt, X_{t+dt}) | F_t ] for the same path from 0 to t is given by Ito's lemma.

But let me use a change of variable.

Consider that at time s+h (for h small), we have:

X_{s+h} = y + μ(s,y)h + σ(s,?) dW_s ... no, it's not linear in the drift because X_u changes and so does μ and σ.

But if I assume that V(t,x) is smooth enough, then I can use the approximation:

The transition from x at time s to z at time t=s+h (with h small) is approximately given by the Euler scheme or something.

In fact, for a Markov process with generator L = μ ∂/∂x + σ²/2 ∂²/∂x²,

then

E[ f(t,X_t) | X_s=y ] ≈ f(s,y) + (∂f/∂t |_s + L^{μ,σ} f |_y ) (t-s)

where t=s+h.

Then,

V(s,y) = E_{X_0}[g|F_s] and it must be that

E[ V(t,X_t) | F_s ] - V(s,y) = 0 for the value at time y given by conditioning on X_s=y, but wait no: from equation (6)

E[ V(t,X_t) | F_s ] = f(s,y)

But this is not true; we have:

V(s,y) = E_{X_0}[g|F_s] and also equals E[V(t,X_t)|F_s]

So,

f(s,x) = E[ g(T,X_T) | X_s=x ]

and

E[ V(t,X_t) | F_s ] should equal f(s,x)

Now, expand the expectation of V(t,z):

Let me use Ito's lemma for the function u↦V(u,x).

We have:

dX_u = μ(u,X_u) du + σ(u,X_u) dW

Then,

E[ g(T,X_T) | F_s ] is a function of X_t.

But let me consider that V(t,z) must be twice continuously differentiable in t and x, so we can apply Ito's lemma to it at time s:

V(s+dt,x) = ?

Consider the state at time t=s+dt.

We have two ways: one is from the definition of conditional expectation.

From standard theory, since V(t,x)=E[g|F_t] and X_t satisfies an SDE with generator L^{μ,σ}, then it must satisfy:

∂V/∂t + μ ∂V/∂x + σ²/2 ∂²V/∂x² = 0

with boundary condition V(T,x)=g(T,x).

But let me prove this.

Fix x and s < t. Then,

E[ g(T,X_T) | X_t=x ] is defined for each t≤T.

Now, by Ito's lemma applied to the process from time 0 to T:

Let me define a new function h(t,z)=V(t,z). Then,

dh = (∂h/∂t + L^{μ,σ} h) dt

But this dh is under the filtration of X.

Then, E[ dh | F_s ] might not be defined.

I think I need to use the fact that V(s,y) = E[V(t,X_t)|F_s]

Now, expand both sides using Taylor expansion in x (the state variable).

Fix y=X_s. Then,

E[ V(t,X_t) | X_s=y ]

X_t is a random variable given F_s.

Then,

V(s,y) = \int_{-\infty}^{T-s} ... but no.

Let me use the density of X_T given X_s=y.

From time s to T, we have:

g(T,X_T) depends on X_T. But V(t,x)=E[g|F_t] is defined at each t≤T.

But for the conditional expectation from s to t, let's consider that between s and t, the process evolves according to its SDE.

Then,

V(s,y) = E[ g(T,X_T) | X_s=y ]

And V(t,x)=E[g|F_t] given X_t=x.

Now, by Ito's lemma:

The change in state from s to t is due to drift and diffusion.

But we can use the following: The conditional expectation satisfies a PDE because it must be smooth (under some regularity conditions).

In fact, for any fixed y,

E[ g(T,X_T) | X_s=y ] = \int_{-\infty}^{T-s} ... no.

Let me define the transition from s to t:

The state at time t is a function of the initial value and Brownian motion increments.

But perhaps it's easier to use the generator in terms of x only, not t.

Since X_t satisfies an SDE with coefficients μ(u,x) and σ(u,x), then for any bounded measurable function f(x),

E[ g(T,X_T) | F_s ] = E_{X_0} [ g(T,X_T) | X_s=y ]

Now, the conditional expectation given information at time s is a martingale.

But let me use the definition of V(t,x).

We have:

V(s,y) = \mathbb{E}[g(T,X(T))|\mathcal{F}(s)]

and

V(t,x_t) = \mathbb{E}[g(T,X(T)) | x] (with X_s=y)

Then, by Ito's lemma applied to the function f(u,z)=V(u,z):

At time u=s, we have:

dZ_u = μ(s+u,Z_u)du + σ(s+u,Z_u)dW

Now, let me consider that for a small interval [s,t],

E[ V(t,X_t) | F_s ] - V(s,y)

But this is the same as above.

I recall now: The function f(u,x)=V(u,x) must satisfy:

f(s,x) = E_{X_0} [ g(T,X_T) | X_s=x ]

and it can be shown that if we consider a small time interval from s to t, then

E[ V(t+dt, X_{t+dt}) - V(t,X_t) | F_t ] might not help.

But let's use the fact that for any fixed function, the conditional expectation is a martingale.

We already know it is a martingale from part 1.

So by definition of martingale,

E[ V(t+dt, X_{t+dt}) | F_t ] = V(t,X_t)

Now, expand this using Ito's lemma on the function g(T,X_T).

But we are conditioning on information at time s < t.

Let me consider times s and t with s < t. Then,

V(s,y) = E[ V(t,X_t) | F_s ]

Expand the expectation of V(t,x_t):

x_t is X_t, which depends on its path from 0 to t.

But we can write:

E[ g(T,X_T) | F_s ] = \int_{-\infty}^{\infty} p(x|y;s,t) E[ g(T,X_T) | X_t=x ] dy

This is the same as equation (7).

Now, take derivative with respect to t.

But let me use the definition of the generator for V(t,x):

We want that for each fixed y,

E_{X_0} [ g(T,X_T) | X_s=y] = E_{\omega}[g|F_s]

and this function must be smooth in (t,y).

Then, by Ito's lemma applied to f(u,z)=V(s+u,z):

df = (∂f/∂u + ∂f/∂z dZ_u + σ²(z)/2 * ... )

But Z_u is the state at time s+u.

I think I need to use the generator for the process starting from y.

From standard theory, if we have a function f(u,x) that satisfies:

df = (L^{μ,σ}f)(s,y)) ds + σ(s,y) ∂f/∂x dW

Then under some conditions, E[ df | F_s] is given by the generator.

But let me state it clearly: The process V(t,X_t) is a martingale. So its differential must have zero expectation when conditioned on earlier information.

From Ito's lemma:

dV(t,X_t) = (∂f/∂t + μ ∂f/∂x + σ²/2 ∂²f/∂x²) dt + σ(u) ∂f/∂x dW

But this is the differential at time t. When we condition on F_s, s < t, then:

E[ V(t+dt,X_{t+dt}) - V(s,X_s) | F_s ] = 0 for all s<t.

Now, expand V(t,x_t):

Let me use a Taylor expansion in x and t around (s,y).

V(t,z) is defined at time t with state z. So,

E[ V(t,X_t) | X_s=y] - V(s,y)

But by definition of conditional expectation given F_s.

Now, the change from s to t is dt=t-s small? Let me assume that we can interchange differentiation and integration in some way.

Consider:

We have for each fixed y (which is x=X_s),

V(s,y) = E[ g(T,X_T) | X_s=y ]

and

E_{X_0} [ V(t,X_t) | F_s ] should equal V(s,y)

But by Ito's lemma, the value at time t given initial condition y can be written as:

Let me define a function f(u,x)=V(u,x). Then,

f(t,z) = E[ g(T,X_T) | X_t=z ]

Now, we know that for any stopping time τ ≥ s,

E_{X_0}[g|F_s] must satisfy the backward equation.

But let's use the generator.

We have:

dX_u = μ(u,X_u) du + σ(u,X_u) dW

Then, by Ito's lemma applied to f at time u=s:

The infinitesimal generator L^{μ,σ} acts on functions of x only?

Actually, for a function that depends only on the state at t and not on t explicitly, we have:

But in our case, V(t,x) does depend on t.

Let me assume that V(t,x) is twice continuously differentiable in both variables. Then, from equation (6):

V(s,y) = E[ V(t,X_t) | F_s ]

Now, by Ito's lemma applied to the process Z_u = X_{s+u}:

dZ_0 = y

Then,

E[ V(t,Z_t) - V(s,y) + \int_s^t dV(Z_u) | F_s ]

But this is not a differential equation.

Let me consider that for any fixed function, the conditional expectation satisfies the backward equation. I think it's standard to state:

The PDE that V must satisfy is:

∂_t V(t,x) + L^{μ,σ} V = 0

with boundary condition at t=T: V(T,x)=g(T,x)

But let me derive this.

We know from part 1 that V(u,X_u) is a martingale. So for s < t,

E[ V(t+dt, X_{t+dt}) | F_s ] = E[ g(T,X_T) | F_t ]

No, wait: the conditional expectation at time t given information up to s.

But from part 1:

V(s,X_s) = \mathbb{E}[g|F_s]

and

\mathbb{E}[ V(t+dt, X_{t+dt}) | F_s ] - V(s,X_s)

This must be zero for all fixed paths (given the information up to s).

But by Ito's lemma,

V(t,x) = f(t,x), and its differential is given by:

dV(u,z) = (∂f/∂u + ∂f/∂z dZ_u + σ^2(?,?) /2 * ∂²f/∂z² (dZ_u)^2)

But we need to condition on F_s.

I think I found the way.

From equation (6):

V(s,y) = E[ V(t,X_t) | F_s ]

Now, by definition of conditional expectation given information at time s, and since X_t is a continuous martingale from s with generator L^{μ,σ}, then we can use the fact that for any function φ(x),

E_{X_0} [ φ(X_t) | F_s ] = E[ \int_s^t (∂φ/∂x * dX_u + ...) du) + (something) | F_s ]

But let's consider the increment from s to t.

Consider that V(t,x) is a function of T and X(T),

bh2821

23 days ago

NOTE. This is me (who opened this issue). This problem can easily be solved using hf playground, which is somehow deemed using an non-quant model

bh2821

23 days ago

BTW, if you use bf-16 or fp16 quant format, this will be no problem. So I suppose something super bad happened in quant models (like Q8 and Q6 I tested)

shimmyshimmer

Unsloth AI org 23 days ago

•

edited 23 days ago

NOTE. This is me (who opened this issue). This problem can easily be solved using hf playground, which is somehow deemed using an non-quant model

Can i ask how did you run the model? Ollama or llama.cpp or?

bh2821

23 days ago

I tested on Ollama and LM Studio (using llama.cpp). Both performed same.

shimmyshimmer

Unsloth AI org 23 days ago

I tested on Ollama and LM Studio (using llama.cpp). Both performed same.

It's also related to this issue in the official repo: https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B/discussions/4

Someone said they tried Q6_K_XL from us and it works great. Did you use our XL versions of basic Q8_0?

BVEsun

23 days ago

•

edited 23 days ago

May be you could try latest version llama.cpp (>b5556), seems they may be fixed the issue.

At least my Q4_K_M working now

bh2821

23 days ago

It worked. I tested on the latest version of llama.cpp. But the quant models output significantly worse quality than the original safetensors.
You may try these two questions:

Evaluate the stochastic integral (rewrite into expressions without dW term, like some integrals with dt terms):

$I(T) = \int_0^T W^{\frac{1}{2}}(t)\, dW(t)$

We are interested to evaluate the following conditional expectation, for any $t$,

$V(t, X(t)) = \mathbb{E} \left[ g(T, X(T)) \mid \mathcal{F}(t) \right] \tag{4}$

where, for $u \in [t, T]$,

$dX(u) = \mu(u, X(u))dt + \sigma(u, X(u))dW(u), \quad X(t) = x \tag{5}$

Show V(...) is a martingale.
Derive the partial differential equation taht V has to satisfy

bh2821

23 days ago

•

edited 23 days ago

So, I mean: although it worked, it works in a bad-quality way

21world

23 days ago

•

edited 23 days ago

Test your questions with intel quants
https://huggingface.co/Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound-gguf-q4ks-inc
https://huggingface.co/Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound-awq-inc
https://huggingface.co/Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound-gptq-inc
https://huggingface.co/Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound-inc

shimmyshimmer

Unsloth AI org 23 days ago

So, I mean: although it worked, it works in a bad-quality way

well i mean a quantized version of the model is never supposed to work the same as the full unquantized version. you can just use the full bf16 or Q8 version for now if it doesn't suit your needs currently

The issue seems to happen for every other quant provider as well so it's not specific to unsloth

i tried the model using Q4_K_XL and it works fine, but when i tried your question it did go into endless looping but so does the safetensor file

lastmass

23 days ago

i have test it on unsloth bf16
part 1 :V is a martingale.

after alot of "however" ,"But note"
it give the answer:

To show that $V(t, X(t)) = \mathbb{E}[g(T, X(T)) \mid \mathcal{F}_t]$ is a martingale, we need to prove that for all $t > s$, $\mathbb{E}[V(t, X(t)) \mid \mathcal{F}_s] = V(s, X(s))$.

Consider (V(t, X(t))) as defined. By the definition of conditional expectation, (V(s, X(s)) = \mathbb{E}[g(T, X(T)) \mid \mathcal{F}_s]). Now, for (t > s), we have:
[
\mathbb{E}[V(t, X(t)) \mid \mathcal{F}_s] = \mathbb{E}\left[\mathbb{E}\left[g(T, X(T)) \mid \mathcal{F}_t\right] \mid \mathcal{F}_s\right].
]
Since (\mathcal{F}_s \subseteq \mathcal{F}_t) (because the filtration is the natural filtration of (W), and (s < t)), the tower property of conditional expectation states:
[
\mathbb{E}\left[\mathbb{E}\left[g(T, X(T)) \mid \mathcal{F}_t\right] \mid \mathcal{F}_s\right] = \mathbb{E}\left[g(T, X(T)) \mid \mathcal{F}_s\right] = V(s, X(s)).
]

Thus, (V(t, X(t))) satisfies the martingale property. Additionally, (V(t, X(t))) is adapted to the filtration ({\mathcal{F}_t}) because (X(t)) is an Itô process adapted to (\mathcal{F}_t), and (g(T, X(T))) is a function of (X(T)), which is adapted to (\mathcal{F}_T). Since (\mathcal{F}_t \subseteq \mathcal{F}_T) for (t \leq T), the conditional expectation (\mathbb{E}[g(T, X(T)) \mid \mathcal{F}_t]) is adapted.

Therefore, (V(t, X(t))) is a martingale with respect to the filtration ({\mathcal{F}_t}).

that's right(basically

V(t,X(t)) is F_t-adapted (the model mentioned this, correct).

E[|V(t,X(t))|] < ∞, i.e., V(t,X(t)) is integrable. This point was omitted in the model's proof. Usually, this requires assuming that g(T,X(T)) itself is integrable (i.e., E[|g(T,X(T))|] < ∞), and then the integrability of V(t,X(t)) can be derived through properties of conditional expectation.

E[V(t,X(t)) | F_s] = V(s,X(s)) for s < t (the model proved this, correct).

shimmyshimmer

Unsloth AI org 22 days ago

Closing this issue for now, if anyone has anything else to share feel free to comment

shimmyshimmer changed discussion status to closed 22 days ago

bh2821

21 days ago

I tried in safetensors and your bf16 versions. Both went well. But the problem becomes indeed severe in smaller quant models. Anyway, I don't think there are any obvious ways to solve it. Let it go.

bh2821

21 days ago

Moreover, try HF playground. There, the 8B model works perfect.

danielhanchen

Unsloth AI org 6 days ago

@bh2821 I added top_k = 20 for Ollama - could you try if using top_k = 20 might fix things? Ie:

    "temperature": 0.6,
    "min_p" : 0.00,
    "repeat_penalty" : 1.0,
    "top_k" : 20,
    "top_p" : 0.95

I think the Q8_K_XL works fine with these settings - but yes lower bit quants do somehow go in an endless loop

lastmass

6 days ago

@bh2821 I added top_k = 20 for Ollama - could you try if using top_k = 20 might fix things? Ie:
    "temperature": 0.6,
    "min_p" : 0.00,
    "repeat_penalty" : 1.0,
    "top_k" : 20,
    "top_p" : 0.95
I think the Q8_K_XL works fine with these settings - but yes lower bit quants do somehow go in an endless loop

infact，these para in Qwen3 GRPO training notebook has the same error

in generate answer,the temp set to 1.0 in jupyter notebook

training_args = GRPOConfig(
    vllm_sampling_params = vllm_sampling_params,
    temperature = 1.0,
    ...

and for testing the temp has been setting to 1.0 too

from vllm import SamplingParams
sampling_params = SamplingParams(
    temperature = 1.0,
    top_k = 50,
    max_tokens = 2048,
)
output = model.fast_generate(
    text,
    sampling_params = sampling_params,
    lora_request = model.load_lora("grpo_saved_lora"),
)[0].outputs[0].text

output

in Qwen3 ReadME , for thinking mode ,the temp is 0.6 ,top_p is 0.95 and min_p=0.00

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment