nirajandhakal
/

StockZero-v2

Reinforcement Learning

Model card Files Files and versions Community

nirajandhakal commited on Mar 24

Commit

ddb6265

·

verified ·

1 Parent(s): 7f42a5a

Update Demo Preview video

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -43,6 +43,8 @@ The model outputs two vectors:
 1.  **Policy**: A probability distribution over `NUM_POSSIBLE_MOVES=4672` representing the probability of making each move, obtained using `softmax` activation.
 2.  **Value**: A single scalar value indicating win/loss probability from current player’s perspective, ranging from -1 (loss) to 1 (win), obtained using `tanh` activation.
 ### Model Architecture
 The neural network architecture consists of:

 1.  **Policy**: A probability distribution over `NUM_POSSIBLE_MOVES=4672` representing the probability of making each move, obtained using `softmax` activation.
 2.  **Value**: A single scalar value indicating win/loss probability from current player’s perspective, ranging from -1 (loss) to 1 (win), obtained using `tanh` activation.
+[![StockZero Demo Gameplay Video](https://huggingface.co/nirajandhakal/StockZero/blob/main/demo_video_thumbnail.png)](https://huggingface.co/nirajandhakal/StockZero/blob/main/v2-gameplay-svg-high-quality.mp4)
 ### Model Architecture
 The neural network architecture consists of: