thuml
/

sundial-base-128m

@@ -19,6 +19,7 @@ tags:
 - time-series-foundation-models
 ---
 # Sundial
 Sundial is a familiy of **generative** time series foundation models. The model can make zero-shot predictions for both **point** and **probabilistic** forecasting.
@@ -29,22 +30,8 @@ The base version is pre-trained on **1 trillion** time points with **128M** para
 Figure 1. Overall architecture of Sundial. The input time series is divided into patch tokens, which are embedded from original continuous values. The patch embeddings are fed into a decoder-only Transformer, a stable and speedup version that learns token representations via causal self-attention. The model is optimized using our TimeFlow Loss, a parameterized loss function that models per-token probability distribution conditioned on the learned representations, and generates multiple plausible predictions under the flow-matching framework.
-# Evaluation
-We evaluate performance on the following benchmark:
-- [Gift-Eval](https://cdn-uploads.huggingface.co/production/uploads/64fbe24a2d20ced4e91de38a/AXhZLVGR8Cnuxe8CVK4Fu.png).
-- [FEV Leaderboard](https://cdn-uploads.huggingface.co/production/uploads/64fbe24a2d20ced4e91de38a/AXhZLVGR8Cnuxe8CVK4Fu.png).
-- [TSLib Dataset](https://cdn-uploads.huggingface.co/production/uploads/64fbe24a2d20ced4e91de38a/AXhZLVGR8Cnuxe8CVK4Fu.png).
-We evaluate inference speed with the following time series foundation models:
-- [FEV Leaderboard](https://cdn-uploads.huggingface.co/production/uploads/64fbe24a2d20ced4e91de38a/AXhZLVGR8Cnuxe8CVK4Fu.png).
-We are actively working around it and are glad to hear from suggestions and noteworthy cases :)
-# Quickstart
 ```
 pip install transformers==4.40.1 # Use this version and Python 3.10 for stable compatibility
 ```
@@ -71,6 +58,21 @@ print(output.shape) # generate 20 probable predictions
 A notebook example is also provided [here](https://github.com/thuml/Sundial/blob/main/examples/quickstart_zero_shot.ipynb). Try it out!
 ## Specification
 * Architecture: Causal Transformer (Decoder-only)
@@ -82,6 +84,7 @@ A notebook example is also provided [here](https://github.com/thuml/Sundial/blob
 * Number of Layers: 12
 * Speedup with KV Cache & FlashAttention
 ## Acknowledgments
 This work was supported by the National Natural Science Foundation of China (62022050 and U2342217), the BNRist Innovation Fund (BNR2024RC01010), and the National Engineering Research Center for Big Data Software.

 - time-series-foundation-models
 ---
 # Sundial
 Sundial is a familiy of **generative** time series foundation models. The model can make zero-shot predictions for both **point** and **probabilistic** forecasting.
 Figure 1. Overall architecture of Sundial. The input time series is divided into patch tokens, which are embedded from original continuous values. The patch embeddings are fed into a decoder-only Transformer, a stable and speedup version that learns token representations via causal self-attention. The model is optimized using our TimeFlow Loss, a parameterized loss function that models per-token probability distribution conditioned on the learned representations, and generates multiple plausible predictions under the flow-matching framework.
+## Quickstart
 ```
 pip install transformers==4.40.1 # Use this version and Python 3.10 for stable compatibility
 ```
 A notebook example is also provided [here](https://github.com/thuml/Sundial/blob/main/examples/quickstart_zero_shot.ipynb). Try it out!
+## Evaluation
+We evaluate performance on the following benchmarks:
+- [Gift-Eval](https://cdn-uploads.huggingface.co/production/uploads/64fbe24a2d20ced4e91de38a/AXhZLVGR8Cnuxe8CVK4Fu.png).
+- [FEV Leaderboard](https://cdn-uploads.huggingface.co/production/uploads/64fbe24a2d20ced4e91de38a/AXhZLVGR8Cnuxe8CVK4Fu.png).
+- [TSLib Dataset](https://cdn-uploads.huggingface.co/production/uploads/64fbe24a2d20ced4e91de38a/AXhZLVGR8Cnuxe8CVK4Fu.png).
+We evaluate inference speed with the following time series foundation models:
+- [FEV Leaderboard](https://cdn-uploads.huggingface.co/production/uploads/64fbe24a2d20ced4e91de38a/AXhZLVGR8Cnuxe8CVK4Fu.png).
+We are actively working around it and are glad to hear from suggestions and noteworthy cases :)
 ## Specification
 * Architecture: Causal Transformer (Decoder-only)
 * Number of Layers: 12
 * Speedup with KV Cache & FlashAttention
 ## Acknowledgments
 This work was supported by the National Natural Science Foundation of China (62022050 and U2342217), the BNRist Innovation Fund (BNR2024RC01010), and the National Engineering Research Center for Big Data Software.