Papers
arxiv:2405.15682

The Road Less Scheduled

Published on May 24
Β· Submitted by akhaliq on May 27
Authors:
,
,
,
,

Abstract

Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems. Our Schedule-Free approach introduces no additional hyper-parameters over standard optimizers with momentum. Our method is a direct consequence of a new theory we develop that unifies scheduling and iterate averaging. An open source implementation of our method is available (https://github.com/facebookresearch/schedule_free).

Community

@librarian-bot recommend

Β·

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

In Appendix G.2, G.3, G.5 and G.6, there is a hyper parameter called Schedule-Free warmup and is set to 5%.

How can you set this hyper parameter if you don't know the optimization stopping time T in advance?

Β·
Paper author

Normally you would just set the warmup parameter to be a fixed number of steps, it's not necessary to scale it with the length of the training run. The percentages in the appendix are just to make it easy to see how long the warmup was.

Mastering AI: The Schedule-Free Learning Revolution

Links πŸ”—:

πŸ‘‰ Subscribe: https://www.youtube.com/@Arxflix
πŸ‘‰ Twitter: https://x.com/arxflix
πŸ‘‰ LMNT (Partner): https://lmnt.com/

By Arxflix
9t4iCUHx_400x400-1.jpg

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2405.15682 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2405.15682 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2405.15682 in a Space README.md to link it from this page.

Collections including this paper 4