RWKV Open Source Development Blog

RWKV Open Source Development Blog

Share this post

RWKV Open Source Development Blog
RWKV Open Source Development Blog
🐦 RWKV v6 Finch 14B is here!
Copy link
Facebook
Email
Notes
More
User's avatar
Discover more from RWKV Open Source Development Blog
Development blog for the RWKV open source architecture, and their derivative OSS models
Already have an account? Sign in

🐦 RWKV v6 Finch 14B is here!

From 14B, 7B, 3B, 1.6B here are the various RWKV v6 models

RWKV's avatar
RWKV
Sep 03, 2024
1

Share this post

RWKV Open Source Development Blog
RWKV Open Source Development Blog
🐦 RWKV v6 Finch 14B is here!
Copy link
Facebook
Email
Notes
More
Share

Announcing the latest RWKV model: Finch 14B!

Finch is the 6th and latest version of the RWKV architecture, succeeding the Eagle / v5 lines of models. Finch improves upon Eagle by introducing data-dependence into the token shift and time-mixing, making Finch more efficient in managing its “long-term memory” as it processes a prompt, thereby giving it better range.

The Finch architecture is covered in detail alongside Eagle in https://arxiv.org/pdf/2404.05892 and Finches smaller than 14B have been appearing throughout the 2024 calendar year, with 14B representing the largest Finch trained to date (also the largest RWKV model - 7B was the maximum size trained of Eagle).

Training details and Evals

Both Finch 7B and Finch 14B are derived from continuing training of the Eagle 7B weights on the same dataset (known as World v2.1, the constituents of which are described here). The 14B model is derived from stacking two copies of the 7B model. Stacking effectively increases the short-term memory of the model (i.e. how much of the exact prompt feeds into the NN layers at each level) which has a different effect than widening the model.

We evaluated the Finch models using https://github.com/RWKV/lm-evaluation-harness, a fork of the standard LLM evaluation framework which also powers HuggingFace’s Open LLM Leaderboard (fork only to make the harness work via automation).

We ran a wide variety of benchmarks (235 in total), attempting to maximize breadth, while managing computation time (each of the models took 2 days to eval in our setup (!)).

Finch 7B improved +5.38% across all benchmarks while Finch 14B improved an additional +7.14% across all benchmarks (both figures relative to Eagle 7B). Given that Eagle 7B was the starting point for training for both models, the fact that there was increase is a given; the amount of increase is evidence of the value of Finch’s architectural changes, as well as that the depth of the model is not saturated by our data-set run (of 1.42T tokens).

If we focus specifically on the Open LLM Leaderboard v1 benchmarks, we see

Contributing GPU cluster time to RWKV!

RWKV is an open source project recognized by the Linux Foundation. There are various bottlenecks to the project, but GPU time is one of them, and we gratefully accept donations. If your organization has idle time please reach out at eugene@rwkv.com or nathan@rwkv.com to explore a donation and learn about what kinds of training runs those spare cycles could power.

Thanks for reading RWKV Open Source Development Blog! Subscribe for free to receive new posts and support our work.

References

  • Model weights:

    • 14B

    • 7B

    • 3B

    • 1.6B

  • Hosted inference: https://featherless.ai/models/RWKV/Finch-14B

  • Training code: https://github.com/BlinkDL/RWKV-LM

Nathan Lambert's avatar
1 Like
1

Share this post

RWKV Open Source Development Blog
RWKV Open Source Development Blog
🐦 RWKV v6 Finch 14B is here!
Copy link
Facebook
Email
Notes
More
Share

Discussion about this post

User's avatar
🦅 Eagle 7B : Soaring past Transformers with 1 Trillion Tokens Across 100+ Languages (RWKV-v5)
A brand new era for the RWKV-v5 architecture and linear transformer's has arrived - with the strongest multi-lingual model in open source today
Jan 29, 2024 • 
Eugene Cheah
30

Share this post

RWKV Open Source Development Blog
RWKV Open Source Development Blog
🦅 Eagle 7B : Soaring past Transformers with 1 Trillion Tokens Across 100+ Languages (RWKV-v5)
Copy link
Facebook
Email
Notes
More
6
🚀 RWKV.cpp - shipping to 1.5 billion systems worldwide
We went from ~50k installation, to 1.5 billion. On every windows 10 and 11 computer, near you (even the ones in the IT store)
Sep 3, 2024 • 
RWKV
2

Share this post

RWKV Open Source Development Blog
RWKV Open Source Development Blog
🚀 RWKV.cpp - shipping to 1.5 billion systems worldwide
Copy link
Facebook
Email
Notes
More
🦅 EagleX v2 : Soaring past LLaMA2 7B in both English and Multi-lang evals (RWKV-v5)
You have seen the teaser with the EagleX 1.7T, now its here - the definitive version of linear transformer trained past, LLaMA 2 7B.
Apr 18, 2024 • 
RWKV
1

Share this post

RWKV Open Source Development Blog
RWKV Open Source Development Blog
🦅 EagleX v2 : Soaring past LLaMA2 7B in both English and Multi-lang evals (RWKV-v5)
Copy link
Facebook
Email
Notes
More

Ready for more?

© 2025 RWKV
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More

Create your profile

User's avatar

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.