mædoc's notes

grokking RWKV

WIP over winter holidays

Resources

https://github.com/huggingface/candle/blob/main/candle-transformers/src/models/rwkv_v6.rs rust impl of v6

https://github.com/Jellyfish042/LongMamba/tree/main context length verification

Stuff to try

v5-v7 math overview

The innovations in RWKV are mainly in the time mixing parts, so let's cover those first.

Eagle (v5)

Drops the iterative ratio of exponentials of v4 and uses matrix valued state tracking key-value inputs.

Finch (v5)

Uses a data dependent decay

Goose (v7)

Employs an explicit in-context iterative least squares which is reminiscent of update steps of a Kalman-Filter?

Channel mixing

some tricks with ICL

working through these steps reinforced how S is a kv cache for retrieval.

#llm #rwkv