why are LLMs differentiable?
interpolation between tokens, thanks to embedding? what does differentiability of token sequences mean for language? there’s an exercise in embedded cognition somewhere!
maybe in expressing the relation between movement in space and language: an agent which describes moving through a maze or actually does it, probably has a similar activation pattern. grounding LLMs in an (artificial) reality seems to be the next trend,
- https://blog.eleuther.ai/minetester-intro/
- https://www.ft.com/content/7c3dafa8-ffb9-4ca8-b677-ab3cc2afbdcb
this could be concretely tested with less hardware, on smaller scale, with a small LM in e g an ascii roguelike game. Maybe pairing that with an active inference like approach.