[FRIAM] Grokking Mechanistic Interpretability

Thu Aug 15 12:06:49 EDT 2024

Digging a bit more into the arxiv paper: https://arxiv.org/pdf/2301.05217

I am a bit surprised that the network bothers to 'discover' discrete
fourier transforms rather than discovering convolutions in the home domain.

It is also surprising that the 'phase transition' to generalization is
relatively smooth wrt continued performance over the task. The network
appears to use memorization as scaffolding toward further amplifications of
structured mechanisms, this is then followed by garbage collection over the
scaffolding. Is this kind of thing specific to these architectures? Is
there evidence for something similar with us?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://redfish.com/pipermail/friam_redfish.com/attachments/20240815/771317f7/attachment.html>