<div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small;color:#333333">Digging a bit more into the arxiv paper: <a href="https://arxiv.org/pdf/2301.05217">https://arxiv.org/pdf/2301.05217</a></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small;color:#333333"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small;color:#333333">I am a bit surprised that the network bothers to 'discover' discrete fourier transforms rather than discovering convolutions in the home domain.<br><br>It is also surprising that the 'phase transition' to generalization is relatively smooth wrt continued performance over the task. The network appears to use memorization as scaffolding toward further amplifications of structured mechanisms, this is then followed by garbage collection over the scaffolding. Is this kind of thing specific to these architectures? Is there evidence for something similar with us?<br></div></div>