[FRIAM] NickC channels DaveW

Wed Jan 18 12:51:21 EST 2023

Marcus wrote:
> Cerebras says they can scale past 1 trillion parameters..
>
> Cerebras-MemoryX-SwarmX-CS-2_banner.png
> Wafer Scale to ‘Brain-Scale’ – Cerebras Touts Linear Scaling up to 192 
> CS-2 Systems 
> <https://www.hpcwire.com/2021/08/24/wafer-scale-to-brain-scale-cerebras-touts-linear-scaling-for-up-to-192-cs-2-systems/>
> hpcwire.com 
> <https://www.hpcwire.com/2021/08/24/wafer-scale-to-brain-scale-cerebras-touts-linear-scaling-for-up-to-192-cs-2-systems/>
>
> <https://www.hpcwire.com/2021/08/24/wafer-scale-to-brain-scale-cerebras-touts-linear-scaling-for-up-to-192-cs-2-systems/>
>
> That would be a power budget of a HPC center, but not out of the 
> ordinary.  Less than 10 MW.   AWS, Azure, Google and national labs 
> have facilities like that.
This post/response has a dual-purpose... one is to respond/question 
specifics about this modern (special purpose) cluster-architecture and 
the other is to *allude to* Glen's point about topology/conformation of 
the "networks" (in this case the computational fabric) vs simple 
brute-force scaling (e.g. count of processing elements vs neurons/axons 
vs microbiome flux, etc.)

Thanks for this reference... I haven't been (even a poser as) an HPC 
wonk for about 2 decades but was surprised when I drilled down on the 
"SwarmX Fabric" 
<https://f.hubspotusercontent30.net/hubfs/8968533/Virtual%20Booth%20Docs/CS%20Weight%20Streaming%20White%20Paper%20111521.pdf> 
Cerebras quotes/advertises/invokes and was unable to find many details 
on what they mean (might just be my rustyness in 
cluster-interconnect-fabric terminology, or my limited 
focus/attention-span?)  by "weight streaming" and quite what a 
"bidirectional tree topology" is... I can parse the words, but am having 
a problem finding (understanding?) the specifics.

I am taking terminology about "weight sparsity" to be somewhat specific 
to Neural Net training/models but also maybe not that far from the more 
general problems encountered in sparse matrix solvers?    In 2001-2003 I 
worked with the ASCI Q-machine 
<https://www.lanl.gov/conferences/salishan/salishan2003/morrison.pdf> 
project on a *simulator* for the machine before it was 
delivered/constructed/debugged/testable, and became modestly familiar 
with it's topology and the problems presented that *seem* 
(superficially) parallel with those presented in this Cerebras system.   
The Quadrics(tm) 4x4 high-speed interconnects used to build a 
hierarchical(ish) (Fat H Tree 
<https://www.researchgate.net/publication/260585525_Expandable_and_Cost-Effective_Network_Structures_for_Data_Centers_Using_Dual-Port_Servers/figures?lo=1>) 
switching fabric (overlapping quad-tree) seems to be similar in concept 
if not in spec/implementation to what Cerebras is building.   The 
Q-machine was perhaps designed for more general purpose problems, but 
the flagship (as I understood it) problem of the day WAS huge sparse 
matrix solvers, albeit to address Computation Fluid Dynamics and 
Radiation Transport (rather than LL) Models.  GPU computational fabric 
as (becoming) a new thing at the time,  it is amazing what the special 
purpose *tensor processors* of today seem to be capable of

There are probably some HPC wonks here who *do* know this off the top of 
their heads, but I was fascinated (as I drilled down) to discover how 
(relatively) similar the architectures and the problems seem (to me).  
It is likely that I am just the hammer seeing everything as a nail, of 
course.

Our "garish" collection of visual representations of the Q machine 
switching fabric (with simulated traffic on a simulated problem), much 
better experienced en-virtu, of course!   Ed will probably remember 
seeing this "back in the day"...

https://www.researchgate.net/publication/220586586_Graph_Visualization_for_the_Analysis_of_the_Structure_and_Dynamics_of_Extreme-Scale_Supercomputers/figures?lo=1

https://www.lanl.gov/conferences/salishan/salishan2003/morrison.pdf

https://f.hubspotusercontent30.net/hubfs/8968533/Virtual%20Booth%20Docs/CS%20Weight%20Streaming%20White%20Paper%20111521.pdf 
<https://f.hubspotusercontent30.net/hubfs/8968533/Virtual%20Booth%20Docs/CS%20Weight%20Streaming%20White%20Paper%20111521.pdf>

nod to Glen... I might probably be able to install/load/try the openAI 
model if I wasn't wasting so much time careening down memory lane and 
trying to register what I see in my rear view mirrors with what I see 
screaming down the Autobhan-of-the-mind through my windscreen!

Or maybe it is chatGPT/GPT3 actually *writing* this post for me?   Or 
have *I* become "one" with my AI-Overlord "dual" who is collaborating 
with me on this?   What is the distinction between parasitic and symbiotic?

Who will ever know...

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://redfish.com/pipermail/friam_redfish.com/attachments/20230118/fe80e828/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Cerebras-MemoryX-SwarmX-CS-2_banner.png
Type: image/png
Size: 214215 bytes
Desc: not available
URL: <http://redfish.com/pipermail/friam_redfish.com/attachments/20230118/fe80e828/attachment-0001.png>