[FRIAM] NickC channels DaveW
Steve Smith
sasmyth at swcp.com
Wed Jan 18 12:51:21 EST 2023
Marcus wrote:
> Cerebras says they can scale past 1 trillion parameters..
>
> Cerebras-MemoryX-SwarmX-CS-2_banner.png
> Wafer Scale to ‘Brain-Scale’ – Cerebras Touts Linear Scaling up to 192
> CS-2 Systems
> <https://www.hpcwire.com/2021/08/24/wafer-scale-to-brain-scale-cerebras-touts-linear-scaling-for-up-to-192-cs-2-systems/>
> hpcwire.com
> <https://www.hpcwire.com/2021/08/24/wafer-scale-to-brain-scale-cerebras-touts-linear-scaling-for-up-to-192-cs-2-systems/>
>
> <https://www.hpcwire.com/2021/08/24/wafer-scale-to-brain-scale-cerebras-touts-linear-scaling-for-up-to-192-cs-2-systems/>
>
> That would be a power budget of a HPC center, but not out of the
> ordinary. Less than 10 MW. AWS, Azure, Google and national labs
> have facilities like that.
This post/response has a dual-purpose... one is to respond/question
specifics about this modern (special purpose) cluster-architecture and
the other is to *allude to* Glen's point about topology/conformation of
the "networks" (in this case the computational fabric) vs simple
brute-force scaling (e.g. count of processing elements vs neurons/axons
vs microbiome flux, etc.)
Thanks for this reference... I haven't been (even a poser as) an HPC
wonk for about 2 decades but was surprised when I drilled down on the
"SwarmX Fabric"
<https://f.hubspotusercontent30.net/hubfs/8968533/Virtual%20Booth%20Docs/CS%20Weight%20Streaming%20White%20Paper%20111521.pdf>
Cerebras quotes/advertises/invokes and was unable to find many details
on what they mean (might just be my rustyness in
cluster-interconnect-fabric terminology, or my limited
focus/attention-span?) by "weight streaming" and quite what a
"bidirectional tree topology" is... I can parse the words, but am having
a problem finding (understanding?) the specifics.
I am taking terminology about "weight sparsity" to be somewhat specific
to Neural Net training/models but also maybe not that far from the more
general problems encountered in sparse matrix solvers? In 2001-2003 I
worked with the ASCI Q-machine
<https://www.lanl.gov/conferences/salishan/salishan2003/morrison.pdf>
project on a *simulator* for the machine before it was
delivered/constructed/debugged/testable, and became modestly familiar
with it's topology and the problems presented that *seem*
(superficially) parallel with those presented in this Cerebras system.
The Quadrics(tm) 4x4 high-speed interconnects used to build a
hierarchical(ish) (Fat H Tree
<https://www.researchgate.net/publication/260585525_Expandable_and_Cost-Effective_Network_Structures_for_Data_Centers_Using_Dual-Port_Servers/figures?lo=1>)
switching fabric (overlapping quad-tree) seems to be similar in concept
if not in spec/implementation to what Cerebras is building. The
Q-machine was perhaps designed for more general purpose problems, but
the flagship (as I understood it) problem of the day WAS huge sparse
matrix solvers, albeit to address Computation Fluid Dynamics and
Radiation Transport (rather than LL) Models. GPU computational fabric
as (becoming) a new thing at the time, it is amazing what the special
purpose *tensor processors* of today seem to be capable of
There are probably some HPC wonks here who *do* know this off the top of
their heads, but I was fascinated (as I drilled down) to discover how
(relatively) similar the architectures and the problems seem (to me).
It is likely that I am just the hammer seeing everything as a nail, of
course.
Our "garish" collection of visual representations of the Q machine
switching fabric (with simulated traffic on a simulated problem), much
better experienced en-virtu, of course! Ed will probably remember
seeing this "back in the day"...
https://www.researchgate.net/publication/220586586_Graph_Visualization_for_the_Analysis_of_the_Structure_and_Dynamics_of_Extreme-Scale_Supercomputers/figures?lo=1
https://www.lanl.gov/conferences/salishan/salishan2003/morrison.pdf
https://f.hubspotusercontent30.net/hubfs/8968533/Virtual%20Booth%20Docs/CS%20Weight%20Streaming%20White%20Paper%20111521.pdf
<https://f.hubspotusercontent30.net/hubfs/8968533/Virtual%20Booth%20Docs/CS%20Weight%20Streaming%20White%20Paper%20111521.pdf>
nod to Glen... I might probably be able to install/load/try the openAI
model if I wasn't wasting so much time careening down memory lane and
trying to register what I see in my rear view mirrors with what I see
screaming down the Autobhan-of-the-mind through my windscreen!
Or maybe it is chatGPT/GPT3 actually *writing* this post for me? Or
have *I* become "one" with my AI-Overlord "dual" who is collaborating
with me on this? What is the distinction between parasitic and symbiotic?
Who will ever know...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://redfish.com/pipermail/friam_redfish.com/attachments/20230118/fe80e828/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Cerebras-MemoryX-SwarmX-CS-2_banner.png
Type: image/png
Size: 214215 bytes
Desc: not available
URL: <http://redfish.com/pipermail/friam_redfish.com/attachments/20230118/fe80e828/attachment-0001.png>
More information about the Friam
mailing list