<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">Marcus wrote:<br>
</div>
<blockquote type="cite"
cite="mid:F6E259DD-038A-4C85-AC65-8CC4BA0BD377@snoutfarm.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Cerebras says they can scale past 1 trillion parameters..
<div><br>
</div>
<div>
<div style="display: block;" class="">
<div style="-webkit-user-select: all; -webkit-user-drag:
element; display: inline-block;" class="apple-rich-link"
draggable="true" role="link"
data-url="https://www.hpcwire.com/2021/08/24/wafer-scale-to-brain-scale-cerebras-touts-linear-scaling-for-up-to-192-cs-2-systems/"><a
style="border-radius:10px;font-family:-apple-system,
Helvetica, Arial,
sans-serif;display:block;-webkit-user-select:none;width:300px;user-select:none;-webkit-user-modify:read-only;user-modify:read-only;overflow:hidden;text-decoration:none;"
class="lp-rich-link" rel="nofollow"
href="https://www.hpcwire.com/2021/08/24/wafer-scale-to-brain-scale-cerebras-touts-linear-scaling-for-up-to-192-cs-2-systems/"
dir="ltr" role="button" draggable="false" width="300"
moz-do-not-send="true">
<table
style="table-layout:fixed;border-collapse:collapse;width:300px;background-color:#E9E9EB;font-family:-apple-system,
Helvetica, Arial, sans-serif;"
class="lp-rich-link-emailBaseTable" width="300"
cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td vertical-align="center" align="center"><img
style="width:300px;filter:brightness(0.97);height:139px;"
draggable="false"
class="lp-rich-link-mediaImage"
alt="Cerebras-MemoryX-SwarmX-CS-2_banner.png"
src="cid:part1.ZWqt9IRo.Aw0DrPxl@swcp.com"
width="300" height="139"></td>
</tr>
<tr>
<td vertical-align="center">
<table style="font-family:-apple-system,
Helvetica, Arial,
sans-serif;table-layout:fixed;background-color:rgba(233,
233, 235, 1);" class="lp-rich-link-captionBar"
width="300" cellspacing="0" cellpadding="0"
bgcolor="#E9E9EB">
<tbody>
<tr>
<td style="padding:8px 0px 8px 0px;"
class="lp-rich-link-captionBar-textStackItem">
<div style="max-width:100%;margin:0px 16px
0px 16px;overflow:hidden;"
class="lp-rich-link-captionBar-textStack">
<div
style="word-wrap:break-word;font-weight:500;font-size:12px;overflow:hidden;text-overflow:ellipsis;text-align:left;"
class="lp-rich-link-captionBar-textStack-topCaption-leading">
<a rel="nofollow"
href="https://www.hpcwire.com/2021/08/24/wafer-scale-to-brain-scale-cerebras-touts-linear-scaling-for-up-to-192-cs-2-systems/"
style="text-decoration: none"
draggable="false"
moz-do-not-send="true"><font
style="color: rgba(0, 0, 0, 1);"
color="#000000">Wafer Scale to
‘Brain-Scale’ – Cerebras Touts
Linear Scaling up to 192 CS-2
Systems</font></a></div>
<div
style="word-wrap:break-word;font-weight:400;font-size:11px;overflow:hidden;text-overflow:ellipsis;text-align:left;"
class="lp-rich-link-captionBar-textStack-bottomCaption-leading">
<a rel="nofollow"
href="https://www.hpcwire.com/2021/08/24/wafer-scale-to-brain-scale-cerebras-touts-linear-scaling-for-up-to-192-cs-2-systems/"
style="text-decoration: none"
draggable="false"
moz-do-not-send="true"><font
style="color: rgba(60, 60, 67,
0.6);" color="#A2A2A9">hpcwire.com</font></a></div>
</div>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</a></div>
</div>
<div><br>
</div>
That would be a power budget of a HPC center, but not out of the
ordinary. Less than 10 MW. AWS, Azure, Google and national
labs have facilities like that.</div>
</blockquote>
This post/response has a dual-purpose... one is to respond/question
specifics about this modern (special purpose) cluster-architecture
and the other is to *allude to* Glen's point about
topology/conformation of the "networks" (in this case the
computational fabric) vs simple brute-force scaling (e.g. count of
processing elements vs neurons/axons vs microbiome flux, etc.)<br>
<p>Thanks for this reference... I haven't been (even a poser as) an
HPC wonk for about 2 decades but was surprised when I <a
moz-do-not-send="true"
href="https://f.hubspotusercontent30.net/hubfs/8968533/Virtual%20Booth%20Docs/CS%20Weight%20Streaming%20White%20Paper%20111521.pdf">drilled
down on the "SwarmX Fabric"</a> Cerebras
quotes/advertises/invokes and was unable to find many details on
what they mean (might just be my rustyness in
cluster-interconnect-fabric terminology, or my limited
focus/attention-span?) by "weight streaming" and quite what a
"bidirectional tree topology" is... I can parse the words, but am
having a problem finding (understanding?) the specifics.</p>
<p>I am taking terminology about "weight sparsity" to be somewhat
specific to Neural Net training/models but also maybe not that far
from the more general problems encountered in sparse matrix
solvers? In 2001-2003 I worked with the <a
moz-do-not-send="true"
href="https://www.lanl.gov/conferences/salishan/salishan2003/morrison.pdf">ASCI
Q-machine</a> project on a *simulator* for the machine before it
was delivered/constructed/debugged/testable, and became modestly
familiar with it's topology and the problems presented that *seem*
(superficially) parallel with those presented in this Cerebras
system. The Quadrics(tm) 4x4 high-speed interconnects used to
build a hierarchical(ish) (<a moz-do-not-send="true"
href="https://www.researchgate.net/publication/260585525_Expandable_and_Cost-Effective_Network_Structures_for_Data_Centers_Using_Dual-Port_Servers/figures?lo=1">Fat
H Tree</a>) switching fabric (overlapping quad-tree) seems to be
similar in concept if not in spec/implementation to what Cerebras
is building. The Q-machine was perhaps designed for more general
purpose problems, but the flagship (as I understood it) problem of
the day WAS huge sparse matrix solvers, albeit to address
Computation Fluid Dynamics and Radiation Transport (rather than
LL) Models. GPU computational fabric as (becoming) a new thing at
the time, it is amazing what the special purpose *tensor
processors* of today seem to be capable of<br>
</p>
<p>There are probably some HPC wonks here who *do* know this off the
top of their heads, but I was fascinated (as I drilled down) to
discover how (relatively) similar the architectures and the
problems seem (to me). It is likely that I am just the hammer
seeing everything as a nail, of course.</p>
<p>Our "garish" collection of visual representations of the Q
machine switching fabric (with simulated traffic on a simulated
problem), much better experienced en-virtu, of course! Ed will
probably remember seeing this "back in the day"...<br>
</p>
<div align="center"><img moz-do-not-send="true"
src="https://www.researchgate.net/profile/Kei-Davis/publication/220586586/figure/fig1/AS:276976649687057@1443047788013/Representation-QS-Quaternary-fat-tree-network-with-64-computational-nodes-small-circles_W640.jpg"
alt="" width="801" height="267"></div>
<div align="center"><a
href="https://www.researchgate.net/publication/220586586_Graph_Visualization_for_the_Analysis_of_the_Structure_and_Dynamics_of_Extreme-Scale_Supercomputers/figures?lo=1"
class="moz-txt-link-freetext">https://www.researchgate.net/publication/220586586_Graph_Visualization_for_the_Analysis_of_the_Structure_and_Dynamics_of_Extreme-Scale_Supercomputers/figures?lo=1</a></div>
<div align="center"><br>
</div>
<p><a moz-do-not-send="true"
href="https://www.lanl.gov/conferences/salishan/salishan2003/morrison.pdf"
class="moz-txt-link-freetext">https://www.lanl.gov/conferences/salishan/salishan2003/morrison.pdf</a></p>
<p><a
href="https://f.hubspotusercontent30.net/hubfs/8968533/Virtual%20Booth%20Docs/CS%20Weight%20Streaming%20White%20Paper%20111521.pdf">https://f.hubspotusercontent30.net/hubfs/8968533/Virtual%20Booth%20Docs/CS%20Weight%20Streaming%20White%20Paper%20111521.pdf
</a></p>
<p>nod to Glen... I might probably be able to install/load/try the
openAI model if I wasn't wasting so much time careening down
memory lane and trying to register what I see in my rear view
mirrors with what I see screaming down the Autobhan-of-the-mind
through my windscreen!</p>
<p>Or maybe it is chatGPT/GPT3 actually *writing* this post for
me? Or have *I* become "one" with my AI-Overlord "dual" who is
collaborating with me on this? What is the distinction between
parasitic and symbiotic? <br>
</p>
<p>Who will ever know... <br>
</p>
<p><br>
</p>
</body>
</html>