<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">


<head>


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


<meta name="Generator" content="Microsoft Word 15 (filtered medium)">


<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}


o\:* {behavior:url(#default#VML);}


w\:* {behavior:url(#default#VML);}


.shape {behavior:url(#default#VML);}


</style><![endif]--><style><!--


/* Font Definitions */


@font-face


        {font-family:"Cambria Math";


        panose-1:2 4 5 3 5 4 6 3 2 4;}


@font-face


        {font-family:Calibri;


        panose-1:2 15 5 2 2 2 4 3 2 4;}


/* Style Definitions */


p.MsoNormal, li.MsoNormal, div.MsoNormal


        {margin:0in;


        font-size:11.0pt;


        font-family:"Calibri",sans-serif;}


a:link, span.MsoHyperlink


        {mso-style-priority:99;


        color:blue;


        text-decoration:underline;}


span.EmailStyle20


        {mso-style-type:personal-compose;


        font-family:"Calibri",sans-serif;


        color:windowtext;}


.MsoChpDefault


        {mso-style-type:export-only;


        font-family:"Calibri",sans-serif;


        mso-ligatures:none;}


@page WordSection1


        {size:8.5in 11.0in;


        margin:1.0in 1.0in 1.0in 1.0in;}


div.WordSection1


        {page:WordSection1;}


--></style>


</head>


<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">


<div class="WordSection1">


<p class="MsoNormal">The “large” refers to the number of parameters used.  A smaller large language model – a deep neural net -- start about 3 billion parameters, but larger ones like Claude 2 (the latest large language model of the company that wrote the paper


 Steve mentioned) have more than 130 billion parameters.   Amazingly, it is possible using (rooms of) GPUs and other accelerators to optimize in this a space of this size.   The billions of parameters come from the vocabulary size – the number of tokens that


 need to be discriminated, the many layers of transformers that are needed to capture the complexity of human and non-human languages (like DNA), and the context window size – how many paragraphs or pages the model is trained on at a time.   A small language


 model might be suitable for understanding the geometries of chemicals, say. <o:p>


</o:p></p>


<p class="MsoNormal"><o:p> </o:p></p>


<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">


<p class="MsoNormal"><b>From:</b> Friam <friam-bounces@redfish.com> <b>On Behalf Of


</b>Tom Johnson<br>


<b>Sent:</b> Saturday, October 7, 2023 2:38 PM<br>


<b>To:</b> The Friday Morning Applied Complexity Coffee Group <friam@redfish.com><br>


<b>Subject:</b> Re: [FRIAM] Language Model Understanding<o:p></o:p></p>


</div>


<p class="MsoNormal"><o:p> </o:p></p>


<div>


<p class="MsoNormal">Thanks for passing this along, Steve. I wish, however, the authors of this short piece would have included a definition of, in their usage, "Large Language Models" and "Small Language Models."  Perhaps I can find those in the larger paper.<o:p></o:p></p>


<div>


<p class="MsoNormal">Tom<o:p></o:p></p>


</div>


</div>


<p class="MsoNormal"><o:p> </o:p></p>


<div>


<div>


<p class="MsoNormal">On Sat, Oct 7, 2023 at 12:34 PM Steve Smith <<a href="mailto:sasmyth@swcp.com">sasmyth@swcp.com</a>> wrote:<o:p></o:p></p>


</div>


<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">


<div>


<p>This popular-press article came through my Google News feed recently which I thought might be useful to the Journalists/English-Majors on the list to help understand how LLMs work, etc.   When I read it in detail (forwarded from my TS (TinyScreenPhone) on


 my LS (Large Screen Laptop)) I found it a bit more detailed and technical than I'd expected, but nevertheless rewarding and possibly offering some traction to Journalism/English majors as well as those with a larger investment in the CS/Math implied.<o:p></o:p></p>


<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">


<p><a href="https://www.anthropic.com/index/decomposing-language-models-into-understandable-components" target="_blank">Decomposing Language Models into Understandable Components<br>


</a><o:p></o:p></p>


<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">


<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">


<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">


<p class="MsoNormal"><img border="0" width="238" height="133" style="width:2.4821in;height:1.3869in" id="_x0000_i1025" src="https://efficient-manatee.transforms.svdcdn.com/production/images/Untitled-Artwork-11.png?w=2880&h=1620&auto=compress%2Cformat&fit=crop&dm=1696477668&s=d32264d5f5e32c79026b8e310e415c74"><o:p></o:p></p>


</blockquote>


</blockquote>


</blockquote>


</blockquote>


<p>and the (more) technical paper behind the article<o:p></o:p></p>


<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">


<p><a href="https://transformer-circuits.pub/2023/monosemantic-features/index.html" target="_blank">https://transformer-circuits.pub/2023/monosemantic-features/index.html<br>


</a><o:p></o:p></p>


</blockquote>


<p class="MsoNormal">Despite having sent a few dogs into vaguely similar scuffles in my careen(r):<o:p></o:p></p>


<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">


<p class="MsoNormal"><a href="https://apps.dtic.mil/sti/tr/pdf/ADA588086.pdf" target="_blank">Faceted Ontologies for Pre Incident Indicator Analysis


</a><br>


<a href="https://www.ehu.eus/ccwintco/uploads/c/c6/HAIS2010_925.pdf" target="_blank">SpindleViz</a><br>


...<o:p></o:p></p>


</blockquote>


<p>... I admit to finding this both intriguing and well over my head on casual inspection...  the (metaphorical?) keywords that drew me in  most strongly included


<i>Superposition</i> and <i>Thought Vectors</i>, though they are (nod to Glen) probably riddled (heaped, overflowing, bursting, bloated ... )  with excess meaning.<o:p></o:p></p>


<p><a href="https://gabgoh.github.io/ThoughtVectors/" target="_blank">https://gabgoh.github.io/ThoughtVectors/</a><o:p></o:p></p>


<p>This leads me (surprise!) to an open ended discursive series of thoughts probably better left for a separate posting (probably rendered in a semasiographic language like


<a href="https://en.wikipedia.org/wiki/Heptapod_languages#Orthography" target="_blank">


Heptapod B</a>).  <o:p></o:p></p>


<p><must... stop... now... ><o:p></o:p></p>


<p>- Steve<o:p></o:p></p>


</div>


<p class="MsoNormal">-. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .<br>


FRIAM Applied Complexity Group listserv<br>


Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom <a href="https://bit.ly/virtualfriam" target="_blank">


https://bit.ly/virtualfriam</a><br>


to (un)subscribe <a href="http://redfish.com/mailman/listinfo/friam_redfish.com" target="_blank">


http://redfish.com/mailman/listinfo/friam_redfish.com</a><br>


FRIAM-COMIC <a href="http://friam-comic.blogspot.com/" target="_blank">http://friam-comic.blogspot.com/</a><br>


archives:  5/2017 thru present <a href="https://redfish.com/pipermail/friam_redfish.com/" target="_blank">


https://redfish.com/pipermail/friam_redfish.com/</a><br>


  1/2003 thru 6/2021  <a href="http://friam.383.s1.nabble.com/" target="_blank">http://friam.383.s1.nabble.com/</a><o:p></o:p></p>


</blockquote>


</div>


</div>


</body>


</html>