[FRIAM] Language Model Understanding

Sat Oct 7 18:10:59 EDT 2023

The “large” refers to the number of parameters used.  A smaller large language model – a deep neural net -- start about 3 billion parameters, but larger ones like Claude 2 (the latest large language model of the company that wrote the paper Steve mentioned) have more than 130 billion parameters.   Amazingly, it is possible using (rooms of) GPUs and other accelerators to optimize in this a space of this size.   The billions of parameters come from the vocabulary size – the number of tokens that need to be discriminated, the many layers of transformers that are needed to capture the complexity of human and non-human languages (like DNA), and the context window size – how many paragraphs or pages the model is trained on at a time.   A small language model might be suitable for understanding the geometries of chemicals, say.

From: Friam <friam-bounces at redfish.com> On Behalf Of Tom Johnson
Sent: Saturday, October 7, 2023 2:38 PM
To: The Friday Morning Applied Complexity Coffee Group <friam at redfish.com>
Subject: Re: [FRIAM] Language Model Understanding

Thanks for passing this along, Steve. I wish, however, the authors of this short piece would have included a definition of, in their usage, "Large Language Models" and "Small Language Models."  Perhaps I can find those in the larger paper.
Tom

On Sat, Oct 7, 2023 at 12:34 PM Steve Smith <sasmyth at swcp.com<mailto:sasmyth at swcp.com>> wrote:

This popular-press article came through my Google News feed recently which I thought might be useful to the Journalists/English-Majors on the list to help understand how LLMs work, etc.   When I read it in detail (forwarded from my TS (TinyScreenPhone) on my LS (Large Screen Laptop)) I found it a bit more detailed and technical than I'd expected, but nevertheless rewarding and possibly offering some traction to Journalism/English majors as well as those with a larger investment in the CS/Math implied.

Decomposing Language Models into Understandable Components
<https://www.anthropic.com/index/decomposing-language-models-into-understandable-components>
[https://efficient-manatee.transforms.svdcdn.com/production/images/Untitled-Artwork-11.png?w=2880&h=1620&auto=compress%2Cformat&fit=crop&dm=1696477668&s=d32264d5f5e32c79026b8e310e415c74]

and the (more) technical paper behind the article

https://transformer-circuits.pub/2023/monosemantic-features/index.html
Despite having sent a few dogs into vaguely similar scuffles in my careen(r):
Faceted Ontologies for Pre Incident Indicator Analysis <https://apps.dtic.mil/sti/tr/pdf/ADA588086.pdf>
SpindleViz<https://www.ehu.eus/ccwintco/uploads/c/c6/HAIS2010_925.pdf>
...

... I admit to finding this both intriguing and well over my head on casual inspection...  the (metaphorical?) keywords that drew me in  most strongly included Superposition and Thought Vectors, though they are (nod to Glen) probably riddled (heaped, overflowing, bursting, bloated ... )  with excess meaning.

https://gabgoh.github.io/ThoughtVectors/

This leads me (surprise!) to an open ended discursive series of thoughts probably better left for a separate posting (probably rendered in a semasiographic language like Heptapod B<https://en.wikipedia.org/wiki/Heptapod_languages#Orthography>).

<must... stop... now... >

- Steve
-. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://redfish.com/pipermail/friam_redfish.com/attachments/20231007/cb98ccb8/attachment.html>