[FRIAM] AI possibilities

Fri Apr 7 15:54:52 EDT 2023

One of the principles of FAIR+ is to carefully track the provenance of data. Taken seriously, this implies we should keep the "raw" data [⛧], each transform, and each checkpoint of the derived data, throughout any given workflow. What LLMs do is start a workflow from a checkpoint (or a set of checkpoints) data, largely ignoring the provenance of that data. Biological organisms are a bit different in their ability to return to the "raw" data and reproduce (not repeat) checkpointed data through similar workflows. E.g. I can not only read about brewing (starting in the middle of someone else's workflow), I can also brew (reproduce their workflow myself). LLMs can't do that. But a LLM embodied in metal or flesh *could* do that.

[⛧] There is no such thing as raw data. There is no such thing as absolute grounding. It's an infinite regress.

On 4/7/23 12:21, Steve Smith wrote:
> 
>>> Or ... or ... they counter the conventional wisdom that *humans* generalize their learning or reasoning beyond text. We are the OG bots.
> 
> I am fascinated by the "bootstrapping" that semantic/syntactic recursion seems to imply.   I'm looking for examples in these LLMs where this is exposed.   It seems that at the level of sophistication GPT4 is responding, this must be common?
> 
> It feels as if there might be a surprisingly robust "house of cards" built entirely of lexical units which are stacked very cleverly together (in an echo of the patterns of the artificial neural net connections made while training an LLM?) and you are suggesting that this is roughly what human learning/understanding looks like as well...
> 
> Lakoff and Nunez's "embodiment" arguements seem to suggest that even if there may be some elaborate card-stacking going on, there is a foundation (in human understanding) of sensorial/sentient/qualia experiences they ground out upon/in?
> 
> If we tentatively accept this account, then I wonder what LLMs are grounded in?  It would seem maybe "hallucinations" of sensorial experiences?
> 
> So: from the horse's mouth:
> 
>     what are LLMs' structures grounded out in?
> 
>     Language models such as LLMs (large language models) are grounded in vast amounts of textual data that they are trained on. The structures of LLMs are derived from the statistical patterns and regularities that are present in the data. These patterns are then encoded as learned parameters in the neural network architecture of the language model.
> 
>     In other words, the structures of LLMs are not explicitly programmed by humans, but are instead learned from large datasets of natural language text. This allows LLMs to capture the complex and subtle patterns of natural language, including its syntax, semantics, and pragmatics.
> 
>     While LLMs are not directly grounded in embodied experiences, as in the theory of embodied mind, they do reflect the linguistic and cultural context in which the data they are trained on was produced.
> 
> 
>>
>> I do really appreciate this duality/tension:   I think you were the first to alert me to this a few thousand messages back (before LLMs/GPT talk, etc erupted here) though I vaguely remember Marcus making a (qualitatively) similar statement as well.  I think his comment was about whether human (early childhood in particular) was anything different from "emulation".
>>
>>>
>>> On 4/7/23 09:15, Steve Smith wrote:
>>>>     These findings counter the conventional wisdom that LLMs are merely statistical next-word predictors and can’t generalize their learning or reasoning beyond text.
>>>

-- 
ꙮ Mɥǝu ǝlǝdɥɐuʇs ɟᴉƃɥʇ' ʇɥǝ ƃɹɐss snɟɟǝɹs˙ ꙮ