[FRIAM] Training your value network

Tue Feb 1 19:31:31 EST 2022

Yeah, there were a couple of things that struck me as either interesting
or telling about the video. There is a novelty regarding the lack of
editing, the unselfconsciousness of the audience, and the humility with
which the project of *being* a value network was approached. The video
is a welcome respite from the overwhelming number of highly polished
video essays that dominate YouTube.

I arrived at this video while watching some professional go games on
NHK (Thank you Jonathan Hop for your closed-captioned translations!)
At work, factions are almost completely polarized into anti-AI and pro-AI
stances. Each placing bets to see what pay-off to the sciences AI will
ultimately have. In the meantime, I am impressed by certain progressive
attitudes toward AI that we see in gaming communities. While I know that
you (EricC) can contribute quite a bit about the impact of AI analysis
in poker, I mostly understand the impact on the go community. There, and
at the risk of saying something ugly, I see a parallel to the wholesale
adoption of western style-thinking in Japan post the atomic bombs of 1945.

Professional games and analyses today are heavily influenced by the
discoveries of AlphaGo. The live commentaries make explicit reference
when a player does something classical (pre-2016), before playing out
variations more indicative of the new style. "Yeah, players once thought
that the center wasn't that big, but now we see with AlphaGo that this
isn't the case" or "It seems that invading at the 3-3 point early is
bigger than we once thought" or "Yes, this is one of the new josekis
(corner patterns giving an even result) *discovered* by AlphaGo"...

There is a sense that the AI is a kind of telescope, allowing players
to see more *deeply* into the universe of go. In the video, we see yet
another variant of this kind of thinking. There, the lecturer discusses
how DeepMind went about factoring their bot into a collaborative (rather
than adversarial) pair: a policy network (a kind of navigator suggesting
possible local strategies) and a value network (the pilot who ultimately
determines the course). Then the lecturer discusses how this network
was trained before inviting the audience to train their *value networks*
like AlphaGo does. Interesting stuff.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://redfish.com/pipermail/friam_redfish.com/attachments/20220201/d2e0ee26/attachment.html>