<div dir="auto">I think we were talking about explanation of reasoning in AI systems recently.  This thesis proposal is relevant.<br><br><div data-smartmail="gmail_signature">---<br>Frank C. Wimberly<br>140 Calle Ojo Feliz, <br>Santa Fe, NM 87505<br><br>505 670-9918<br>Santa Fe, NM</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">---------- Forwarded message ---------<br>From: <strong class="gmail_sendername" dir="auto">Diane Stidle</strong> <span dir="auto"><<a href="mailto:stidle@andrew.cmu.edu">stidle@andrew.cmu.edu</a>></span><br>Date: Mon, Nov 30, 2020, 8:04 AM<br>Subject: Reminder: Thesis Proposal - December 1, 2020 - Nicolay Topin - Unifying State and Policy Level Explanations for Reinforcement Learning<br>To: <a href="mailto:ml-seminar@cs.cmu.edu">ml-seminar@cs.cmu.edu</a> <<a href="mailto:ML-SEMINAR@cs.cmu.edu">ML-SEMINAR@cs.cmu.edu</a>>,  <<a href="mailto:marie.desjardins@simmons.edu">marie.desjardins@simmons.edu</a>><br></div><br><br>

  
  <div>

    <blockquote type="cite">

      
      <p><i><b>Thesis Proposal</b></i></p>

      <p>Date: December 1, 2020<br>

        Time: 10:00am (EST)<br>

        Speaker: Nicolay Topin</p>

      <p><font size="-1">Zoom Meeting Link: <a href="https://cmu.zoom.us/j/99269721240?pwd=a3c5QytZbE01a0w4WEpIS3RpSjFSdz09" target="_blank" rel="noreferrer">https://cmu.zoom.us/j/99269721240?pwd=a3c5QytZbE01a0w4WEpIS3RpSjFSdz09</a><br>

          Meeting ID: 992 6972 1240<br>

          Password: 068976</font></p>

      <p><b>Title</b><b><font size="-1">: Unifying State and Policy

            Level Explanations for Reinforcement Learning</font></b></p>

      <p><font size="-1">Abstract:</font><br>

        <font size="-1"><font size="-1">In an off-policy reinforcement

            learning setting, an agent observes interactions with an

            environment to learn a policy to maximize reward. Before the

            agent is allowed to follow its learned policy, a human

            operator can use explanations to gauge the agent's

            competency and try to understand its behavior. Policy-level

            behavior explanations illustrate the long-term behavior of

            the agent. Feature importance explanations identify the

            features of a state that affect an agent’s action choice for

            that state. Experience importance explanations show which

            past experiences led to the current behavior. Previous

            methods for creating explanations have provided a subset of

            information types but not all three at once. In this thesis,

            we address the problem of creating explanations for a

            reinforcement learning agent that include this full set of

            information types. We contribute a novel explanation method

            that unifies and extends these existing explanation types.<br>

            <br>

            We have created a method for producing feature importance

            explanations by learning a decision tree policy using

            reinforcement learning. This method formulates the problem

            as a Markov decision process, so standard off-policy

            learning algorithms can be used to learn an optimal decision

            tree. Likewise, we have created an algorithm for summarizing

            policy-level behavior as a Markov chain over abstract

            states. Our approach uses a set of decision trees to map

            states to abstract states. In addition, we have introduced a

            method for creating experience importance explanations which

            identifies sets of similarly treated inputs and how these

            sets impacted training.<br>

            <br>

            We propose two lines of future work. First, we will

            integrate the two decision tree explanations (for feature

            importance explanations and policy-level behavior

            explanations) via a shared state featurization. Second, we

            will extend the experience importance explanation algorithm

            to identify important experiences for both abstract state

            division as well as the agent's choice of features to

            examine.<br>

            <br>

            <b>Thesis committee:</b><br>

            Manuela Veloso (Chair)<br>

            Tom Mitchell<br>

            Ameet Talwalkar<br>

            Marie desJardins (Simmons University)</font><br>

        </font></p>

    </blockquote>

  </div>


</div>