<div dir="auto"><div><br></div><div><br></div><div data-smartmail="gmail_signature">---<br>Frank C. Wimberly<br>140 Calle Ojo Feliz, <br>Santa Fe, NM 87505<br><br>505 670-9918<br>Santa Fe, NM</div></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">---------- Forwarded message ---------<br>From: <strong class="gmail_sendername" dir="auto">Andrej Risteski</strong> <span dir="auto"><<a href="mailto:aristesk@andrew.cmu.edu">aristesk@andrew.cmu.edu</a>></span><br>Date: Thu, May 22, 2025, 3:32 PM<br>Subject: Impromptu talk by Vahab Mirrokni (Google) of possible interest<br>To:  <<a href="mailto:ml-seminar@cs.cmu.edu">ml-seminar@cs.cmu.edu</a>><br></div><br><br><div dir="ltr">Hi all, <div><br></div><div>of possible interest: Vahab Mirrokni (Google) will speak tomorrow (Friday May 23) in GHC 6115 from 1:30-2pm. (This got arranged at the very last minute, so the last minute advertisement) Talk info below.  </div><div><br></div><div dir="ltr"><div dir="ltr"><b>Title: </b><i>ML Efficiency for Large Models: From Data Efficiency to Faster Transformers</i></div><div dir="ltr"><br></div><div dir="ltr">Abstract: Scaling large models efficiently for faster training and inference is a fundamental challenge. In this talk, we present a number of algorithmic challenges and potential solutions from theory to practice. First, we discuss data efficiency and model efficiency problems that can be formalized as a subset selection problem. For model efficiency, we present sequential attention for feature selection and sparsification[ICLR'23, arxiv]. For data efficiency, we present a sensitivity sampling technique for improved quality and efficiency of the models[ICML'24]. Furthermore, we discuss the intrinsic quadratic complexity of attention models as well as token generation. We first discuss HyperAttention; a technique to develop linear-time attention algorithms under mild assumptions[ICLR'24]. We then present PolySketchFormer, a technique to bypass the hardness results of achieving sub-quadratic attention by applying sketching of polynomial functions[ICML'24]. We also show how to address the complexity of token generation via clustering techniques[arxiv]. Finally, I will discuss Titans, which is a family of architectures based on a new neural long-term memory module that learns to memorize a historical context and helps an attention attend to the current context while utilizing long past information.</div><div dir="ltr"><br></div><div dir="ltr"><b>Bio</b>: Vahab Mirrokni is a Google Fellow and VP at Google Research, and now Gemini data area lead. He also leads the algorithm and optimization research groups at Google. These research teams include:  market algorithms,  large-scale graph mining, and large-scale optimization. Previously he was a distinguished scientist and senior research director at Google. He received his PhD from MIT in 2005 and his B.Sc. from Sharif University of Technology in 2001. He joined Google Research in 2008, after research positions at Microsoft Research, MIT and Amazon. He is the co-winner of best paper awards at KDD, ACM EC, and SODA. His research areas include algorithms, distributed and stochastic optimization, and computational economics. Recently he has been working on various algorithmic problems in machine learning, online optimization and mechanism design, and large-scale graph-based learning.</div></div><br></div>

</div>