TurboQuant is a compression algorithm introduced by Google Research (Zandieh et al.) at ICLR 2026 that solves the primary memory bottleneck in large language model inference: the key-value (KV) cache.
The third quarter of 2025 was dominated by massive rounds for companies developing AI chips and quantum computers. Over $2.5 billion went to AI, with wafer-scale chip maker Cerebras leading the pack ...
Most linear algebra courses start by considering how to solve a system of linear equations. \[ \begin{align} a_{0,0}x_0 + a_{0,1}x_0 + \cdots a_{0,n-1}x_0 & = b_0 ...
Matrix-vector multiplications form the core of a plethora of scientific computing and machine learning applications that include solving partial differential equations, forward and back propagation in ...
Despite the increasing number of pharmaceutical companies, university laboratories and funding, less than one percent of initially researched drugs enter the commercial market. In this context, ...
This paper concerns the more foundational tasks of distributed dense linear algebra. While a single TPU core can already store and operate on large matrices (e.g., of size 16,384, 32,768 in single ...
The calculation of derivatives is ubiquitous in science and engineering. In thermodynamics, in particular, state properties can be expressed as derivatives of thermodynamic potentials. The manual ...