CNN has been widely used in many image-related machine learning algorithms due to its high accuracy for image recognition. Convolution and fully-connected layers are two essential components for CNN.
Abstract: Band matrix multiplication is widely used in DSP systems. However, for band matrix multiplication, the traditional Kung-Leiserson systolic array cannot be realized with high cell-efficiency.
This paper concerns the more foundational tasks of distributed dense linear algebra. While a single TPU core can already store and operate on large matrices (e.g., of size 16,384, 32,768 in single ...
Abstract: Systolic arrays (SAs) are highly parallel pipelined structures capable of executing various tasks such as matrix multiplication and convolution. They comprise a grid of usually homogeneous ...