Transformer Encoder/Decoder

KAT: A Knowledge Augmented Transformer for Vision-and-Language

Can multimodal transformers leverage explicit knowledge in their reasoning? Existing, primarily unimodal, methods have explored approaches under the paradigm of knowledge retrieval followed by answer ...

IEEE

MISSFormer: An Effective Transformer for 2D Medical Image Segmentation

Abstract: Transformer-based methods are recently popular in vision tasks because of their capability to model global dependencies alone. However, it limits the performance of networks due to the lack ...

GitHub

HunyuanVideo: A Systematic Framework For Large Video Generation Model

We present HunyuanVideo, a novel open-source video foundation model that exhibits performance in video generation that is comparable to, if not superior to, leading closed-source models. In order to ...

IEEE

Temporal Convolutional and Fusional Transformer Model With Bi-LSTM Encoder-Decoder for Multi-Time-Window Remaining Useful Life Prediction

Abstract: Health prediction is crucial for ensuring reliability, minimizing downtime, and optimizing maintenance in industrial systems. Remaining Useful Life (RUL) prediction is a key component of ...

InfoQ

Gemma 4 12B Enables On-Device, Multimodal Agentic Workflows with an Encoder-free Architecture

A monthly overview of things you need to know as an architect or aspiring architect. Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results