Arxiv 2021: Sparse attention Planning

Arxiv 2021: Sparse attention Planning

Is Sparse Attention more Interpretable?Подробнее

Is Sparse Attention more Interpretable?

HPCA' SpAtten: Efficient Sparse Attention Architecture w/ Cascade Token/Head Pruning by Hanrui WangПодробнее

HPCA' SpAtten: Efficient Sparse Attention Architecture w/ Cascade Token/Head Pruning by Hanrui Wang

MICRO21 SRC "Transformer Acceleration with Dynamic Sparse Attention"Подробнее

MICRO21 SRC 'Transformer Acceleration with Dynamic Sparse Attention'

Sparse is Enough in Scaling Transformers (aka Terraformer) | ML Research Paper ExplainedПодробнее

Sparse is Enough in Scaling Transformers (aka Terraformer) | ML Research Paper Explained

Learning Manipulation Skills Via Hierarchical Spatial AttentionПодробнее

Learning Manipulation Skills Via Hierarchical Spatial Attention

Short Intro HPCA'21 SpAtten: Efficient Sparse Attention Architecture with Cascade Token/Head PruningПодробнее

Short Intro HPCA'21 SpAtten: Efficient Sparse Attention Architecture with Cascade Token/Head Pruning

HPCA'21 SpAtten: Efficient Sparse Attention Architecture with Cascade Token/Head Pruning Hanrui WangПодробнее

HPCA'21 SpAtten: Efficient Sparse Attention Architecture with Cascade Token/Head Pruning Hanrui Wang

[QA] Star Attention: Efficient LLM Inference over Long SequencesПодробнее

[QA] Star Attention: Efficient LLM Inference over Long Sequences

Adaptive Transformers in NLPПодробнее

Adaptive Transformers in NLP

[QA] Attention as a HypernetworkПодробнее

[QA] Attention as a Hypernetwork

Big Bird: Transformers for Longer Sequences (Paper Explained)Подробнее

Big Bird: Transformers for Longer Sequences (Paper Explained)

What Matters in Transformers? Not All Attention is NeededПодробнее

What Matters in Transformers? Not All Attention is Needed

From Sparse to Soft Mixtures of ExpertsПодробнее

From Sparse to Soft Mixtures of Experts

Mixture of Sparse Attention for Automatic LLM CompressionПодробнее

Mixture of Sparse Attention for Automatic LLM Compression

arxiv 2404 01306Подробнее

arxiv 2404 01306

TransformerFAM: Feedback attention is working memoryПодробнее

TransformerFAM: Feedback attention is working memory

Embracing Single Stride 3D Object Detector with Sparse TransformerПодробнее

Embracing Single Stride 3D Object Detector with Sparse Transformer

CVPR2023 Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision TransformersПодробнее

CVPR2023 Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers

Giannis Daras: Improving sparse transformer models for efficient self-attention (spaCy IRL 2019)Подробнее

Giannis Daras: Improving sparse transformer models for efficient self-attention (spaCy IRL 2019)