Crates.io | candle-flash-attn-v1 |
lib.rs | candle-flash-attn-v1 |
version | 0.0.1 |
created_at | 2025-04-15 11:23:22.869699+00 |
updated_at | 2025-04-15 11:23:22.869699+00 |
description | Flash attention V1 layer for the candle ML framework. |
homepage | https://github.com/huggingface/candle-extensions/candle-flash-attn-v1/ |
repository | https://github.com/huggingface/candle-extensions/ |
max_upload_size | |
id | 1634308 |
size | 25,601,223 |
Flash Attention v2 does not support Turing GPUs (T4, RTX 2080). This layer can be used in replacement of the official flash attention Candle layer in the meantime.