candle-lstm

Crates.io	candle-lstm
lib.rs	candle-lstm
version	0.2.3
source	src
created_at	2024-08-24 15:00:48.56997
updated_at	2024-09-25 07:19:05.912539
description	optimize HuggingFace candle LSTM in some cases
homepage
repository	https://github.com/kigichang/candle-lstm
max_upload_size
id	1350356
size	25,130

kigi (kigichang)

documentation

Re-implementing Candle LSTM inference to speed inference up, including bidirectional LSTM.

This implementation is ONLY FOR CPU INFERENCE. DO NOT USE IT ON METAL OR CUDA.

I test inference on My Macbook Pro with M2 chip. It is ~5ms slower than Candle on Metal.

I test On RTX4090 with Cuda 12.5. It is ~6x slower than Candle on Cuda.

Install Pytorch and run simple.py to generate test data.

lstm_test.pt: Pytorch LSTM with batch_first = False.
lstm_test_batch_first.pt: Pytorch LSTM with batch_first = True.
bi_lstm_test.pt: Pytorch Bidirectional LSTM with batch_first = False.
bi_lstm_test_batch_first.pt: Pytorch Bidirectional LSTM with batch_first = True.

Commit count: 10