fry

Crates.io	fry
lib.rs	fry
version	0.0.4
created_at	2023-10-25 01:40:13.286581+00
updated_at	2025-08-02 18:45:20.918654+00
description	A dead-simple, no-alloc, no-std TTS.
homepage
repository	https://github.com/TTWNO/fry
max_upload_size
id	1012997
size	1,553,142

Tait Hoyem (TTWNO)

documentation

README

`fry`

A very simple, dumb, no-alloc, no-std TTS. This comes with a handful of extreme limitation:

It may only work on text of a fixed-sized buffer (maximum 32 characters at a time).
It only spells words, it does not actually speak them.
It uses espeak to generate the sound files, and sox to modify the output to be of a fixed length.
It has no output capability. It is up to the user of the library to know where to dump this data.
The audio produced is only 16-bit signed PCM with one channel.

The name is from the term "vocal fry".

Building new files

If you'd like to produce new files for the library to use (this is required to change the speed of the speech, for example), then you can use the scripts in data to create new files. Please note that you will have to manually update variables in both the scripts and the library to accomodate any changes made to the data files.

`generate_base.sh`

This script creates new files using espeak and the list of english alphabetic characters to create [a-z].wav You may modify the arguments to espeak to produce faster sounds via the -s flag for settings "words per minute".

Check out man espeak or man espeak-ng for more details.

`calc.py`

This Python file uses the sox command, along with some basic math to calculate the output for adding 0 padding to the WAV file so all files have exactly the same length (in bytes and time). Then, strip the headers so that the WAV data is simply raw PCM data. It is up the user what they will do with this data.

If mediainfo displays different information than this for [a-z].wav, then you may need to change the settings in the constants of calc.py to produce the right sized padded/raw files.

`fry_normalize`

fry_normalize is the beginning of a text-to-speech engine written entirely in Rust. This module only normalizes text to be a restircted, known form. Check out fry_normalize's README for more information.

TODO

Add some tests to verify that bit patterns are indeed concatonated correctly.
Simplify build process for each letter .wav files.
Eliminate non-Rust dependencies for building the .wav files.
- sox (calc.py)
- python (calc.py)
- espeak (generate-base.sh)
- bash (generate-base.sh)
Actually use a TTS engine instead of manually craming in spelling by letter.
Add std and alloc features for when they are available to the consumer.
Add compile-time or test-time tests that that the following match both the wav file and some constant in the lib.rs file:
- Verify bit arrangement (LE, BE)
- Verify number of channels (mono, stereo)
- Verify sample rate (22050, 41000, etc.)
- Verify PCM width (s16, u16, s32, s8, etc.)
Wrap raw data with PCM type, because otherwise test output is WAY too big, generic over:
- bit arrangement (LE, BE)
- number of channels (mono, stereo)
- sample rate (22050, 41000, etc.)
- PCM width (s16, u16, s32, s8, etc.)

Commit count: 12

fry

documentation

README

fry

Building new files

generate_base.sh

calc.py

fry_normalize

TODO

cargo fmt

`fry`

`generate_base.sh`

`calc.py`

`fry_normalize`