Crates.io | ytranscript |
lib.rs | ytranscript |
version | 0.1.0 |
source | src |
created_at | 2024-05-15 18:16:00.212657 |
updated_at | 2024-05-15 18:16:00.212657 |
description | ytranscript is a rust crate that provides functionality to fetch YouTube video transcripts. |
homepage | https://github.com/rudrodip/ytranscript |
repository | https://github.com/rudrodip/ytranscript |
max_upload_size | |
id | 1241336 |
size | 18,062 |
ytranscript
is a Rust crate that provides functionality to fetch YouTube video transcripts. It supports fetching transcripts in different languages and handles various error scenarios that might occur while retrieving the transcripts.
Add ytranscript
to your Cargo.toml
:
Here is an example of how to use the ytranscript
crate in a binary crate:
use ytranscript::YoutubeTranscript;
use std::env;
#[tokio::main]
async fn main() {
// Get the video ID from command line arguments
let args: Vec<String> = env::args().collect();
if args.len() != 2 {
eprintln!("Usage: ytranscript_bin <video_id>");
return;
}
let video_id = &args[1];
// Fetch the transcript
match YoutubeTranscript::fetch_transcript(video_id, None).await {
Ok(transcript) => {
for entry in transcript {
println!("{:?}", entry);
}
}
Err(e) => {
eprintln!("Error: {}", e);
}
}
}
YoutubeTranscript::fetch_transcript
Fetches the transcript for a given YouTube video ID or URL.
Arguments:
video_id
: A string slice representing the YouTube video URL or ID.config
: An optional TranscriptConfig
specifying the desired language for the transcript.Returns:
Ok(Vec<TranscriptResponse>)
: A vector of TranscriptResponse
if the transcript is successfully fetched.Err(YoutubeTranscriptError)
: An error if the transcript cannot be fetched.The crate defines a set of errors that might occur while fetching transcripts:
use thiserror::Error;
#[derive(Error, Debug)]
pub enum YoutubeTranscriptError {
#[error("YouTube is receiving too many requests from this IP and now requires solving a captcha to continue")]
TooManyRequests,
#[error("The video is no longer available ({0})")]
VideoUnavailable(String),
#[error("Transcript is disabled on this video ({0})")]
TranscriptDisabled(String),
#[error("No transcripts are available for this video ({0})")]
TranscriptNotAvailable(String),
#[error("No transcripts are available in {0} for this video ({2}). Available languages: {1:?}")]
TranscriptNotAvailableLanguage(String, Vec<String>, String),
#[error("Impossible to retrieve Youtube video ID.")]
InvalidVideoId,
}
The crate uses regex patterns to extract YouTube video IDs and parse XML transcripts:
pub const RE_YOUTUBE: &str =
r#"(?:youtube\.com\/(?:[^\/]+\/.+\/|(?:v|e(?:mbed)?)\/|.*[?&]v=)|youtu\.be\/)([^"&?\/\s]{11})"#;
pub const USER_AGENT: &str = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36,gzip(gfe)";
pub const RE_XML_TRANSCRIPT: &str = r#"<text start="([^"]*)" dur="([^"]*)">([^<]*)<\/text>"#;
The crate defines the following types:
#[derive(Debug)]
pub struct TranscriptConfig {
pub lang: Option<String>,
}
#[derive(Debug)]
pub struct TranscriptResponse {
pub text: String,
pub duration: f64,
pub offset: f64,
pub lang: String,
}
You can test the functionality of the ytranscript
crate by running the following command:
cargo test
This project is licensed under the MIT License. See the LICENSE file for details