srt_subtitles_parser

Crates.iosrt_subtitles_parser
lib.rssrt_subtitles_parser
version0.1.3
created_at2025-11-05 20:28:45.417227+00
updated_at2025-11-17 09:59:30.505165+00
descriptionA Rust-based parser that processes .srt (SubRip Subtitle) files and converts them into a structured data format
homepage
repositoryhttps://github.com/annnzinch/srt-subtitles-parser
max_upload_size
id1918571
size34,924
(annnzinch)

documentation

README

Srt Subtitles Parser

Links

Crate: https://crates.io/crates/srt_subtitles_parser
Docs: https://docs.rs/srt_subtitles_parser

Brief Description

Srt Subtitles Parser is a Rust-based parser that processes .srt (SubRip Subtitle) files. The parser reads .srt files, validates their structure, and extracts subtitle entries consisting of index number, a start timestamp, an end timestamp, and one or more lines of subtitle text. The parser converts the file into a structured data format, which can be used for:

  • Converting subtitles to other formats such as WebVTT, JSON, CSV.
  • Performing time-based analysis (total duration, reading speed, gaps detection)
  • Validating subtitle file consistency (sequential numbering, non-overlapping timestamps)
  • Filtering, searching, or manipulating subtitle text
  • Synchronizing subtitles by shifting timecodes

Parsing Process

What is Being Parsed

The parser processes SRT subtitle files with the following structure:

1
00:00:00,000 --> 00:00:02,500
Welcome to the Example Subtitle File!

2
00:00:03,000 --> 00:00:06,000
This is a demonstration of SRT subtitles.

Each subtitle entry consists of:

  • Index: sequential number identifying the subtitle
  • Timecode: start and end times in format HH:MM:SS,mmm --> HH:MM:SS,mmm
    • hours: 00-99
    • minutes: 00-59
    • seconds: 00-59
    • milliseconds: 000-999
  • Text: one or more lines of text content
  • Separator: empty line between entries

Grammar Overview

The parser uses Pest grammar with the following rules:

  • WHITESPACE: a whitespace character, which can be a space or a tab
WHITESPACE = _{ " " | "\t" }
  • NEWLINE: handles line breaks
NEWLINE = _{ "\r\n" | "\n" }
  • index: index number (integer)
index = @{ ASCII_DIGIT+ }
  • hours, minutes, seconds, milliseconds: components of timestamp, each with fixed width
hours = @{ ASCII_DIGIT{2} }
minutes = @{ ASCII_DIGIT{2} }
seconds = @{ ASCII_DIGIT{2} }
milliseconds = @{ ASCII_DIGIT{3} }
  • timestamp: time in HH:MM:SS,mmm format
timestamp = { hours ~ ":" ~ minutes ~ ":" ~ seconds ~ "," ~ milliseconds }
  • timecode: start and end timestamps separated by " --> ".
timecode = { timestamp ~ WHITESPACE* ~ "-->" ~ WHITESPACE* ~ timestamp }
  • text_line: single line of subtitle text (cannot be empty)
text_line = @{ (!NEWLINE ~ ANY)+ }
  • text_content: subtitle content, which can span multiple lines
text_content = { text_line ~ (NEWLINE ~ text_line)* }
  • subtitle_block: a complete subtitle entry: index, timecode, text, and mandatory blank line
subtitle_block = { 
    index ~ NEWLINE ~ 
    timecode ~ NEWLINE ~ 
    text_content ~ NEWLINE ~ 
    NEWLINE
}
  • subtitle_file: a full subtitle file containing one or more subtitle blocks.
subtitle_file = { 
    SOI ~ 
    (subtitle_block)+ ~
    NEWLINE* ~ 
    EOI 
}

Parsing Process

The parsing process includes:

  1. Reading: input .srt file path
  2. Tokenization: splitting input into subtitle blocks using Pest grammar rules
  3. Extracting: parsing each block to extract: index, start and end timestamps, and text content
  4. Validating: checking format, valid time ranges, presence of required blank lines and block structure completeness
  5. Transforming: parsing data into a structured Rust types (Subtitle, Timestamp, SubtitleFile)

Data Structures

The parser produces the following structured data:

pub struct SubtitleFile {
    pub subtitles: Vec<Subtitle>,
}

pub struct Subtitle {
    pub index: u32,
    pub start: Timestamp,
    pub end: Timestamp,
    pub text: String,
}

pub struct Timestamp {
    pub hours: u32,
    pub minutes: u32,
    pub seconds: u32,
    pub milliseconds: u32,
}

How Results Are Used

The structured subtitle data can be used for:

  • Serialization: conversion to JSON using Serde

  • Deserialization: conversion from JSON using Serde

  • Text Analysis: extracting text for translation or word count

  • Quality Control: detecting timing errors, missing indices, or overlapping subtitles

  • Statistics: calculating total duration, average subtitle length, reading speed

  • Timecode Manipulation: shifting all timestamps by a fixed offset

  • Time Conversion: converting timestamps to/from milliseconds for calculations

Example Input

1
00:00:00,000 --> 00:00:02,500
Welcome to the Example Subtitle File!

2
00:00:03,000 --> 00:00:06,000
This is a demonstration of SRT subtitles.

Example Output

{
  "subtitles": [
    {
      "index": 1,
      "start": {
        "hours": 0,
        "minutes": 0,
        "seconds": 0,
        "milliseconds": 0
      },
      "end": {
        "hours": 0,
        "minutes": 0,
        "seconds": 2,
        "milliseconds": 500
      },
      "text": "Welcome to the Example Subtitle File!"
    },
    {
      "index": 2,
      "start": {
        "hours": 0,
        "minutes": 0,
        "seconds": 3,
        "milliseconds": 0
      },
      "end": {
        "hours": 0,
        "minutes": 0,
        "seconds": 6,
        "milliseconds": 0
      },
      "text": "This is a demonstration of SRT subtitles."
    }
  ]
}
Commit count: 0

cargo fmt