srt_subtitles_parser

Crates.io	srt_subtitles_parser
lib.rs	srt_subtitles_parser
version	0.1.3
created_at	2025-11-05 20:28:45.417227+00
updated_at	2025-11-17 09:59:30.505165+00
description	A Rust-based parser that processes .srt (SubRip Subtitle) files and converts them into a structured data format
homepage
repository	https://github.com/annnzinch/srt-subtitles-parser
max_upload_size
id	1918571
size	34,924

(annnzinch)

documentation

README

Srt Subtitles Parser

Links

Crate: https://crates.io/crates/srt_subtitles_parser
Docs: https://docs.rs/srt_subtitles_parser

Brief Description

Srt Subtitles Parser is a Rust-based parser that processes .srt (SubRip Subtitle) files. The parser reads .srt files, validates their structure, and extracts subtitle entries consisting of index number, a start timestamp, an end timestamp, and one or more lines of subtitle text. The parser converts the file into a structured data format, which can be used for:

Converting subtitles to other formats such as WebVTT, JSON, CSV.
Performing time-based analysis (total duration, reading speed, gaps detection)
Validating subtitle file consistency (sequential numbering, non-overlapping timestamps)
Filtering, searching, or manipulating subtitle text
Synchronizing subtitles by shifting timecodes

Parsing Process

What is Being Parsed

The parser processes SRT subtitle files with the following structure:

1
00:00:00,000 --> 00:00:02,500
Welcome to the Example Subtitle File!

2
00:00:03,000 --> 00:00:06,000
This is a demonstration of SRT subtitles.

Each subtitle entry consists of:

Index: sequential number identifying the subtitle
Timecode: start and end times in format HH:MM:SS,mmm --> HH:MM:SS,mmm
- hours: 00-99
- minutes: 00-59
- seconds: 00-59
- milliseconds: 000-999
Text: one or more lines of text content
Separator: empty line between entries

Grammar Overview

The parser uses Pest grammar with the following rules:

WHITESPACE: a whitespace character, which can be a space or a tab

WHITESPACE = _{ " " | "\t" }

NEWLINE: handles line breaks

NEWLINE = _{ "\r\n" | "\n" }

index: index number (integer)

index = @{ ASCII_DIGIT+ }

hours, minutes, seconds, milliseconds: components of timestamp, each with fixed width

hours = @{ ASCII_DIGIT{2} }
minutes = @{ ASCII_DIGIT{2} }
seconds = @{ ASCII_DIGIT{2} }
milliseconds = @{ ASCII_DIGIT{3} }

timestamp: time in HH:MM:SS,mmm format

timestamp = { hours ~ ":" ~ minutes ~ ":" ~ seconds ~ "," ~ milliseconds }

timecode: start and end timestamps separated by " --> ".

timecode = { timestamp ~ WHITESPACE* ~ "-->" ~ WHITESPACE* ~ timestamp }

text_line: single line of subtitle text (cannot be empty)

text_line = @{ (!NEWLINE ~ ANY)+ }

text_content: subtitle content, which can span multiple lines

text_content = { text_line ~ (NEWLINE ~ text_line)* }

subtitle_block: a complete subtitle entry: index, timecode, text, and mandatory blank line

subtitle_block = { 
    index ~ NEWLINE ~ 
    timecode ~ NEWLINE ~ 
    text_content ~ NEWLINE ~ 
    NEWLINE
}

subtitle_file: a full subtitle file containing one or more subtitle blocks.

subtitle_file = { 
    SOI ~ 
    (subtitle_block)+ ~
    NEWLINE* ~ 
    EOI 
}

Parsing Process

The parsing process includes:

Reading: input .srt file path
Tokenization: splitting input into subtitle blocks using Pest grammar rules
Extracting: parsing each block to extract: index, start and end timestamps, and text content
Validating: checking format, valid time ranges, presence of required blank lines and block structure completeness
Transforming: parsing data into a structured Rust types (Subtitle, Timestamp, SubtitleFile)

Data Structures

The parser produces the following structured data:

pub struct SubtitleFile {
    pub subtitles: Vec<Subtitle>,
}

pub struct Subtitle {
    pub index: u32,
    pub start: Timestamp,
    pub end: Timestamp,
    pub text: String,
}

pub struct Timestamp {
    pub hours: u32,
    pub minutes: u32,
    pub seconds: u32,
    pub milliseconds: u32,
}

How Results Are Used

The structured subtitle data can be used for:

Serialization: conversion to JSON using Serde
Deserialization: conversion from JSON using Serde
Text Analysis: extracting text for translation or word count
Quality Control: detecting timing errors, missing indices, or overlapping subtitles
Statistics: calculating total duration, average subtitle length, reading speed
Timecode Manipulation: shifting all timestamps by a fixed offset
Time Conversion: converting timestamps to/from milliseconds for calculations

Example Input

1
00:00:00,000 --> 00:00:02,500
Welcome to the Example Subtitle File!

2
00:00:03,000 --> 00:00:06,000
This is a demonstration of SRT subtitles.

Example Output

{
  "subtitles": [
    {
      "index": 1,
      "start": {
        "hours": 0,
        "minutes": 0,
        "seconds": 0,
        "milliseconds": 0
      },
      "end": {
        "hours": 0,
        "minutes": 0,
        "seconds": 2,
        "milliseconds": 500
      },
      "text": "Welcome to the Example Subtitle File!"
    },
    {
      "index": 2,
      "start": {
        "hours": 0,
        "minutes": 0,
        "seconds": 3,
        "milliseconds": 0
      },
      "end": {
        "hours": 0,
        "minutes": 0,
        "seconds": 6,
        "milliseconds": 0
      },
      "text": "This is a demonstration of SRT subtitles."
    }
  ]
}

Commit count: 0

srt_subtitles_parser

documentation

README

Srt Subtitles Parser

Links

Brief Description

Parsing Process

What is Being Parsed

Grammar Overview

Parsing Process

Data Structures

How Results Are Used

Example Input

Example Output

cargo fmt