Skip to content

[BUG] Conversion does not remove optional cue identifiers in a vtt cue block (results in doubled sequence numbering) #24

@e-d-n-a

Description

@e-d-n-a

Expected Behavior

In the WebVTT format each cue block can include an optional cue identifier.

The vtt-files I currently want to convert with the help of this library are very close to the srt format and so use a sequence number as cue identifier already.

So looking at your code that converts the formats, I would expect this cue identifier to get handled (removed) to create a valid conversion.

Current Behavior

When I try to convert those numbered cues, I end up with a doubled numbering in consecutive lines before each timestamp.
This is, because you do not look for cue identifiers and remove them in the current version of this library.
Sequence numbers are added in any case and so you end up with a doubled numbering.

Possible Solution

As the conversion in this library is mostly done by several direct replacements on the file contents instead of parsing the full vtt content first, it is not easy to modify it and drop any possibly detected cue identifier lines.
So I modified the function add_sequence_numbers to drop any non-empty lines before a line with a timestamp.
It's not a very elegant solution, but it works and doesn't need a complex redesign of the input handling in convert_content.

Steps to Reproduce

  1. Download example file 'E1x1_en.vtt.txt' from attachments and rename it to .vtt
  2. Convert the single file with the following code snippet
import vtt_to_srt.vtt_to_srt as vtt_to_srt
vtt_file = vtt_to_srt.ConvertFile('E1x1_en.vtt', 'utf-8')
vtt_file.convert()
  1. Check created srt-file for double numbering in front of any cue block

Context (Environment)

  • Version: vtt_to_srt3-0.2.0.0-py3-none-any.whl
  • Platform: Windows 64-bit, Python 3.7-32bit
  • Subsystem: -
  • Files: vtt_to_srt.py

Detailed Description

see above

Possible Implementation

    def add_sequence_numbers(self, contents):
        """Adds sequence numbers to subtitle contents and returns new subtitle contents

        :contents -- contents of vtt file
        """
        output = ''
        lines = contents.split('\n')
        i = 1
        n = 0
        while n < len(lines)-1:
            line = lines[n]
            next_line = lines[n+1]
            if self.has_timestamp(next_line):
                if line == '':
                    output += '\n'
                output += str(i) + '\n'
                output += next_line + '\n'
                i += 1
                n += 2
            else:
                output += line + '\n'
                n += 1
        output += lines[-1] + '\n'
        return output

E1x1_en.vtt.txt

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions