-
-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Expected Behavior
In the WebVTT format each cue block can include an optional cue identifier.
The vtt-files I currently want to convert with the help of this library are very close to the srt format and so use a sequence number as cue identifier already.
So looking at your code that converts the formats, I would expect this cue identifier to get handled (removed) to create a valid conversion.
Current Behavior
When I try to convert those numbered cues, I end up with a doubled numbering in consecutive lines before each timestamp.
This is, because you do not look for cue identifiers and remove them in the current version of this library.
Sequence numbers are added in any case and so you end up with a doubled numbering.
Possible Solution
As the conversion in this library is mostly done by several direct replacements on the file contents instead of parsing the full vtt content first, it is not easy to modify it and drop any possibly detected cue identifier lines.
So I modified the function add_sequence_numbers to drop any non-empty lines before a line with a timestamp.
It's not a very elegant solution, but it works and doesn't need a complex redesign of the input handling in convert_content.
Steps to Reproduce
- Download example file 'E1x1_en.vtt.txt' from attachments and rename it to .vtt
- Convert the single file with the following code snippet
import vtt_to_srt.vtt_to_srt as vtt_to_srt
vtt_file = vtt_to_srt.ConvertFile('E1x1_en.vtt', 'utf-8')
vtt_file.convert()
- Check created srt-file for double numbering in front of any cue block
Context (Environment)
- Version: vtt_to_srt3-0.2.0.0-py3-none-any.whl
- Platform: Windows 64-bit, Python 3.7-32bit
- Subsystem: -
- Files: vtt_to_srt.py
Detailed Description
see above
Possible Implementation
def add_sequence_numbers(self, contents):
"""Adds sequence numbers to subtitle contents and returns new subtitle contents
:contents -- contents of vtt file
"""
output = ''
lines = contents.split('\n')
i = 1
n = 0
while n < len(lines)-1:
line = lines[n]
next_line = lines[n+1]
if self.has_timestamp(next_line):
if line == '':
output += '\n'
output += str(i) + '\n'
output += next_line + '\n'
i += 1
n += 2
else:
output += line + '\n'
n += 1
output += lines[-1] + '\n'
return output