WebVTT for subtitles and captions

WebVTT is the Web Video Text Tracks format, which is primarily used for captioning video content. It transports timed data chunks, including captions, subtitles, video descriptions, or for that matter any metadata that cues up with a time segment of a media element.

A WebVTT file is a simple text file, encoded as UTF-8, which has a .vtt file extension. It follows the format defined by the specifications listed at http://dev.w3.org/html5/webvtt/.

WebVTT in HTTP Live Streaming (HLS)

WebVTT feature adds support for captions for Video on Demand (VoD) content in HLS. Using AMS you can repackage WebVTT files for HLS delivery. To do so, you specify m3u8 playlists for subtitles or captions in a set-level m3u8 playlist or in a variant playlist.

AMS can handle the request for m3u8 file for captions and m3u8 files for media segments. You can configure the duration of WebVTT segments. A WebVTT file is segmented according to the specified duration.

Note:

The support for WebVTT captions is available iOS 6 onwards.

The file format

The following helps you quickly understand the VTT file and cue format in brief. For a detailed understanding, refer to the specifications at http://dev.w3.org/html5/webvtt/. A WebVTT file begins with the following, in the order:

The following are the contents of a simple VTT file that captions a part of the video content.

WEBVTT
00:01.000 -- > 00:04.000
The first cue.
00:05.000 --> 00:09.000
The second cue.
WEBVTT 00:01.000 -- > 00:04.000 The first cue. 00:05.000 --> 00:09.000 The second cue.
WEBVTT 
 
00:01.000 --  > 00:04.000
The first cue. 
 
00:05.000 --> 00:09.000
The second cue.

WebVTT cues

A WebVTT cue allows you to specify text for a particular part of a media file, for example a subtitle, and the timestamp range of the media file that the text in question applies to. You can also assign a unique identifier to a WebVTT cue, which is a simple string that cannot contain the substring -->, nor any of the WebVTT line terminators. Each cue takes the following form: [idstring] [hh:]mm:ss.msmsms --> [hh:]mm:ss.msmsms Text string

The timestamp follows a standard format, where the hour part hh: is optional, and where the milliseconds are separated from the seconds by a dot (.) rather than a colon (:). The second part of the timestamp range must be greater than the first part of the timestamp range. Timestamps for different cues can overlap. Cue data cannot have two subsequent line terminators or the string "-->".

idstring]
[hh:]mm:ss.msmsms --> [hh:]mm:ss.msmsms
Text string
idstring] [hh:]mm:ss.msmsms --> [hh:]mm:ss.msmsms Text string
idstring] 
[hh:]mm:ss.msmsms --> [hh:]mm:ss.msmsms 
Text string

WebVTT cue settings

There are a number of settings that can be set per cue, and these are specified after the timestamp range value: [idstring] [hh:]mm:ss.msmsms --> [hh:]mm:ss.msmsms [cue settings] Text string

These cue settings allow you to specify the position and alignment of the cue text, for example, align, size, position, vertical, and so on.

idstring]
[hh:]mm:ss.msmsms --> [hh:]mm:ss.msmsms [cue settings]
Text string
idstring] [hh:]mm:ss.msmsms --> [hh:]mm:ss.msmsms [cue settings] Text string
idstring] 
[hh:]mm:ss.msmsms --> [hh:]mm:ss.msmsms [cue settings] 
Text string

WebVTT cue components

You can use WebVTT cue components to add more information to the cue text. These cue components are similar to HTML elements, and can be used to add semantics and styling to the actual text strings. Some available options are i for italics, b for bold, u for underlined text, and so on.

The following is an example WebVTT file with cue settings (align) and cue components (bold). WEBVTT 00:01.000 --> 00:04.000 align:start The <b>first</b> cue. 00:05.000 --> 00:09.000 align:end The second cue.

WEBVTT
00:01.000 --> 00:04.000 align:start
The <b>first</b> cue.
00:05.000 --> 00:09.000 align:end
The second cue.
WEBVTT 00:01.000 --> 00:04.000 align:start The <b>first</b> cue. 00:05.000 --> 00:09.000 align:end The second cue.
WEBVTT 
 
00:01.000 --> 00:04.000   align:start
The <b>first</b> cue. 
 
00:05.000 --> 00:09.000 align:end
The second cue.

Implement WebVTT in AMS

Use the following workflow in AMS for WebVTT:

  1. Host the Video on Demand (VoD) content and the corresponding WebVTT file using the same file name. Place the WebVTT file in a subfolder named vtt at the VoD location.

    For example, if the VOD file is available at [rootinstall]\webroot\vod\test, then the location of the corresponding WebVTT file is [rootinstall]\webroot\vod\test\vtt.

  2. Provide the subscribers with the URL for the m3u8 file. In case of a WebVTT file, the URL is the same for the m3u8 file, with the string vtt appended before the last slash. For example, to access the VoD content named sample.f4v, a client requests http://example.com/hls-vod/medialocation/sample.f4v.m3u8. The m3u8 URL for the VTT file is http://example.com/hls-vod/medialocation/vtt/sample.f4v.m3u8.

  3. Use f4mconfigurator tool to generate a set-level or a variant playlist with both the m3u8 files in the required format.

  4. When the HLS module receives the request for a WebVTT or subtitles m3u8, the module loads the WebVTT file from the location specified in the URL, parses the WebVTT file in-memory, and creates the required playlist with the virtual URL location of the each WebVTT segment. Each URL is named to indicate the start time of the contained caption text.

    The URL format for a WebVTT segment is <WebVTT file base name>NumX.vtt, where if T is the duration of each WebVTT segment, then SegmentStartTime = X*T. The duration T is configured in httpd.conf file. The URLs can be absolute or relative.

  5. When a request for a VTT segment is received by the module, it loads the WebVTT file, finds the segment start time (= X*T), and generates and serves the requested segment from the start time with T seconds of caption text.

WebVTT examples

Segmented WebVTT caption files or URLs are part of a different m3u8 file, which gets included in the set-level m3u8.

The following is a sample of a set-level m3u8 file.#EXTM3U #EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",DEFAULT=YES,AUTOSELECT=YES,URI="/hls-vod/webvtt/vtt/sample.f4v.m3u8",LANGUAGE="en" #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=150000,SUBTITLES="subs" /hls-vod/webvtt/sample.f4v.m3u8

This includes a m3u8 file to serve segmented WebVTT caption files and a m3u8 for media segments corresponding to sample.f4v file. The response for URL /hls-vod/webvtt/vtt/sample.f4v.m3u8.#EXTM3U #EXT-X-MEDIA-SEQUENCE:0 #EXT-X-ALLOW-CACHE:NO #EXT-X-VERSION:2 #EXT-X-TARGETDURATION:20 #EXTINF:20, sampleNum0.vtt #EXTINF:20, sampleNum1.vtt #EXTINF:20, sampleNum2.vtt #EXTINF:20, sampleNum3.vtt #EXTINF:20, sampleNum4.vtt #EXTINF:15, sampleNum5.vtt #EXT-X-ENDLIST

All VTT segments, except for the last segment, are of about 20 seconds duration. The last file is of 15 seconds duration. The contents of sampleNum0.vtt file are expanded below for reference. WEBVTT X-TIMESTAMP-MAP=MPEGTS:63000, LOCAL:00:00:00.000 1 00:00.100 --> 00:00:30.059 This text appears from 0 to 30 seconds.

The contents of sampleNum1.vtt file are expanded below for reference. WEBVTT X-TIMESTAMP-MAP= M PEGTS:63000, LOCAL:00:00:00.000 1 00:00.100 --> 00:00:30.059 This text appears from 0 to 30 seconds. 2 00:30.070 --> 00:50.110 This text appears from 30 sec to 50 sec.

#EXTM3U
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",DEFAULT=YES,AUTOSELECT=YES,URI="/hls-vod/webvtt/vtt/sample.f4v.m3u8",LANGUAGE="en"
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=150000,SUBTITLES="subs"
/hls-vod/webvtt/sample.f4v.m3u8
#EXTM3U #EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",DEFAULT=YES,AUTOSELECT=YES,URI="/hls-vod/webvtt/vtt/sample.f4v.m3u8",LANGUAGE="en" #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=150000,SUBTITLES="subs" /hls-vod/webvtt/sample.f4v.m3u8
#EXTM3U 
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",DEFAULT=YES,AUTOSELECT=YES,URI="/hls-vod/webvtt/vtt/sample.f4v.m3u8",LANGUAGE="en" 
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=150000,SUBTITLES="subs" 
/hls-vod/webvtt/sample.f4v.m3u8
#EXTM3U
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-ALLOW-CACHE:NO
#EXT-X-VERSION:2
#EXT-X-TARGETDURATION:20
#EXTINF:20,
sampleNum0.vtt
#EXTINF:20,
sampleNum1.vtt
#EXTINF:20,
sampleNum2.vtt
#EXTINF:20,
sampleNum3.vtt
#EXTINF:20,
sampleNum4.vtt
#EXTINF:15,
sampleNum5.vtt
#EXT-X-ENDLIST
#EXTM3U #EXT-X-MEDIA-SEQUENCE:0 #EXT-X-ALLOW-CACHE:NO #EXT-X-VERSION:2 #EXT-X-TARGETDURATION:20 #EXTINF:20, sampleNum0.vtt #EXTINF:20, sampleNum1.vtt #EXTINF:20, sampleNum2.vtt #EXTINF:20, sampleNum3.vtt #EXTINF:20, sampleNum4.vtt #EXTINF:15, sampleNum5.vtt #EXT-X-ENDLIST
#EXTM3U 
#EXT-X-MEDIA-SEQUENCE:0 
#EXT-X-ALLOW-CACHE:NO 
#EXT-X-VERSION:2 
#EXT-X-TARGETDURATION:20 
#EXTINF:20, 
sampleNum0.vtt 
#EXTINF:20, 
sampleNum1.vtt 
#EXTINF:20, 
sampleNum2.vtt 
#EXTINF:20, 
sampleNum3.vtt 
#EXTINF:20, 
sampleNum4.vtt 
#EXTINF:15, 
sampleNum5.vtt 
#EXT-X-ENDLIST
WEBVTT
X-TIMESTAMP-MAP=MPEGTS:63000, LOCAL:00:00:00.000
1
00:00.100 --> 00:00:30.059
This text appears from 0 to 30 seconds.
WEBVTT X-TIMESTAMP-MAP=MPEGTS:63000, LOCAL:00:00:00.000 1 00:00.100 --> 00:00:30.059 This text appears from 0 to 30 seconds.
WEBVTT 
X-TIMESTAMP-MAP=MPEGTS:63000, LOCAL:00:00:00.000 
 
1 
00:00.100 --> 00:00:30.059 
This text appears from 0 to 30 seconds.
WEBVTT
X-TIMESTAMP-MAP= M PEGTS:63000, LOCAL:00:00:00.000
1
00:00.100 --> 00:00:30.059
This text appears from 0 to 30 seconds.
2
00:30.070 --> 00:50.110
This text appears from 30 sec to 50 sec.
WEBVTT X-TIMESTAMP-MAP= M PEGTS:63000, LOCAL:00:00:00.000 1 00:00.100 --> 00:00:30.059 This text appears from 0 to 30 seconds. 2 00:30.070 --> 00:50.110 This text appears from 30 sec to 50 sec.
WEBVTT 
X-TIMESTAMP-MAP= M  PEGTS:63000, LOCAL:00:00:00.000 
1 
00:00.100 --> 00:00:30.059 
This text appears from 0 to 30 seconds. 
 
2 
00:30.070 --> 00:50.110 
This text appears from 30 sec to 50 sec.

Get help faster and easier

New user?