4.4. Video¶

The server must stream video to the client wth minimal latency. This section will describe the structure and content of the video frames, how low latency can be achieved and what the client must do to process the data.

4.4.1. Video Texture Layout¶

The server should use a video encoder to compress the video stream. The rendering output of the server application should be written to a video texture and this should be provided as an input surface to the video encoder. The protocol is flexible regarding the layout of the video texture. It is up to the client and server to support the same layout. The resolution of the video texture is only limited to what the application needs and the encoder/decoder can support. It may contain one or more sub-textures as required by the client.

Currently the Teleport protocol specifies a video texture layout for VR applications wanting to implement hybrid rendering. The server application should render an axis-aligned cubemap each frame and send the cubemap as part of the video texture to the client. YUV formats are commonly used with video encoders and therefore the alpha channel must be stored in an extra pixel. The alpha channel will store the depth of the rendered scene on the server. The RGB and alpha channels of the cubemap should therefore be rendered separately to the video texture as a colour and depth cubemap. The depth cubemap should be rendered at half the resolution of the colour cubemap. The resolution of the video texture is configurable but must be communicated to the client in the VideoConfig structure of the SetupCommand. A resolution of 1536x1536 pixels is recommended for the video texture with 512x512 for each face of the colour cubemap and 256x256 for each face of the depth cubemap.

A Tag ID should also be included in the video texture. This links the video frame with the associated video metadata. The metadata is sent in a separate stream. See Video Metadata. The Tag ID should be encoded as a sequence of 5 bits at the bottom right of the video texture. This allows 32 different IDs or a maximum delay of 31 frames between the transmission of the metadata and video texture before a Tag ID is reused.

Teleport also supports the sending of lighting information in the video texture. Each frame, the cubemap can be used by the server application to generate a specular cubemap at lower mip levels for reflections and a diffuse cubemap for global illumination. A cubemap containing the specular lighting of the scene should be written to the video texture at 6 different mip levels. Each mip should be half the resolution of the previous mip, from 64x64 pixels down to 2x2. A cubemap containing the diffuse lighting of the scene should be rendered to the video texture at 64x64 resolution.. The resolution and offsets of the lighting cubemaps can be communicated to the client in the VideoConfig structure of the SetupCommand.

A webcam image may also be sent to the client. The dimensions of the webcam image and offset may vary as long as they fit in the video texture. A flag indicating if the webcam image is being streamed and the width, height and offset of the image can be communicated to the client in the VideoConfig structure of the SetupCommand.

The video texture should be in the following form:

Colour Cubemap
Depth Cubemap	Specular Cubemap
	Diffuse Cubemap	Webcam
			Tag ID

The cubemap sub-layout should be in the following form:

Front Face +X +Z ←

Back Face -X +Z →

Right Face +Y +Z ↓

Left Face -Y

+Z ↑

Top Face +Z

+Z →

Bottom Face -Z

+X ←

where

Table 4.14 Video Texture Layout¶
Offset X	Offset Y	Width	Heigth	Description
0	0	512	512	Colour Cubemap Front Face
512	0	512	512	Colour Cubemap Back Face
1024	0	512	512	Colour Cubemap Right Face
0	512	512	512	Colour Cubemap Left Face
512	512	512	512	Colour Cubemap Top Face
1024	512	512	512	Colour Cubemap Bottom Face
0	1024	256	256	Depth Cubemap Front Face
256	1024	256	256	Depth Cubemap Back Face
512	1024	256	256	Depth Cubemap Right Face
0	1280	256	256	Depth Cubemap Left Face
256	1280	512	512	Depth Cubemap Top Face
512	1280	256	256	Depth Cubemap Bottom Face
768	1280	126	64	Specular Lighting Cubemap
768	1406	64	64	Diffuse Lighting Cubemap
960	1406	128	96	Webcam Texture
1516	1532	20	4	Tag ID

Note: In the offsets above, higher X values go from left to right and higher Y values go from top to bottom..

4.4.2. Video Frame Structure¶

The video encoder should be configured to accept the YUV 4:2:0 12-bit pixel format as input for the video frame. 16-bit formats such as YUV 4:4:4 are available but YUV 4:2:0 minimizes decoding time and latency. The video texture must therefore be converted to the the YUV 4:2:0 format for processing by the video encoder. The server must send the video encoder output to the client each frame. The raw unmodified output must be sent as one large chunk or payload to the client. The structure of the output depends on the video codec used. The server and client must use the same video codec and a software or hardware video encoder and decoder that supports it. The server must tell the client what codec is being used in the VideoConfig structure of the SetupCommand. For HEVC/H264, the output is made up of multiple NAL-units such as picture parameters (VPS, SPS, PPS etc.) and video coding layers (VCL) containing the compressed data of the video texture. Each frame has at least one VCL and may have picture parameters if the frame is an IDR frame or the video encoder is configured to send the picture parameters with every frame. The video data should be transferred in accordance with the section of the protocol outlined in Data Transfer.

4.4.3. Recovering from Corruption¶

An IDR frame is a special type of I-frame or keyframe in HEVC/H264. It does not rely on any prior frames for decoding and subsequent frames will reference it until the next I-frame. The IDR frame will also include picture parameters added by the encoder for the decoder to process. This includes information such as the bitrate of the encoder, the texture resolution and pixel format etc. The video encoder will output an IDR frame as the very first frame and at periodic intervals determined by the encoder settings.

To reduce latency, the video encoder should be configured to only send the first frame as an IDR. The encoder should only produce further IDR frames if requested by the client. If the client receives a corrupted video frame and the following frame references it (P-frame), this will cause corrupted video. The stream will not recover because the encoder will not automatically send a new IDR frame. Therefore, the client must be able to identify if it has missed a video frame. To achieve this, the client has to keep count of the number of video frames received from the server. The client needs to compare this count with the stream-payload-id set by the server. If there is a mismatch between both values and the current video frame is not an IDR frame or the video frame has been corrupted during the transfer, the client must send a HTTP message to the server requesting am IDR frame. On receiving the HTTP message, the server must tell the video encoder to force an IDR for the next frame. This allows the video stream to recover. To understand how the stream-payload-id is managed and how the client determines if a payload is corrupted, see Data Transfer.

4.4.4. Minimizing Latency¶

The server must configure the video encoder to minimize latency. Different encoders may support different settings and the capabilities of some hardware encoders will depend on the the GPU and driver installed. The server application must therefore query the capabilities of the encoder to determine the encoder settings supported. The video decoder on the client will be informed of these settings via the picture parameters received with each IDR frame.

The following settings are recommended to minimize latency:

Ultra-low latency or low latency Tuning Info
Rate control mode of Constant Bit Rate (CBR)
Multi Pass - Quarter/Full (evaluate and decide)
Very low VBV buffer size (e.g. single frame = bitrate/framerate)
No B Frames - Just I and P frames
Infinite GOP length
Adaptive quantization (AQ) enabled
Long term reference pictures enabled
Intra refresh enabled
Non-reference P frames
The first frame should be the only IDR sent unless recovering from a lost frame.

4.4.5. Processing of video Frame on the Client¶

On receiving a non-corrupted video frame, the client must parse each individual NAL-unit for the video decoder to process. For HEVC and H264 codecs, each NAL-unit is separated by a 3-byte ALU code with bytes 1 and 2 having a value of 0 and bytes 3 having a value of 1 (001). The client must implement a parser to avail of this to split up the NAL-units. Video decoders usually output the decoded video data in the same YUV format used as input to the video encoder. When the decoder has finished decoding a frame, the client must convert the YUV texture to an RGBA texture. The cubemap and assocated lighting information must then be extracted from this texture for rendering.

Navigation