Theory 101:

A Basic Guide to Transport Streams

Transport Stream Structure
Video Structure

Video Structure

Resolution & Framerates

The resolution of a Video stream is measured in pixels and is usually written as Horizontal x Vertical. The Horizontal resolution is frequently excluded when talking about broadcast video, and the vertical resoltuion is frequently excluded when talking about the movie industry.
The Framerate of a video is the number of frames displayed in a second.

The most common broadcast video resolution and framerate formats in use today are:

ShorthandResolutionCommon Uses
1080p@23.9761920 x 1080Film (24), PAL / NTSC HD Blu-Ray
1080i@29.971920 x 1080NTSC HD Broadcast
1080i@251920 x 1080PAL HD Broadcast
720p@59.941280 x 720NTSC HD Broadcast
720p@501280 x 720PAL HD Broadcast
576p@25720 x 576PAL DVD
576i@25720 x 576PAL DVD, PAL Broadcast
480p@23.976720 x 480NTSC DVD
480i@29.97720 x 480NTSC DVD, NTSC Broadcast

The "i" or "p" after the resolution above indicates whether the frame is interlaced or progressive (see below).

Frame Structure

Progressive Frames

A Progressive Frame is when the whole frame is made up from top to bottom, line by line. Film is always progressive as an image forms on the film in each snapshot.

Interlaced Frames

An Interlaced Frame is split into two Fields. A Field is composed of either odd or even lines from the Frame. A Video Camera is the most common way interlaced frames are created. A video camera will take twice as many fields per second compared with Film. This means that each field will be taken at a different moment in time. When the two fields are combined to make a whole frame, two different instances in time are displayed at once. Therefore, any horizontal movement in a frame will show up as "combed" lines (combing). This is a major issue for converting interlaced frames into progressive frames. CRT TVs use Interlaced frames to display a picture and are able to display each field correctly at the correct time.

Broadcast vs Playback

To maintain compatibility with interlaced video cameras and CRT TVs, broadcasters commonly use Interlaced Frames. To broadcast progressive sources such as Movies for example, each frame is split into two fields. Each field represents the same moment in time however. The receiver is then able to recreate the complete progressive frame or display each field on an interlaced display.

Pulldown and Repeat Field Flags

NTSC video cameras work at 29.97 frames per second, or more accurately at 59.94 fields per second. This is the same rate used when broadcasting NTSC video. To broadcast progressive frames, each frame is split into two fields. However, since the progressive frame rate is 23.976, extra fields are needed to make 29.97 frames per second. This can be accomplished by repeating fields in a 3:2 sequence. The encoder can either create the extra fields and then encode them (hard encoded interlaced frames) or by using flags to indicate which fields should be repeated by the decoder for display on an interlaced display. Converting progressive sources into interlaced sources is called "Pulldown" and the reverse process is called "Inverse Telecine".

Video Compression

Frame Types

A Video stream is usually made up of different types of compressed frames. Prediction is used to reduce the size of frames by taking into account the frames before and after to further improve the efficiency of compressed frames. Furthermore, a frame itself may be divided up into smaller blocks or slices with each slice being able to use prediction independantly of other slices.
I-Frames are known as Intra, Index or Key Frames. In their simplist state, an I-Frame is a single frame compressed without taking any other frames into account. For Example, in Mpeg1/2 Video, an I-frame is simply a video frame compressed with jpeg compression. Mpeg 1/2 Video made up of I-Frames only is sometimes called MJPEG (Motion JPEG). I-Frames have the highest quality but have the largest size.
P-Frames are known as Predicted Frames. These frames use information from the previous I-Frame or P-frame to compress the frame.
B-Frames are known as Bi-Directional Predicted Frames. These use information from the both the previous and next I-Frame or P-Frames. B-Frames are the smallest but have the least quality.


Because B-Frames can use previous and future frames for reference, the order of frames is changed by the encoder to aid in the decoding process.
For Example, a typical video stream sequence:


Would be encoded as:


You can use TSPE to see this effect on a real video stream by examining the timestamps - click the PTS+ button after seeking to an I-Frame by clicking the IF+ button. This will show the encoded sequence of frames present and their timestamps (Presentation Time Stamp)


A Group Of Pictures (GOP) contains one I-Frame, P-Frames and B-Frames. The GOP Length is the number of frames between two I-Frames. For example, a typical MPEG2 GOP is IBBPBBPBBPBB
GOPs may be Open or Closed. An Open GOP's last B frames use information from the next I frame from the next GOP. A Closed GOP however, does not use any information from the next GOP. A Closed GOP is useful when editing for example.