|CONTENTS | PREV | NEXT | INDEX||JMF 2.0 API Guide|
To send or receive a live media broadcast or conduct a video conference over the Internet or Intranet, you need to be able to receive and transmit media streams in real-time. This chapter introduces streaming media concepts and describes the Real-time Transport Protocol JMF uses for receiving and transmitting media streams across the network.
When media content is streamed to a client in real-time, the client can begin to play the stream without having to wait for the complete stream to download. In fact, the stream might not even have a predefined duration--downloading the entire stream before playing it would be impossible. The term streaming media is often used to refer to both this technique of delivering content over the network in real-time and the real-time media content that's delivered.
Streaming media is everywhere you look on the web--live radio and television broadcasts and webcast concerts and events are being offered by a rapidly growing number of web portals, and it's now possible to conduct audio and video conferences over the Internet. By enabling the delivery of dynamic, interactive media content across the network, streaming media is changing the way people communicate and access information.
Protocols for Streaming Media
Transmitting media data across the net in real-time requires high network throughput. It's easier to compensate for lost data than to compensate for large delays in receiving the data. This is very different from accessing static data such as a file, where the most important thing is that all of the data arrive at its destination. Consequently, the protocols used for static data don't work well for streaming media.
The HTTP and FTP protocols are based on the Transmission Control Protocol (TCP). TCP is a transport-layer protocol1 designed for reliable data communications on low-bandwidth, high-error-rate networks. When a packet is lost or corrupted, it's retransmitted. The overhead of guaranteeing reliable data transfer slows the overall transmission rate.
For this reason, underlying protocols other than TCP are typically used for streaming media. One that's commonly used is the User Datagram Protocol (UDP). UDP is an unreliable protocol; it does not guarantee that each packet will reach its destination. There's also no guarantee that the packets will arrive in the order that they were sent. The receiver has to be able to compensate for lost data, duplicate packets, and packets that arrive out of order.
Like TCP, UDP is a general transport-layer protocol--a lower-level networking protocol on top of which more application-specific protocols are built. The Internet standard for transporting real-time data such as audio and video is the Real-Time Transport Protocol (RTP).
RTP is defined in IETF RFC 1889, a product of the AVT working group of the Internet Engineering Task Force (IETF).
Real-Time Transport Protocol
RTP provides end-to-end network delivery services for the transmission of real-time data. RTP is network and transport-protocol independent, though it is often used over UDP.
Figure 7-1: RTP architecture.
RTP can be used over both unicast and multicast network services. Over a unicast network service, separate copies of the data are sent from the source to each destination. Over a multicast network service, the data is sent from the source only once and the network is responsible for transmitting the data to multiple locations. Multicasting is more efficient for many multimedia applications, such as video conferences. The standard Internet Protocol (IP) supports multicasting.
RTP enables you to identify the type of data being transmitted, determine what order the packets of data should be presented in, and synchronize media streams from different sources.
RTP data packets are not guaranteed to arrive in the order that they were sent--in fact, they're not guaranteed to arrive at all. It's up to the receiver to reconstruct the sender's packet sequence and detect lost packets using the information provided in the packet header.
While RTP does not provide any mechanism to ensure timely delivery or provide other quality of service guarantees, it is augmented by a control protocol (RTCP) that enables you to monitor the quality of the data distribution. RTCP also provides control and identification mechanisms for RTP transmissions.
If quality of service is essential for a particular application, RTP can be used over a resource reservation protocol that provides connection-oriented services.
An RTP session is an association among a set of applications communicating with RTP. A session is identified by a network address and a pair of ports. One port is used for the media data and the other is used for control (RTCP) data.
A participant is a single machine, host, or user participating in the session. Participation in a session can consist of passive reception of data (receiver), active transmission of data (sender), or both.
Each media type is transmitted in a different session. For example, if both audio and video are used in a conference, one session is used to transmit the audio data and a separate session is used to transmit the video data. This enables participants to choose which media types they want to receive--for example, someone who has a low-bandwidth network connection might only want to receive the audio portion of a conference.
The media data for a session is transmitted as a series of packets. A series of data packets that originate from a particular source is referred to as an RTP stream. Each RTP data packet in a stream contains two parts, a structured header and the actual data (the packet's payload).
Figure 7-2: RTP data-packet header format.
The header of an RTP data packet contains:
- The RTP version number (V): 2 bits. The version defined by the current specification is 2.
- Padding (P): 1 bit. If the padding bit is set, there are one or more bytes at the end of the packet that are not part of the payload. The very last byte in the packet indicates the number of bytes of padding. The padding is used by some encryption algorithms.
- Extension (X): 1 bit. If the extension bit is set, the fixed header is followed by one header extension. This extension mechanism enables implementations to add information to the RTP Header.
- CSRC Count (CC): 4 bits. The number of CSRC identifiers that follow the fixed header. If the CSRC count is zero, the synchronization source is the source of the payload.
- Marker (M): 1 bit. A marker bit defined by the particular media profile.
- Payload Type (PT): 7 bits. An index into a media profile table that describes the payload format. The payload mappings for audio and video are specified in RFC 1890.
- Sequence Number: 16 bits. A unique packet number that identifies this packet's position in the sequence of packets. The packet number is incremented by one for each packet sent.
- Timestamp: 32 bits. Reflects the sampling instant of the first byte in the payload. Several consecutive packets can have the same timestamp if they are logically generated at the same time--for example, if they are all part of the same video frame.
- SSRC: 32 bits. Identifies the synchronization source. If the CSRC count is zero, the payload source is the synchronization source. If the CSRC count is nonzero, the SSRC identifies the mixer.
- CSRC: 32 bits each. Identifies the contributing sources for the payload. The number of contributing sources is indicated by the CSRC count field; there can be up to 16 contributing sources. If there are multiple contributing sources, the payload is the mixed data from those sources.
In addition to the media data for a session, control data (RTCP) packets are sent periodically to all of the participants in the session. RTCP packets can contain information about the quality of service for the session participants, information about the source of the media being transmitted on the data port, and statistics pertaining to the data that has been transmitted so far.
There are several types of RTCP packets:
RTCP packets are "stackable" and are sent as a compound packet that contains at least two packets, a report packet and a source description packet.
All participants in a session send RTCP packets. A participant that has recently sent data packets issues a sender report. The sender report (SR) contains the total number of packets and bytes sent as well as information that can be used to synchronize media streams from different sessions.
Session participants periodically issue receiver reports for all of the sources from which they are receiving data packets. A receiver report (RR) contains information about the number of packets lost, the highest sequence number received, and a timestamp that can be used to estimate the round-trip delay between a sender and the receiver.
The first packet in a compound RTCP packet has to be a report packet, even if no data has been sent or received--in which case, an empty receiver report is sent.
All compound RTCP packets must include a source description (SDES) element that contains the canonical name (CNAME) that identifies the source. Additional information might be included in the source description, such as the source's name, email address, phone number, geographic location, application name, or a message describing the current state of the source.
When a source is no longer active, it sends an RTCP BYE packet. The BYE notice can include the reason that the source is leaving the session.
RTCP APP packets provide a mechanism for applications to define and send custom information via the RTP control port.
RTP applications are often divided into those that need to be able to receive data from the network (RTP Clients) and those that need to be able to transmit data across the network (RTP Servers). Some applications do both--for example, conferencing applications capture and transmit data at the same time that they're receiving data from the network.
Receiving Media Streams From the Network
Being able to receive RTP streams is necessary for several types of applications. For example:
- Conferencing applications need to be able to receive a media stream from an RTP session and render it on the console.
- A telephone answering machine application needs to be able to receive a media stream from an RTP session and store it in a file.
- An application that records a conversation or conference must be able to receive a media stream from an RTP session and both render it on the console and store it in a file.
Transmitting Media Streams Across the Network
RTP server applications transmit captured or stored media streams across the network.
For example, in a conferencing application, a media stream might be captured from a video camera and sent out on one or more RTP sessions. The media streams might be encoded in multiple media formats and sent out on several RTP sessions for conferencing with heterogeneous receivers. Multiparty conferencing could be implemented without IP multicast by using multiple unicast RTP sessions.
The RTP specification is a product of the Audio Video Transport (AVT) working group of the Internet Engineering Task Force (IETF). For additional information about the IETF, see
http://www.ietf.org. The AVT working group charter and proceedings are available at
IETF RFC 1889, RTP: A Transport Protocol for Real Time Applications
IETF RFC 1890: RTP Profile for Audio and Video Conferences with Minimal Control
Note: These RFCs are undergoing revisions in preparation for advancement from Proposed Standard to Draft Standard and the URLs listed here are for the Internet Drafts of the revisions available at the time of publication.
In addition to these RFCs, separate payload specification documents define how particular payloads are to be carried in RTP. For a list of all of the RTP-related specifications, see the AVT working group charter at:1 In the seven layer ISO/OSI data communications model, the transport layer is level four. For more information about the ISO/OSI model, see Understanding OSI. Larmouth, John. International Thompson Computer Press, 1996. ISBN 1850321760.