Homework Problems and Questions

Chapter 9 Review Questions

SECTION 9.1

R1. Reconstruct Table 9.1 for when Victor Video is watching a 4 Mbps video, Facebook Frank is looking at a new 100 Kbyte image every 20 seconds, and Martha Music is listening to 200 kbps audio stream.
R2. There are two types of redundancy in video. Describe them, and discuss how they can be exploited for efficient compression.
R3. Suppose an analog audio signal is sampled 16,000 times per second, and each sample is quantized into one of 1024 levels. What would be the resulting bit rate of the PCM digital audio signal?
R4. Multimedia applications can be classified into three categories. Name and describe each category.

SECTION 9.2

R5. Streaming video systems can be classified into three categories. Name and briefly describe each of these categories.
R6. List three disadvantages of UDP streaming.
R7. With HTTP streaming, are the TCP receive buffer and the client’s application buffer the same thing? If not, how do they interact?
R8. Consider the simple model for HTTP streaming. Suppose the server sends bits at a constant rate of 2 Mbps and playback begins when 8 million bits have been received. What is the initial buffering delay t_p?

SECTION 9.3

R9. What is the difference between end-to-end delay and packet jitter? What are the causes of packet jitter?
R10. Why is a packet that is received after its scheduled playout time considered lost?
R11. Section 9.3 describes two FEC schemes. Briefly summarize them. Both schemes increase the transmission rate of the stream by adding overhead. Does interleaving also increase the transmission rate?

SECTION 9.4

R12. How are different RTP streams in different sessions identified by a receiver? How are different streams from within the same session identified?
R13. What is the role of a SIP registrar? How is the role of an SIP registrar different from that of a home agent in Mobile IP?

Problems

P1. Consider the figure below. Similar to our discussion of Figure 9.1 , suppose that video is encoded at a fixed bit rate, and thus each video block contains video frames that are to be played out over the same fixed amount of time, Δ. The server transmits the first video block at t₀, the second block at t0+Δ, the third block at t0+2Δ, and so on. Once the client begins playout, each block should be played out Δ time units after the previous block.

Description

In this graph, the x-axis is labeled "Time" and the y-axis is labeled "Video block number." The y-axis omits 0 but moves upwards in increments of one to 9. In the left half of the x-axis we see a blue line rising upwards to the right in a stair-step fashion. This line is labeled "Constant bit rate video transmission by server." It begins at a point labeled "t0." The line rises to 1 on the y-axis then extends horizontally to the right for one unit defined as Delta. It then goes up to 2 and over one Delta, etc. until it reaches 7 on the y-axis.

In the right half of the x-axis we see a dark blue line, labeled "Video reception at client," rise from a point labeled t1 and extend upward to the right in an inconsistent fashion. The x-axis is delineated from this point into units of Delta. This line rises to 1 on the y-axis and then extends horizontally for one and a half Deltas. It rises to 2 and extends into the next Delta. Then it rises to 3 and extends within the same Delta, then rises to 4 and extends just barely into the next Delta. From here it rises to 5 and extends through the Delta and halfway into the next Delta. From here it rises to 6 and extends three Deltas, then rises to seven.
1. Suppose that the client begins playout as soon as the first block arrives at t₁. In the figure below, how many blocks of video (including the first block) will have arrived at the client in time for their playout? Explain how you arrived at your answer.
2. Suppose that the client begins playout now at t1+Δ. How many blocks of video (including the first block) will have arrived at the client in time for their playout? Explain how you arrived at your answer.
3. In the same scenario at (b) above, what is the largest number of blocks that is ever stored in the client buffer, awaiting playout? Explain how you arrived at your answer.
4. What is the smallest playout delay at the client, such that every video block has arrived in time for its playout? Explain how you arrived at your answer.
P2. Recall the simple model for HTTP streaming shown in Figure 9.3 . Recall that B denotes the size of the client’s application buffer, and Q denotes the number of bits that must be buffered before the client application begins playout. Also r denotes the video consumption rate. Assume that the server sends bits at a constant rate x whenever the client buffer is not full.
1. Suppose that x<r. As discussed in the text, in this case playout will alternate between periods of continuous playout and periods of freezing. Determine the length of each continuous playout and freezing period as a function of Q, r, and x.
2. Now suppose that x>r. At what time t=tf does the client application buffer become full?
P3. Recall the simple model for HTTP streaming shown in Figure 9.3 . Suppose the buffer size is infinite but the server sends bits at variable rate x(t). Specifically, suppose x(t) has the following saw-tooth shape. The rate is initially zero at time t=0 and linearly climbs to H at time t=T. It then repeats this pattern again and again, as shown in the figure below.

Description

In this graph, the x-axis is labeled "Time" and the y-axis is labeled "Bit rate x(t)." The x-axis is divided at regular intervals into T, 2T, 3T, and 4T. The y-axis is unmarked except for an H near its top.

A blue line begins at the intersection of x and y and rises sharply to H on the y-axis at T on the x-axis, then drops vertically to the x-axis. From here it rises sharply to H on the y-axis and 2T on the x-axis, then drops vertically to the x-axis. This pattern is repeated twice more.
1. What is the server’s average send rate?
2. Suppose that Q=0, so that the client starts playback as soon as it receives a video frame. What will happen?
3. Now suppose Q>0 and HT/2≥Q. Determine as a function of Q, H, and T the time at which playback first begins.
4. Suppose H>2r and Q=HT/2. Prove there will be no freezing after the initial playout delay.
5. Suppose H>2r. Find the smallest value of Q such that there will be no freezing after the initial playback delay.
6. Now suppose that the buffer size B is finite. Suppose H>2r. As a function of Q, B, T, and H, determine the time t=tf when the client application buffer first becomes full.
P4. Recall the simple model for HTTP streaming shown in Figure 9.3 . Suppose the client application buffer is infinite, the server sends at the constant rate x, and the video consumption rate is r with r<x. Also suppose playback begins immediately. Suppose that the user terminates the video early at time t=E. At the time of termination, the server stops sending bits (if it hasn’t already sent all the bits in the video).
1. Suppose the video is infinitely long. How many bits are wasted (that is, sent but not viewed)?
2. Suppose the video is T seconds long with T>E. How many bits are wasted (that is, sent but not viewed)?
P5. Consider a DASH system (as discussed in Section 2.6 ) for which there are N video versions (at N different rates and qualities) and N audio versions (at N different rates and qualities). Suppose we want to allow the player to choose at any time any of the N video versions and any of the N audio versions.
1. If we create files so that the audio is mixed in with the video, so server sends only one media stream at given time, how many files will the server need to store (each a different URL)?
2. If the server instead sends the audio and video streams separately and has the client synchronize the streams, how many files will the server need to store?
P6. In the VoIP example in Section 9.3 , let h be the total number of header bytes added to each chunk, including UDP and IP header.
1. Assuming an IP datagram is emitted every 20 msecs, find the transmission rate in bits per second for the datagrams generated by one side of this application.
2. What is a typical value of h when RTP is used?
P7. Consider the procedure described in Section 9.3 for estimating average delay d_i. Suppose that u=0.1. Let r1−t1 be the most recent sample delay, let r2−t2 be the next most recent sample delay, and so on.
1. For a given audio application suppose four packets have arrived at the receiver with sample delays r4−t4, r3−t3, r2−t2, and r1−t1. Express the estimate of delay d in terms of the four samples.
2. Generalize your formula for n sample delays.
3. For the formula in part (b), let n approach infinity and give the resulting formula. Comment on why this averaging procedure is called an exponential moving average.
P8. Repeat parts (a) and (b) in Question P7 for the estimate of average delay deviation.
P9. For the VoIP example in Section 9.3 , we introduced an online procedure (exponential moving average) for estimating delay. In this problem we will examine an alternative procedure. Let t_i be the timestamp of the ith packet received; let r_i be the time at which the ith packet is received. Let d_n be our estimate of average delay after receiving the nth packet. After the first packet is received, we set the delay estimate equal to d1=r1−t1.
1. Suppose that we would like dn=(r1−t1+r2−t2+⋯+rn−tn)/n for all n. Give a recursive formula for d_n in terms of dn−1, rn, and t_n.
2. Describe why for Internet telephony, the delay estimate described in Section 9.3 is more appropriate than the delay estimate outlined in part (a).
P10. Compare the procedure described in Section 9.3 for estimating average delay with the procedure in Section 3.5 for estimating round-trip time. What do the procedures have in common? How are they different?
P11. Consider the figure below (which is similar to Figure 9.3 ). A sender begins sending packetized audio periodically at t=1. The first packet arrives at the receiver at t=8.

Description

In this graph, the x-axis is labeled "Time" and the y-axis is labeled "Packets." The x-axis begins at 0 and moves at regular intervals in increments of one to sixteen. Only 1 and 8 are actually numbered. The y-axis is also is notched at eight regular intervals, though these are not labeled.

At 1 on the x-axis, a line extends upwards to the right in a stair-step fashion. It rises to 1 on the y-axis then over to 2 on the x-axis, then up to 2 on y and 3 on x, etc. to the top of the graph. This line is labeled "Packets generated."

At 8 on the x-axis, a line extends upwards to the right in an inconsistent stair-step fashion. It rises to 1 on the y-axis then extends horizontally over to 9 on the x-axis. From there it rises to 2 on the y-axis and extends to 12 on the x-axis. From there it rises to 5 on the y-axis and extends to 15 on the x-axis. From there it rises to 7 on the y-axis and extends to 16 on the x-axis, then rises to 8 on the y-axis
1. What are the delays (from sender to receiver, ignoring any playout delays) of packets 2 through 8? Note that each vertical and horizontal line segment in the figure has a length of 1, 2, or 3 time units.
2. If audio playout begins as soon as the first packet arrives at the receiver at t=8, which of the first eight packets sent will not arrive in time for playout?
3. If audio playout begins at t=9, which of the first eight packets sent will not arrive in time for playout?
4. What is the minimum playout delay at the receiver that results in all of the first eight packets arriving in time for their playout?
P12. Consider again the figure in P11, showing packet audio transmission and reception times.
1. Compute the estimated delay for packets 2 through 8, using the formula for d_i from Section 9.3.2 . Use a value of u=0.1.
2. Compute the estimated deviation of the delay from the estimated average for packets 2 through 8, using the formula for v_i from Section 9.3.2 . Use a value of u=0.1.
P13. Recall the two FEC schemes for VoIP described in Section 9.3 . Suppose the first scheme generates a redundant chunk for every four original chunks. Suppose the second scheme uses a low-bit rate encoding whose transmission rate is 25 percent of the transmission rate of the nominal stream.
1. How much additional bandwidth does each scheme require? How much playback delay does each scheme add?
2. How do the two schemes perform if the first packet is lost in every group of five packets? Which scheme will have better audio quality?
3. How do the two schemes perform if the first packet is lost in every group of two packets? Which scheme will have better audio quality?
P14.
1. Consider an audio conference call in Skype with N>2 participants. Suppose each participant generates a constant stream of rate r bps. How many bits per second will the call initiator need to send? How many bits per second will each of the other N−1 participants need to send? What is the total send rate, aggregated over all participants?
2. Repeat part (a) for a Skype video conference call using a central server.
3. Repeat part (b), but now for when each peer sends a copy of its video stream to each of the N−1 other peers.
P15.
1. Suppose we send into the Internet two IP datagrams, each carrying a different UDP segment. The first datagram has source IP address A1, destination IP address B, source port P1, and destination port T. The second datagram has source IP address A2, destination IP address B, source port P2, and destination port T. Suppose that A1 is different from A2 and that P1 is different from P2. Assuming that both datagrams reach their final destination, will the two UDP datagrams be received by the same socket? Why or why not?
2. Suppose Alice, Bob, and Claire want to have an audio conference call using SIP and RTP. For Alice to send and receive RTP packets to and from Bob and Claire, is only one UDP socket sufficient (in addition to the socket needed for the SIP messages)? If yes, then how does Alice’s SIP client distinguish between the RTP packets received from Bob and Claire?
P16. True or false:
1. If stored video is streamed directly from a Web server to a media player, then the application is using TCP as the underlying transport protocol.
2. When using RTP, it is possible for a sender to change encoding in the middle of a session.
3. All applications that use RTP must use port 87.
4. If an RTP session has a separate audio and video stream for each sender, then the audio and video streams use the same SSRC.
5. In differentiated services, while per-hop behavior defines differences in performance among classes, it does not mandate any particular mechanism for achieving these performances.
  
  Description
  
  In this illustration, we see (from left to right) a horizontal line labeled "Arrivals," which is marked at regular intervals and labeled at every other mark. The first mark is labeled "t =8," then "t = 6," and so on to "t = 0." At t = 7 we see two data packets labeled 10 and 9. At t = 6 we see two data packets labeled 8 and 7. At t = 3 we see one data packet labeled 6. At t = 2 we see one data packet labeled 5. At t = 1 we see one data packet labeled 4. And at t = 0 we see three data packets labeled 3, 2, and 1.
  
  To the right of the arrivals we see "Packet queue (wait for tokens)" with eight empty slots. To the right of the packet queue is the "remove token" diamond shape from figure 9.14 (though unlabeled here). Above it is the leaky bucket, with the dropped token labeled "r = 1 token/slot," and the bucket labeled "b = 2 tokens."
  
  To the right of the diamond is a horizontal line pointing to the right and notched at regular intervals and labeled at every other notch, beginning with "t = 4" and ending with "t = 0."
6. Suppose Alice wants to establish an SIP session with Bob. In her INVITE message she includes the line: m=audio 48753 RTP/AVP 3 (AVP 3 denotes GSM audio). Alice has therefore indicated in this message that she wishes to send GSM audio.
7. Referring to the preceding statement, Alice has indicated in her INVITE message that she will send audio to port 48753.
8. SIP messages are typically sent between SIP entities using a default SIP port number.
9. In order to maintain registration, SIP clients must periodically send REGISTER messages.
10. SIP mandates that all SIP clients support G.711 audio encoding.
P17. Consider the figure below, which shows a leaky bucket policer being fed by a stream of packets. The token buffer can hold at most two tokens, and is initially full at t=0. New tokens arrive at a rate of one token per slot. The output link speed is such that if two packets obtain tokens at the beginning of a time slot, they can both go to the output link in the same slot. The timing details of the system are as follows:
1. Packets (if any) arrive at the beginning of the slot. Thus in the figure, packets 1, 2, and 3 arrive in slot 0. If there are already packets in the queue, then the arriving packets join the end of the queue. Packets proceed towards the front of the queue in a FIFO manner.
2. After the arrivals have been added to the queue, if there are any queued packets, one or two of those packets (depending on the number of available tokens) will each remove a token from the token buffer and go to the output link during that slot. Thus, packets 1 and 2 each remove a token from the buffer (since there are initially two tokens) and go to the output link during slot 0.
3. A new token is added to the token buffer if it is not full, since the token generation rate is r = 1 token/slot.
4. Time then advances to the next time slot, and these steps repeat.
Answer the following questions:
1. For each time slot, identify the packets that are in the queue and the number of tokens in the bucket, immediately after the arrivals have been processed (step 1 above) but before any of the packets have passed through the queue and removed a token. Thus, for the t=0 time slot in the example above, packets 1, 2, and 3 are in the queue, and there are two tokens in the buffer.
2. For each time slot indicate which packets appear on the output after the token(s) have been removed from the queue. Thus, for the t=0 time slot in the example above, packets 1 and 2 appear on the output link from the leaky buffer during slot 0.
P18. Repeat P17 but assume that r=2. Assume again that the bucket is initially full.
P19. Consider P18 and suppose now that r=3 and that b=2 as before. Will your answer to the question above change?
P20. Consider the leaky bucket policer that polices the average rate and burst size of a packet flow. We now want to police the peak rate, p, as well. Show how the output of this leaky bucket policer can be fed into a second leaky bucket policer so that the two leaky buckets in series police the average rate, peak rate, and burst size. Be sure to give the bucket size and token generation rate for the second policer.
P21. A packet flow is said to conform to a leaky bucket specification (r, b) with burst size b and average rate r if the number of packets that arrive to the leaky bucket is less than rt+b packets in every interval of time of length t for all t. Will a packet flow that conforms to a leaky bucket specification (r, b) ever have to wait at a leaky bucket policer with parameters r and b? Justify your answer.
P22. Show that as long as r1<Rw1/(∑ wj), then d_max is indeed the maximum delay that any packet in flow 1 will ever experience in the WFQ queue.

Programming Assignment

In this lab, you will implement a streaming video server and client. The client will use the real-time streaming protocol (RTSP) to control the actions of the server. The server will use the real-time protocol (RTP) to packetize the video for transport over UDP. You will be given Python code that partially implements RTSP and RTP at the client and server. Your job will be to complete both the client and server code. When you are finished, you will have created a client-server application that does the following:

The client sends SETUP, PLAY, PAUSE, and TEARDOWN RTSP commands, and the server responds to the commands.
When the server is in the playing state, it periodically grabs a stored JPEG frame, packetizes the frame with RTP, and sends the RTP packet into a UDP socket.
The client receives the RTP packets, removes the JPEG frames, decompresses the frames, and renders the frames on the client’s monitor.

The code you will be given implements the RTSP protocol in the server and the RTP depacketization in the client. The code also takes care of displaying the transmitted video. You will need to implement RTSP in the client and RTP server. This programming assignment will significantly enhance the student’s understanding of RTP, RTSP, and streaming video. It is highly recommended. The assignment also suggests a number of optional exercises, including implementing the RTSP DESCRIBE command at both client and server. You can find full details of the assignment, as well as an overview of the RTSP protocol, at the Web site www.pearsonhighered.com/cs-resources.

AN INTERVIEW WITH . . .

Henning Schulzrinne

Henning Schulzrinne is a professor, chair of the Department of Computer Science, and head of the Internet Real-Time Laboratory at Columbia University. He is the co-author of RTP, RTSP, SIP, and GIST—key protocols for audio and video communications over the Internet. Henning received his BS in electrical and industrial engineering at TU Darmstadt in Germany, his MS in electrical and computer engineering at the University of Cincinnati, and his PhD in electrical engineering at the University of Massachusetts, Amherst.

What made you decide to specialize in multimedia networking?

This happened almost by accident. As a PhD student, I got involved with DARTnet, an experimental network spanning the United States with T1 lines. DARTnet was used as a proving ground for multicast and Internet real-time tools. That led me to write my first audio tool, NeVoT. Through some of the DARTnet participants, I became involved in the IETF, in the then-nascent Audio Video Transport working group. This group later ended up standardizing RTP.

What was your first job in the computer industry? What did it entail?

My first job in the computer industry was soldering together an Altair computer kit when I was a high school student in Livermore, California. Back in Germany, I started a little consulting company that devised an address management program for a travel agency—storing data on cassette tapes for our TRS-80 and using an IBM Selectric typewriter with a home-brew hardware interface as a printer.

My first real job was with AT&T Bell Laboratories, developing a network emulator for constructing experimental networks in a lab environment.

What are the goals of the Internet Real-Time Lab?

Our goal is to provide components and building blocks for the Internet as the single future communications infrastructure. This includes developing new protocols, such as GIST (for network-layer signaling) and LoST (for finding resources by location), or enhancing protocols that we have worked on earlier, such as SIP, through work on rich presence, peer-to-peer systems, next-generation emergency calling, and service creation tools. Recently, we have also looked extensively at wireless systems for VoIP, as 802.11b and 802.11n networks and maybe WiMax networks are likely to become important last-mile technologies for telephony. We are also trying to greatly improve the ability of users to diagnose faults in the complicated tangle of providers and equipment, using a peer-to-peer fault diagnosis system called DYSWIS (Do You See What I See).

We try to do practically relevant work, by building prototypes and open source systems, by measuring performance of real systems, and by contributing to IETF standards.

What is your vision for the future of multimedia networking?

We are now in a transition phase; just a few years shy of when IP will be the universal platform for multimedia services, from IPTV to VoIP. We expect radio, telephone, and TV to be available even during snowstorms and earthquakes, so when the Internet takes over the role of these dedicated networks, users will expect the same level of reliability.

We will have to learn to design network technologies for an ecosystem of competing carriers, service and content providers, serving lots of technically untrained users and defending them against a small, but destructive, set of malicious and criminal users. Changing protocols is becoming increasingly hard. They are also becoming more complex, as they need to take into account competing business interests, security, privacy, and the lack of transparency of networks caused by firewalls and network address translators.

Since multimedia networking is becoming the foundation for almost all of consumer entertainment, there will be an emphasis on managing very large networks, at low cost. Users will expect ease of use, such as finding the same content on all of their devices.

Why does SIP have a promising future?

As the current wireless network upgrade to 3G networks proceeds, there is the hope of a single multimedia signaling mechanism spanning all types of networks, from cable modems, to corporate telephone networks and public wireless networks. Together with software radios, this will make it possible in the future that a single device can be used on a home network, as a cordless BlueTooth phone, in a corporate network via 802.11 and in the wide area via 3G networks. Even before we have such a single universal wireless device, the personal mobility mechanisms make it possible to hide the differences between networks. One identifier becomes the universal means of reaching a person, rather than remembering or passing around half a dozen technology- or location-specific telephone numbers.

SIP also breaks apart the provision of voice (bit) transport from voice services. It now becomes technically possible to break apart the local telephone monopoly, where one company provides neutral bit transport, while others provide IP “dial tone” and the classical telephone services, such as gateways, call forwarding, and caller ID.

Beyond multimedia signaling, SIP offers a new service that has been missing in the Internet: event notification. We have approximated such services with HTTP kludges and e-mail, but this was never very satisfactory. Since events are a common abstraction for distributed systems, this may simplify the construction of new services.

Do you have any advice for students entering the networking field?

Networking bridges disciplines. It draws from electrical engineering, all aspects of computer science, operations research, statistics, economics, and other disciplines. Thus, networking researchers have to be familiar with subjects well beyond protocols and routing algorithms. Given that networks are becoming such an important part of everyday life, students wanting to make a difference in the field should think of the new resource constraints in networks: human time and effort, rather than just bandwidth or storage.

Work in networking research can be immensely satisfying since it is about allowing people to communicate and exchange ideas, one of the essentials of being human. The Internet has become the third major global infrastructure, next to the transportation system and energy distribution. Almost no part of the economy can work without high-performance networks, so there should be plenty of opportunities for the foreseeable future.