Skip to main content

TCP: From Data to Streaming Video


As use of Transport Control Protocol (TCP) becomes ubiquitous for streaming video, service providers can achieve the best performance by understanding this technology’s inherent limitations. TCP, in combination with HTTP adaptive streaming (HAS), has many advantages for transporting non-real-time video. Non-real-time video refers to video in which the delay incurred before a video image can be presented to the user can be substantial (that is, several tens of seconds). A typical example is video for entertainment. TCP is widely used and well understood, and HAS can rely on a widely deployed infrastructure and is not hindered by firewalls. However, TCP has some built-in limitations that stem from its initial purpose. Because this technology was designed for reliable data transport offering a fair share of the available capacity to every user, it includes congestion control mechanisms that can affect video streaming. In addition, HAS has inherent inefficiencies that require more capacity than video over Real-time Transport Protocol (RTP)/User Datagram Protocol (UDP), the traditional method of transporting video. For best results, therefore, TCP should be used in the proper circumstances. For example, HAS over TCP is ideally suited for “instantly consumed,” non-real-time video, where users can tolerate a latency of 20 to 30 seconds between the information sent and the information displayed, but are reluctant to wait much longer than this after they have selected the video of their choice. It is not suitable for video streams where some form of real-time interaction between users is required (for example, video telephony), because distributing video files through lengthy download before viewing does not offer users the instant gratification that they have become used to. The best video performance can be provided by the right version of TCP. It should include the window scaling option, as well as Early Congestion Notification (ECN) and Selective Acknowledgements (SACK), the latter two to help improve efficiency in a wireless environment. In addition, care needs to be taken when sizing buffers in the bottleneck nodes and a large play-out buffer needs to be used in the video client. Clients generally should not leave time gaps between the chunks of video they request, unless using the highest video bit rate that can be supported over the network.

Adjustments for congestion control

Initially, TCP’s key role was to provide reliable data transfer over unreliable networks.[1] Consequently, congestion control mechanisms[2] have been built in to maintain “fair” end-to-end flow control while utilizing the network efficiently. Service providers need to be aware of these mechanisms and consider the following adjustments to address their impact on streaming video traffic using HAS over TCP.

Understanding the TCP window

To support reliable delivery of data packets, TCP uses sequence numbers and acknowledgements (ACKs) sent from the client to the server. For the purpose of congestion control, a TCP connection is allowed to send a designated amount of unacknowledged (unACKed) information, known as the “TCP window.” The size of this window changes dynamically: As soon as the client acknowledges data (in particular, data associated with streaming video), the TCP window begins to grow — until congestion is “sensed,” using packet loss (that is, the absence of ACKs) as the congestion signal. At that point, the window size is reduced by half, after which it begins to increase again as new ACKs trickle in, until congestion is recognized once more. This process results in a “sawtooth” pattern of throughput. In the initial phase just before congestion is sensed for the first time, the window grows exponentially, but consequently the window grows linearly.

Determining buffer size in the bottleneck nodes

Because TCP relies on packet loss to indicate network congestion, buffers need to be carefully dimensioned for TCP to work effectively. When a buffer is too small, the bursts of traffic that TCP produces cannot be absorbed by the buffer. Packet loss then occurs too frequently, which results in lower throughput than expected. When a buffer is too large, it tends to delay the feedback TCP receives regarding the state of network congestion. Consequently, it becomes harder for TCP to adjust throughput to maintain its fair share. For this reason, TCP often does not work well with traffic shapers.

Using advanced TCP for the wireless environment

Widely deployed versions of TCP cannot distinguish between packet loss caused by congestion and packet loss that results from noise (transport errors). As a result, TCP often cuts window size in half when it should not — decreasing its efficiency over noisy channels, such as wireless links. To address these issues, service providers can take advantage of the following options to make TCP more robust. (Additional details are available in RFC 3481.[3]

  • Early Congestion Notification (ECN) — Instead of waiting for packet loss to signal congestion, TCP is notified when a buffer is filling up. To improve efficiency in a wireless environment, ECN also allows TCP to partially distinguish between packet loss due to errors and packet loss due to congestion.
  • Selective Acknowledgements (SACK) — To avoid excessive retransmission of packets, the TCP receiver more carefully indicates which packets were lost, rather than retransmitting everything, starting with the first lost packet that was reported. This approach is especially valuable when the packet’s round-trip time (RTT) is large.

Wireless links can also be protected by link layer mechanisms that are designed for unreliable links. For example, Forward Error Control (FEC) and Hybrid Automatic Repeat ReQuest (HARQ) help reduce packet loss between the endpoints of the link. However, they also introduce additional delay and overhead bit rate. Service providers need to evaluate how well this trade-off meets their own unique requirements.

The effects of HTTP adaptive streaming

When HTTP adaptive streaming is used to transport video over TCP, the video is segmented in consecutive chunks, and each chunk is encoded at various bit rates. As illustrated in Figure 1, the client uses an “HTTP GET” command to request each successive chunk — which will be encoded at a bit rate that closely matches the throughput the client measures by monitoring how fast previous chunks were downloaded. The throughput the client gets is the fair share that TCP offers.

At a larger time scale (a few seconds), HAS tries to follow the fair share that TCP offers. But it can only do so with some delay, as shown in Figure 1. To absorb the mismatch between the video bit rate and the throughput profile offered by TCP, a large play-out buffer is needed at the client side. A video buffer of tens of seconds is not uncommon with HAS. Consequently, there is a large latency between the video information being sent and the video information being displayed. As previously stated, this works well for video distribution, but it is detrimental for interactive video communication. As illustrated in Figure 2, a typical HAS client will request the next video chunk only after the current chunk has been completely received. This process leaves a time gap, typically about one RTT, between video chunks. Sometimes a client voluntarily introduces an additional gap to prevent the video buffer from growing too large. This might be justified when the client requests the highest video bit rate, but some clients introduce these gaps at a bit rate lower than the maximum for video. During these time gaps, transmission opportunities are missed, and if the gaps grow too long (that is, longer than the TCP retransmission timer, typically 1 second), TCP will start rebuilding its window from scratch. As a result, HAS is not capable of fully using the fair share that TCP offers. At times, it even results in the client choosing a lower rate than necessary. This phenomenon becomes more pronounced as the RTT increases. First, because the RTT is larger the gaps are larger, and secondly, the responsiveness of TCP decreases as the RTT increases. Typically for RTT <100 ms, HAS works fine, while for larger RTTs HAS starts to suffer[4].

The impact of “link-fair” congestion control

TCP is link-fair but not application-fair. That is, it offers throughput without taking into account the application’s needs. When a network is congested, an application-fair congestion control mechanism downgrades the throughput of a high definition video flow and a standard definition video flow more or less in proportion to their respective maximum bit rate. In contrast, TCP offers them the same throughput, if their RTTs are the same. As a result, the high definition video flow suffers more than the standard definition video flow. This problem can be addressed, to some extent, by increasing the number of TCP connections used for the high definition video. Current HAS clients typically open more than one TCP connection, making them behave unfairly to other applications that only rely on one TCP connection. Unfortunately, there is no easy way for the network to control how many TCP connections an application opens. Today, TCP and HAS are becoming ubiquitous, as the phenomenal growth of video traffic continues. These technologies offer the advantages of familiarity and a widely deployed infrastructure. And when their inherent limitations are understood and addressed appropriately, they provide an excellent option for transporting “instantly consumed” video. To contact the authors or request additional information, please send an email to The authors would like to thank Werner Van Leekwijck, Volker Hilt, John Hearn and their teams for the valuable discussions on this topic.


  1. [1]J. Postel, “Transmission Control Protocol,” STD 7, IETF RFC 793, September 1981.
  2. [2]M. Allman, V. Paxson, W. Stevens, “TCP Congestion Control,” IETF RFC 2581, April 1999.
  3. [3]H. Inamura, G. Montenegro, R. Ludwig, A. Gurtov, F. Khafizov, “TCP over Second (2.5G) and Third (3G) Generation Wireless Networks,” IETF RFC 3481, February 2003
  4. [4]Viorel Craciun, N. Degrande, Y. Jutras, D. Robinson, “HAS Shows Its Value for Service Providers,” TechZine, January 12, 2011
David Robinson

About David Robinson

DAVE ROBINSON is a technology strategist in Alcatel-Lucent’s Network and Platform Division with particular interest in video. He has worked for several organizations including Digital Equipment Corporation and Oracle before joining Alcatel-Lucent to work on IPTV and Internet TV solutions. He has long standing interest in measuring and improving end user QoE with IPTV and OTT video entertainment systems. This includes the practical application of caching, recommendations engines, and adaptive streaming enable improved end user experience. Currently he is investigating ways to optimise delivery of HTTP Adaptive Streaming video over the Internet. Dave has a B.Sc. and a Ph.D., both from Imperial College in London.

Danny De Vleeschauwer

About Danny De Vleeschauwer

Danny De Vleeschauwer is a Network Strategist in the video and immersion advisory group of Bell Labs CTO in Antwerp, Belgium. He received an M.Sc. in Electrical Engineering and a Ph.D. degree in applied sciences from Ghent University in Belgium. Prior to joining Alcatel-Lucent, Dr. De Vleeschauwer was a researcher at Ghent University. His early work was on image processing, and he worked later on the application of queuing theory in packet-based networks. His current research focus is on ensuring adequate quality for triple-play services offered over packet-based networks. He is a guest professor in the Telecommunications and Information Processing Department (TELIN) of Ghent University, Distinguished Member of Technical Staff and a member of Alcatel-Lucent Technical Academy (ALTA).

Article tags