The Vault

Early Packet Loss Feedback for Improved Video Delivery over 802.11 Wireless Channels
Research Paper / Feb 2014

Early Packet Loss Feedback for Improved Video Delivery over 802.11 Wireless Channels Weimin Liu, Rahul Vanam, Liangping Ma, Yuriy A. Reznik, and Gregory S. Sternberg InterDigital Communications, LLC, USA Abstract—This paper proposes a cross-layer real-time video transmission scheme in which packet losses over the local 802.11 wireless link are indicated to the video encoder as soon as possible, and upon a notification the video encoder uses a variety of prediction-resetting techniques to prevent further error propagation. The proposed scheme is based on current Internet protocols and overcomes common challenges presented by these protocols such as encryption. Experimental results show a significant improvement in video quality over conventional packet loss feedback methods. Keywords—cross-layer; video; packet loss; feedback; 802.11; WLAN; error propagation I. INTRODUCTION There has been a rapid growth in mobile multimedia traffic in recent years due to the introduction of a new generation of smart phones and tablet computers with their high-resolution video capabilities and support of interactive applications. Video now accounts for 51% of the mobile traffic, and Cisco predicts that mobile video will increase 16-fold between 2012 and 2017, representing two thirds of total mobile data traffic [1]. Wireless local area networking (WLAN), generally known as Wi-Fi based on IEEE 802.11 standards [2], has been a key technology for data delivery for both mobile and non- mobile users. At the end of 2012, data consumption over Wi- Fi was four times that of cellular [1]. Real-time video applications impose challenging latency requirements on the wireless networks. We consider mobile video telephony operating over WLAN links as illustrated in Figure 1. Like most wireless technologies, WLAN networks suffer from transmission errors, which results in degraded video quality. Bob AP Internet Alice Local wireless link Feedback Wired or wireless link Figure 1. WLAN communication links and conventional feedback in mobile video telephony. Only one direction, from Alice to Bob, is shown. In order to improve delivery of voice and video data, IEEE 802.11 and Wi-Fi Alliance have defined quality-of-service (QoS) provisions to provide different access priorities with extended distributed medium access (EDCA) and hybrid coordination function (HCF) Controlled Channel Access (HCCA) [3]. There have been various MAC-layer and cross- layer approaches to improving delivery of video over WLAN networks, include relaying [4], rate control, selective retransmission [5], smart packet drop, finer prioritization of packets within one stream, and content-specific methods ([6], [7]). Recognizing that packet losses do happen from time to time, another approach is to reduce the degradation to the video when a packet is lost during transmission. The video encoder can limit the error propagation if it knows which packet is lost. Positive (ACK) and negative acknowledgements (NACK) can be collected at the receiver and then transmitted as a report to the sender. For example, the report can be encapsulated according to IETF RFC 4585 [9] and ITU-T H.271 [10] and carried in RTP Control Protocol (RTCP) reports [8]. However, there is often latency in transmitting the feedback report, as depicted in Figure 1. The collection period for the RTCP report is regulated by the timing rules specified in RFC 4585. In practice, such reports are usually sent periodically and not very often – at about one second intervals. By then, significant error propagation has occurred. Consider a mobile video telephony operating with RTP transport protocol and RTCP-type feedback in an architecture shown in Figure 1. The first or local wireless link from Alice to Bob is closest to Alice with the shortest feedback delay. The goal of this paper is to establish a mechanism in which a packet loss over the local 802.11 link is fed back to the video encoder in order to stop error propagation. Timeliness is of the essence in the feedback. The sooner or earlier the feedback, the sooner the video encoder can take measures to prevent error propagation, and a better quality will be experienced in the decoded video at Bob. In this paper, we propose a novel scheme comprising early packet loss detection and notification at the local wireless link and the use of feedback-based video coding methods. The scheme can overcome the obstacles presented by encryption. We also impose a constraint that standard and commonly-used protocols be the foundation of the scheme. This paper is organized as follows. Section II gives background on packet loss over 802.11. Our proposed approach is described in Section III. Results are provided in Section IV, and finally, we conclude in Section V. II. BACKGROUND IEEE 802.11 links suffer from transmission errors mainly for two reasons. One is interference and fading from the ever- changing wireless channel conditions. There have been many studies on rate adaptation algorithms to estimate the channel condition and account for the change (e.g., [18], [19], [20], [21], [22]), but transmission errors are unavoidable or by design as part of the rate-error tradeoff in selecting MCS (modulation and coding scheme). Another source of transmission error in 802.11 is collision. 802.11 networks use a carrier sense multiple access / collision avoidance (CSMA/CA) mechanism to allow multiple stations to share the same wireless medium without central coordination. Because more than one 802.11 stations can begin transmitting in the same time slot, collisions can happen, which are likely to cause transmission errors. The probability of collision can be significant or high when the number of stations is large (see, for example, [16], [17]). The 802.11 standard defines its own ACK frame within the Media Access Control (MAC) sub-layer. The receiving station sends an ACK control frame immediately after successfully receiving a frame. There is no explicit NACK frame in 802.11 when a transmission fails. On the transmitting station side, if no ACK is received, the 802.11 MAC performs retransmission until an ACK is received or some maximum number of transmission attempts is reached. A frame is deemed to have been lost when no ACK has been received by the transmitting station after the maximum number of transmission attempts. Retransmission is the mechanism employed by 802.11 to deal with transmission errors in each transmission attempt. However, repeated transmission errors can potentially lead to loss of the packet. In 802.11, there will be no indication of transmission failure from the 802.11 MAC sub-layer to the upper sub-layer, the Logical Link Control (LLC). When a frame has failed transmission, the MAC simply drops it and stops trying. With a transport protocol such as UDP (User Datagram Protocol), typically employed in conjunction with RTP, there is no additional retransmission at an upper protocol layer. III. PROPOSED APPROACH In this section, we first describe our early packet loss detection approach followed by our feedback-based video coding methods. A. Early packet loss detection Standard-based communication systems usually employ a stack of protocol layers. Without loss of generality, we consider the Internet protocol suite, commonly known as TCP/IP, which consists of application, transport, network, data link, and physical layers. 802.11 fits into the physical and lower data link sub-layer, and the packet loss feedback traverses up from the 802.11 MAC to the video encoder, as illustrated in Figure 2. We consider two representative scenarios in the choice of the application-layer protocol: 1) Scenario 1: the video encoder generates Real-time Transport Protocol (RTP [27]) packets directly, and Secure Real-time Transport Protocol (SRTP [28]) profile is used for RTP delivery, and 2) Scenario 2: the (H.264-compliant) video encoder generates Network Abstraction Layer (NAL) packets, and Transport Layer Security (TLS [29]) is used at the application layer for security. Physical Data Link Video Encoder 802.11 PHY 802.11 MAC 802.2 LLC Packet Loss Notification Network Transport IP UDP RTP or TLS Transmission Application Figure 2. Video encoder and 802.11 in the Internet protocol stack showing the flow of packet loss notification. The key difference between these two scenarios lies in encryption: in Scenario 1, the RTP sequence number is not encrypted and is available to the 802.11 MAC sub-layer through deep packet inspection for identifying video packets. In Scenario 2, however, TLS encrypts the entire payload, and no sequence number is available to the 802.11 MAC for identification of video packets. Packet loss detection is performed in the 802.11 MAC to determine if a packet has failed all transmission attempts to the receiver. Alternatively, for video data, one can define transmission failure as when a MAC protocol data unit (MPDU) has failed transmission after certain duration of time. The duration limit should be set based on the type of application (video conferencing, video calling, etc). Upon the detection of a packet loss, it is necessary to identify which particular packet from a video stream has failed. If there are multiple applications or multiple video streams using 802.11 concurrently, we only wish to identify video packets for a specific stream. A video stream can be identified by the IP 5-tuple consisting of source and destination IP addresses, port numbers, and protocol type. Scenario 1 is the simpler case in which a video packet can be uniquely identified by its RTP sequence number SNRTP, which can be determined by the 802.11 MAC through deep packet inspection. In Scenario 2, TLS encrypts the entire payload, and the MAC sub-layer cannot identify any sequence number in the video packet directly. The 802.11 MAC can only see the encrypted data. However, the TLS protocol, which performs encryption, can establish the mapping between the NAL sequence number SNNAL in a video packet and the encrypted data. The proposed approach here is to use part of the encrypted data as “signature,” denoted as IDTLS, and perform a table-lookup to find the corresponding SNNAL from an IDTLS. Encrypted data appear to be random, and one can choose a longer pattern to increase the probability that the signature will be unique for a given number of video packets. Consider M random patterns containing N bits each. There are 2 N !/(2 N– M)! ways the M patterns can be selected from 2 N possible patterns such that they are all unique, and the total number of choices of M patterns is 2 NM . Therefore the probability that all of the M patterns are unique is 2 N !/(2 NM (2 N–M)!). For example, if the video encoder generates 30 packets per second, we want the signature pattern IDTLS to be unique among the M=90 consecutive packets over any 3-second period so each video packet can be uniquely identified. If we choose a signature length N=32 bits (4 bytes), the probability that any two patterns out of 90 would match is less than one in a million (9.32×10 –7 ). In the 802.11 MAC, data arrive from the LLC sub-layer as MAC service data units (MSDUs) whereas packet loss happens at the MAC/PHY layers, identified by the MAC as lost MPDUs. Because of aggregation and fragmentation allowed by 802.11, the mapping between MSDU and MPDU is not necessarily 1-to-1. When an MPDU fails transmission, more than one MSDU or IP packet can be affected. An MPDU is identified by its Sequence Control (SC) SCMPDU while an MSDU is identified by its sequence number SNMSDU. For identifying video packets, we need to map the SCMPDU of a failed MPDU to SNRTP (Scenario 1) or SNNAL (Scenario 2). In the proposed approach, when a transmission failure occurs, the mapping SCMPDU→SNMSDU is established first by looking up a table established during the aggregation and fragmentation processes in the 802.11 MAC. Each entry in the table is added when a new MSDU is aggregated and/or fragmented and is deleted once each MSDU is deemed successfully transmitted or lost. Then the mapping SNMSDU→SNRTP (Scenario 1) or SNMSDU→IDTLS (Scenario 2) is established. Both these steps are not encumbered by 802.11- level encryption. For Scenario 1, the mapping of SCMPDU→SNMSDU→SNRTP provides sufficient information for notifying the video encoder of a packet loss. Table I shows how the packet loss feedback tasks may be distributed. For Scenario 2, an additional mapping IDTLS→SNNAL is performed in the TLS layer to accomplish SCMPDU→SNMSDU →IDTLS→SNNAL mapping. Table II illustrates a possible distribution of these tasks. Note these mappings are possibly one-to-many. The methods described here are general and can be applied to security protocols other than SRTP or TLS. Notification of packet loss is a feedback message that traverses several protocol layers. It can generally be accomplished in one of three ways. When all the protocol layers are implemented in the same physical device, notification can be accomplished using an application programming interface (API), software mailboxes, sockets, or other forms of inter-process communications such as shared memory or operating system-level signals. When the video encoder and the 802.11 MAC are not in the same physical device or are provided by different vendors, the notification message may pass through some standard protocol interfaces, such as IP. An additional standard or proprietary protocol may be needed so that a notification is understood by the receiver. Alternatively the 802.11 MAC can spoof a standard packet that appears to have originated from the receiver side (e.g., RTCP Receiver Report). B. Feedback-based error resilient video coding We propose to use a video encoder capable of adapting its coding structure upon receiving packet loss notification to effectively stop error propagation. We briefly review two well-known feedback-based video coding techniques, followed by description of our two approaches. All the schemes described below are based on the H.264 video encoder. Figure 3. Feedback-based video coding methods: (a) intra refresh, (b) reference picture selection, and (c) reference set of picture selection. a) Intra refresh (IR) [11]: In this scheme, the encoder upon receiving packet loss notification encodes the next frame as an intra or instantaneous decoder refresh (IDR) frame, which effectively breaks prediction from all previous frames. This approach is illustrated in Figure 3(a). Although, it stops TABLE I TASK DISTRIBUTION FOR EARLY PACKET LOSS FEEDBACK (SCENARIO 2) Protocol Functions Mapping Table(s) Video Encoder - Encode video into NAL packets - Map SNNAL to video frames or slices - Perform prediction resetting SNNAL→frame/slice TLS - Map IDTLS to SNNAL IDTLS→SNNAL 802.11 MAC - Filter out other data streams - Detect packet loss SCMPDU - Map SCMPDU to SNMSDU - Map SNMSDU to IDTLS SCMPDU→SNMSDU SNMSDU→IDTLS TABLE II TASK DISTRIBUTION FOR EARLY PACKET LOSS FEEDBACK (SCENARIO 1) Protocol Functions Mapping Table(s) Video Encoder - Encode video into RTP packets - Map SNRTP to video frames or slices - Perform IDR or prediction resetting SNRTP→frame/slice 802.11 MAC - Filter out other data streams - Detect packet loss SCMPDU - Map SCMPDU to SNMSDU - Map SNMSDU to SNRTP SCMPDU→SNMSDU SNMSDU→ SNRTP error propagation, it requires more bits to encode; b) Reference picture selection (RPS) [11]: In this scheme, the encoder upon receiving packet loss notification predicts the next frame from an uncorrupted reference frame as illustrated in Figure 3(b). This technique is efficient and takes fewer bits than IR; c) Rate-distortion optimized reference picture selection (RDO-RPS): The encoder performs rate-distortion optimization to decide between encoding the next frame as an intra/IDR frame or a Predicted (P) frame using an uncorrupted reference frame; d) Reference set of picture selection (RSPS): This scheme is a generalization of our RDO-RPS. In this scheme, in addition to RDO-RPS, the next frame could be encoded using multiple uncorrupted reference frames as shown in Figure 3(c), thereby further reducing bits required for encoding. Figure 4. Rate-distortion plots comparing early packet loss feedback vs. RTCP feedback. Both methods use RDO-RPS feedback-based video coding. Results are for News.yuv at (a) PER=0.1%, (b) PER=0.7%, and (c) PER=1.4%; and BQMall.yuv at (d) PER=0.1%, (e) PER=0.7%, and (f) PER=1.4%. IV. EXPERIMENTAL SETUP AND RESULTS We use the x264 encoder [30] modified to perform RDO- RPS. The IPPP coding structure is used during encoding, since it is a popular choice in video conferencing and video streaming applications. The JM video decoder [31] is used with frame-copy error concealment. We test for quantization parameter (QP) = {26, 28, 30, 32, 34} on test sequences „News‟ (352×288, 30 fps) and „BQMall (832×480, 30 fps, 300 frames) [31]. The News video sequence was looped back repeatedly to generate 2236 frames. We test for packet error rates (PER) at 0.1%, 0.7%, and 1.4%, and use 60ms early notification delay. For BQMall, we encode each frame as 8 slices (packets) while for News we use a frame as a packet. Packet error patterns were obtained by adjusting the number of stations attached to an AP ([16], [17]) to achieve the target PER for a given timeout limit. We compare our scheme to RTCP feedback having a feedback delay of one second. Figure 4 illustrates rate-distortion (RD) plots for the two sequences at various PERs. For News.yuv, which has relatively low motion, we find that at very low PER = 0.1%, the RD performance of both the schemes is very similar. At higher error rates, our early packet loss detection yields up to 0.5–1dB improvement in PSNR. The BQMall.yuv is a relatively high-motion sequence containing camera pan and moving people. For this video, our scheme yields higher RD performance over RTCP feedback for all three PERs, and yields maximum PSNR gain of 0.5–6dB. This video yields lower performance for RTCP feedback because its high motion results in great propagation errors during feedback delay. Figure 5 shows PSNR-per-frame for the BQMall sequence at QP=26 and PER=1.4%, and it illustrates that our scheme quickly recovers from error propagation compared to conventional RTCP feedback. Figure 5. PSNR-per-frame for BQMall at QP=26 and PER=1.4% when comparing the proposed early packet loss feedback and RTCP feedback schemes. Both schemes use RDO-RPS for feedback-based video coding. V. CONCLUSIONS This paper shows that the combination of early packet loss feedback from the local 802.11 WLAN link and the associated video encoding techniques can effectively prevent prolonged error propagation in the event of packet loss during transmission. The proposed mechanism is based on current Internet protocols and can overcome challenges presented by packet aggregation, fragmentation, and encryption. Experimental results show significant improvements in video quality over conventional RTCP roundtrip feedback. REFERENCES [1] Cisco, “Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2012–2017,” http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns7 05/ns827/white_paper_c11-520862.html, accessed 3/7/2013. [2] IEEE 802.11-2012: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications: http://standards.ieee.org/getieee802/download/802.11-2012.pdf [3] Wi-Fi Multimedia: http://www.wi-fi.org/knowledge-center/articles/wi- fi-multimedia%E2%84%A2-wmm%C2%AE [4] M. van der Schaar, Y. Andreopoulos, and Z. Hu, “Optimized scalable video streaming over IEEE 802.11a/e HCCA wireless network under delay constraints,” IEEE Trans. Mobile Computing, vol.5, no.6, pp.755- 768, June 2006. [5] M.-H. Lu, P. Steenkiste, and T. Chen, “Robust wireless video streaming using hybrid spatial/temporal retransmission,” IEEE J. Selected Areas in Comm., Vol.28, No.3, April 2010. [6] I. Haratcherev, J. Taal, K. Langendoen, R. Lagendijk, and H. Sips, “Optimized video streaming over 802.11 by cross-layer signaling,” IEEE Comm. Mag., January 2006. [7] G. Venkatesan, A. Ashley, E. Reuss, and T. Cooklev, “IEEE 802 tutorial: video over 802.11,” IEEE 802, March 2007. [8] IETF RFC 3611: "RTP Control Protocol Extended Reports (RTCP XR)", T. Friedman, R. Caceres, and A. Clark, November 2003. http://www.ietf.org/rfc/rfc3611.txt [9] IETF RFC 4585: “Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF),” J. Ott, S. Wenger, N. Sato, C. Burmeister, and J. Rey, July 2006. http://www.ietf.org/rfc/rfc4585.txt [10] ITU-T Recommendation H.271,”Video back channel messages for conveyance of status information and requests from a video receiver to a video sender,” May 2006. [11] B. Girod and N. Färber, “Feedback-based error control for mobile video transmission,” in Proc. of IEEE, vol. 97, no. 10, Oct. 1999, pp.1707– 1723 [12] G. J. Conklin, G. S. Greenbaum, K. O. Lillevold, A. F. Lippman, and Y. A. Reznik, "Video Coding for Streaming Media Delivery on the Internet", IEEE Trans. Circuits Syst. Video Technology, 2001, vol. 11, no. 3, pp. 20-34. [13] MPEG, “ISO/IEC DIS 23009-1 Information Technology -- Dynamic adaptive streaming over HTTP (DASH) -- Part 1: Media presentation description and segment formats”, August 30, 2011. [14] Apple, HTTP Live Streaming Overview, 2011. Real Networks, Helix Universal Media Server, 2011. [15] G. Bianchi, “Performance analysis of the IEEE 802.11 distributed coordination function,” IEEE J. Selected Areas in Comm., Vol. 18, No.3, 2000. [16] H. Vu and T. Sakurai, “Collision Probability in Saturated IEEE 802.11 Networks,” Australian Telecomm Networks & Appl. Conf. (ATNAC), Dec. 2006. [17] D. Xu, T. Sakurai, and H.L. Vu, “An Access Delay Model for IEEE 802.11e EDCA,” IEEE Trans. Mobile Computing, Vol.8, No.2, 2009. [18] A. Kamerman and L. Monteban, “WaveLAN II: a high-performance weless LAN for the unlicensed band,” Bell Labs Technical Journal, pp. 118-133, Summer 1997. [19] M. Lacage, M.H. Manshaei, and T. Turletti, “IEEE 802.11 rate adaptation: a practical approach,” ACM MSWiM, 2004, Venezia, Italy. [20] J. Bicket, “Bit-rate selection in wireless networks,” MIT Master’s Thesis, 2005. [21] Onoe: http://madwifi-project.org/browser/madwifi/branches/madwifi- 0.9.4/ath_rate/onoe/onoe.c [22] S.H.Y. Wong, H. Yang, S. Lu, and V. Bharghavan, “Robust rate adaptation for 802.11 wireless networks,” IEEE MobiCom, Sept. 23-26, 2006, Los Angeles, USA. [23] OpenMAX. http://www.khronos.org/openmax [24] IETF Internet-Draft, “Transport-layer consideration for explicit cross- layer indications,” 2007, http://tools.ietf.org/html/draft-sarolahti-tsvwg- crosslayer-01. [25] IEEE Standard 802.2, 1998 Edition (R2003), http://standards.ieee.org/getieee802/download/802.2-1998.pdf [26] IETF RFC 768: “User Datagram Protocol,” J. Postel, August 1980. http://www.ietf.org/rfc/rfc768.txt [27] IETF RFC 1889: “RTP: A Transport Protocol for Real-Time Applications,” H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, January 1996. http://www.ietf.org/rfc/rfc1889.txt [28] IETF RFC 3711: “The Secure Real-time Transport Protocol (SRTP),” M. Baugher, D. McGrew, M. Naslund, E. Carrara, and K. Norrman, March 2004. http://www.ietf.org/rfc/rfc3711.txt [29] IETF RFC 5246: The Transport Layer Security (TLS) Protocol Version 1.2, T. Dierks and E. Rescorla, August 2008. http://www.ietf.org/rfc/rfc5246.txt [30] x264 encoder, URL: http://www.videolan.org/developers/x264.html [31] JM decoder, URL: http://iphome.hhi.de/suehring/tml/ [32] Xiph.org, URL: http://media.xiph.org/video/derf/