The Vault

Intra-Stream Traffic Differentiation and Resource Allocation for Video Teleconferencing in LTE
Research Paper / Feb 2014

Intra-Stream Traffic Differentiation and Resource Allocation for Video Teleconferencing in LTE Systems A. Balasubramanian∗, L. Ma∗, A. Rapaport†, W. Liu†, G. Sternberg†, and A. Zeira∗ ∗ Interdigital Communications, Inc., San Diego, CA 92121, USA {anantharaman.balasubramanian, liangping.ma, ariela.zeira}@interdigital.com † Interdigital Communications, Inc., King of Prussia, PA 19406, USA {weimin.liu, avi.rapaport, gregory.sternberg}@interdigital.com Abstract—This paper considers a downlink LTE system where a basestation (eNodeB) is serving video traffic (generated possibly due to video teleconferencing applications) to many users in a cell. The video traffic is separated into multiple sub-streams (logical channels) based on the coding structure or priority of video packets. The objective is to maximize video quality by servicing appropriate video sub-streams and users while taking into account the resource constrained wireless channel. It is shown that one can obtain significant gains in video quality by determining the quality of service (QoS) parameters on a per sub-stream basis, rather than in ‘user-based’ approach where all the sub-streams are lumped into one single stream. Motivated by video teleconferencing applications, we provide simulation results for the case where video traffic is separated into logical channels based on a Hierarchical-P encoding structure. Furthermore, we demonstrate the gains in video quality that can be obtained by allocating resources across logical channels. With the exploding growth in mobile video traffic expected to happen in 4G systems, this study suggests potential benefits of a logical channel based approach as compared to conventional user-based schemes. Index Terms—Video aware scheduling, QoS, LTE, Hierarchical-P. I. INTRODUCTION There has been a rapid growth in mobile multimedia traf- fic recently due to various reasons such as introduction of smart phones like the iPhone, and iPad among others. These devices are endowed with advanced multimedia capabilities like video streaming, high resolution display and the ability to support interactive applications like video conferencing and video chatting. The increased availability of advanced mobile multimedia devices is matched by a plethora of video enabled applications that are available in the market such as Facetime which enables video display along with conventional voice calls. This explosion in multimedia traffic is likely to continue and Cisco [1] has predicted that mobile traffic will be dominated by video which will exceed 90% of the global consumer traffic in a few years [2]. Although 4G systems (such as LTE, LTE-A) can deliver higher data rates than their counterpart 3G/2G systems, the rate of video traffic explosion as predicted by Cisco [1], could easily outgrow the MMSP’13, Sept. 30 - Oct. 2, 2013, Pula (Sardinia), Italy. 978-1-4799-0125-8/13/$31.00 c©2013 IEEE. increased capacity offered by such systems. Moreover video teleconferencing over mobile networks are constrained due to the strict latency requirements imposed by such applications. It is typical to add redundancy into video bitstreams to combat packet losses in the network. The fact that the video packets are not equally important due to different levels of redundancy, presents an opportunity for the network to differentiate video packets intelligently to optimize video quality. We briefly review related work. In [3], authors propose scheduling policies for LTE systems, but do not consider video traffic. Liebl et al., [4] have proposed a scheduling scheme where, by knowing the future channel behavior, transmission of video packets to users with favorable channel conditions is performed until a deadline approaches. The problem with this approach is that, it may not always be possible to precisely know the future channel state (for example, if the users have high mobility). A content aware utility function is maximized as the scheduling performance metric in [5] for streaming based applications, while this paper focuses on video tele- conferencing applications which are real-time based. Video delivery over LTE systems have been considered in [6] and [7] wherein video is considered as a single stream. However, the proposed work in this paper address video traffic differ- entiation whereby video traffic is modeled as multiple video sub-streams based on the characteristics of video, such as the video coding structure. In [8], sub-stream based scheduling is performed, however the authors have considered a Multimedia Broadcast and Multicast (MBMS) scenario in LTE, whereas the proposed scheme in this paper is for unicast scenarios. Our main contribution is to propose a scheme by which video traffic is separated into logical channels (sub-streams) based on video characteristics such as the video coding structure, more specifically based on the impact of packet loss on the video quality. Then, a logical channel based resource allocation policy (scheduling policy) is designed that considers a sub-stream in the order of priority, and services users taking into account multiuser diversity, fairness and QoS metrics. It is important to emphasize that the fairness and QoS metrics are considered on a sub-stream basis rather than by lumping all the sub-streams into one. Furthermore, wireless resources are allocated to lower priority sub-streams only after allocating MMSP2013076 just enough resources for higher priority sub-streams. The remainder of this paper is organized as follows. Section II contains the video and wireless system model. In Section III we propose a method for separating a video flow into multiple sub-streams and a logical channel based (sub-stream based) resource allocation policy. Section IV discusses the simulation results. Finally, Section V concludes the paper. II. SYSTEM MODEL Motivated by real time video applications, we consider an end-to-end system such as video tele-conferencing (for example, supported by the RTP/UDP protocol). We assume that the losses between the packet gateway and eNodeB are negligible (which is typically the case). Therefore, the focus of this paper is on the link between eNodeB and UE air interface. A. Video Coding Structure It is assumed that video is encoded according to a hierarchical-P structure [9]. Fig.1 shows three layers of hi- erarchy (numbered 1 through 3) where layer-2 frames are predicted from layer-1, while layer-3 are predicted from layer- 1 and layer-2. The impact of losing a layer-1 frame is quite large, because losing such a frame affects all subsequent frames across all layers. A layer-2 frame is less important than a layer-1 frame because the impact of losing a layer-2 frame affects only a layer-3 frame. Similarly, layer-3 frames are the least important. As such, it makes sense to prioritize video frames in accordance with the impact of their loss. This leads to layer-1 frames being higher priority than layer-2 frames, etc. This encoding structure provides flexibility in adapting to network conditions aside from its error resilience properties and its attractiveness for video conferencing applications. For example, layer-3 frames can be dropped thus halving the frame rate (and the bit rate) without significantly affecting the video quality, as layer-3 frames are not used for prediction. More details on the properties of the hierarchical-P structure can be found in [9]. I (1) P (3) P (3) P (2) P (1) P (3) P (3) P (2) P (1) P (3) P (3) P (2) P (1) Fig. 1: Hierarchical-P video encoding structure As described above, the hierarchical-P coding structure provides a way to model a single video stream into multiple sub-streams depending on which layer the video frame belongs to. For example, sub-stream-1 can be thought of containing the layer-1 video frames shown in Fig.1 etc. The benefit of this approach is that the sub-stream separation performed at the application layer can be preserved as they pass through the LTE protocol stack to the MAC layer where priority-based resource allocation could be performed. This adaptability makes the proposed scheme suitable for implementation in existing 3GPP systems. B. Wireless Model We consider an LTE cellular network with an eNodeB serv- ing N active wireless users with a SU-MIMO (4x2) antenna configuration. As shown in Fig. 2, there is a transmit buffer for every user at the eNodeB that contains L logical channels (1, 2 . . . L), that stores video packets of appropriate layers in the order of decreasing importance. A logical channel is an information stream dedicated to the transfer of a specific type of information over the radio interface [10]. Logical Channels are provided on top of the MAC layer. Each user can have multiple logical channels. However, traffic on these logical channels is mapped into the same single physical channel. The system bandwidth is divided into m physical resource blocks (PRBs) which are further divided into M subbands with (M − 1) subbands each having p = ⌈m M ⌉ consecutive PRBs, and one subband having (m− (M − 1)p) consecutive PRBs [11]. The eNodeB can allocate M subbands to N users in the system every transmission time interval (TTI) of 1ms. Each subband can be allocated to one or more logical channels to at most one user every TTI. We do not assume an infinitely backlogged transmit buffer model in this paper (that is, there are time instants during which there may not be any data to transmit). It is important to note that the presence of multiple logical channels in each user’s transmit buffer gives additional freedom to allocate subbands based on fairness, and the buffer occupancy of each logical channel as opposed to lumping all the logical channels together and considering it as one entity. For every subband, a user-dependent channel quality indicator eNodeB (Scheduler) Logical Channel-1 Logical Channel-L User-1 . . . . . . Logical Channel-1 Logical Channel-L User-N UE-1 UE-N . . . . Fig. 2: Wireless System Model (CQI) that is indicative of channel conditions is assumed to be known at the eNodeB in addition to user-dependent rank and precoding matrix. Furthermore, the eNodeB always chooses spatial multiplexing mode whenever the rank of the channel is greater than one. III. PROPOSED SCHEME As described in Sec. II-A, one can model a video stream as composed of multiple sub-streams based on the impact of MMSP2013077 packet loss on the video quality which in turn depends on the video coding structure. The encoded video bitstream is separated into multiple sub-streams by assigning distinct port numbers to each layer of the video coding hierachy (shown in Fig.1), thereby creating several sub-streams of a single video flow. For each sub-stream, which is identified by the IP 5-tuples (source address, destination address, source port number, destination port number, and the protocol number), an EPS bearer is created and associated with a QoS Class Identifier (QCI). These EPS bearers are then mapped one-to- one to radio bearers and subsequently to logical channels. At the MAC layer, the separation of the video traffic will be preserved. In this process, there is no need to change the existing 3GPP standard. Fig.2 shows an example where encoded video is separated into, L = 3 sub-streams (logical channels) as it enters each user’s transmit buffer. The eNodeB chooses packets from logical channels and delivers them to the appropriate users through an error prone wireless channel. 1 2 3 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Hierarchical P temporal layer (Logical Channel) Av er ag e PS NR L os s (dB ) QP=30 QP=27 QP=24 Fig. 3: Average PSNR loss due to a packet loss in different logical channels Similar to the approach in [3], we can define an objective function accounting for the importance of packets belonging to different logical channels, and formulate the problem of op- timizing the video quality as a integer programming problem. Our formulation will include the one in [3] as a special case. As the problem in [3] is NP-hard, so would be our formulation. Therefore, instead of trying to find an approximation algorithm for the problem of optimizing a pre-defined objective function with various constraints, we design an algorithm based on the characteristics of video traffic and then show that the algorithm does reasonably well in achieving the goal of enhancing the video quality. Fig.3 shows the average peak signal-to-noise ratio (PSNR) loss of a decoded video sequence due to a packet loss from each of the logical channels for several Quantization Parameters (QPs). It is clear that the PSNR loss due to packet loss from logical channel-1 is much higher than that of other logical channels. From the error propagation characteristic of different layers of video frames discussed in [9], and the PSNR loss characteristics depicted in Fig.3, one of the possibilities would be to look at a strict priority based scheduling scheme where resources (subbands) are allocated so as to meet the requirements for logical channel-1, and use the remaining resources (if available) for lower priority logical channels. Another advantage in employing this approach is that it enables us to compute the fairness, and QoS metrics at the logical channel level which provides finer granularity, instead of having to compute these metrics coarsely by lumping all the logical channels together and treating it as one entity. We show that this simple scheme indeed gives us significant gains when compared with the state-of-the-art scheduling schemes. Let Rci,j represent the number of bits transmitted from the j th logical channel of user-i in subband c. Let R˜i,j(t) denote user-i’s exponentially weighted moving average throughput from logical channel-j achieved until TTI t, which is calcu- lated as: R˜i,j(t) = (1− α)R˜i,j(t− 1) + α ∑ c R˜ci,j(t− 1) (1) where R˜ci,j denotes the data successfully delivered in subband c from the j th logical channel of ith user’s buffer, and α denotes the averaging constant. The proposed scheme considers a logical channel-j (in the order of decreasing priority) for all the users in the system at a given TTI and allocates a subband c to user-k that satisfies the following: k = argmax i Pi,j Qi,j r c i (2) Here Pi,j represents the proportional fairness (PF) weight for logical channel-j of user-i, which is calculated as Pi,j = 1/R˜i,j , where R˜i,j is computed according to (1) (t omitted for simplicity), Qi,j represents the head of line (HOL) delay for logical channel-j of user-i and rci represents the amount of data that can be transmitted in the current TTI for user-i in subband c, which can be obtained from the user-dependent CQI reported for this subband. For ease of notation, we do not explicitly show the dependence of k on j, c in (2). Since we do not assume an infinitely backlogged transmit buffer in this paper, it is possible that for user-k, logical channel-j may not have any packets to transmit, while logical channel- (j +1) might have some, in which case subband c would not be allocated to user-k, but instead to the next best user that satisfies (2). The detailed algorithm is outlined in the next page. As can be seen, the intuition behind the logical channel based resource allocation algorithm is simple: We begin by considering the first logical channel (logical channel-1) of all users in the system and allocate only the required amount of resources. If any resources remain unused, then we consider the next logical channel (i.e., logical channel-2) of all the users, and this process is repeated until there are no resources left. Every time a logical channel-j (j = 1, 2, . . . L) is considered, the resources need to be shared among the N users in the system. Furthermore, it is important to note that although we assign one subband to a user for transmission of packets in logical channel-j (as in line:13 of the algorithm), it is possible that the allocated subband could be used for MMSP2013078 Algorithm Logical Channel Based Scheduling 1: Let S = {1, 2 . . .M} denote the subbands in the system. 2: Let Bi,j denote the amount of data waiting to be trans- mitted in logical channel-j of user-i from eNodeB. 3: ∀ i, j: Ei,j = 0 denotes the estimated amount of data serviced from logical channel-j of user-i. 4: for logical channel, j = 1, 2, . . . L do 5: Let U = {1, 2 . . .N} denote the users in the system. 6: for every subband c ∈ S do 7: select the best user, k ∈ U according to equation (2). 8: if Bk,j ≤ Ek,j then 9: // Do not consider this user for this subband 10: U ← U \ {k}. 11: rck = 0. Goto step (7). 12: else 13: Assign subband c to user k chosen in step (7). 14: S ← S \ {c} 15: end if 16: Update Ek,j ← Ek,j + rck 17: end for 18: end for carrying packets from logical channel-k, where k > j. This is because, once all packets are serviced from logical channel-j, and given the fact that a subband can be allocated to at most one user in a SU-MIMO system, a natural strategy would be to service packets from logical channels higher than j (provided the subband capacity is high enough to carry packets from logical channels higher than j), in order to make the best use of the assigned subband. IV. SIMULATION RESULTS AND DISCUSSION Simulations were performed on a system-level simulator for the LTE air-interface with abstractions of the application, transport, medium access control, and the physical layers built on MATLAB platform. A SU-MIMO, 4x2 system was considered with the following system parameters [11]: band- width:10MHz, number of physical resource blocks:50, number of subbands:7 with 38 wireless users in the cell randomly distributed and α = 0.0029 (refer (1)). The eNodeB services video traffic from logical channels which contains the appro- priate layer packets as explained in Sections II-A and sends it to the users. It should be noted that the ability of the eNodeB to allocate only the required amount of resources (subbbands) to service a particular logical channel of a user depends on estimating the rate admitted by the subband. This can be calculated by jointly considering the user-dependent CQI of the subband (which determines the modulation, coding and transmit block size) and the rank of the channel. For the case when multiple subbands are assigned to a logical channel of a particular user, the rate is estimated using the Exponential Effecitve SINR Mapping (EESM) method [11]. Each user decodes the frames that are received in error using frame-copy based error concealment (where frames that are received in error or lost are copied from the previous frame). The PSNR of the decoded frames was chosen to be the perfor- mance metric for evaluating different schemes. Furthermore it is assumed that a packet is lost if it is not transmitted within its delay constraint (in practical LTE systems, the delay limit for the eNodeB to UE link is normally kept below 50ms). We have compared our logical channel based scheduling algorithm with the resource allocation algorithm proposed for video delivery over LTE systems in [6], in addition to the well- known, state-of-art algorithms like proportional fair (PF), and Maximum Largest Weighted Delay First scheme (MLWDF). In particular, the resource allocation scheme in [6] is used in our multiple logical channel setting (which we henceforth call ‘LuoLC’ algorithm) that allocates a subband to user-k that satisfies: k = argmax i ( W ai + argmax j W bi,j − W c i ) (3) where W ai represents the normalized channel rate admitted by the subband for the ith user, W bi,j = (HOLDelay)i,j QosDelaymax j , denotes the HOL delay of the ith user in the jth logical channel (HOLDelayi,j) normalized by the maximum QoS delay that can be tolerated by the jth logical channel (QosDelaymaxj ), and W ci represents the normalized exponentially weighted moving average throughput for user-i, calculated in the same way as (1). We have also considered a variant of MLWDF algorithm which takes into account the logical channels (which we henceforth call MLWDFLC) wherein the HOL delay of a user is taken to be the maximum of the HOL delay of all its logical channels. This scheme ensures that users that have any of their logical channels backlogged (as compared to others) are given more priority. More specifically, in MLWDFLC, for a subband c, we choose user-k that satisfies: k = argmax i Pi ( argmax j Qi,j ) rci where Pi is the PF weight of user-i, calculated as Pi = 1/R˜i, where R˜i denotes the exponentially weighted moving average throughput obtained by lumping all the logical channels, and Qi,j , r c i are as described in Section III. 10 15 20 25 30 35 40 45 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Frame PSNR (dB) CD F PF MLWDF MLWDFLC LC LuoLC Fig. 4: Cumulative distribution of the decoded frame PSNR for Soccer sequence. MMSP2013079 Fig.4 and Fig.5 depict the cumulative distribution of the decoded PSNR of all frames of all the users in the system for various schemes for the soccer (delay constraint of 50ms) and foreman sequences (delay constraint of 30ms) respectively. It is clear that the logical channel approach gives significant and consistent gains over the other algorithms. To see why 10 15 20 25 30 35 40 45 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Frame PSNR (dB) CD F PF MLWDF MLWDFLC LC LuoLC Fig. 5: Cumulative distribution of the decoded frame PSNR for Foreman sequence Foreman Soccer 0 2 4 6 8 10 12 14 16 18 20 Pa ck et L os s R at e (% ) MLWDFLC LuoLC LC Fig. 6: Logical channel-1 rate of packet loss due to congestion this should be the case, Fig.6 shows the packet loss rate due to congestion (that is, the percentage of packets dropped at the transmitter due to violation of the delay constraint) in logical channel-1 for different sequences. It is clear that the LC approach has far less packet loss than the LuoLC and MLWDFLC (congestion losses for individual logical channels are not applicable for PF and MLWDF as all the logical channels are lumped together). However, it should be noted that the packet loss rate of other logical channels are higher for the LC scheme than LuoLC and MLWDFLC. This is due to the fact that LC approach uses the importance of the video packets in allocating resources to users (and to the logical channels) exhibiting this phenomenon which proves beneficial in improving the video quality. It is instructive to see that the gains in soccer sequence is higher than that of foreman. For example, in the case of soccer sequence there is a 12dB gain for the median user between LC and MLWDF, while there is none for the foreman sequence. This is because there is less dependency among the frames in soccer sequence (due to high motion) as compared to foreman, making the error concealment less effective for soccer than foreman. V. CONCLUSION This paper has considered real-time video delivery over LTE systems wherein traffic differentiation of a video stream is first performed by separating it into multiple sub-streams (logical channels) based on video coding structure, more specifically based on the impact of packet loss on the video quality. Resources are then allocated in which parameters of interest are considered on a logical channel basis to service users. Simulation results show that the combination of traffic differentiation and logical channel based resource allocation scheme provides significant gains compared to the state-of- the-art algorithms. The proposed scheme becomes very useful especially for applications such as video teleconferencing and can be easily adopted in existing 3GPP systems. With the rapid increase in video traffic expected to happen, this study demonstrates the potential benefits that can be obtained by the proposed scheme. REFERENCES [1] “Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2011–2016,” http://www.cisco.com/en/US/solutions/collateral/ ns341/ns525/ns537/ns705/ns827/white paper c11-520862.html, accessed: 11/01/2012. [2] O. Oyman and S. Singh, “Quality of Experience for HTTP Adaptive Streaming Services,” IEEE Communications Magazine, vol. 50, no. 4, pp. 20–27, 2012. [3] S. Lee, S. Choudhury, A. Khoshnevis, S. Xu, and S. Lu, “Downlink MIMO with frequency-domain packet scheduling for 3GPP LTE,” in IEEE INFOCOM, 2009, pp. 1269–1277. [4] G. Liebl, M. Kalman, and B. Girod, “Deadline-Aware Scheduling for Wireless Video Streaming,” in IEEE International Conference on Multimedia and Expo, ICME, July 2005. [5] P. Pahalawatta, R. Berry, T. Pappas, and A. Katsaggelos, “Content-Aware Resource Allocation and Packet Scheduling for Video Transmission over Wireless Networks,” IEEE Journal on Selected Areas in Communica- tions, vol. 25, no. 4, pp. 749–759, 2007. [6] H. Luo, S. Ci, D. Wu, J. Wu, and H. Tang, “Quality-driven cross-layer optimized video delivery over LTE,” Communications Magazine, IEEE, vol. 48, no. 2, pp. 102–109, 2010. [7] G. Piro, L. A. Grieco, G. Boggia, R. Fortuna, and P. Camarda, “Two- level downlink scheduling for real-time multimedia services in LTE networks,” Multimedia, IEEE Transactions on, vol. 13, no. 5, pp. 1052– 1065, 2011. [8] C. Lou and L. Qiu, “Qos-Aware Scheduling and Resource Allocation for Video Streams in e-MBMS towards LTE-A System,” in IEEE Vehicular Technology Conference, Sept. 2011, pp. 1 –5. [9] D. Hong, M. Horowitz, A. Eleftheriadis, and T. Wiegand, “H. 264 hierarchical p coding in the context of ultra-low delay, low complexity applications,” in Picture Coding Symposium (PCS), 2010. IEEE, 2010, pp. 146–149. [10] G. T. 21.905, “Third generation partnership project, technical spec- ification group services and system aspects; Vocabulary for 3GPP specifications (Release 10),” V10.3.0, (2011-03). [11] A. Ghosh, J. Zhang, R. Muhamed, and J. Andrews, Fundamentals of LTE. Prentice Hall, 2010. [12] “JM Software Codec, Ver 16.2,” http://iphome.hhi.de/suehring/tml, ac- cessed: 11/01/2012. [13] M. Andrews, K. Kumaran, K. Ramanan, A. Stolyar, P. Whiting, and R. Vijayakumar, “Providing quality of service over a shared wireless link,” IEEE Communications Magazine, vol. 39, no. 2, pp. 150–154, 2001. [14] D. Tse and P. Viswanath, Fundamentals of wireless communication. Cambridge university press, 2005. MMSP2013080