The Vault

IEEE COMSOC MMTC E-Letter November
Research Paper / Nov 2013

IEEE COMSOC MMTC E-Letter http://www.comsoc.org/~mmc/ 42/50 Vol.8, No.6, November 2013 Sender-Side Adaptation for Video Telephony over Wireless Communication Systems Liangping Ma, Yong He, Gregory Sternberg, Yan Ye, and Yuriy Reznik InterDigital Communications, Inc. USA {liangping.ma, yong.he, gregory.sternberg, yan.ye, yuriy.reznik}@interdigital.com 1. Introduction Mobile video telephony is gaining significant traction due to the availability of highly efficient video compression technologies such as H.264/AVC [1] and HEVC [2] and the availability of high-capacity wireless access networks such as LTE/LTE-Advanced [3]. This is evidenced by the increasing popularity of video telephony applications developed for smart phones such as iPhones and Android phones. Compared to the traditional audio only communication, video telephony provides much richer content and better user experience. However, if the video sender and the video receiver do not coordinate well, mismatches may occur, resulting in poor user experience and/or inefficient use of the network resource. The mismatches may be in video orientation, video aspect ratio, or video resolution. The video orientation mismatch occurs when the orientation of the transmitted video does not align correctly with the orientation of the display at the receiver. For example, the transmitted video is vertical, whereas the video display at the receiver is horizontal. This mismatch could be resolved by manually rotating the receiver until it aligns with the sent video at the cost of degraded user experience. The other two mismatches cannot be resolved by rotating the receiver. The video aspect ratio occurs if the aspect ratio of the transmitted video is different from that of the display at the receiver, even if the video orientations match. For example, the transmitted video is generated from a smart phone iPhone 4S (960×640) with an aspect ratio 960:640=3:2, whereas the aspect ratio of the display at the receiver (Samsung Galaxy S III) is 1280:720=16:9. The video resolution mismatch occurs when the resolution of the transmitted video is different from that of the display at the receiver. For example, the transmitted video has a resolution of 1080P (1920×1080), whereas the display at the receiver has a resolution of 720P (1280×720). Desired solutions to these mismatch problems should be standard based, considering the heterogeneity of the mobile devices. The 3GPP multimedia telephony services for IMS (MTSI) is such an effort intended to resolve the video orientation mismatch without user intervention (i.e., manually rotating the receiver device). MTSI mandates that the sender signals the orientation of the image captured on the sender side to the receiver for appropriate rendering and projection on the screen [4]. The rendering and displaying could include cropping or rotating the video. However, the MTSI method, where the receiver adapts to the orientation of the sender, may not fully resolve the video orientation mismatch problem, as illustrated in Figure 1.With the knowledge of the video orientation of the transmitted video, the receiver can adapt to the captured video orientation by either (a) cropping and scaling up or (b) scaling down the received video to fit its own display. In Figure 1(a), portions of the image are lost, and in Figure 1(b), the video is down sampled and there are black bars around the displayed video. Both of them lead to sub-optimal user experience. Additionally, we note that in the examples shown in Figure 1, the MTSI approach is inefficient, because not all the video data delivered across the communication network are fully utilized by the receiver: either part of the video is thrown away or the whole video is down sampled. Figure 1 (c) and Figure 1 (d) show the inefficiency of the MTSI method for the case of aspect ratio mismatch and the case of resolution mismatch. Encode Send Receiver Communication Network Captured Video Sender Displayed Video Crop Scale Up Receive Decode Encode Send Receiver Communication Network Captured Video Sender Displayed VideoReceive Decode Scale Down (a) (b) Encode Send Receiver Communication Network Captured Video (16:9) Sender Displayed Video Crop Scale Up Receive Decode (c) Aspect ratio 4:3 Encode Send Receiver Communication Network Captured Video (1920x1080) Sender Displayed VideoDownsampleReceive Decode (d) (1280x720) Figure 1. The MTSI method (receiver-side adaptation) leads to undesired user experience and/or inefficient use of the network resource. IEEE COMSOC MMTC E-Letter http://www.comsoc.org/~mmc/ 43/50 Vol.8, No.6, November 2013 In this paper, we propose a sender-side adaptation method which can solve all of the three aforementioned mismatch problems, resulting in better user experience, more efficient use of the network resources, and improved network-wide system performance. 2. Sender-Side Adaptation The basic idea of our proposed method is to adapt the video processing and/or video capturing on the sender side to the display of the receiver. With the proposed method, every bit of video data delivered across the wireless communication system is fully utilized by the receiver. In our proposed method, the receiver informs the sender of its desired video display orientation, the aspect ratio, and/or the width and height of the video to be displayed. Note that, by providing the desired width and height, the receiver also provides the desired aspect ratio. After obtaining such information, the video sender can use various adaptation techniques and we consider two of them here. In the first technique, the video sender crops the captured video according to the display orientation, the aspect ratio, and/or the resolution of the receiver, and encodes and transmits the cropped video as illustrated in Figure 2. Such cropping has the benefit of potentially saving a significant amount of network resource, which usually is precious in a wireless communication system. As an example, consider the scenario in Figure 3 (a). Let the image length be pixels, and the width be . Then, instead of sending encoded bits corresponding to raw pixels per image, we only need to send encoded bits corresponding to (( ) ) raw pixels per image. Assuming the same video encoding efficiency, this represents a reduction of ( ) in the encoded bit rate. Take the 1080P (1920×1080) resolution as an example, the reduction α is 68.36%. Alternatively, we can maintain the same encoded bit rate (thereby keeping the same traffic load to the communication system) during the video encoding process of the cropped images, which can significantly improve the objective video quality, generally resulting in better user experience. In the second technique, the video sender adapts its video capturing to the display orientation, aspect ratio, or the resolution of the receiver. During video capturing, a subset of the image sensors is selected according to the orientation, aspect ratio, and/or resolution of the display of the receiver. This is illustrated in Figure 4. It is possible that video adaptively captured as such has the same resolution as the display at the receiver, since in practice the resolution of the image sensor array may be much higher than that of the video to be captured. For example, the Nokia Lumia 1020 smart phone features a sensor array of 41 Megapixels, much higher than the 1080P (1920×1080 = 2.07 Megapixels) resolution. crop encode, send crop (a) (b) Receiver Receiver encode, send Network Network crop encode, send (c) Receiver Network 16:9 4:3 4:3 crop encode, send (d) Receiver Network 1920x1080 1280x720 1280x720 Figure 2 Cropping on the sender side. To enable the aforementioned sender-side video adaptation techniques, the receiver can provide the sender with the following information: the height and width of the desired video pictures, and the up direction (as preferred by a user) in the video. The up direction is not necessarily opposite to the direction of gravity, e.g., when a phone is placed flat on a horizontal table. The up direction can be represented by an angle relative to the width of the display (denoted by ) as shown in Figure 5. After receiving the information, the video sender can find its own up direction, and then determine the picture that it needs to crop or capture. For example, the width is in the direction , and the height is in the direction (90 degrees ). It can also decide how many pixels in the width direction and the height direction according to the width and height specified by the receiver. The angle is generally quantized at an appropriate granularity and signaled to the sender. The signaling occurs only if the angle has changed significantly. Another benefit of the proposed method is that it can improve the network-wide system performance. For example, in the cropping technique, when a user IEEE COMSOC MMTC E-Letter http://www.comsoc.org/~mmc/ 44/50 Vol.8, No.6, November 2013 reduces its encoded bit rate, the network can release network resources from this user and assign it to other users that experience poor channel conditions. In doing so, the video quality of the other users is improved while the video quality of the first user remains the same, regardless of the antenna configuration. Receiver Image sensor selection in video capturing encode, send Network Image sensor selection in video capturing encode, send Network Receiver (b) (a) Figure 3 Adaptation in video capturing orientation A W id th H eight Up direction x y o Figure 4 The desired video orientation, aspect ratio, and resolution for the receiver In addition, if the network cannot provide enough resource for delivering the video at the desired resolution, the sender can generate a video of the same aspect ratio but at a lower resolution, and transmits the lower bit rate video. The receiver can then up sample the decoded video. This reduces packet losses in the network which result in error propagation – a cause of undesired user experience. 3. Conclusion In this short paper we have proposed a sender-side video adaptation method that can significantly improve the user experience and the efficiency in using network resource for video telephony over wireless communication systems. The proposed method is attractive due to its effectiveness and simplicity. References [1] ITU-T Recommendation H.264: Advanced video coding for generic audiovisual services, Nov. 2007. [2] ITU-T H.265, “High Efficiency Video Coding,” June, 2013. [3] 3GPP TS 36.300, V11.6.0, "Evolved Universal Terrestrial Radio Access Network; Overall Description," Release 11, 2013. [4] 3GPP TS 26.114 V12.1.0, “IP Multimedia Subsystem (IMS); Multimedia Telephony; Media handling and interaction (Release 12),” 2013. Liangping Ma (M’05-SM’13) currently is working on network resource allocation for video QoE optimization and on cognitive radios at InterDigital. He was the principal investigator of two US government funded research projects. He was with San Diego Research Center Inc. (2005-2007) and Argon ST Inc. (2007-2009). He received his B.S. degree in physics from Wuhan University, China, in 1998, and his Ph.D. in electrical engineering from University of Delaware, US, in 2004. He has authored/co-authored more than 30 journal and conference papers. Yong He is a member of technical staff in InterDigital Communications, Inc, San Diego, CA, USA. His early working experiences include various positions, including Principal Staff Engineer, at Motorola, San Diego, CA, USA, from 2001 to 2011, and Motorola Australia Research Center, from 1999 to 2001. He is currently active in video coding related standardization at MPEG, JCT-VC and 3GPP SA4 Working group. He received Ph.D. degree from Hong Kong University of Science and Technology, M.S. and B.S degrees from Southeast University, China. Gregory Sternberg received his MSEE degree from the University of Pennsylvania (1996) and BSEE from the Pennsylvania State University (1994). He joined InterDigital in 2000 where he has developed algorithms for various 3GPP cellular systems for both technology and product development projects. Currently he is a Principal Engineer at InterDigital where he is leading a project related to Video Optimization over Wireless Networks. He holds more than 20 issued patents with many other patents pending and has co-authored several conference papers. IEEE COMSOC MMTC E-Letter http://www.comsoc.org/~mmc/ 45/50 Vol.8, No.6, November 2013 Yan Ye (M’08-SM’13) received her Ph.D. from the Electrical and Computer Engineering Department at University of California, San Diego in 2002. She received her M.S. and B.S. degrees, both in Electrical Engineering, from the University of Science and Technology of China, in 1997 and 1994, respectively. She currently works at Innovation Labs at InterDigital Communications. Previously she has worked at Image Technology Research at Dolby Laboratories Inc and Multimedia R&D and Standards at Qualcomm Inc. She has been involved in the development of various video coding standards, including the HEVC standard and its scalable extensions, the Key Technology Area of ITU- T/VCEG, and the scalable extensions of H.264/AVC. Her research interests include video coding, processing and streaming. Yuriy A. Reznik (M’97-SM’07) is a Director of Engineering at InterDigital Communications, Inc., (San Diego, CA), where he leads R&D in multimedia coding and delivery over wireless networks. Previously, he worked at Qualcomm (2005-2011), RealNetworks (1998- 2005), and also stayed as Visiting Scholar at Stanford University (2008). He holds a Ph.D. degree in Computer Science from Kiev University. He has authored/co-authored over 90 conference and journal papers, and co-invented over 20 issued US patents.