High quality facial attribute editing in videos is a challenging problem as it requires the modifications to be realistic and consistent throughout the video frames. Previous works address the problem with auto-encoder architectures and rely on adversarial training to ensure the attribute editing and the temporal consistency of the results. However, many algorithms are limited to a certain task and exhibit noticeable artifacts on high resolution images. To tackle these limitations, we propose to edit facial attributes on real images via the latent space of high quality generative networks. We further introduce a simple pipeline to generalize the face editing to video frames. Our model achieves a disentangled and controllable attribute editing on real images and videos. We conduct extensive experiments on image and video datasets and show that our model outperforms other state-of-the-art methods. The presented pipeline can be potentially useful for real-world video applications.