Comixify : Transform Video Into Comics

Pęśko, Maciej; Svystun, Adam; Andruszkiewicz, Paweł; Rokita, Przemysław; Trzciński, Tomasz

doi:10.3233/FI-2019-1834

Artykuł - szczegóły

Tytuł artykułu

Comixify : Transform Video Into Comics

Autorzy

Pęśko Maciej , Svystun Adam , Andruszkiewicz Paweł , Rokita Przemysław , Trzciński Tomasz

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

DOI

10.3233/FI-2019-1834

Warianty tytułu

Języki publikacji

Abstrakty

In this paper, we propose a solution to transform a video into a comics. We approach this task using a neural style algorithm based on Generative Adversarial Networks (GANs). Several recent works in the field of Neural Style Transfer showed that producing an image in the style of another image is feasible. In this paper, we build up on these works and extend the existing set of style transfer use cases with a working application of video comixification. To that end, we train an end-to-end solution that transforms input video into a comics in two stages. In the first stage, we propose a state-of-the-art keyframes extraction algorithm that selects a subset of frames from the video to provide the most comprehensive video context and we filter those frames using image aesthetic estimation engine. In the second stage, the style of selected keyframes is transferred into a comics. To provide the most aesthetically compelling results, we selected the most state-of-the art style transfer solution and based on that implement our own ComixGAN framework. The final contribution of our work is a Web-based working application of video comixification available at http://comixify.ii.pw.edu.pl.

Słowa kluczowe

neural style transfer style transfer comics style comics computer vision neural network

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2019

Tom

Vol. 168, nr 2-4

Strony

311--333

Opis fizyczny

Bibliogr. 52 poz., fot., rys., tab.

Twórcy

autor

Pęśko Maciej

243678@pw.edu.pl

Faculty of Electronics and Information Technology, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland

autor

Svystun Adam

280814@pw.edu.pl

Faculty of Electronics and Information Technology, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland

autor

Andruszkiewicz Paweł

261406@pw.edu.pl

Faculty of Electronics and Information Technology, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland

autor

Rokita Przemysław

pro@pw.edu.pl

Faculty of Electronics and Information Technology, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland

autor

Trzciński Tomasz

t.trzcinski@pw.edu.pl

Faculty of Electronics and Information Technology, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland

Bibliografia

[1] Gygli M, Grabner H, Riemenschneider H, Gool LJV. Creating Summaries from User Videos. In: ECCV (7), volume 8695 of Lecture Notes in Computer Science. Springer, 2014 pp. 505-520. doi:10.1007/978-3-319-10584-0_33.
[2] Song Y, Vallmitjana J, Stent A, Jaimes A. TVSum: Summarizing web videos using titles. In: CVPR. IEEE Computer Society, 2015 pp. 5179-5187. doi:10.1109/CVPR.2015.7299154.
[3] Zhang K, Chao W, Sha F, Grauman K. Video Summarization with Long Short-Term Memory. In: ECCV (7), volume 9911 of Lecture Notes in Computer Science. Springer, 2016 pp. 766-782. doi:10.1007/978-3-319-46478-7_47.
[4] Mahasseni B, Lam M, Todorovic S. Unsupervised Video Summarization with Adversarial LSTM Networks. In: CVPR. IEEE Computer Society, 2017 pp. 2982-2991. doi:10.1109/CVPR.2017.318.
[5] Zhou K, Qiao Y, Xiang T. Deep Reinforcement Learning for Unsupervised Video Summarization With Diversity-Representativeness Reward. In: AAAI. AAAI Press, 2018 pp. 7582-7589. URL https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16395.
[6] Almeida VAF, Bestavros A, Crovella M, de Oliveira A. Characterizing Reference Locality in the WWW. In: PDIS. IEEE Computer Society, 1996 pp. 92-103. doi:10.1109/PDIS.1996.568672.
[7] Chesire M, Wolman A, Voelker GM, Levy HM. Measurement and Analysis of a Streaming Media Workload. In: USITS. USENIX, 2001 pp. 1-12. URL http://dl.acm.org/citation.cfm?id=1251440.1251441.
[8] Jiang L, Miao Y, Yang Y, Lan Z, Hauptmann AG. Viral Video Style: A Closer Look at Viral Videos on YouTube. In: ICMR. ACM, 2014 p. 193. doi:10.1145/2578726.2578754.
[9] Crane R, Sornette D. Robust dynamic classes revealed by measuring the response function of a social system. Proceedings of the National Academy of Sciences, 2008. 105(41):15649-15653. doi:10.1073/pnas.0803685105.
[10] Szabó G, Huberman BA. Predicting the popularity of online content. Commun. ACM, 2010. 53(8):80-88. doi:10.1145/1787234.1787254.
[11] Pinto H, Almeida JM, Gonçalves MA. Using early view patterns to predict the popularity of youtube videos. In: WSDM. ACM, 2013 pp. 365-374. doi:10.1145/2433396.2433443.
[12] Khosla A, Sarma AD, Hamid R. What makes an image popular? In: WWW. ACM, 2014 pp. 867-876. doi:10.1145/2566486.2567996.
[13] Trzcinski T, Andruszkiewicz P, Bochenski T, Rokita P. Recurrent Neural Networks for Online Video Popularity Prediction. In: ISMIS, volume 10352 of Lecture Notes in Computer Science. Springer, 2017 pp. 146-153. doi:10.1007/978-3-319-60438-1_15.
[14] Murray N, Marchesotti L, Perronnin F. AVA: A large-scale database for aesthetic visual analysis. In: CVPR. IEEE Computer Society, 2012 pp. 2408-2415. doi:10.1109/CVPR.2012.6247954.
[15] Ponomarenko NN, Ieremeiev O, Lukin VV, Egiazarian KO, Jin L, Astola J, Vozel B, Chehdi K, Carli M, Battisti F, Kuo CJ. Color image database TID2013: Peculiarities and preliminary results. In: EUVIP. IEEE, 2013 pp. 106-111. URL https://ieeexplore.ieee.org/document/6623960.
[16] Lu X, Lin Z, Shen X, Mech R, Wang JZ. Deep Multi-patch Aggregation Network for Image Style, Aesthetics, and Quality Estimation. In: ICCV. IEEE Computer Society, 2015 pp. 990-998. doi:10.1109/ICCV.2015.119.
[17] Kao Y, Wang C, Huang K. Visual aesthetic quality assessment with a regression model. In: ICIP. IEEE, 2015 pp. 1583-1587. doi:10.1109/ICIP.2015.7351067.
[18] Kim J, Zeng H, Ghadiyaram D, Lee S, Zhang L, Bovik AC. Deep Convolutional Neural Models for Picture-Quality Prediction: Challenges and Solutions to Data-Driven Image Quality Assessment. IEEE Signal Process. Mag., 2017. 34(6):130-141. doi:10.1109/MSP.2017.2736018.
[19] Talebi H, Milanfar P. NIMA: Neural Image Assessment. IEEE Trans. Image Processing, 2018. 27(8):3998-4011. doi:10.1109/TIP.2018.2831899.
[20] Hou L, Yu C, Samaras D. Squared Earth Mover’s Distance-based Loss for Training Deep Neural Networks. CoRR, 2016. abs/1611.05916. URL http://arxiv.org/abs/1611.05916.
[21] Gatys LA, Ecker AS, Bethge M. Image Style Transfer Using Convolutional Neural Networks. In: CVPR. IEEE Computer Society, 2016 pp. 2414-2423. doi:10.1109/CVPR.2016.265.
[22] Li Y, Fang C, Yang J, Wang Z, Lu X, Yang M. Diversified Texture Synthesis with Feed-Forward Networks. In: CVPR. IEEE Computer Society, 2017 pp. 266-274. doi:10.1109/CVPR.2017.36.
[23] Li Y, Wang N, Liu J, Hou X. Demystifying Neural Style Transfer. 2017. pp. 2230-2236. doi:10.24963/ijcai.2017/310.
[24] Johnson J, Alahi A, Fei-Fei L. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In: ECCV (2), volume 9906 of Lecture Notes in Computer Science. Springer, 2016 pp. 694-711. doi:10.1007/978-3-319-46475-6_43.
[25] Ulyanov D, Lebedev V, Vedaldi A, Lempitsky VS. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images. In: ICML, volume 48 of JMLR Workshop and Conference Proceedings. JMLR.org, 2016 pp. 1349-1357. URL http://dl.acm.org/citation.cfm?id=3045390.3045533.
[26] Ulyanov D, Vedaldi A, Lempitsky VS. Instance Normalization: The Missing Ingredient for Fast Stylization. CoRR, 2016. abs/1607.08022. URL http://arxiv.org/abs/1607.08022.
[27] Yeh M, Tang S. Improved Style Transfer by Respecting Inter-layer Correlations. CoRR, 2018. abs/1801.01933. URL http://arxiv.org/abs/1801.01933.
[28] Wang X, Oxholm G, Zhang D, Wang Y. Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer. In: CVPR. IEEE Computer Society, 2017 pp. 7178-7186. doi:10.1109/CVPR.2017.759.
[29] Wilmot P, Risser E, Barnes C. Stable and Controllable Neural Texture Synthesis and Style Transfer Using Histogram Losses. CoRR, 2017. abs/1701.08893. URL http://arxiv.org/abs/1701.08893.
[30] Dumoulin V, Shlens J, Kudlur M. A Learned Representation For Artistic Style. 2017 URL https://openreview.net/forum?id=BJO-BuT1g.
[31] Chen TQ, Schmidt M. Fast Patch-based Style Transfer of Arbitrary Style. CoRR, 2016. abs/1612.04337. URL http://arxiv.org/abs/1612.04337.
[32] Huang X, Belongie SJ. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. 2017. pp. 1510-1519. doi:10.1109/ICCV.2017.167.
[33] Desai S. End-to-End Learning of One Objective Function to Represent Multiple Styles for Neural Style Transfer. Technical report. URL http://cs231n.stanford.edu/reports/2017/pdfs/407.pdf.
[34] Ghiasi G, Lee H, Kudlur M, Dumoulin V, Shlens J. Exploring the structure of a real-time, arbitrary neural artistic stylization network. CoRR, 2017. abs/1705.06830. URL http://arxiv.org/abs/1705.06830.
[35] Shen F, Yan S, Zeng G. Meta Networks for Neural Style Transfer. CoRR, 2017. abs/1709.04111. URL http://arxiv.org/abs/1709.04111.
[36] Zhao H, Rosin PL, Lai Y. Automatic Semantic Style Transfer using Deep Convolutional Neural Networks and Soft Masks. CoRR, 2017. abs/1708.09641. URL http://arxiv.org/abs/1708.09641.
[37] Li Y, Fang C, Yang J, Wang Z, Lu X, Yang M. Universal Style Transfer via Feature Transforms. In: NIPS. 2017 pp. 385-395. URL http://papers.nips.cc/paper/6642-universal-style-transfer-via-feature-transforms.pdf.
[38] Li Y, Liu M, Li X, Yang M, Kautz J. A Closed-Form Solution to Photorealistic Image Stylization. In: ECCV (3), volume 11207 of Lecture Notes in Computer Science. Springer, 2018 pp. 468-483. doi:10.1007/978-3-030-01219-9_28.
[39] Luan F, Paris S, Shechtman E, Bala K. Deep Photo Style Transfer. In: CVPR. IEEE Computer Society, 2017 pp. 6997-7005. doi:10.1109/CVPR.2017.740.
[40] Chen D, Liao J, Yuan L, Yu N, Hua G. Coherent Online Video Style Transfer. In: ICCV. IEEE Computer Society, 2017 pp. 1114-1123. doi:10.1109/ICCV.2017.126.
[41] Chen D, Yuan L, Liao J, Yu N, Hua G. StyleBank: An Explicit Representation for Neural Image Style Transfer. In: CVPR. IEEE Computer Society, 2017 pp. 2770-2779. doi:10.1109/CVPR.2017.296.
[42] Chen D, Yuan L, Liao J, Yu N, Hua G. Stereoscopic Neural Style Transfer. CoRR, 2018. abs/1802.10591. URL http://arxiv.org/abs/1802.10591.
[43] Chen Y, Lai YK, Liu YJ. CartoonGAN: Generative Adversarial Networks for Photo Cartoonization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018 doi:10.1109/CVPR. 2018.00986.
[44] Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: CVPR. IEEE Computer Society, 2015 pp. 1-9. doi:10.1109/CVPR.2015. 7298594.
[45] Deng J, Dong W, Socher R, Li L, Li K, Li F. ImageNet: A large-scale hierarchical image database. In: CVPR. IEEE Computer Society, 2009 pp. 248-255. doi:10.1109/CVPR.2009.5206848.
[46] Zeng K, Chen T, Niebles JC, Sun M. Title Generation for User Generated Videos. In: ECCV (2), volume 9906 of Lecture Notes in Computer Science. Springer, 2016 pp. 609-625. doi:10.1007/978-3-319-46475-6_38.
[47] Potapov D, Douze M, Harchaoui Z, Schmid C. Category-Specific Video Summarization. In: ECCV (6), volume 8694 of Lecture Notes in Computer Science. Springer, 2014 pp. 540-555. doi:10.1007/978-3-319-10599-4_35.
[48] Zoph B, Vasudevan V, Shlens J, Le QV. Learning Transferable Architectures for Scalable Image Recognition. CoRR, 2017. abs/1707.07012. doi:10.1109/CVPR.2018.00907.
[49] Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, 2014. abs/1409.1556. URL http://arxiv.org/abs/1409.1556.
[50] Pesko M, Trzcinski T. Neural Comic Style Transfer: Case Study. CoRR, 2018. abs/1809.01726. URL http://arxiv.org/abs/1809.01726.
[51] Lin T, Maire M, Belongie SJ, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: Common Objects in Context. In: ECCV (5), volume 8693 of Lecture Notes in Computer Science. Springer, 2014 pp. 740-755. doi:10.1007/978-3-319-10602-1_48.
[52] Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y. Generative Adversarial Nets. 2014. pp. 2672-2680. URL http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf.

Uwagi

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-c10222e2-b3a3-4b08-a9df-9dd40184102c