• 《工程索引》(EI)刊源期刊
  • 综合性科学技术类中文核心期刊
  • 中国科技论文统计源期刊
  • 中国科学引文数据库来源期刊

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

多模态学习方法综述

陈鹏 李擎 张德政 杨宇航 蔡铮 陆子怡

陈鹏, 李擎, 张德政, 杨宇航, 蔡铮, 陆子怡. 多模态学习方法综述[J]. 工程科学学报, 2020, 42(5): 557-569. doi: 10.13374/j.issn2095-9389.2019.03.21.003
引用本文: 陈鹏, 李擎, 张德政, 杨宇航, 蔡铮, 陆子怡. 多模态学习方法综述[J]. 工程科学学报, 2020, 42(5): 557-569. doi: 10.13374/j.issn2095-9389.2019.03.21.003
CHEN Peng, LI Qing, ZHANG De-zheng, YANG Yu-hang, CAI Zheng, LU Zi-yi. A survey of multimodal machine learning[J]. Chinese Journal of Engineering, 2020, 42(5): 557-569. doi: 10.13374/j.issn2095-9389.2019.03.21.003
Citation: CHEN Peng, LI Qing, ZHANG De-zheng, YANG Yu-hang, CAI Zheng, LU Zi-yi. A survey of multimodal machine learning[J]. Chinese Journal of Engineering, 2020, 42(5): 557-569. doi: 10.13374/j.issn2095-9389.2019.03.21.003

多模态学习方法综述

doi: 10.13374/j.issn2095-9389.2019.03.21.003
基金项目: 国家重点研发计划(云计算和大数据专项)资助项目(2017YFB1002304)
详细信息
    通讯作者:

    E-mail:liqing@ies.ustb.edu.cn

  • 中图分类号: TP18

A survey of multimodal machine learning

More Information
  • 摘要: 大数据是多源异构的。在信息技术飞速发展的今天,多模态数据已成为近来数据资源的主要形式。研究多模态学习方法,赋予计算机理解多源异构海量数据的能力具有重要价值。本文归纳了多模态的定义与多模态学习的基本任务,介绍了多模态学习的认知机理与发展过程。在此基础上,重点综述了多模态统计学习方法与深度学习方法。此外,本文系统归纳了近两年较为新颖的基于对抗学习的跨模态匹配与生成技术。本文总结了多模态学习的主要形式,并对未来可能的研究方向进行思考与展望。
  • 图  1  “下雪”场景的多模态数据(图像、音频与文本)

    Figure  1.  Multimodal data for a “snow” scene (images, sound and text)

    图  2  多核学习

    Figure  2.  Multi-kernel learning

    图  3  共享子空间学习

    Figure  3.  Common subspace learning

    图  4  协同训练

    Figure  4.  Co-training

  • [1] Rhianna K. Pedwell J A. Hardy S L, et al. Effective visual design and communication practices for research posters: Exemplars based on the theory and practice of multimedia learning and rhetoric. Biochem Mol Biol Educ, 2017, 45(3): 249 doi: 10.1002/bmb.21034
    [2] Welch K E. Electric Rhetoric: Classical Rhetoric, Oralism, and A New Literacy. Cambridge: MIT Press, 1999
    [3] Berlin James A. Contemporary composition: the major pedagogical theories. College English, 1982, 44(8): 765 doi: 10.2307/377329
    [4] O'Halloran K L. Interdependence, interaction and metaphor in multi-semiotic texts. Social Semiotics, 1999, 9(3): 317 doi: 10.1080/10350339909360442
    [5] O'Halloran K L. Classroom discourse in mathematics: a multi-semiotic analysis. Linguistics Educ, 1998, 10(3): 359 doi: 10.1016/S0898-5898(99)00013-3
    [6] Morency L P, Baltrusaitis T. Tutorial on multimodal machine learning [R/OL]. Language Technologies Institute (2016-6-26) [2019-03-05]. https://www.cs.cmu.edu/~morency/MMML-Tutorial-ACL2017.pdf
    [7] Plummer B A, Wang L W, Cervantes C M, et al. Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models // Proceedings of IEEE International Conference on Computer Vision (ICCV 2015). Santiago, 2015: 2641
    [8] von Glasersfeld E, Pisani P P. The multistore parser for hierarchical syntactic structures. Commun ACM, 1970, 13(2): 74 doi: 10.1145/362007.362026
    [9] Jackson P. Introduction to Expert Systems. 3rd Ed. Boston: Addison Wesley, 1998
    [10] Cortes C, Vapnik V. Support-vector networks. Machine Learning, 1995, 20(3): 273
    [11] Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco: Morgan Kaufmann Publishers, 1988
    [12] Jelinek F. Statistical Methods for Speech Recognition. Cambridge: MIT Press, 1997
    [13] McGurk H, MacDonald J. Hearing lips and seeing voices. Nature, 1976, 264(5588): 746 doi: 10.1038/264746a0
    [14] Petajan E D. Automatic Lipreading to Enhance Speech Recognition (Speech Reading) [Dissertation]. University of Illinois at Urbana-Champaign, 1984
    [15] Fels S S, Hinton G E. Glove-Talk: a neural network interface between a data-glove and a speech synthesizer. IEEE Trans Neural Networks, 1993, 4(1): 2 doi: 10.1109/72.182690
    [16] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting. J Machine Learning Res, 2014, 15: 1929
    [17] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks // Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, 2011: 315
    [18] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition // Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016). Las Vegas, 2016: 770
    [19] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks // Advances in Neural Information Processing Systems. Lake Tahoe, 2012: 1097
    [20] Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, Inception-ResNet and the impact of residual connections on learning // Proceedings of Thirty-First AAAI Conference on Artificial Intelligence. San Franciso, 2017: 4278
    [21] Deng J, Dong W, Socher R, et al. ImageNet: A large-scale hierarchical image database // Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009). Miami, 2009: 248
    [22] Lample G, Ballesteros M, Subramanian S, et al. Neural architectures for named entity recognition // Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, 2016: 260
    [23] Ngiam J, Khosla A, Kim M, et al. Multimodal deep learning // Proceedings of the 28th International Conference on Machine Learning. Bellevue, 2011: 689
    [24] Baltrusaitis T, Ahuja C, Morency L P. Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Machine Intelligence, 2019, 41(2): 423 doi: 10.1109/TPAMI.2018.2798607
    [25] Zhang L, Zhao Y, Zhu Z F, et al. Multi-view missing data completion. IEEE Trans Knowledge Data Eng, 2018, 30(7): 1296 doi: 10.1109/TKDE.2018.2791607
    [26] Wang L Q, Sun W C, Zhao Z C, et al. Modeling intra- and inter-pair correlation via heterogeneous high-order preserving for cross-modal retrieval. Signal Process, 2017, 131: 249 doi: 10.1016/j.sigpro.2016.08.012
    [27] Liu H P, Li F X, Xu X Y, et al. Multi-modal local receptive field extreme learning machine for object recognition. Neurocomputing, 2018, 277: 4 doi: 10.1016/j.neucom.2017.04.077
    [28] Fu K, Jin J Q, Cui R P, et al. Aligning where to see and what to tell: image captioning with region-based attention and scene-specific contexts. IEEE Trans Pattern Anal Machine Intelligence, 2017, 39(12): 2321 doi: 10.1109/TPAMI.2016.2642953
    [29] Breiman L. Random forests. Machine Learning, 2001, 45(1): 5 doi: 10.1023/A:1010933404324
    [30] Breiman L, Friedman J H, Olshen R A, et al. Classification and Regression Trees. Florida: Chapman and Hall/CRC, 1998
    [31] Breiman L. Statistical modeling: the two cultures. Statist Sci, 2001, 16(3): 199
    [32] Vapnik V N, Cervonenkis, A. J. Empirical Inference. Berlin: Springer, 2013
    [33] Schölkopf B, Smola A J, Bach F. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge: MIT Press, 2002
    [34] Mercer J. Functions of positive and negative type, and their connection with the theory of integral equations. Philos Trans R Soc London Ser A, 1909, 209(441-458): 415 doi: 10.1098/rsta.1909.0016
    [35] Aronszajn N. Theory of reproducing kernels. Trans Am Math Soc, 1950, 68(3): 337 doi: 10.1090/S0002-9947-1950-0051437-7
    [36] Steinwart I, Hush D, Scovel C. An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels. IEEE Trans Inf Theory, 2006, 52(10): 4635 doi: 10.1109/TIT.2006.881713
    [37] Lodhi H, Saunders C, Shawe-Taylor J, et al. Text classification using string kernels. J Machine Learning Res, 2002, 2(3): 419
    [38] Wu J X, Rehg J M. Beyond the euclidean distance: Creating effective visual codebooks using the histogram intersection kernel // 2009 IEEE 12th International Conference on Computer Vision. Kyoto, 2009: 630
    [39] Lanckriet G R, Deng M, Cristianini N, et al. Kernel-based data fusion and its application to protein function prediction in yeast. // Proceedings of Pacific Symposium on Biocomputing. Hawaii, 2004: 300
    [40] Lee W J, Verzakov S, Duin R P W. Kernel combination versus classifier combination // Proceedings of International Workshop on Multiple Classifier Systems, MCS 2007. Prague, 2007: 22
    [41] Gönen M, Alpaydin E. Localized multiple kernel learning // Proceedings of the 25th International Conference on Machine learning. Helsinki, 2008: 352
    [42] Jiang T J, Wang S Z, Wei R X. Support vector machine with composite kernels for time series prediction // Proceedings of International Symposium on Neural Networks. Nanjing, 2007: 350
    [43] Hotelling H. Relations between 2 sets of variants. Biometrika, 1935, 28(3-4): 312
    [44] Cooley W W, Lohnes P R. Multivariate Procedures for the Behavioral Sciences. New York: John Wiley & Sons, 1962
    [45] Akaho S. A kernel method for canonical correlation analysis // Proceedings of the International Meeting of the Psychometric Society (IMPS2001). Osaka, 2001: 1
    [46] Wang S, Lu J F, Gu X J, et al. Unsupervised discriminant canonical correlation analysis based on spectral clustering. Neurocomputing, 2016, 171: 425 doi: 10.1016/j.neucom.2015.06.043
    [47] Hu H F. Multiview gait recognition based on patch distribution features and uncorrelated multilinear sparse local discriminant canonical correlation analysis. IEEE Trans Circuits Syst Video Technol, 2014, 24(4): 617 doi: 10.1109/TCSVT.2013.2280098
    [48] Farquhar J D R, Hardoon D R, Meng H, et al. Two view learning: SVM-2K, theory and practice // Proceedings of the 18th International Conference on Neural Information Processing. Vancouver, 2005: 355.
    [49] Ozerov A, Fevotte C. Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans Audio Speech Language Process, 2010, 18(3): 550 doi: 10.1109/TASL.2009.2031510
    [50] Zhang J, Huan J. Inductive multi-task learning with multiple view data // Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Beijing, 2012: 543
    [51] Kong X N, Ng M K, Zhou Z H. Transductive multilabel learning via label set propagation. IEEE Trans Knowledge Data Eng, 2013, 25(3): 704 doi: 10.1109/TKDE.2011.141
    [52] Blum A, Mitchell T. Combining labeled and unlabeled data with co-training // Proceedings of the Eleventh Annual Conference on Computational Learning Theory. Madison, 1998: 92
    [53] Collins M. Unsupervised models for named entity classification. // Proceedings the 1999 of Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. College Park, 1999: 100
    [54] Brefeld U, Scheffer T. Co-EM support vector learning // Proceedings of the Twenty-first International Conference on Machine Learning. Banff, 2004: 16
    [55] Muslea I, Minton S, Knoblock C A. Active + semi-supervised learning = robust multi-view learning // Proceedings of the 19th International Conference on Machine Learning. Sydney, 2002: 435
    [56] Lécun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86(11): 2278 doi: 10.1109/5.726791
    [57] Mikolov T, Karafiát M, Burget L, et al. Recurrent neural network based language model // Eleventh Annual Conference of the International Speech Communication Association. Makuhari, 2010: 1045
    [58] Hinton G E. Deep belief networks[J/OL]. Scholarpedia (2009-04-11) [2019-03-05]. http://www.scholarpedia.org/article/Deep_belief_networks
    [59] Simonyan K, Zisserman A. Very Deep convolutional networks for large-scale image recognition. // Proceedings of International Conference on Learning Representations 2015. San Diego 2015: 1
    [60] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas 2016: 779
    [61] Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Machine Intelligence, 2017, 39(4): 640 doi: 10.1109/TPAMI.2016.2572683
    [62] Kim Y. Convolutional neural networks for sentence classification // Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. Doha, 2014: 1746
    [63] Shen Y L, He X D, Gao J F, et al. A latent semantic model with convolutional-pooling structure for information retrieval // Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. Shanghai, 2014: 101
    [64] Hu B T, Lu Z D, Li H, et al. Convolutional neural network architectures for matching natural language sentences // Advances in Neural Information Processing Systems. Montreal, 2014: 2042
    [65] Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model. J Machine Learning Res, 2003, 3(6): 1137
    [66] Collobert R, Weston J. A unified architecture for natural language processing: Deep neural networks with multitask learning // Proceedings of the 25th International Conference on Machine Learning. Helsinki, 2008: 160
    [67] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[J/OL]. arXiv (2013-09-07) [2019-03-05]. https://arxiv.org/pdf/1301.3781.pdf
    [68] Graves A, Schmidhuber J. Frame-wise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 2005, 18(5-6): 602 doi: 10.1016/j.neunet.2005.06.042
    [69] Liu P F, Qiu X P, Huang X J. Recurrent neural network for text classification with multi-task learning // Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. New York, 2016: 2873
    [70] Sundermeyer M, Alkhouli T, Wuebker J, et al. Translation modeling with bidirectional recurrent neural networks // Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, 2014: 14
    [71] Cho K, van Merrienboer B, Bahdanau D, et al. On the properties of neural machine translation: encoder-decoder approaches. // Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. Doha, 2014: 103
    [72] Wollmer M, Eyben F, Graves A, et al. Bidirectional LSTM networks for context-sensitive keyword detection in a cognitive virtual agent framework. Cognitive Comput, 2010, 2(3): 180 doi: 10.1007/s12559-010-9041-8
    [73] Zhou P, Shi W, Tian J, et al. Attention-based bidirectional long short-term memory networks for relation classification // Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, 2016: 207
    [74] Zhang J, Man K F. Time series prediction using RNN in multi-dimension embedding phase space // SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics. San Diego, 1998: 1868
    [75] Graves A, Mohamed A R, Hinton G. Speech recognition with deep recurrent neural networks // 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, 2013: 6645
    [76] Karpathy A, Joulin A, Fei-Fei L F. Deep fragment embeddings for bidirectional image sentence mapping // Advances in Neural Information Processing Systems. Montreal, 2014: 1889
    [77] Donahue J, Hendricks L A, Rohrbach M, et al. Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans Pattern Anal Machine Intelligence, 2014, 39(4): 677
    [78] Kiros R, Salakhutdinov R, Zemel R. Unifying visual-semantic embeddings with multimodal neural language models. // Deep Learning and Representation Learning Workshop: NIPS 2014. Montreal, 2014: 1
    [79] Mitchell M, Han X F, Dodge J, et al. Midge: Generating image descriptions from computer vision detections // Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Avignon, 2012: 747
    [80] Ma L, Lu Z D, Li H. Learning to answer questions from image using convolutional neural network // Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. Phoenix, 2016: 3567
    [81] Wan J, Wang D Y, Hoi S C H, et al. Deep learning for content-based image retrieval: a comprehensive study // Proceedings of the 22nd ACM International Conference on Multimedia. Orlando, 2014: 157
    [82] Wöllmer M, Metallinou A, Eyben F, et al. Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling // Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH2010). Makuhari, 2010: 2362
    [83] Su Y H, Fan K, Bach N, et al. Unsupervised multi-modal neural machine translation // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seattle, 2019: 10482
    [84] Wang X, Huang Q Y, Celikyilmaz A, et al. Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seattle, 2019: 6629
    [85] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets // Advances in Neural Information Processing Systems. Monteral, 2014: 2672
    [86] Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks[J/OL]. arXiv (2016-01-07) [2019-03-05]. https://arxiv.org/pdf/1511.06434.pdf
    [87] Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks // Proceedings of the 34th International Conference on Machine Learning. Sydney, 2017: 214
    [88] Mirza M, Osindero S. Conditional generative adversarial nets[J/OL]. arXiv (2014-11-06) [2019-03-05]. https://arxiv.org/pdf/1411.1784.pdf
    [89] Tzeng E, Hoffman J, Saenko K, et al. Adversarial discriminative domain adaptation // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, 2017: 7167
    [90] Liu M Y, Tuzel O. Coupled generative adversarial networks // Advances in Neural Information Processing Systems. Barcelona, 2016: 469
    [91] Pei Z Y, Cao Z J, Long M S, et al. Multi-adversarial domain adaptation // Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans, 2018: 3934
    [92] Cao Z J, Long M S, Wang J M, et al. Partial transfer learning with selective adversarial networks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt lake city, 2018: 2724
    [93] Xie S A, Zheng Z B, Chen L, et al. Learning semantic representations for unsupervised domain adaptation // Proceedings of the 35th International Conference on Machine Learning. Long Beach, 2018: 5423
    [94] Denton E L, Chintala S, Szlam A, et al. Deep generative image models using a laplacian pyramid of adversarial networks // Advances in Neural Information Processing Systems. Montreal, 2015: 1486
    [95] Zhang H, Goodfellow I, Metaxas D, et al. Self-attention generative adversarial networks [J/OL]. arXiv (2018-05-21)[2019-03-05]. https://arxiv.org/pdf/1805.08318.pdf
    [96] Rush A M, Chopra S, Weston J. A neural attention model for abstractive sentence summarization // Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, 2015: 379
    [97] Miyato T, Kataoka T, Koyama M, et al. Spectral normalization for generative adversarial networks[J/OL]. arXiv (2018-02-16) [2019-03-05]. https://arxiv.org/pdf/1802.05957.pdf
    [98] Brock A, Donahue J, Simonyan K. Large scale GAN training for high fidelity natural image synthesis[J/OL]. arXiv (2019-02-25) [2019-03-05]. https://arxiv.org/pdf/1809.11096.pdf
    [99] Isola P, Zhu J Y, Zhou T H, et al. Image-to-image translation with conditional adversarial networks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, 2017: 1125
    [100] Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks // Proceedings of the IEEE International Conference on Computer Vision. Venice, 2017: 2223
    [101] Choi Y, Choi M, Kim M, et al. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 8789
    [102] Huang X, Liu M Y, Belongie S, et al. Multimodal unsupervised image-to-image translation // Proceedings of the European Conference on Computer Vision (ECCV). Munich, 2018: 172
    [103] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation // Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, 2015: 234
    [104] Anderson P, He X D, Buehler C, et al. Bottom-up and top-down attention for image captioning and visual question answering // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 6077
    [105] Chen X P, Ma L, Jiang W H, et al. Regularizing RNNs for caption generation by reconstructing the past with the present // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 7995
    [106] Chen F H, Ji R R, Sun X S, et al. Groupcap: Group-based image captioning with structured relevance and diversity constraints // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 1345
    [107] Reed S, Akata Z, Yan X, et al. Generative adversarial text to image synthesis. // Proceedings of The 33rd International Conference on Machine Learning. New York, 2016: 1060
    [108] Zhang H, Xu T, Li H S, et al. Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks // Proceedings of the IEEE International Conference on Computer Vision. Venice, 2017: 5907
    [109] Zhang H, Xu T, Li H S, et al. StackGAN++: Realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Machine Intelligence, 2019, 41(8): 1947 doi: 10.1109/TPAMI.2018.2856256
    [110] Hong S, Yang D D, Choi J, et al. Inferring semantic layout for hierarchical text-to-image synthesis // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 7986
    [111] Xu T, Zhang P C, Huang Q Y, et al. AttnGAN: fine-grained text to image generation with attentional generative adversarial networks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 1316
  • 加载中
图(4)
计量
  • 文章访问数:  1763
  • HTML全文浏览量:  1301
  • PDF下载量:  332
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-03-21
  • 刊出日期:  2020-05-01

目录

    /

    返回文章
    返回