[1] |
Rhianna K. Pedwell J A. Hardy S L, et al. Effective visual design and communication practices for research posters: Exemplars based on the theory and practice of multimedia learning and rhetoric. Biochem Mol Biol Educ, 2017, 45(3): 249 doi: 10.1002/bmb.21034
|
[2] |
Welch K E. Electric Rhetoric: Classical Rhetoric, Oralism, and A New Literacy. Cambridge: MIT Press, 1999
|
[3] |
Berlin James A. Contemporary composition: the major pedagogical theories. College English, 1982, 44(8): 765 doi: 10.2307/377329
|
[4] |
O'Halloran K L. Interdependence, interaction and metaphor in multi-semiotic texts. Social Semiotics, 1999, 9(3): 317 doi: 10.1080/10350339909360442
|
[5] |
O'Halloran K L. Classroom discourse in mathematics: a multi-semiotic analysis. Linguistics Educ, 1998, 10(3): 359 doi: 10.1016/S0898-5898(99)00013-3
|
[6] |
Morency L P, Baltrusaitis T. Tutorial on multimodal machine learning [R/OL]. Language Technologies Institute (2016-6-26) [2019-03-05]. https://www.cs.cmu.edu/~morency/MMML-Tutorial-ACL2017.pdf
|
[7] |
Plummer B A, Wang L W, Cervantes C M, et al. Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models // Proceedings of IEEE International Conference on Computer Vision (ICCV 2015). Santiago, 2015: 2641
|
[8] |
von Glasersfeld E, Pisani P P. The multistore parser for hierarchical syntactic structures. Commun ACM, 1970, 13(2): 74 doi: 10.1145/362007.362026
|
[9] |
Jackson P. Introduction to Expert Systems. 3rd Ed. Boston: Addison Wesley, 1998
|
[10] |
Cortes C, Vapnik V. Support-vector networks. Machine Learning, 1995, 20(3): 273
|
[11] |
Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco: Morgan Kaufmann Publishers, 1988
|
[12] |
Jelinek F. Statistical Methods for Speech Recognition. Cambridge: MIT Press, 1997
|
[13] |
McGurk H, MacDonald J. Hearing lips and seeing voices. Nature, 1976, 264(5588): 746 doi: 10.1038/264746a0
|
[14] |
Petajan E D. Automatic Lipreading to Enhance Speech Recognition (Speech Reading) [Dissertation]. University of Illinois at Urbana-Champaign, 1984
|
[15] |
Fels S S, Hinton G E. Glove-Talk: a neural network interface between a data-glove and a speech synthesizer. IEEE Trans Neural Networks, 1993, 4(1): 2 doi: 10.1109/72.182690
|
[16] |
Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting. J Machine Learning Res, 2014, 15: 1929
|
[17] |
Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks // Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, 2011: 315
|
[18] |
He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition // Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016). Las Vegas, 2016: 770
|
[19] |
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks // Advances in Neural Information Processing Systems. Lake Tahoe, 2012: 1097
|
[20] |
Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, Inception-ResNet and the impact of residual connections on learning // Proceedings of Thirty-First AAAI Conference on Artificial Intelligence. San Franciso, 2017: 4278
|
[21] |
Deng J, Dong W, Socher R, et al. ImageNet: A large-scale hierarchical image database // Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009). Miami, 2009: 248
|
[22] |
Lample G, Ballesteros M, Subramanian S, et al. Neural architectures for named entity recognition // Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, 2016: 260
|
[23] |
Ngiam J, Khosla A, Kim M, et al. Multimodal deep learning // Proceedings of the 28th International Conference on Machine Learning. Bellevue, 2011: 689
|
[24] |
Baltrusaitis T, Ahuja C, Morency L P. Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Machine Intelligence, 2019, 41(2): 423 doi: 10.1109/TPAMI.2018.2798607
|
[25] |
Zhang L, Zhao Y, Zhu Z F, et al. Multi-view missing data completion. IEEE Trans Knowledge Data Eng, 2018, 30(7): 1296 doi: 10.1109/TKDE.2018.2791607
|
[26] |
Wang L Q, Sun W C, Zhao Z C, et al. Modeling intra- and inter-pair correlation via heterogeneous high-order preserving for cross-modal retrieval. Signal Process, 2017, 131: 249 doi: 10.1016/j.sigpro.2016.08.012
|
[27] |
Liu H P, Li F X, Xu X Y, et al. Multi-modal local receptive field extreme learning machine for object recognition. Neurocomputing, 2018, 277: 4 doi: 10.1016/j.neucom.2017.04.077
|
[28] |
Fu K, Jin J Q, Cui R P, et al. Aligning where to see and what to tell: image captioning with region-based attention and scene-specific contexts. IEEE Trans Pattern Anal Machine Intelligence, 2017, 39(12): 2321 doi: 10.1109/TPAMI.2016.2642953
|
[29] |
Breiman L. Random forests. Machine Learning, 2001, 45(1): 5 doi: 10.1023/A:1010933404324
|
[30] |
Breiman L, Friedman J H, Olshen R A, et al. Classification and Regression Trees. Florida: Chapman and Hall/CRC, 1998
|
[31] |
Breiman L. Statistical modeling: the two cultures. Statist Sci, 2001, 16(3): 199
|
[32] |
Vapnik V N, Cervonenkis, A. J. Empirical Inference. Berlin: Springer, 2013
|
[33] |
Schölkopf B, Smola A J, Bach F. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge: MIT Press, 2002
|
[34] |
Mercer J. Functions of positive and negative type, and their connection with the theory of integral equations. Philos Trans R Soc London Ser A, 1909, 209(441-458): 415 doi: 10.1098/rsta.1909.0016
|
[35] |
Aronszajn N. Theory of reproducing kernels. Trans Am Math Soc, 1950, 68(3): 337 doi: 10.1090/S0002-9947-1950-0051437-7
|
[36] |
Steinwart I, Hush D, Scovel C. An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels. IEEE Trans Inf Theory, 2006, 52(10): 4635 doi: 10.1109/TIT.2006.881713
|
[37] |
Lodhi H, Saunders C, Shawe-Taylor J, et al. Text classification using string kernels. J Machine Learning Res, 2002, 2(3): 419
|
[38] |
Wu J X, Rehg J M. Beyond the euclidean distance: Creating effective visual codebooks using the histogram intersection kernel // 2009 IEEE 12th International Conference on Computer Vision. Kyoto, 2009: 630
|
[39] |
Lanckriet G R, Deng M, Cristianini N, et al. Kernel-based data fusion and its application to protein function prediction in yeast. // Proceedings of Pacific Symposium on Biocomputing. Hawaii, 2004: 300
|
[40] |
Lee W J, Verzakov S, Duin R P W. Kernel combination versus classifier combination // Proceedings of International Workshop on Multiple Classifier Systems, MCS 2007. Prague, 2007: 22
|
[41] |
Gönen M, Alpaydin E. Localized multiple kernel learning // Proceedings of the 25th International Conference on Machine learning. Helsinki, 2008: 352
|
[42] |
Jiang T J, Wang S Z, Wei R X. Support vector machine with composite kernels for time series prediction // Proceedings of International Symposium on Neural Networks. Nanjing, 2007: 350
|
[43] |
Hotelling H. Relations between 2 sets of variants. Biometrika, 1935, 28(3-4): 312
|
[44] |
Cooley W W, Lohnes P R. Multivariate Procedures for the Behavioral Sciences. New York: John Wiley & Sons, 1962
|
[45] |
Akaho S. A kernel method for canonical correlation analysis // Proceedings of the International Meeting of the Psychometric Society (IMPS2001). Osaka, 2001: 1
|
[46] |
Wang S, Lu J F, Gu X J, et al. Unsupervised discriminant canonical correlation analysis based on spectral clustering. Neurocomputing, 2016, 171: 425 doi: 10.1016/j.neucom.2015.06.043
|
[47] |
Hu H F. Multiview gait recognition based on patch distribution features and uncorrelated multilinear sparse local discriminant canonical correlation analysis. IEEE Trans Circuits Syst Video Technol, 2014, 24(4): 617 doi: 10.1109/TCSVT.2013.2280098
|
[48] |
Farquhar J D R, Hardoon D R, Meng H, et al. Two view learning: SVM-2K, theory and practice // Proceedings of the 18th International Conference on Neural Information Processing. Vancouver, 2005: 355.
|
[49] |
Ozerov A, Fevotte C. Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans Audio Speech Language Process, 2010, 18(3): 550 doi: 10.1109/TASL.2009.2031510
|
[50] |
Zhang J, Huan J. Inductive multi-task learning with multiple view data // Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Beijing, 2012: 543
|
[51] |
Kong X N, Ng M K, Zhou Z H. Transductive multilabel learning via label set propagation. IEEE Trans Knowledge Data Eng, 2013, 25(3): 704 doi: 10.1109/TKDE.2011.141
|
[52] |
Blum A, Mitchell T. Combining labeled and unlabeled data with co-training // Proceedings of the Eleventh Annual Conference on Computational Learning Theory. Madison, 1998: 92
|
[53] |
Collins M. Unsupervised models for named entity classification. // Proceedings the 1999 of Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. College Park, 1999: 100
|
[54] |
Brefeld U, Scheffer T. Co-EM support vector learning // Proceedings of the Twenty-first International Conference on Machine Learning. Banff, 2004: 16
|
[55] |
Muslea I, Minton S, Knoblock C A. Active + semi-supervised learning = robust multi-view learning // Proceedings of the 19th International Conference on Machine Learning. Sydney, 2002: 435
|
[56] |
Lécun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86(11): 2278 doi: 10.1109/5.726791
|
[57] |
Mikolov T, Karafiát M, Burget L, et al. Recurrent neural network based language model // Eleventh Annual Conference of the International Speech Communication Association. Makuhari, 2010: 1045
|
[58] |
Hinton G E. Deep belief networks[J/OL]. Scholarpedia (2009-04-11) [2019-03-05]. http://www.scholarpedia.org/article/Deep_belief_networks
|
[59] |
Simonyan K, Zisserman A. Very Deep convolutional networks for large-scale image recognition. // Proceedings of International Conference on Learning Representations 2015. San Diego 2015: 1
|
[60] |
Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas 2016: 779
|
[61] |
Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Machine Intelligence, 2017, 39(4): 640 doi: 10.1109/TPAMI.2016.2572683
|
[62] |
Kim Y. Convolutional neural networks for sentence classification // Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. Doha, 2014: 1746
|
[63] |
Shen Y L, He X D, Gao J F, et al. A latent semantic model with convolutional-pooling structure for information retrieval // Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. Shanghai, 2014: 101
|
[64] |
Hu B T, Lu Z D, Li H, et al. Convolutional neural network architectures for matching natural language sentences // Advances in Neural Information Processing Systems. Montreal, 2014: 2042
|
[65] |
Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model. J Machine Learning Res, 2003, 3(6): 1137
|
[66] |
Collobert R, Weston J. A unified architecture for natural language processing: Deep neural networks with multitask learning // Proceedings of the 25th International Conference on Machine Learning. Helsinki, 2008: 160
|
[67] |
Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[J/OL]. arXiv (2013-09-07) [2019-03-05]. https://arxiv.org/pdf/1301.3781.pdf
|
[68] |
Graves A, Schmidhuber J. Frame-wise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 2005, 18(5-6): 602 doi: 10.1016/j.neunet.2005.06.042
|
[69] |
Liu P F, Qiu X P, Huang X J. Recurrent neural network for text classification with multi-task learning // Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. New York, 2016: 2873
|
[70] |
Sundermeyer M, Alkhouli T, Wuebker J, et al. Translation modeling with bidirectional recurrent neural networks // Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, 2014: 14
|
[71] |
Cho K, van Merrienboer B, Bahdanau D, et al. On the properties of neural machine translation: encoder-decoder approaches. // Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. Doha, 2014: 103
|
[72] |
Wollmer M, Eyben F, Graves A, et al. Bidirectional LSTM networks for context-sensitive keyword detection in a cognitive virtual agent framework. Cognitive Comput, 2010, 2(3): 180 doi: 10.1007/s12559-010-9041-8
|
[73] |
Zhou P, Shi W, Tian J, et al. Attention-based bidirectional long short-term memory networks for relation classification // Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, 2016: 207
|
[74] |
Zhang J, Man K F. Time series prediction using RNN in multi-dimension embedding phase space // SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics. San Diego, 1998: 1868
|
[75] |
Graves A, Mohamed A R, Hinton G. Speech recognition with deep recurrent neural networks // 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, 2013: 6645
|
[76] |
Karpathy A, Joulin A, Fei-Fei L F. Deep fragment embeddings for bidirectional image sentence mapping // Advances in Neural Information Processing Systems. Montreal, 2014: 1889
|
[77] |
Donahue J, Hendricks L A, Rohrbach M, et al. Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans Pattern Anal Machine Intelligence, 2014, 39(4): 677
|
[78] |
Kiros R, Salakhutdinov R, Zemel R. Unifying visual-semantic embeddings with multimodal neural language models. // Deep Learning and Representation Learning Workshop: NIPS 2014. Montreal, 2014: 1
|
[79] |
Mitchell M, Han X F, Dodge J, et al. Midge: Generating image descriptions from computer vision detections // Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Avignon, 2012: 747
|
[80] |
Ma L, Lu Z D, Li H. Learning to answer questions from image using convolutional neural network // Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. Phoenix, 2016: 3567
|
[81] |
Wan J, Wang D Y, Hoi S C H, et al. Deep learning for content-based image retrieval: a comprehensive study // Proceedings of the 22nd ACM International Conference on Multimedia. Orlando, 2014: 157
|
[82] |
Wöllmer M, Metallinou A, Eyben F, et al. Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling // Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH2010). Makuhari, 2010: 2362
|
[83] |
Su Y H, Fan K, Bach N, et al. Unsupervised multi-modal neural machine translation // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seattle, 2019: 10482
|
[84] |
Wang X, Huang Q Y, Celikyilmaz A, et al. Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seattle, 2019: 6629
|
[85] |
Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets // Advances in Neural Information Processing Systems. Monteral, 2014: 2672
|
[86] |
Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks[J/OL]. arXiv (2016-01-07) [2019-03-05]. https://arxiv.org/pdf/1511.06434.pdf
|
[87] |
Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks // Proceedings of the 34th International Conference on Machine Learning. Sydney, 2017: 214
|
[88] |
Mirza M, Osindero S. Conditional generative adversarial nets[J/OL]. arXiv (2014-11-06) [2019-03-05]. https://arxiv.org/pdf/1411.1784.pdf
|
[89] |
Tzeng E, Hoffman J, Saenko K, et al. Adversarial discriminative domain adaptation // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, 2017: 7167
|
[90] |
Liu M Y, Tuzel O. Coupled generative adversarial networks // Advances in Neural Information Processing Systems. Barcelona, 2016: 469
|
[91] |
Pei Z Y, Cao Z J, Long M S, et al. Multi-adversarial domain adaptation // Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans, 2018: 3934
|
[92] |
Cao Z J, Long M S, Wang J M, et al. Partial transfer learning with selective adversarial networks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt lake city, 2018: 2724
|
[93] |
Xie S A, Zheng Z B, Chen L, et al. Learning semantic representations for unsupervised domain adaptation // Proceedings of the 35th International Conference on Machine Learning. Long Beach, 2018: 5423
|
[94] |
Denton E L, Chintala S, Szlam A, et al. Deep generative image models using a laplacian pyramid of adversarial networks // Advances in Neural Information Processing Systems. Montreal, 2015: 1486
|
[95] |
Zhang H, Goodfellow I, Metaxas D, et al. Self-attention generative adversarial networks [J/OL]. arXiv (2018-05-21)[2019-03-05]. https://arxiv.org/pdf/1805.08318.pdf
|
[96] |
Rush A M, Chopra S, Weston J. A neural attention model for abstractive sentence summarization // Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, 2015: 379
|
[97] |
Miyato T, Kataoka T, Koyama M, et al. Spectral normalization for generative adversarial networks[J/OL]. arXiv (2018-02-16) [2019-03-05]. https://arxiv.org/pdf/1802.05957.pdf
|
[98] |
Brock A, Donahue J, Simonyan K. Large scale GAN training for high fidelity natural image synthesis[J/OL]. arXiv (2019-02-25) [2019-03-05]. https://arxiv.org/pdf/1809.11096.pdf
|
[99] |
Isola P, Zhu J Y, Zhou T H, et al. Image-to-image translation with conditional adversarial networks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, 2017: 1125
|
[100] |
Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks // Proceedings of the IEEE International Conference on Computer Vision. Venice, 2017: 2223
|
[101] |
Choi Y, Choi M, Kim M, et al. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 8789
|
[102] |
Huang X, Liu M Y, Belongie S, et al. Multimodal unsupervised image-to-image translation // Proceedings of the European Conference on Computer Vision (ECCV). Munich, 2018: 172
|
[103] |
Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation // Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, 2015: 234
|
[104] |
Anderson P, He X D, Buehler C, et al. Bottom-up and top-down attention for image captioning and visual question answering // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 6077
|
[105] |
Chen X P, Ma L, Jiang W H, et al. Regularizing RNNs for caption generation by reconstructing the past with the present // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 7995
|
[106] |
Chen F H, Ji R R, Sun X S, et al. Groupcap: Group-based image captioning with structured relevance and diversity constraints // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 1345
|
[107] |
Reed S, Akata Z, Yan X, et al. Generative adversarial text to image synthesis. // Proceedings of The 33rd International Conference on Machine Learning. New York, 2016: 1060
|
[108] |
Zhang H, Xu T, Li H S, et al. Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks // Proceedings of the IEEE International Conference on Computer Vision. Venice, 2017: 5907
|
[109] |
Zhang H, Xu T, Li H S, et al. StackGAN++: Realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Machine Intelligence, 2019, 41(8): 1947 doi: 10.1109/TPAMI.2018.2856256
|
[110] |
Hong S, Yang D D, Choi J, et al. Inferring semantic layout for hierarchical text-to-image synthesis // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 7986
|
[111] |
Xu T, Zhang P C, Huang Q Y, et al. AttnGAN: fine-grained text to image generation with attentional generative adversarial networks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 1316
|