تعداد نشریات | 36 |
تعداد شمارهها | 1,203 |
تعداد مقالات | 8,704 |
تعداد مشاهده مقاله | 7,207,006 |
تعداد دریافت فایل اصل مقاله | 4,210,482 |
قطعهبندی معنایی تصاویر خودروهای خودران با بهرهگیری از تکنیک معلم-دانشآموز | ||
پدافند الکترونیکی و سایبری | ||
مقاله 1، دوره 9، شماره 4 - شماره پیاپی 36، اسفند 1400، صفحه 1-19 اصل مقاله (1.59 M) | ||
نوع مقاله: مقاله پژوهشی | ||
نویسندگان | ||
امیر خسرویان1؛ مسعود مسیح طهرانی2؛ عبدالله امیرخانی* 2 | ||
1کارشناسی ارشد،دانشگاه علم و صنعت، تهران، ایران | ||
2استادیار، دانشگاه علم و صنعت، تهران، ایران | ||
تاریخ دریافت: 14 مهر 1399، تاریخ بازنگری: 18 تیر 1400، تاریخ پذیرش: 19 تیر 1400 | ||
چکیده | ||
قطعهبندی معنایی یکی از رایجترین خروجیهای پردازش تصویری برای خودروهای خودران مجهز به بینایی است. مدلهای مبتنی بر یادگیری عمیق جهت یاد گرفتن ویژگیهای محیطی جدید و با دامنه متفاوت نیازمند در اختیار داشتن انبوهی از داده هستند. اما فرآیند برچسبگذاری دستی این حجم از داده توسط انسان بسیار زمانبر خواهد بود. در حالی که رویکرد بسیاری از مقالات مبتنی بر آموزش مدلهای یادگیری عمیق با روش نظارتی است، در این مقاله از روش نیمه نظارتی جهت اعمال قطعهبندی معنایی بهره گرفته میشود. بهطور دقیقتر در این پژوهش، روش معلم- دانشآموز جهت برقراری تعامل میان مدلهای یادگیری عمیق به کار گرفته میشود. در ابتدا مدلهای DABNet و ContextNet در جایگاه معلم با استفاده از پایگاه داده BDD100K آموزش داده میشوند. با توجه به اهمیت قابلیت تعمیم پذیری و مقاوم بودن مدلهای مورد استفاده در خودروهای خودران، این معیارهای شبکههای معلم با شبیهسازی در نرمافزار CARLA مورد ارزیابی قرار گرفتهاند. سپس شبکههای معلم، پایگاه داده Cityscapes را بهطور کامل و بدون دخالت انسان در فرآیند آموزش با بهرهگیری از یادگیری نیمه- نظارتی به مدل FastSCNN آموزش دادهاند. برخلاف سایر رویکردهای نیمه- نظارتی، وجود دو پایگاه داده با اختلاف دامنه قابل توجه، روش معلم- دانشآموز را بیشتر به چالش خواهد کشید. نتایج نشان میدهد عملکرد مدل دانشآموز در کلاسهایی نظیر خودرو، انسان و جاده که شناسایی آنها از مهمترین اولویتهای خودرو خودران است بهترتیب به میزان 2/1%، 3% و 8/3% با برچسبگذاری دستی اختلاف دارد. همچنین میانگین دقت مدل دانشآموز نیز تنها 5/4% اختلاف عملکرد با مدلی دارد که آمادهسازی پایگاه داده آن نیازمند صرف زمان بسیار زیاد است. | ||
کلیدواژهها | ||
خودرو خودران؛ شبکههای عصبی پیچشی؛ قطعهبندی معنایی؛ روش معلم- دانشآموز | ||
عنوان مقاله [English] | ||
The Semantic Segmentation of Autonomous Vehicles Images with the Teacher-Student Technique | ||
نویسندگان [English] | ||
Amir Khosravian1؛ Masoud Masih-Tehrani2؛ Abdollah Amirkhani2 | ||
1M.Sc., University of Science and Technology, Tehran, Iran | ||
2Assistant Professor, University of Science and Technology, Tehran, Iran | ||
چکیده [English] | ||
Semantic segmentation is one of the most common image processing outputs for vision-based autonomous vehicles. Deep neural networks require large-scale data in order to learn new environment features with diverse domains. While the approach of a great deal of papers is based on supervised learning, in this paper, semantic segmentation has been implemented by taking advantage of the semi-supervised learning method. To be more specific, in this study the teacher-student technique is utilized to establish a connection for the interaction between the deep learning models. First, the DABNet and ContextNet models are trained as our teacher networks with the BDD100K database. Regarding the significance of generalization and robustness of models in autonomous vehicles, these criteria of the teacher models have been evaluated by simulations in CARLA software. Finally, teacher networks train the FastSCNN model automatically using the Cityscapes database without any human interference. In contrast with other semi-supervised approaches, the existence of two different databases with noticeable amount of domain-shift effect would challenge the student-teacher technique even more. The results indicate that student’s performance in classes such as vehicles, pedestrians, and road, which are the highest priority classes to detect, has only 1.2%, 3%, and 3.8% accuracy difference, respectively. Also, there is a 4.5% drop for the model’s mean intersection over union accuracy between the teacher’s performance and a similar model trained with an entirely supervised method. Also, the mean accuracy for the student model has only 4.5% difference in performance with a model whose data base needs a long time for preparation. | ||
کلیدواژهها [English] | ||
Autonomous vehicles, Convolutional neural networks, Semantic segmentation, Teacher-student technique | ||
مراجع | ||
[1] S. Singh, “Critical reasons for crashes investigated in the national motor vehicle crash causation survey,” Traffic Saf. Facts - Crash Stats, 2015. [2] F. Becker and K. W. Axhausen, “Literature review on surveys investigating the acceptance of automated vehicles,” in TRB 96th Annual Meeting Compendium of Papers, pp. 1–12 , 2017 [3] C. Gkartzonikas and K. Gkritza, “What have we learned? A review of stated preference and choice studies on autonomous vehicles,” Transp. Res. Part C, vol. 98, pp. 323–337, 2019. [4] J. Cui, L. S. Liew, et al. “A review on safety failures, security attacks, and available counter measures for autonomous vehicles,” Ad. Hoc. Networks, vol. 90, p. 101823, 2019. [5] J. Van Brummelen, M. O’Brien, et al. “Autonomous vehicle perception: The technology of today and tomorrow,” Transp. Res. Emerg. Technol., Part C, vol. 89, pp. 384–406, 2018. [6] J. Janai, F. Guney, et al. “Computer vision for autonomous vehicles: Problems, datasets and state of the art,” Foundations and Trends in Computer Graphics and Vision, vol. 12, no. 1-3, pp. 1-308, 2020. [7] D. Feng, et al. “Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges,” IEEE Transactions on Intelligent Transportation Systems, 2020, DOI: 10.1109/TITS.2020.2972974. [8] K. Kim, J. S. Kims S. Jeong, et al. “Cybersecurity for autonomous vehicles: Review of attacks and defense,” Computers & Security, vol. 103, p. 102150, 2021. [9] Z. El Rewini, K. Sadatsharan, D. F. Selvaraj, et al., “Cybersecurity challenges in vehicular communications,” Vehicular Communications, vol. 23, p. 100214, 2020. [10] C. Kamann and C. Rother, “Benchmarking the robustness of semantic segmentation models,” in arXiv preprint arXiv:1908.05005, 2019. [11] H. Wu, Y. Yan, Y. Ye, M. K. Ng, and Q. Wu, “Geometric knowledge embedding for unsupervised domain adaptation,” Knowledge-Based Systems, vol. 191, p. 105155, 2020. [12] Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. Van Gool, “Domain adaptive faster R-CNN for object detection in the wild,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 3339-3348. [13] E. Romera, L. M. Bergasa, K. Yang, J. M. Alvarez, and R. Barea, "Bridging the day and night domain gap for semantic segmentation," in IEEE Intelligent Vehicles Symposium (IV), 2019, pp. 1312-1318. [14] M. Cordts et al., “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3213-3223. [15] F. Yu et al., “BDD100K: a diverse driving dataset for heterogeneous multitask learning,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 2636-2645. [16] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the KITTI vision benchmark suite,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 3354–3361. [17] D. Heo, J. Nam, and B. Ko, “Estimation of pedestrian pose orientation using soft target training based on teacher–student framework,” Sensors, vol. 19, no. 5, p. 1147, 2019. [18] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587, 2017. [19] L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 801-818. [20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition.” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778. [21] J. Xie, B. Shuai, J.-F. Hu, J. Lin, and W. S. Zheng, “Improving fast segmentation with teacher-student learning,” in British Machine Vision Conference (BMVC), 2018, pp. 205. [22] D. Heo, J. Nam, and B. Ko, "Estimation of pedestrian pose orientation using soft target training based on teacher–student framework," Sensors, vol. 19, no. 5, pp. 1147, 2019. [23] E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, "A survey of autonomous driving: common practices and emerging technologies," IEEE Access, vol. 8, pp. 58443-58469, 2020. [24] Y. Zhu et al., “Improving semantic segmentation via self-training,” arXiv preprint arXiv: 2004.14960, 2020. [25] G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla, “Segmentation and recognition using structure from motion point clouds,” in Proceedings of European Conference on Computer Vision (ECCV), 2008, pp. 44-57. [26] G. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classesin video: A high-definition ground truth database,” Pattern Recognition, vol. 30, no. 2, pp. 88–97, 2009. [27] L.-C. Chen et al., “Naive-student: leveraging semi-supervised learning in video sequences for urban scene segmentation,” in Proceedings of European Conference on Computer Vision (ECCV), 2020, pp. 695-714. [28] Q. Xie, M.-T. Luong, E. Hovy, and Q. V Le, “Self-training with noisy student improves imageNet classification,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10687-10698. [29] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3234-3243. [30] Y. H. Tsai, K. Sohn, S. Schulter, and M. Chandraker, “Domain adaptation for structured output via discriminative patch representations,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 1456-1465. [31] Y. H. Tsai, W. C. Hung, S. Schulter, K. Sohn, M. H. Yang, and M. Chandraker, “Learning to adapt structured output space for semantic segmentation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7472-7481. [32] M. Chen, H. Xue, and D. Cai, “Domain adaptation for semantic segmentation with maximum squares loss,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 2090-2099. [33] T. H. Vu, H. Jain, M. Bucher, M. Cord, and P. Perez, “Advent: adversarial entropy minimization for domain adaptation in semantic segmentation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2517-2526. [34] F. Pan, I. Shin, F. Rameau, S. Lee, and I. S. Kweon, “Unsupervised intra-domain adaptation for semantic segmentation through self-supervision,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 3764-3773. [35] U. Michieli, M. Biasetton, G. Agresti, and P. Zanuttigh, “Adversarial learning and self-teaching techniques for domain adaptation in semantic segmentation,” IEEE Transactions on Intelligent Vehicles, , vol. 5, no. 3, pp. 508-518, 2020. [36] S. R. Richter, V. Vineet, S. Roth, and V. Koltun, “Playing for data: ground truth from computer games,” in Proceedings of European Conference on Computer Vision (ECCV), 2016, pp. 102-118. [37] G. Neuhold, T. Ollmann, S. R. Bulo, and P. Kontschieder, “The mapillary vistas dataset for semantic understanding of street scenes,” Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2017, pp. 4990-4999. [38] S. M. Khorashadizadeh, V. Azadzadeh, and, A. M. Latif, “Detection of pornographic digital images using support vector machine and neural network,” Journal of Electronical & Cyber Defence, vol. 4, no. 4, pp. 79-88, 2017. [39] M. Asadi, M. A. Jabraeil Jamali, et al., “Comparison of supervised machine learning algorithms in detection of botnets domain generation algorithms,” Journal of Electronical & Cyber Defence, vol. 8, no. 4, pp. 17-29, 2020. [40] G. Li, I. Yun, J. Kim, and J. Kim, “DABNet: depth-wise asymmetric bottleneck for real-time semantic segmentation,” in British Machine Vision Conference (BMVC), 2019. [41] R. P. K. Poudel, U. Bonde, S. Liwicki, and C. Zach, “ContextNet: exploring context and detail for semantic segmentation in real-time,” in British Machine Vision Conference (BMVC), 2018. [42] R. P. K. Poudel, S. Liwicki, and R. Cipolla, “Fast-SCNN: fast semantic segmentation network,” in British Machine Vision Conference (BMVC), 2019. [43] D. Mazzini, “guided upsampling network for real-time semantic segmentation,” in British Machine Vision Conference (BMVC), 2018, p. 117. [44] C. Yu, “BiSeNet: bilateral segmentation network for real-time semantic segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 325-341. [45] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431-3440. [46] O. Ronneberger, P. Fischer, and T. Brox, “U-net: convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015, pp. 234–241. [47] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. C. Chen, “Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation mark,” arXiv preprint arXiv:1801.04381, 2018. [48] H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia, “ICNet for real-time semantic segmentation on high-resolution images,” in Proceedings of European Conference on Computer Vision (ECCV), 2018, pp. 405-420. [49] N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang, “On large-batch training for deep learning: generalization gap and sharp minima,” Int. Conf. Learn. Represent, pp. 1–16. 2016 [50] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations (ICLR), 2015. [51] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: an open urban driving simulator,” in Conference on Robot Learning (CoRL), 2017. [52] V. Badrinarayanan, A. Kendall and R. Cipolla, “SegNet: A deep convolutional encoder-decoder architecture for image segmentation,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481-2495, 2017. [53] A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: A deep neural network architecture for real-time semantic segmentation,” arXiv preprint arXiv:1606.02147, 2016. [54] M. Liu and H. Yin, “Feature pyramid encoding network for real-time semantic segmentation,” arXiv preprint arXiv:1909.08599, 2019. [55] S. Mehta, M. Rastegari, A. Caspi, L. Shapiro, and H. Hajishirzi, “ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 552-568. [56] S. Mehta, M. Rastegari, L. Shapiro, and H. Hajishirzi, “ESPNetv2: a light-weight, power efficient, and general purpose convolutional neural network,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9190-9200. | ||
آمار تعداد مشاهده مقاله: 652 تعداد دریافت فایل اصل مقاله: 355 |