| تعداد نشریات | 38 |
| تعداد شمارهها | 1,408 |
| تعداد مقالات | 10,088 |
| تعداد مشاهده مقاله | 11,909,264 |
| تعداد دریافت فایل اصل مقاله | 6,961,332 |
استخراج اطلاعات تهدیدات سایبری با استفاده از یادگیری عمیق و بازنمایش دانش | ||
| پدافند الکترونیکی و سایبری | ||
| مقاله 7، دوره 13، شماره 2 - شماره پیاپی 50، تیر 1404، صفحه 73-88 اصل مقاله (996.79 K) | ||
| نوع مقاله: مقاله پژوهشی | ||
| نویسندگان | ||
| سمیرا حورعلی* 1؛ فاطمه حورعلی2؛ عاطفه پاکزاد1 | ||
| 1استادیار گروه مهندسی کامپیوتر، دانشکده فنی مهندسی و علوم پایه، دانشگاه کوثر بجنورد، بجنورد، ایران | ||
| 2استادیار گروه مهندسی برق، دانشکده مهندسی برق و کامپیوتر، مجتمع آموزش عالی فنی و مهندسی اسفراین، اسفراین، ایران | ||
| تاریخ دریافت: 27 فروردین 1404، تاریخ بازنگری: 04 خرداد 1404، تاریخ پذیرش: 26 خرداد 1404 | ||
| چکیده | ||
| اطلاعات مربوط به امنیت سایبری به سرعت در اینترنت در حال رشد است و حملات سایبری روز به روز در حال افزایش است. مهاجمان بیشتر بخشهای نظامی، دولتی و شرکتی را هدف قرار میدهند، زیرا این بخشها حاوی اطلاعات حساس و طبقهبندیشدهای هستند که به استراتژیهای دفاعی مناسب نیاز دارد. استخراج اطلاعات تهدیدات سایبری یعنی استخراج نهادها، روابط بین آنها و رویدادهای موجود در متون سایبری، یکی از گامهای مهم برای تشخیص حملات سایبری، رویدادهای مضر و کاهش آنها در زمان واقعی در صورت وقوع است. استخراج مؤثر اطلاعات ارزشمند از تهدیدات سایبری میتواند به متخصصان امنیتی در تصمیمگیری آگاهانه و توسعه استراتژیهای دفاعی قوی کمک کند .همچنین این موضوع یکی از راهکارهای اساسی برای ارتقاء عملکرد سیستمهایی نظیرخلاصهسازی متون، ترجمه ماشینی و پرسش و پاسخ نیز میباشد. هرچند طی چهار دهه گذشته استخراج اطلاعات همواره یک موضوع تحقیقاتی فعال بوده است؛ اما هنوز هم دقت آن در حد قابل قبول نیست و مدل محاسباتی دقیقی برای آن وجود ندارد. در این مقاله ابتدا توسط جدیدترین متد تعبیه واژگان، شبکه بازگشتی دوجهته Bi-GRU، مکانیزم توجه و بازنمایش دانش نهادهای موجود در متن با دقت بالا استخراج میشوند؛ سپس با محاسبه میزان اهمیت و وزن هر ویژگی و در نظر گرفتن تمام معیارهای لازم در تصمیم-گیری، عبارات وابسته به نهادها تشخیص داده میشود. جهت استخراج دقیق روابط بین نهادها از شبکه عصبی مبتنی بر گراف و تابع هزینه ابتکاری استفاده شده است. برای تشخیص و پیشبینی دقیق رویدادهای امنیتی از شبکه عمیق KVP مبتنی بر مکانیزم توجه استفاده شده است که میتواند همبستگی بین دو عنصر که موقعیتهای متفاوتی در یک دنباله ورودی دارند را شناسایی کند. برای بررسی عملکرد روش پیشنهادی شبیهسازیهای گستردهای صورت گرفته است. طبق نتایج شبیهسازی، روش پیشنهادی روی پیکره-های CoNLL-2012 و OSINT به ترتیب به امتیاز F1 8/89 و 4/93 درصد دست یافته است. | ||
| کلیدواژهها | ||
| استخراج اطلاعات؛ تهدیدات سایبری؛ روابط نهادها؛ استخراج رویداد؛ یادگیری عمیق؛ بازنمایش دانش | ||
| موضوعات | ||
| آسیب پذیری ها و تهدیدات فضای سایبری | ||
| عنوان مقاله [English] | ||
| Cyber Threat Information Extraction using Deep Learning and Knowledge Representation | ||
| نویسندگان [English] | ||
| Samira Hourali1؛ Fatemeh Hourali2؛ Atefe Pakzad1 | ||
| 1Assistant Professor, Department of Computer Engineering, Faculty of Engineering, Kosar University of Bojnord, Bojnord, Iran | ||
| 2Assistant Professor, Department of Electrical Engineering, Faculty of Electrical and Computer Engineering, Esfarayen University of Technology, Esfarayen, Iran | ||
| چکیده [English] | ||
| Cyber security information is rapidly growing on the internet and cyber attacks are increasing daily. Attackers mostly target the military, government, and corporate departments, because these contain sensitive and classified information that requires appropriate defense strategies. Cyber threat information extraction, i.e., extracting entities, relationships between them, and events in cyber texts, is one of the important steps for detecting cyber attacks, harmful events, and mitigating them in real time if they occur. Extracting valuable information from cyber threats can help security professionals to make informed decisions and develop strong defense strategies. It is also a fundamental solution for improving the performance of systems such as text summarization, machine translation, and question-answering. Although information extraction has been an active research topic over the past four decades, its accuracy is still not acceptable and there is no accurate computational model for it. In this paper, first, the entities in the text are extracted with high accuracy using the latest vocabulary embedding method, the Bi-GRU bidirectional recurrent network, the attention mechanism, and the knowledge representation; Then, expressions related to the entities are recognized by calculating the importance and weight of each feature and considering all the necessary criteria in decision-making. The entities relationships were extracted by a graph-based neural network and a heuristic loss function. The KVP deep network based on the attention mechanism has been used for accurate detection and security events prediction which can identify the correlation between two elements that have different positions in the input sequence. Extensive simulations have been carried out to check the performance of the proposed method. According to the simulation results, the proposed method has achieved 89.8% and 93.4% F1 scores on CoNLL-2012 and OSINT datasets, respectively. | ||
| کلیدواژهها [English] | ||
| Information extraction, cyber threats, entity relationships, event extraction, deep learning, knowledge representation | ||
| مراجع | ||
|
[1] L. Zongxun, L. Yujun, Z. Haojie, and L. Juan, “Construction of ttps from apt reports using bert,” in 2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), IEEE, 2021, pp. 260–263. Accessed: Apr. 15, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9674158 [2] K. Dadashtabar Ahmadi, M. Kheirkhah, A. J. Rashidi, " Detection of advanced Cyber Attacks, Using Behavior Modeling Based on Natural Language Processing", ECD, Vol. 6, No. 3, Serial No. 2, pp. 141-151, 2018. doi: 20.1001.1.23224347.1397.6.3.12.2 [3] N. Sun et al., “Cyber threat intelligence mining for proactive cybersecurity defense: a survey and new perspectives,” IEEE Communications Surveys & Tutorials, 2023, Accessed: Apr. 20, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10117505/ [4] S. Ainslie, D. Thompson, S. Maynard, and A. Ahmad, “Cyber-threat intelligence for security decision-making: a review and research agenda for practice,” Computers & Security, p. 103352, 2023. https://doi.org/10.1016/j.cose.2023.103352. [5] M. H. HassanNia, M. R. HasaniAhangar, A. Gafori, An Improved Method of Incident Detection due to Cyber Attacks, ECD, Vol. 7, No. 4, 2020. Available: https://sid.ir/paper/395725/en. [6] E.Bastami, H.Soltanizadeh*,M. Rahmanimanesh, P. Keshavarzi, "A Malware Classification Method Using visualization and Word Embedding Features", ECD, Vol. 11, No. 1, 2023. doi: 20.1001.1.23224347.1402.11.1.1.2. [7] K. Lee, L. He, and L. Zettlemoyer, “Higher-order coreference resolution with coarse-to-fine inference,” arXiv preprint arXiv:1804.05392, 2018. [8] H. Peng, D. Khashabi, and D. Roth, “Solving hard coreference problems,” arXiv preprint arXiv:1907.05524, 2019. https://doi.org/10.48550/arXiv.1907.05524. [9] L.-T. Wu, J.-R. Lin, S. Leng, J.-L. Li, and Z.-Z. Hu, “Rule-based information extraction for mechanical-electrical-plumbing-specific semantic web,” AUTOMAT CONSTR, vol. 135, p. 104108, 2022. https://doi.org/10.1016/j.autcon.2021.104108. [10] A. Alamoudi, A. Alomari, and S. Alwarthan, “A rule-based information extraction approach for extracting metadata from PDF books,” ICIC Express Letters, Part B: Applications, vol. 12, no. 2, pp. 121–132, 2021. doi:10.24507/icicelb.12.02.121 [11] D. Freitag, J. Cadigan, R. Sasseen, and P. Kalmar, “VALET: rule-based information extraction for rapid deployment,” in Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022, pp. 524–533. Accessed: Oct. 26, 2023. [Online]. Available: https://aclanthology.org/2022.lrec-1.55/ [12] F. Rahma and A. Romadhony, “Rule-Based Crime Information Extraction on Indonesian Digital News,” in 2021 International Conference on Data Science and Its Applications (ICoDSA), IEEE, 2021, pp. 10–15. Accessed: Oct. 26, 2023. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9617509 [13] K. Shaukat, S. Luo, S. Chen and D. Liu, "Cyber Threat Detection Using Machine Learning Techniques: A Performance Evaluation Perspective," 2020 International Conference on Cyber Warfare and Security (ICCWS), Islamabad, Pakistan, pp. 1-6, 2020. doi: 10.1109/ICCWS48432.2020.9292388. [14] B. M. Davis, M. Salinas-Navarro, M. F. Cordeiro, L. Moons, and L. De Groef, “Characterizing microglia activation: a spatial statistics approach to maximize information extraction,” Sci. Rep., vol. 7, no. 1, p. 1576, 2017. https://doi.org/10.1038/s41598-017-01747-8. [15] G. Tür, D. Hakkani-Tür, and K. Oflazer, “A statistical information extraction system for Turkish,” Nat. Lang. Eng, vol. 9, no. 2, pp. 181–210, 2003. doi:10.1017/S135132490200284X. [16] J. Zhang, “Entropic Statistics: Concept, Estimation, and Application in Machine Learning and Knowledge Extraction,” Mach. learn. knowl. extr., vol. 4, no. 4, pp. 865–887, 2022. https://doi.org/10.3390/make4040044. [17] Y. Ghazi, Z. Anwar, R. Mumtaz, S. Saleem, and A. Tahir, “A supervised machine learning based approach for automatically extracting high-level threat intelligence from unstructured sources,” in 2018 International Conference on Frontiers of Information Technology (FIT), IEEE, 2018, pp. 129–134. Accessed: Apr. 22, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/8616979 [18] K. Narasimhan, A. Yala, and R. Barzilay, “Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning.” arXiv, Sep. 27, 2016. Accessed: Oct. 20, 2023. [Online]. Available: http://arxiv.org/abs/1603.07954.https://doi.org/10.48550/arXiv.1603.07954 [19] W. Y. Wang, J. Li, and X. He, “Deep reinforcement learning for NLP,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, 2018, pp. 19–21. Accessed: Oct. 20, 2023. [Online]. Available: https://aclanthology.org/P18-5007/ [20] X. Wang, J. Yang, Q. Wang, and C. Su, “Threat Intelligence Relationship Extraction Based on Distant Supervision and Reinforcement Learning.,” in SEKE, 2020, pp. 572–576. Accessed: Apr. 22, 2024. [Online]. Available: https://ksiresearch.org/seke/seke20paper/paper149.pdf [21] X. Wang et al., “A method for extracting unstructured threat intelligence based on dictionary template and reinforcement learning,” in 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD), IEEE, 2021, pp. 262–267. Accessed: Apr. 22, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9437858 [22] Y. Yang, Z. Wu, Y. Yang, S. Lian, F. Guo, and Z. Wang, “A survey of information extraction based on deep learning,” Appl. Sci., vol. 12, no. 19, p. 9691, 2022. https://doi.org/10.3390/app12199691. [23] H. Jo, Y. Lee, and S. Shin, “Vulcan: Automatic extraction and analysis of cyber threat intelligence from unstructured text,” COMPUT SECUR , vol. 120, p. 102763, 2022. https://doi.org/10.1016/j.cose.2022.102763. [24] X. Wang et al., “Cyber threat intelligence entity extraction based on deep learning and field knowledge engineering,” in 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD), IEEE, 2022, pp. 406–413. Accessed: Apr. 22, 2024. [Online]. Available:https://ieeexplore.ieee.org/abstract/document/9776139/ [25] K. Ahmed, S. K. Khurshid, and S. Hina, “CyberEntRel: Joint extraction of cyber entities and relations using deep learning,” COMPUT SECUR , vol. 136, p. 103579, 2024. https://doi.org/10.1016/j.cose.2023.103579. [26] Y. Shi, Y. Xiao, P. Quan, M. Lei, and L. Niu, “Document-level relation extraction via graph transformer networks and temporal convolutional networks,” Pattern Recognit. Lett, vol. 149, pp. 150–156, 2021. https://doi.org/10.1016/j.patrec.2021.06.012. [27] C. Park, J. Park, and S. Park, “AGCN: Attention-based graph convolutional networks for drug-drug interaction extraction,” Expert Syst. Appl., vol. 159, p. 113538, 2020. https://doi.org/10.1016/j.eswa.2020.113538. [28] J. Xu, Y. Chen, Y. Qin, R. Huang, and Q. Zheng, “A feature combination-based graph convolutional neural network model for relation extraction,” Symmetry, vol. 13, no. 8, p. 1458, 2021. https://doi.org/10.3390/sym13081458. [29] H. Zhang, Z. Huang, Z. Li, D. Li, and F. Liu, “Densely Connected Graph Attention Network Based on Iterative Path Reasoning for Document-Level Relation Extraction,” in Advances in Knowledge Discovery and Data Mining, vol. 12713, K. Karlapalem, H. Cheng, N. Ramakrishnan, R. K. Agrawal, P. K. Reddy, J. Srivastava, and T. Chakraborty, Eds., LECT NOTES ARTIF INT, vol. 12713. , Cham: Springer International Publishing, 2021, pp. 269–281. doi: 10.1007/978-3-030-75765-6_22. [30] S. Guo, L. Huang, G. Yao, Y. Wang, H. Guan, and T. Bai, “Extracting Biomedical Entity Relations using Biological Interaction Knowledge,” INTERDISCIP SCI, vol. 13, no. 2, pp. 312–320, Jun. 2021, doi: 10.1007/s12539-021-00425-8. [31] S. Raza and B. Schwartz, “Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach,” BMC MED INFORM DECIS, vol. 23, no. 1, p. 20, Jan. 2023, doi: 10.1186/s12911-023-02117-3. [32] C. Kruengkrai, T. H. Nguyen, S. M. Aljunied, and L. Bing, “Improving low-resource named entity recognition using joint sentence and token labeling,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 5898–5905. Accessed: Oct. 26, 2023. [Online]. Available: https://aclanthology.org/2020.acl-main.523/ [33] P. H. Martins, Z. Marinho, and A. F. T. Martins, “Joint Learning of Named Entity Recognition and Entity Linking.” arXiv, Jul. 18, 2019. Accessed: Oct. 26, 2023. [Online]. Available: http://arxiv.org/abs/1907.08243 [34] Y. Lu et al., “Unified Structure Generation for Universal Information Extraction.” arXiv, Mar. 23, 2022. Accessed: Oct. 26, 2023. [Online]. Available: http://arxiv.org/abs/2203.12277. [35] I.-H. Hsu et al., “DEGREE: A Data-Efficient Generation-Based Event Extraction Model.” arXiv, May 03, 2022. Accessed: Oct. 26, 2023. [Online]. Available: http://arxiv.org/abs/2108.12724. [36] J. Gao, H. Zhao, C. Yu, and R. Xu, “Exploring the Feasibility of ChatGPT for Event Extraction.” arXiv, Mar. 09, 2023. Accessed: Oct. 26, 2023. [Online]. Available: http://arxiv.org/abs/2303.03836. [37] D. Zhang, S. Wei, S. Li, H. Wu, Q. Zhu, and G. Zhou, “Multi-modal graph fusion for named entity recognition with targeted visual guidance,” in Proceedings of the AAAI conference on artificial intelligence, 2021, pp. 14347–14355. Accessed: Oct. 26, 2023. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/17687 [38] D. Sui, Z. Tian, Y. Chen, K. Liu, and J. Zhao, “A large-scale chinese multimodal ner dataset with speech clues,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 2807–2818. Accessed: Oct. 26, 2023. [Online]. Available: https://aclanthology.org/2021.acl-long.218/ [39] C. Zheng, J. Feng, Z. Fu, Y. Cai, Q. Li, and T. Wang, “Multimodal Relation Extraction with Efficient Graph Alignment,” in Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event China: ACM, Oct. 2021, pp. 5298–5306. doi: 10.1145/3474085.3476968. [40] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014. https://doi.org/10.48550/arXiv.1409.0473. [41] A. T. de Almeida, “Multicriteria decision model for outsourcing contracts selection based on utility function and ELECTRE method,” Comput. Oper. Res., vol. 34, no. 12, pp. 3569–3574, 2007. https://doi.org/10.1016/j.cor.2006.01.003. [42] A. Vaswani et al., “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017, Accessed: Jun. 04, 2024. [Online]. Available: https://proceedings.neurips.cc/paper/7181-attention-is-all [43] M. Daniluk, T. Rocktäschel, J. Welbl, and S. Riedel, “Frustratingly Short Attention Spans in Neural Language Modeling.” arXiv, Feb. 15, 2017. Accessed: Jun. 04, 2024. [Online]. Available: http://arxiv.org/abs/1702.04521. https://doi.org/10.48550/arXiv.1702.04521. [44] T. Satyapanich, F. Ferraro, and T. Finin, “Casie: Extracting cybersecurity event information from text,” in Proceedings of the AAAI conference on artificial intelligence, 2020, pp. 8749–8757. Accessed: May 22, 2024. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/6401 [45] V. Behzadan, C. Aguirre, A. Bose, and W. Hsu, “Corpus and deep learning classifier for collection of cyber threat indicators in twitter stream,” in 2018 IEEE International Conference on Big Data (Big Data), IEEE, 2018, pp. 5002–5007. Accessed: May 22, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/8622506 [46] S. K. Lim, A. O. Muis, W. Lu, and C. H. Ong, “Malwaretextdb: A database for annotated malware articles,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017, pp. 1557–1567. Accessed: May 22, 2024. [Online]. Available: https://aclanthology.org/P17-1143/ [47] A. Roy, Y. Park, and S. Pan, “Predicting malware attributes from cybersecurity texts,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 2857–2861. Accessed: May 22, 2024. [Online]. Available: https://aclanthology.org/N19-1293/ [48] P. Stenetorp, S. Pyysalo, G. Topić, T. Ohta, S. Ananiadou, and J. Tsujii, “BRAT: a web-based tool for NLP-assisted text annotation,” in Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, 2012, pp. 102–107. Accessed: May 22, 2024. [Online]. Available: https://aclanthology.org/E12-2021.pdf [49] “CVEProject/cvelist.” CVE Program, May 22, 2024. Accessed: May 22, 2024. [Online]. Available: https://github.com/CVEProject/cvelist [50] S. Roy, E. Panaousis, C. Noakes, A. Laszka, S. Panda, and G. Loukas, “SoK: The MITRE ATT&CK Framework in Research and Practice.” arXiv, Apr. 14, 2023. Accessed: May 22, 2024. [Online]. Available: http://arxiv.org/abs/2304.07411 [51] J. R. Hobbs, “Resolving pronoun references,” Lingua, vol. 44, no. 4, pp. 311–338, 1978. https://doi.org/10.1016/0024-3841(78)90006-2. [52] “Annotated English Gigaword - Linguistic Data Consortium.” Accessed: Apr. 12, 2019. [Online]. Available: https://catalog.ldc.upenn.edu/LDC2012T21 [53] M. Vilain, J. Burger, J. Aberdeen, D. Connolly, and L. Hirschman, “A model-theoretic coreference scoring scheme,” in Proceedings of the 6th conference on Message understanding, Association for Computational Linguistics, 1995, pp. 45–52.
[55] X. Luo, “On coreference resolution performance metrics,” in Proceedings of the conference on human language technology and empirical methods in natural language processing, Association for Computational Linguistics, 2005, pp. 25–32. [56] S. Pradhan, A. Moschitti, N. Xue, O. Uryupina, and Y. Zhang, “CoNLL-2012 shared task: Modeling multilingual unrestricted coreference in OntoNotes,” in Joint Conference on EMNLP and CoNLL-Shared Task, Association for Computational Linguistics, 2012, pp. 1–40. [57] B. Kantor and A. Globerson, “Coreference Resolution with Entity Equalization,” in Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019, pp. 673–677. [59] Amudha, M., M. Ramachandran, Vimala Saravanan, P. Anusuya, and R. Gayathri. "A study on TOPSIS MCDM techniques and its application." Data Analytics and Artificial Intelligence 1, no. 1 pp. 09-14, 2021. doi:10.46632/daai/1/1/2. [60] Yazdani, Morteza, and Felipe R. Graeml. "VIKOR and its applications: A state-of-the-art survey." Int. J. Strateg. Decis. Sci.5, no. 2 , pp. 56-83, 2014. DOI: 10.4018/ijsds.2014040105. | ||
|
آمار تعداد مشاهده مقاله: 376 تعداد دریافت فایل اصل مقاله: 16 |
||