مقایسه الگوریتم های یادگیری ماشین نظارتی در تشخیص الگوریتم های تولید دامنه شبکه های بات

اسدی, مهدی; جبرئیل جمالی, محمدعلی; پارسا, سعید; مجیدنژاد, وحید

فراخوان حمایت از طرحهای فناورانه: جوش اتوماتیک به کمک ربات پرتابل

تعداد نشریات	34
تعداد شماره‌ها	1,329
تعداد مقالات	9,550
تعداد مشاهده مقاله	9,719,493
تعداد دریافت فایل اصل مقاله	5,934,615

	مقایسه الگوریتم های یادگیری ماشین نظارتی در تشخیص الگوریتم های تولید دامنه شبکه های بات
پدافند الکترونیکی و سایبری
مقاله 2، دوره 8، شماره 4 - شماره پیاپی 32، دی 1399، صفحه 17-29 اصل مقاله (1.29 M)
نوع مقاله: مقاله پژوهشی
نویسندگان
مهدی اسدی¹؛ محمدعلی جبرئیل جمالی^* ²؛ سعید پارسا³؛ وحید مجیدنژاد²
¹دانشجوی دکترا، گروه مهندسی کامپیوتر، واحد شبستر، دانشگاه آزاد اسلامی، شبستر، ایران
²استادیار، گروه مهندسی کامپیوتر، واحد شبستر، دانشگاه آزاد اسلامی، شبستر، ایران
³دانشیار ، عضو هیات علمی دانشگاه علم و صنعت ایران
تاریخ دریافت: 06 اردیبهشت 1400، تاریخ پذیرش: 06 اردیبهشت 1400
چکیده
الگوریتمهای تولید دامنه در شبکههای بات به‌عنوان نقاط ملاقات مدیر بات با خدمتدهنده فرمان و کنترل آن‌ها مورداستفاده قرار میگیرند و میتوانند به‌طور مداوم تعداد زیادی از دامنهها را برای گریز از تشخیص توسط روشهای سنتی از جمله لیست سیاه،تولید کنند. شرکتهای تأمین‌کننده امنیت اینترنتی، معمولاً لیست سیاه را برای شناسایی شبکههای بات و بدافزارها استفاده میکنند، اما الگوریتم تولید دامنه می‌تواند به‌طور مداوم دامنه را به‌روز کند تا از شناسایی لیست سیاه جلوگیری کند. شناسایی شبکههای بات مبتنی بر الگوریتم تولید دامنه یک مسئله چالش‌برانگیز در امنیت سامانههای کامپیوتری است. در این مقاله، ابتدا با استفاده از مهندسی ویژگیها، سه نوع ویژگی (ساختاری، آماری و زبانی) برای تشخیص الگوریتمهای تولید دامنه استخراج‌شده و سپس مجموعه داده جدیدی از ترکیب یک مجموعه داده با دامنههای سالم و دو مجموعه داده با الگوریتمهای تولید دامنه بدخواه و ناسالم تولید میشود. با استفاده از الگوریتمهای یادگیری ماشین، ردهبندی دامنهها انجام‌شده و نتایج به‌صورت مقایسهای جهت تعیین نمونه‌ با نرخ صحت بالاتر و نرخ مثبت نادرست کمتر جهت تشخیص الگوریتم‌های تولید دامنه مورد بررسی قرار میگیرد. نتایج به‌دست آمده در این مقاله، نشان میدهد الگوریتم جنگل تصادفی، نرخ صحت، نرخ تشخیص و مشخصه عملکرد پذیرنده بالاتری را به ترتیب برابر با 32/89%، 67/91% و 889/0 ارائه میدهد. همچنین در مقایسه با نتایج سایر الگوریتمهای بررسی شده، الگوریتم جنگل تصادفی نرخ مثبت نادرست پایینتری برابر با 373/0 نشان میدهد.
کلیدواژه‌ها
شبکه‌بات؛ الگوریتم های تولید دامنه؛ الگوریتم های یادگیری ماشین؛ فهرست سیاه؛ خدمت دهنده فرمان و کنترل
عنوان مقاله [English]
Comparison of Supervised Machine Learning Algorithms in Detection of Botnets Domain Generation Algorithms
نویسندگان [English]
M. Asadi¹؛ M. A. Jabraeil Jamali²؛ S. Parsa³؛ V. Majidnezhad²
¹Ph.D. Student ,Department of Computer Engineering, Shabestar Branch, Islamic Azad University, Shabestar ------------ Faculty member of of Department of Computer Engineering, Khamneh Branch,
²Assistant Professor, Department of Computer Engineering, Shabestar Branch, Islamic Azad University, Shabestar, Iran
³Associate Professor, Department of Computer, Science and Technology University, Tehran, Iran
چکیده [English]
Domain generation algorithms (DGAs) are used in Botnets as rendezvous points to their command and control (C&C) servers, and can continuously provide a large number of domains which can evade detection by traditional methods such as Blacklist. Internet security vendors often use blacklists to detect Botnets and malwares, but the DGA can continuously update the domain to evade blacklist detection. In this paper, first, using features engineering; the three types of structural, statistical and linguistic features are extracted for the detection of DGAs, and then a new dataset is produced by using a dataset with normal DGAs and two datasets with malicious DGAs. Using supervised machine learning algorithms, the classification of DGAs has been performed and the results have been compared to determine a DGA detection model with a higher accuracy and a lower error rate. The results obtained in this paper show that the random forest algorithm offers accuracy rate, detection rate and receiver operating characteristic (ROC) equal to 89.32%, 91.67% and 0.889, respectively. Also, compared to the results of the other investigated algorithms, the random forest algorithm presents a lower false positive rate (FPR) equal to 0.373.
کلیدواژه‌ها [English]
Botnet, Domain Generation Algorithms (DGAs), Machine Learning Algorithms, Blacklist, C&C Server

مراجع
[1] S. Parsa, H. Mortazi, “Botnet Detection with Flow Behavior Analysis Approach,” Journal of Electronical &Cyber Defence, vol. 5, no. 4, 2017. (In Persian)## [2] S. Schiavoni, F. Maggi, L. Cavallaro, and S. Zanero, Phoenix, “DGA-based botnet track- ing and intelligence,” in: Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), in: Lecture Notes in Computer Science, 8550, pp. 192–211, 2014.## [3] J. Woodbridge, H. S. Anderson, A. Ahuja, and D. Grant, “Predicting domain generation algorithms with long short-term memory networks,” CoRR abs/1611.00791. arXiv:1611.00791, 2016.## [4] D. K. McGrath and M. Gupta, “Behind Phishing: An Examination of Phisher Modi Operandi,” In Proceedings of the First USENIXWorkshop on Large-Scale Exploits and Emergent Threats, LEET 08, San Francisco, CA,USA, 15 April 2008.## [5] L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi, “EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis,” In Proceedings of the Network and Distributed System Security Symposium, NDSS 2011, San Diego, CA, USA, 6–9 February 2011.## [6] J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, “Beyond blacklists: Learning to detect malicious web sites from suspicious URLs,” In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, pp. 1245–1254, 2009.## [7] S. Yadav, A. K. K. Reddy, A. L. N. Reddy, and S. Ranjan, “Detecting algorithmically generated domain-flux attacks with DNS traffic analysis,” IEEE/ACM Trans. Netw., vol. 20, pp. 1663–1677, 2012.## [8] M. Antonakakis, R. Perdisci, Y. Nadji, N. Vasiloglou, S. Abu-Nimeh,W. Lee, and D. Dagon, “From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware,” In Proceedings of the 21st USENIX Security Symposium, Bellevue, WA, USA, 8–10 August 2012.## [9] D. Nhauo and K. Sung-Ryul, “Classification of malicious domain names using support vector machine and bi-gram method,” J. Secur. Appl., vol. 7, pp. 51–58, 2013.## [10] K. Demertzis and L. Iliadis, “Evolving smart URL filter in a zone-based policy firewall for detecting algorithmically generated malicious domains,” In International Symposium on Statistical Learning and Data Sciences; Springer:Cham, Switzerland, pp. 223–233, 2015.## [11] J. Hagen and S. Luo, “Why domain generation algorithms (DGA)?,” Trend Micro, 18 August 2016.## [12] Symantec, W32.Ramnit analysis, Version 1.0, 2015-02-24.## [13] J. Geffner, “End-to-end analysis of a domain generating algorithm malware family,” Black Hat USA, 2013.## [14] C. E. Shannon, “A Mathematical Theory of Communication,” Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948.## [15] M. Mohri, A. Rostamizadeh, and A. Talwalkar, “Foundations of machine learning,” MIT press, 2018.## [16] I. Rish, “An empirical study of the naive Bayes classifier,” International Joint Conferences on Artificial Intelligence 2001 Workshop on Empirical Methods in Artificial Intelligence, pp. 41-46, 2001.## [17] L. Rokach and O. Z. Maimon, Data mining with decision trees: theory and applications, 2008.## [18] J. Harrell and E. Frank, “Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis,” Springer, 2015.## [19] D. Denisko and M. M. Hoffman, “Classification and interaction in random forests,” in Proceedings of the National Academy of Sciences, vol. 115, no. 8, pp. 1690-1692, 2018.## [20] C. Cortes and V. Vapnik, “Support vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.## [21] G. Shakhnarovich, T. Darrel, and P. Indyk, “Nearest-neighbor methods in lea
آمار تعداد مشاهده مقاله: 1,042 تعداد دریافت فایل اصل مقاله: 837

اخبار و اعلانات

آمار

مقایسه الگوریتم های یادگیری ماشین نظارتی در تشخیص الگوریتم های تولید دامنه شبکه های بات