تولید خودکار متن فارسی با استفاده مدل‌های مبتنی بر قاعده و تعبیه واژگان

حاجی پور, امید; سدیدپور, سعیده سادات

فراخوان حمایت از طرحهای فناورانه: جوش اتوماتیک به کمک ربات پرتابل

تعداد نشریات	34
تعداد شماره‌ها	1,306
تعداد مقالات	9,427
تعداد مشاهده مقاله	9,188,344
تعداد دریافت فایل اصل مقاله	5,620,765

	تولید خودکار متن فارسی با استفاده مدل‌های مبتنی بر قاعده و تعبیه واژگان
پدافند الکترونیکی و سایبری
مقاله 4، دوره 9، شماره 4 - شماره پیاپی 36، اسفند 1400، صفحه 43-54 اصل مقاله (1.08 M)
نوع مقاله: مقاله پژوهشی
نویسندگان
امید حاجی پور¹؛ سعیده سادات سدیدپور^* ²
¹دانشجوی دکتری، دانشگاه صنعتی امیرکبیر، تهران، ایران
²استادیار، دانشگاه صنعتی مالک‌اشتر، تهران، ایران
تاریخ دریافت: 08 بهمن 1399، تاریخ بازنگری: 26 تیر 1400، تاریخ پذیرش: 27 تیر 1400
چکیده
تولید زبان طبیعی از پردازش زبان طبیعی حاصل می‏شود. زبان طبیعی از یک سیستم ارائه ماشینی مانند پایگاه دانش تولید می‏شود. سیستم‏های NLG از مدت‏ها پیش وجود داشته اما فنّاوری آن به صورت ابزار تجاری اخیراً به‌صورت گسترده به وجود آمده است. در NLG، سیستم نیاز به تصمیم‏گیری در مورد چگونگی قرار دادن یک مفهوم در کلمات دارد. توانایی ایجاد متن معنی‌دار نقش کلیدی در بسیاری از کاربردهای پردازش زبان طبیعی مانند ترجمه ماشین، گفتار و تبدیل عکس به متن دارد. هدف این پروژه ارائه روشی برای تولید متن با استفاده از روش‌های هوش مصنوعی و با ساختار درست و آغازی برای تولید متن فارسی است. به عبارت دیگر در این مقاله روشی ارائه شده که قادر به تولید متن طولانی متنوع علاوه بر حفظ معنا و ساختار در زبان فارسی میباشد. جهت پیشبرد تولید متن سعی شده از ترکیب روشهای یادگیری ماشین با مدلهای احتمالاتی، استفاده شود. در مدل پیشنهادی از مدلهای احتمالاتی برای استخراج قوانین و از Word2vec برای برداریسازی متن استفاده شده و سپس در فاز تولید از ترکیب این دو و فاصله کسینوسی استفاده میشود. نتایج نشان‌دهنده ارائه مدلی بوده که متن تولیدی آن دارای ساختار، مفهوم و تنوع مناسب میباشد. همچنین این مدل از نظر انسانی و پیچیدگی نیز بهینه می‌باشد.
کلیدواژه‌ها
تولید زبان طبیعی؛ تولید خودکار متن؛ مدل زبانی؛ روش مبتنی بر قاعده؛ تعبیه کلمات
عنوان مقاله [English]
Automatic Persian Text Generation Using Rule-Based Models and Word Embedding
نویسندگان [English]
omid Hajipoor¹؛ Saeedeh Sadat Sadidpour²
¹PhD student, Amirkabir University of Technology, Tehran, Iran
²Assistant Professor, Malikashtar University of Technology, Tehran, Iran
چکیده [English]
Natural language generation comes from natural language processing. Natural language is generated from a machine system such as a knowledge base. Although NLG systems have been around for a long time, the commercial applications of this technology have recently increased. In NLG, the system needs to decide how to put a concept into words. The ability to create meaningful text plays a key role in many natural language processing applications such as machine translation, speech and image-to-text conversions. The aim of this paper is to provide a method for generating text using artificial intelligence methods with the correct structure and starting point for generating Persian (Farsi) texts. In other words, the method presented in this article can produce various long Persian texts, maintaining the intended meaning and the Persian language structure. In order to advance the generation of text, an attempt has been made to use a combination of machine learning methods with probabilistic models. In the proposed model, probabilistic models are used to extract the rules and Word2vec is used to embed the text, and then in the generation phase, a combination of the two and a cosine distance are used. The results indicate the presentation of a model whose generation text has the appropriate structure, concept and variety. This model is also optimal in terms of ergonomics and complexity .
کلیدواژه‌ها [English]
Natural language generation, automatic text generation, language model, rule-based method, Word Embedding

مراجع
[1] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013. [2] P. F. Brown, J. Cocke, S. A. Della Pietra, V. J. Della Pietra, F. Jelinek, J. D. Lafferty, et al., "A statistical approach to machine translation," Computational linguistics, vol. 16, 1990. [3] Y. K. Meena and D. Gopalani, "Domain independent framework for automatic text summarization," Procedia Computer Science, vol. 48, pp. 722-727, 2015. [4] A. Bauer, N. Hoedoro, and A. Schneider, "Rule-based Approach to Text Generation in Natural Language-Automated Text Markup Language (ATML3)," in Challenge+ DC@ RuleML, 2015. [5] T. Becker, "Practical, template–based natural language generation with tag," in Proceedings of the Sixth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+ 6), 2002, pp. 80-83. [6] K. V. Deemter, M. Theune, and E. Krahmer, "Real versus template-based natural language generation: A false opposition?," Computational Linguistics, vol. 31, pp. 15-24, 2005. [7] A. Ratnaparkhi, "Trainable methods for surface natural language generation," in Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference, 2000, pp. 194-201. [8] G. Szymanski and Z. Ciota, "Hidden Markov models suitable for text generation," in WSEAS International Conference on Signal, Speech and Image Processing (WSEAS ICOSSIP 2002), pp. 3081-3084. [9] S. R. Eddy, G. Mitchison, and R. Durbin, "Maximum discrimination hidden Markov models of sequence consensus," Journal of Computational Biology, vol. 2, pp. 9-23, 1995. [10] S. R. Eddy, "Multiple alignment using hidden Markov models," in Ismb, 1995, pp. 114-120. [11] A. Skymind, "Beginner's Guide to Deep Reinforcement Learning," ed, 2019. [12] D. P. Kingma and M. Welling, "Auto-encoding variational bayes," arXiv preprint arXiv:1312.6114, 2013. [13] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, et al., "Generative adversarial nets," in Advances in neural information processing systems, 2014, pp. 2672-2680. [14] T. Iqbal and S. Qureshi, "The Survey: Text Generation Models in Deep Learning," Journal of King Saud University-Computer and Information Sciences, 2020. [15] P. Bachman and D. Precup, "Data generation as sequential decision making," in Advances in Neural Information Processing Systems, 2015, pp. 3249-3257. [16] D. Bahdanau, P. Brakel, K. Xu, A. Goyal, R. Lowe, J. Pineau, et al., "An actor-critic algorithm for sequence prediction," arXiv preprint arXiv:1607.07086, 2016. [17] J. Lucas, G. Tucker, R. Grosse, and M. Norouzi, "Understanding posterior collapse in generative latent variable models," 2019. [18] S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Jozefowicz, and S. Bengio, "Generating sentences from a continuous space," arXiv preprint arXiv:1511.06349, 2015. [19] S. Dai, Z. Gan, Y. Cheng, C. Tao, L. Carin, and J. Liu, "APo-VAE: Text Generation in Hyperbolic Space," arXiv preprint arXiv:2005.00054, 2020. [20] L. Yu, W. Zhang, J. Wang, and Y. Yu, "SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient," in AAAI, 2017, pp. 2852-2858. [21] T. Che, Y. Li, R. Zhang, R. D. Hjelm, W. Li, Y. Song, et al., "Maximum-likelihood augmented discrete generative adversarial networks," 2017. [22] K. Lin, D. Li, X. He, Z. Zhang, and M.-T. Sun, "Adversarial ranking for language generation," in Advances in Neural Information Processing Systems, 2017, pp. 3155-3165. [23] J. Guo, S. Lu, H. Cai, W. Zhang, Y. Yu, and J. J. a. p. a. Wang, "Long text generation via adversarial training with leaked information," 2017. [24] Z. Liu, J. Wang, and Z. Liang, "CatGAN: Category-Aware Generative Adversarial Networks with Hierarchical Evolutionary Learning for Category Text Generation," in AAAI, 2020, pp. 8425-8432. [25] H. Yin, D. Li, X. Li, and P. Li, "Meta-CoTGAN: A Meta Cooperative Training Paradigm for Improving Adversarial Text Generation," in AAAI, 2020, pp. 9466-9473. [26] K. Wang and X. Wan, "SentiGAN: Generating Sentimental Texts via Mixture Adversarial Networks," in IJCAI, 2018, pp. 4446-4452. [27] Y. Zhang, Z. Gan, K. Fan, Z. Chen, R. Henao, D. Shen, et al., "Adversarial feature matching for text generation," 2017.[28] M. J. Kusner and J. M. J. a. p. a. Hernández-Lobato, "Gans for sequences of discrete elements with the gumbel-softmax distribution," 2016. [29] W. Fedus, I. Goodfellow, and A. M. J. a. p. a. Dai, "Maskgan: Better text generation via filling in the _," 2018. [30] ا. حاجی‌پور و س. س. سدیدپور, "استخراج خودکار کلمات کلیدی متون کوتاه فارسی با استفاده از word2vec"، پدافند الکترونیکی و سایبری، vol. 8, pp. 105-114, 2020.
آمار تعداد مشاهده مقاله: 909 تعداد دریافت فایل اصل مقاله: 552

اخبار و اعلانات

آمار

تولید خودکار متن فارسی با استفاده مدل‌های مبتنی بر قاعده و تعبیه واژگان