Welcome to NLP 2022

11th International Conference on Natural Language Processing (NLP 2022)

September 17~18, 2022, Copenhagen, Denmark



Accepted Papers
RestroDroid - A Restaurant App using Bot Service

Jeny Jijo1, Supreet Ronad2, Sathvik Saya2, Sampreeth Naik2 and Priyadarshini V2, 1Assistant Professor, Dept of CSE, PES University, Electronic City Campus, Bengaluru, 2Dept of CSE, PES University, Electronic City Campus, Bengaluru

ABSTRACT

An automated system that can be used to make the working of a restaurant more efficient is described. In today’s age of rapid meals and need for social distancing due to COVID-19, ensuring hygiene and keeping safe has become a top-most priority. But in businesses which involve serving people in short range, it is difficult to do the same. The solution is to make the whole system contact-less using technology. The proposed system makes use of a mobile app and a line following robot to deliver food to customers in a restaurant. Ordering through the application gives the user fast visual confirmation of our choices and assures that the things in the order placed is exactly what the customer ordered. This technology is paired with a robot that can bring meals to a certain table. The whole system is made low-cost as compared to the existing systems.

KEYWORDS

Automated service, Bot, Application, Flutter, Arduino, IoT, Restaurant.


Mining Online Drug Reviews Database for the Treatment of Rheumatoid Arthritis by using Deep Learning

Pinar Yildirim, Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Istanbul Okan University, Istanbul, Turkey

ABSTRACT

In this paper, a research study to extract knowledge in the online patient reviews for rheumatoid arthritis is introduced. Rheumatoid arthritis is a long-term and disabling autoimmune disease. Today, a huge amount of people have rheumatoid arthritis in the world. Considering the importance of the medication of rheumatoid arthritis, we aimed to investigate patient reviews in WebMD database and get some useful information for this disease. Our results revealed that etanercept treatment has the highest number of reviews. Data analysis was applied to discover knowledge on this drug. Deep learning approach was used to predict the effectiveness of etanercept and classification results were compared with other traditional classifiers. According to the comparison of classifiers, deep neural network has better accuracy metrics than others. Therefore, the results highlight that deep learning can be encouraging for medical data analyses. We hope that our study can make contributions to intelligent data analysis in medical domain.

KEYWORDS

Classification, Deep Learning, Etanercept, Online Drug Reviews.


Cyberbullying Detection using Ensemble Method

Saranyanath K P1, Wei Shi2 and Jean-Pierre Corriveau1, 1School of Computer Science, Carleton University, Ottawa, Canada, 2School of Information Technology, Carleton University, Ottawa, Canada

ABSTRACT

Cyberbullying is a form of bullying that occurs across social media platforms using electronic messages. In this paper we propose three different approaches, and five models to identify cyberbullying on a generated social media dataset, derived from multiple online platforms. Our initial approach consists in enhancing a Support Vector Machines. Our second approach is based on DistilBERT, which is a lighter and faster Transformer model than BERT. Staking the first three models we obtain two more ensemble models. Contrasting the ensemble models with the three others, we observe that the ensemble models outperform the base model concerning all evaluation metrics except precision. While the highest accuracy, of 89.6%, was obtained using an ensemble model, we achieved the lowest accuracy, at 85.53% on the SVM model. The DistilBERT model exhibited the highest precision, at 91.17%. The model developed using the different granularity of features outperformed the simple TF-IDF.

KEYWORDS

Machine Learning, Natural Language Processing, Support Vector Machine, DistilBERT, Cyberbullying.


WassBERT: High Performance BERT-based Persian Sentiment Analyzer and Comparison to Other State-of-the-art Approaches

Masoumeh Mohammadi and Shadi Tavakoli, DepartMent of DataScience & Machine Learning, Telewebion, Tehran, Iran

ABSTRACT

Applications require the ability to perceive others opinions as one of the most outstanding parts of knowledge. Finding the positive or negative feelings in sentences is called sentiment analysis (SA). Businesses use it to understand customer sentiment in comments on websites or social media. An optimized loss function and novel data augmentation methods are proposed for this study, based on Bidirectional Encoder Representations from Transformers (BERT). First, a crawled dataset from Persian movie comments on various sites has been prepared. Then, balancing and augmentation techniques are accomplished on the dataset. Next, some deep models and the proposed BERT are applied to the dataset. We focus on customizing the loss function, which achieves an overall accuracy of 94.06 for multi-label (positive, negative, neutral) sentences. And the comparative experiments are conducted on the dataset, where the results reveal the performance of the proposed model is significantly superior compared with other models.

KEYWORDS

Bidirectional encoder representations from transformers (BERT), Bidirectional long short-term memory (Bi-LSTM), Comment classification, Convolutional neural network (CNN), Deep learning, Opinion mining(OM), Natural language processing (NLP), Persian language sentiment classification, Persian Sentiment analysis, Text mining.


DAGAM: Data Augmentation with Generation and Modification

Byeong-Cheol Jo1,*, Tak-Sung Heo1,*, Yeongjoon Park1, Yongmin Yoo1, Won Ik Cho2 and Kyungsun Kim1, 1AI R&D; Group, NHN, Seoul, Republic of Korea, 2Department of Electrical and Computer Engineering and INMC, Seoul National University, Seoul, Republic of Korea

ABSTRACT

Text classification has exhibited excellent performance since the advent of pre-trained language models based on Transformer architecture. However, in pre-trained language models, under-fitting often occurs due to the size of the model being very large compared to the amount of available training data. In light of this, we introduce three data augmentation schemes that help reduce underfitting problems of large-scale language models. Primarily we use a generation model for data augmentation, which is defined as Data Augmentation with Generation (DAG). Next, we augment data using text modification techniques such as corruption and word order change (Data Augmentation with Modification, DAM). Finally, we propose Data Augmentation with Generation And Modification (DAGAM), which combines DAG and DAM techniques. We conduct data augmentation for six benchmark datasets of text classification task, and verify the usefulness of DAG, DAM, and DAGAM through BERT-based fine-tuning and evaluation.

KEYWORDS

Data Augmentation, Text Generation, Text Modification, Summarization, Character Order Change.


Topic Segmentation of Research Article Collections

Erion Çano and Benjamin Roth, Digital Philology, Research Group Data Mining and Machine Learning, University of Vienna

ABSTRACT

Collections of research article data harvested from the web have become common recently since they are important resources for experimenting on tasks such as named entity recognition, text summarization, or keyword generation. In fact, certain types of experiments require collections that are both large and topically structured, with records assigned to separate research disciplines. Unfortunately, the current collections of publicly available research articles are either small or heterogeneous and unstructured. In this work, we perform topic segmentation of a paper data collection that we crawled and produce a multitopic dataset of roughly seven million paper data records. We construct a taxonomy of topics extracted from the data records and then annotate each document with its corresponding topic from that taxonomy. As a result, it is possible to use this newly proposed dataset in two modalities: as a heterogeneous collection of documents from various disciplines or as a set of homogeneous collections, each from a single research topic.

KEYWORDS

Research Articles, Topic Segmentation, Multitopic Dataset, Keyword Generation, Research Resources.


Comparison of Various Forms of Serious Games: Exploring the Potential use of Serious Game Walkthrough in Education Outside the Classroom

Xiaohan Feng1 and Makoto Murakami2, 1Graduate School of Information Sciences and Arts, Toyo University, Kawagoe, Saitama, Japan, 2Dept. of Information Sciences and Arts, Toyo University, Kawagoe, Saitama, Japan

ABSTRACT

The advantages of using serious games for education have already been proven in many studies, especially narrative VR games, which allow players to remember more information. On the other hand, game walkthrough can compensate for the disadvantages of gaming, such as pervasiveness and convenience. This study investigates whether game walkthrough of serious games can have the same learning effect as serious games. Use game creation (samples) and questionnaires, this study will compare the information that viewers remember from game walkthrough and actual game play, analyze their strengths and weaknesses, and examine the impact of the VR format on the results. The results proved that while game walkthrough allows subjects to follow the experiences of actual game players with a certain degree of empathy, they have limitations when it comes to compare with actual gameplay, especially when it comes to topics that require subjects to think for themselves. Meanwhile game walkthrough of VR game is not a medium suitable for making the receiver memorize information. For prevalence and convenience, however, serious games walkthrough is a viable educational option outside the classroom.

KEYWORDS

Serious game, multimedia, educational game, virtual reality, narratology, Education Outside the Classroom(EOTC).


Fast Rank Optimization Scheme by the Estimation of Vehicular Speed and Phase Difference in Mu-MIMO

Shin-Hwan Kim1, Kyung-Yup Kim2, Sang-Wook Kim3 and Jae-Hyung Koo4, 1Access Network Technology Team, Korea Telecom, Seoul, Korea, 2Access Network Technology Team, Korea Telecom, Seoul, Korea, 3Access Network Technology Department, Korea Telecom, Seoul, Korea, 4Network Research Technology Unit, Korea Telecom, Seoul, Korea

ABSTRACT

Resent MU-MIMO(Multi User-Multi Input Multi Output) scheme is one of the important and advanced technologies. In particular, it is a suitable technique to increase the capacity from the point of view of solving cell load, which is one of the big issues in the contents of 5G commercial field optimization. While this MU-MIMO technology has an important advantage of cell capacity expansion, there is a disadvantage like an interference problem due to each multi-user beams. It is important to use the advanced beamforming technology for MU-MIMO to overcome these disadvantages. Therefore, by applying the interference cancelling technology among inter UE(User Equipment) beams to improve each UE’s performance, it will contribute to improving the cell throughput. This paper introduces the various techniques of eliminating interference in MU-MIMO system. Also, it is important that UE reports rank indicator reflected the interference of multi-user beams. This paper analyses the problem of the conventional method of the rank decision in MU-MIMO system, estimates the vehicular speed quickly with the proposed rank optimization technique, and shows the DL(Downlink) UE’s performance is improved by applying a proposed rank value suitable for vehicular speed. This technique will be effectively applied to increase the overall cell capacity by improving the DL UE’s throughput in the MU-MIMO system.

KEYWORDS

MU-MIMO, 5G, multi-user, interference, UE, DL, rank indicator, cell capacity.


From the Art of Software Testing to Test-as-a-Service

Janete Amaral, Alberto S. Lima, José Neuman de Souza, Lincoln S. Rocha, MDCC, Universidade Federal do Ceará, Fortaleza/CE – Brasil

ABSTRACT

Researchers consider that the first edition of the book “The Art of Software Testing”, by Myers (1979), initiated research in Software Testing. Since then, software testing has gone through evolutions that have driven standards and tools. This evolution has accompanied the complexity and variety of software deployment platforms. The migration to the cloud allowed benefits such as scalability, agility, and better return on investment. Cloud computing require a greater involvement of software testing to ensure that services work as expected. In addition to testing cloud applications, cloud computing has paved the way for testing in the Test-as-a-Service model. This work aims to characterize Test-as-a-Service, in the context of cloud computing. Based on the knowledge explained here, we sought to linearize the evolution of software testing, characterizing fundamental points, and allowing us to compose a synthesis of the body of knowledge in software testing, expanded by the paradigm of cloud computing.

KEYWORDS

Cloud computing, Software Testing, Test-as-a-Service.