Revisiting K-Means and Topic Modeling, a Comparison Study to Cluster Arabic Documents

Revisiting K-Means and Topic Modeling, a Comparison Study to Cluster Arabic Documents

Clustering Arabic text documents is of high importance for many Natural Language Technologies. This study uses a combined method to cluster Arabic text documents. Mainly, we use generative models and clustering techniques. The study uses Latent Dirichlet Allocation (LDA) and kmeans clustering algorithm and applies them to a news dataset used in previous similar studies. The aim of this research is twofold: it first shows that normalizing the weights in the vector space, for the documentterm matrix of the text documents, dramatically improves the quality of clusters and hence the accuracy of clustering when using k-means algorithm. The results are compared to a recent study on clustering Arabic text documents. Second, it shows that the combined method is superior in terms of clustering quality for Arabic text documents according to external measures like purity, F-measure, entropy, accuracy, and other measures. It is shown in this study that the purity of the combined method is 0.933 compared to 0.82 for K-means algorithm, these figures are higher in comparison to a recent similar study. This is also confirmed by the other used validation measures. The correctness of the combined method is then confirmed using different Arabic datasets.

...

تم النشر في: 2018-10-11 10:05:36

An Approach for Integrating Data Mining with Saudi Universities Database Systems: Case Study

An Approach for Integrating Data Mining with Saudi Universities Database Systems: Case Study

This paper presents an approach for integrating data mining algorithms within Saudi university's database system, viz., Prince Sattam Bin Abdulaziz University (PSAU) as a case study. The approach based on a bottom-up methodology; it starts by providing a data mining application that represents a solution to one of the problems that face Saudi Universities' systems. After that, it integrates and implements the solution inside the university's database system. This process is then repeated to enhance the university system by providing data mining tools that help different parties-especially decision makers-to carry out certain decision. The paper presents a case study that includes analyzing and predicting the student withdrawal from courses at PSAU using association rule mining, neural networks and decision trees. Then it provides a conceptual and practical approach for integrating the resulted application within the university's database system. The experiment improves that this approach can be used as a framework for integrating data mining techniques within Saudi university's database systems. The paper concluded that mining universities' data can be applied as a computer system (intelligent university's system), Also, data mining algorithms can be adapted with any database system regardless that this system is new, exists or legacy. Moreover, data mining algorithms can be a solution for some educational problems, in addition to providing information for decision makers and users.

...

تم النشر في: 2018-10-11 10:05:04

A Domain-Based Approach to Extract Arabic Person Names Using N-Grams and Simple Rules

A Domain-Based Approach to Extract Arabic Person Names Using N-Grams and Simple Rules

Named Entity Recognition (NER) is considered an important task in many human language technologies including information extraction, Natural Language Processing (NLP) and Machine Translation. This is believed to be a challenging task for Arabic language. Most of the existing research studies deal only with names that are found in Modern Standard Arabic (MSA) sources such as news. In this study, we aim at building Classical Arabic name list or Gazetteer which represents an important part of a lively Arabic literature and culture. To achieve this goal, we propose a new approach for extracting Arabic Person Names (APNs). This approach constitutes a new model for extracting named entities from unstructured Arabic text without the need for Part of Speech (POS) tagging and/or morphological analysis. The proposed approach is based on formulating a model that is established on a specific domain. For this study, we use an authentic text in the literature of Islamic-Arabic studies viz, the “Hadith”. This domain is related to the Prophet Mohammad’s Peace Be Upon Him (PBUH) sayings. To achieve aims of this study, we use NLP and text mining techniques to extract and build an accurate standard list of classical APNs. Also, We built a standard evaluation classical names list in order to evaluate our approach. Results show very good precision of around 84%.

...

تم النشر في: 2018-10-11 10:04:29

Fine-Grained Quran Dataset

Fine-Grained Quran Dataset

Extracting knowledge from text documents has become one of the main hot topics in the field of Natural Language Processing (NLP) in the era of information explosion. Arabic NLP is considered immature due to several reasons including the low available resources. On the other hand, automatically extracting reliable knowledge from specialized data sources as holy books is considered ultimately a challenging task but of great benefit to all humans. In this context, this paper provides a comprehensive Quranic Dataset as a first part (foundation) of an ongoing research that attempts to lay grounds for approaches and applications to explore the holy Quran. The paper presents the algorithms and approaches that have been designed to extract an aggregative data from massive Arabic text sources including the holy Quran and tightly associated books. Holy Quran text is transferred into structured multi-dimensional data records starting from the chapter level, the word level and then the character level. All these are linked with interpretations and meanings, parsing, translations, intonation roots and stems of words, all from authentic and reliable sources. The final dataset is represented in excel sheets and database records format. Also, the paper presents models of the dataset at all levels. The Quranic dataset presented in this paper was designed to be appropriate for: database, data mining, text mining and Artificial Intelligence applications; it is also designed to serve as a comprehensive encyclopedia of holy Quran and the Quranic Science books.

...

تم النشر في: 2018-10-11 10:02:42

Extracting Topics from the Holy Quran Using Generative Models

Extracting Topics from the Holy Quran Using Generative Models

The holy Quran is one of the Holy Books of God.It is considered one of the main references for an estimated 1.6 billion of Muslims around the world. The Holy Quran language is Arabic. Specialized as well as non-specialized people in religion need to search and lookup certain information from the Holy Quran. Most research projects concentrate on the translation of the holy Quran in different languages. Nevertheless, few research projects pay attention to original text of the holy Quran in Arabic language. Keyword search is one of the Information Retrieval (IR)methods but will retrieve what is called exact search. Semantic search aims at finding deeper meanings of a text, and it is a hot field of study in Natural Language Processing (NLP). Inthis paper topic modeling techniques are explored to setup a framework for semantic search in the holy Quran. As the Holy Quran is the word of God, its meanings are unlimited. In this paper the words of chapter Joseph (Peace Be Upon Him (PBUH))from the Holy Quran is analyzed based on topic modeling techniques as a case study. Latent Dirichlet Allocation (LDA)topic modeling technique has been applied in this paper into two structures (Hizb Quarters and verses) of Joseph chapter as:words, roots and stems. The log-Likelihood has been calculated for the two structures of the chapter. Results show that the best structure to use is verses, which gives the least energy for data.Some of the results of the attained topics are shown. These results suggest that topic modeling techniques failed to capture in an accurate manner the coherent topics of the chapter.

...

تم النشر في: 2018-10-11 10:02:08

The Challenges and the Opportunities of Teaching the Introductory Computer Programming Course: Case Study

The Challenges and the Opportunities of Teaching the Introductory Computer Programming Course: Case Study

Teaching practical courses has always constituted burdens on colleges offering non-technical degrees. One of these courses is computer programming, especially in degrees like computer science, information systems and software engineering. In such programs, students are expected to take between 2-5 computer programming courses. Success ratio in the first course is usually low. Why is it low and how this ratio can be increased is discussed in this paper. This is carried out in the context of an introductory computer programming course at the college of computer engineering and sciences (CCES) at Prince Sattam Bin Abdulaziz University in Kingdom of Saudi Arabia. This study also relies on the teaching experience of the authors in different countries including -for around ten years: UK, USA, Sudan, Jordan and Kingdom of Saudi Arabia. To figure out problems and solutions for teaching introductory computer programming course, then interviews with selected instructors who teach such course at the college are conducted. Also, a questionnaire has been designed and distributed to students. The results of the analysis for both the interviews and the questionnaire have been used along with the results of similar studies to recommend solutions of problems occur in such course. As questionnaire shows, most students think that giving the whole course in the lab will be much better and also they prefer not to work alone. These and other recommendations presented in this study are especially appropriate for similar institutions in the Middle-East and Gulf area.

...

تم النشر في: 2018-10-11 10:01:16

Analysis of the Dynamics of a Nonlinear Neuron Model

Analysis of the Dynamics of a Nonlinear Neuron Model

Chaos may increase the computation capabilities of artificial neural networks. This is possible because of the large number of states that can be obtained as the result of utilizing chaos attributes like control, space filling and sensitivity to initial conditions. In this paper, mathematical analysis of a chaotic spiking neuron model is carried out. The analysis is performed to understand, and hence to exploit the rich dynamics that such system may provide, which then can be used in processing information tasks. To accomplish this, a chaotic spiking neural model called the Nonlinear Dynamic State (NDS) neuron is used. The study includes detailed mathematical analysis in both phase space and Eigen space. These methods has discovered certain facts regarding the NDS attractor and also propose the stabilization of the model. It has been shown in this paper that one of the major ingredients that drive the model are the repelling forces of the two fixed points of type spiral-repellor. These fixed-points were two spiral saddle points of index-1 and index-2 in the original Rossler attractor. The other ingredient that allows the existence of the NDS chaotic attractor are the reset and the self-feedback mechanisms. The analytical investigation strongly indicates that the dynamics of the NDS model allow a diverse dynamic behaviors such as Unstable Periodic Orbits (UPOs), which can be steadied and controlled. The UPO is one of the dynamic behavior that is exhibited by the non-linear systems in the phase space. The vast variety of dynamic behaviors that the NDS neuron provide may be utilized in carrying out information processing functions.

...

تم النشر في: 2018-10-11 09:59:30

Dynamic Authentication Protocol for Mobile Networks Using Public-Key Cryptography

Dynamic Authentication Protocol for Mobile Networks Using Public-Key Cryptography

The authentication and key agreement (AKA) protocol of Universal Mobile Telecommunication System (UMTS) is still vulnerable to redirection attack which allows an adversary to redirect user traffic form a network to another and eavesdrop or mischarge the subscribers in the system. Moreover, the International Mobile Subscriber Identity (IMSI) which uniquely identifies a user, is still reveal to the visited network and can still be demanded by an attacker who impersonates a base station, as there is no network authentication in this case, and the non-repudiation services requirement which provide the protection for the subscribers from incorrect bill charging, and the service providers with legal evidence when collecting the bills, are two important points in the non-repudiation requirement. In this paper, a dynamic authentication protocol by integrating the public-key cryptography with the hash-chaining technique is presented to significantly improve the security level as well as to improve the performance.

...

تم النشر في: 2018-10-11 09:57:39

Investigation of a Chaotic Spiking Neuron Model

Investigation of a Chaotic Spiking Neuron Model

Chaos provides many interesting properties that can be used to achieve computational tasks. Such properties are sensitivity to initial conditions, space filling, control and synchronization. Chaotic neural models have been devised to exploit such properties. In this paper, a chaotic spiking neuron model is investigated experimentally. This investigation is performed to understand the dynamic behaviours of the model. The aim of this research is to investigate the dynamics of the nonlinear dynamic state neuron (NDS) experimentally. The experimental approach has revealed some quantitative and qualitative properties of the NDS model such as the control mechanism, the reset mechanism, and the way the model may exhibit dynamic behaviours in phase space. It is shown experimentally in this paper that both the reset mechanism and the self-feed back control mechanism are important for the NDS model to work and to stabilise to one of the large number of available unstable periodic orbits (UPOs) that are embedded in its attractor. The experimental investigation suggests that the internal dynamics of the NDS neuron provide a rich set of dynamic behaviours that can be controlled and stabilised. These wide range of dynamic behaviours may be exploited to carry out information processing tasks.

...

تم النشر في: 2018-10-11 09:55:30

Studying a Chaotic Spiking Neural Model

Studying a Chaotic Spiking Neural Model

Dynamics of a chaotic spiking neuron model are being studied mathematically and experimentally. The Nonlinear Dynamic State neuron (NDS) is analysed to further understand the model and improve it. Chaos has many interesting properties such as sensitivity to initial conditions, space filling, control and synchronization. As suggested by biologists, these properties may be exploited and play vital role in carrying out computational tasks in human brain. The NDS model has some limitations; in thus paper the model is investigated to overcome some of these limitations in order to enhance the model. Therefore, the models parameters are tuned and the resulted dynamics are studied. Also, the discretization method of the model is considered. Moreover, a mathematical analysis is carried out to reveal the underlying dynamics of the model after tuning of its parameters. The results of the aforementioned methods revealed some facts regarding the NDS attractor and suggest the stabilization of a large number of unstable periodic orbits (UPOs) which might correspond to memories in phase space.

...

تم النشر في: 2018-10-11 09:54:30