-
oa Identifying Virality Attributes of Arabic Language News Articles
- Publisher: Hamad bin Khalifa University Press (HBKU Press)
- Source: Qatar Foundation Annual Research Conference Proceedings, Qatar Foundation Annual Research Conference Proceedings Volume 2016 Issue 1, Mar 2016, Volume 2016, ICTPP3229
Abstract
Our research is focused on expanding the reach and impact of Arabic language news articles by attracting more readers. In pursuit of this research goal, we analyze attributes that result in certain news articles becoming viral, relative to other news articles that do not become viral or so viral. Specifically, we focus on Arabic language news articles, as Arabic language articles have unique linguistic, cultural, and social constrains relative to most Western languages news stories. In order to understand virality, we take two approaches, a time series and linguistical, in an Arabic language data of more than 1,000 news articles with associated temporal traffic data. For data collection, we select (Kasra, “a breaking”) (http://kasra.co/) is an Arabic language online news site that targets Arabic language speakers worldwide, but particularly in the Middle East North Africa (MENA) region. We gathered more than 3,000 articles, originally, then gathered traffic data for this set of articles, reducing the set to more than 1,000 with complete traffic data. We focus first on the temporal attributes in order to categorize clusters of virality with this set of articles. Then, with topical analysis, we seek to identify linguistical aspects common to articles within each virality cluster identified by time series. Based on results from the time series analysis, we cluster articles based on common temporal characteristics of traffic access. Once clustered by time series, we analyze each cluster for content attributes, topical and linguistical, in order to identify specific attributes that may be causing the virality of articles within each times-series cluster. To compute dissimilarity for time-series, we utilize and evaluate the performance of several state-of-the-art time series dissimilarity-based clustering approaches, such as dynamic time warping, discrete wavelet transformation, and others. To identify the dissimilarity algorithm with the most discriminating power, we conduct a principal component analysis (PCA), which is a statistical technique used to highlight variations and patterns in a dataset. Based on findings from our PCA, we select discrete wavelet transformation-based dissimilarity as the best times-series algorithm for our research because the resulting principal axes explain more proportion of variability (75.43 percent) relative to the other time-series algorithms that we had employed. We identify five virality clusters using times series. For topic modeling, we employ Latent Dirichlet allocation (LDA) for this portion of the research. LDA is a generative probabilistic model for collections of discrete data, such as text, LDA explains similarities among groups of observations within a data set. For text modeling, the topic probabilities of LDA provide an explicit representation of a document. For the topical classification analysis, we use Linguistic Inquiry and Word Count (LIWC), which is a sentiment analysis tool. LIWC is a text processing program based on occurrences of words in several categories covering writing style and psychological meaning. Prior empirical work shows the value of a LIWC linguistic analysis for detecting meanings in various experimental settings, including attention focus, thinking style, and social relationships. In terms of results, surprising, the article topic is not predictive of virality of Arabic language news articles. Instead we find that linguistical aspects and style of the news article is the most predictive attribute for predicting virality for Arabic news articles. In analyzing the attributes of virality in Arabic language news articles, our research finds that, perhaps counter intuitively, the topic of the article does not impact the virality. Instead, we find that style of the article is the most impactful attribute for predicting virality for Arabic news articles. Building on these findings, we will leverage aspects of the news articles with other factors to develop tools to assist content creators to more effectively reach their user segment. Our research results will assist in understanding the virality of Arabic news and ultimately improve readership and dissemination of Arabic language news articles.