- Home
- Conference Proceedings
- Qatar Foundation Annual Research Conference Proceedings
- Conference Proceeding
Qatar Foundation Annual Research Conference Proceedings Volume 2018 Issue 3
- Conference date: 19-20 Mar 2018
- Location: Qatar National Convention Center (QNCC), Doha, Qatar
- Volume number: 2018
- Published: 15 March 2018
41 - 60 of 77 results
-
-
Coexistence of IEEE 802154g and WLAN
Authors: Saad Mohamed Shalbia, Ridha Hamila and Naofal Al-DhahirThe aging electric grid was established hundred years ago when electricity needs were simple. The power plants were centralized and most homes had only small energy demands such as few lamps and radios. The grid role was to carry electricity from utilities to consumers’ homes. This limited one way interaction makes it difficult for the grid to respond to the sudden changes and higher demands of energy of the 21st century. The smart grid (SG) is a two-way network that allows to exchange electricity and information through the same network between the utility and its customers by installing real time sensors that collect data about ever-changing power consumption. It is an integrated network of communications, automated control, computers, and tools operating together to make the grid more efficient, reliable, secure and greener. SG integrates more technologies such as wind, solar energy and plug-in electric vehicles (PEV). SG will replace the aging electric grid; homes and utilities can better communicate with each other to manage electricity usage by measuring the consumer's consumption instantaneously through a smart meter utilities. As we mentioned before, SG infrastructure enables efficient integration of PEV which may play an important role in balancing SG during critical peak or emergency time by injecting more power to the grid. Two way dialogue facilities service where plug-in hybrids (PHEV) communicate with the grid to obtain information about grid demands whether it will supply the grid with power or charge batteries from the grid. This needs a modern wireless communication network. IEEE 802.15.4g was introduced as a standard for smart utility network (SUN) to enable communication between different parts of SG. IEEE 802.15.4g works in different frequency bands, our work concentrates on 2.4 GHz ISM (Industrial, Scientific and Medical), which is unlicensed band and overcrowded with many devices from other standards, e.g. ZigBee, Bluetooth and wireless local area network (WLAN). The SUN desired signal may overlap with other interfering signal working in the same band, and thus will hinder the receiver's ability to extract the proper signal; this called coexistence problem. Thus, in this contribution the coexistence mechanism is investigated thoroughly in order to improve the performance. SUN has been studied and investigated considering signal attenuation due to path loss in the presence of additive interference and white Gaussian noise at the receiver. The effect of packet length on packet error rate (PER) is researched to find the optimum packet length that achieves the maximum effective throughput for the network in coexistence with the WLAN interfering packets. Though, employing longer packet length results in higher effective throughput, leading to higher PER as many interferers collide with the desired packet. Conversely, using shorter packet length provides lower PER with higher overhead due to packet header and preample, reducing the throughput. Simulation showed that, as signal to interference noise ratio (SINR) increases, longer packet length can be used, to achieve maximum throughput. Moreover, multipath Rayleigh fading channel has also been introduced along minimum mean square error (MMSE) equalization as an interference mitigation technique. Simulation showed that MMSE achieves good performance and improves PER in coexistence of WLAN interfering system.
-
-
-
Data Privacy in Online Social Networks With FineGrained Access Control
Authors: Ahmed Khalil Abdulla and Dr. Spiridon BakirasOnline Social Networks (OSNs), such as Facebook and Twitter, are popular platforms that enable users to interact and socialize through their networked devices. However, the social nature of such applications forces users to share a great amount of personal data with other users and the OSN service providers, including pictures, location check-ins, etc. Even though some OSNs offer configurable privacy controls that limit access to shared data, users might misconfigure these controls due to their complexity or lack of clear instructions. Furthermore, the fact that OSN service providers have full access over the data stored on their servers is an alarming thought, especially for users who are conscious about their privacy. For example, OSNs might share such data with third parties, data mine them for targeted advertisements, collect statistics, etc. As a result, data and communication privacy over OSNs is a popular topic in the data privacy research community. Existing solutions include cryptographic mechanisms [1], trusted third parties [2], external dictionaries [3], and steganographic techniques [4]. Nevertheless, none of the aforementioned approaches offers a comprehensive solution that (i) implements fine-grained access control over encrypted data and (ii) works seamlessly over existing OSN platforms. To this end, we will design and implement a flexible and user-friendly system that leverages encryption-based access control and allows users to assign arbitrary decryption privileges to every data object that is posted on the OSN servers. The decryption privileges can be assigned on the finest granularity level, for example, to a hand-picked group of users. In addition, data decryption is performed automatically at the application layer, thus enhancing the overall experience for the end-user. Our cryptographic-based solution leverages hidden vector encryption (HVE)[5], which is a ciphertext policy-based access control mechanism. Under HVE, each user generates his/her own master key (one-time) that is subsequently used to generate a unique decryption key for every user with whom they share a link in the underlying social graph. Moreover, during the encryption process, the user interactively selects a list of friends and/or groups that will be granted decryption privileges for that particular data object. To distribute the decryption keys, we utilize an untrusted database server where users have to register before using our system. The server stores (i) the social relationships of the registered users, (ii) their public keys, and (iii) the HVE decryption keys assigned to each user. As the database server is untrusted, the decryption keys are stored in encrypted form, i.e., they are encrypted with the public key of the underlying user. Therefore, our solution relies on the existing public key infrastructure (PKI) to ensure the integrity and authenticity of the users’ public keys. To facilitate the deployment of our system over existing OSN platforms, we use steganographic techniques [6] to hide the encrypted data objects within randomly chosen cover images (stego images). The stego images are then uploaded to the OSN servers, and only authorized users (with the correct decryption keys) would be able to extract the embedded data. Unauthorized users will simply see the random cover images. We aim to implement our system as a Chrome-based browser extension where, after installation, the user registers with the un- trusted server and uploads/downloads the necessary decryption keys. The keys are also stored locally, in order to provide a user-friendly interface to share private information. Specifically, our system will offer a seamless decryption process, where all hidden data objects are displayed automatically while surfing the OSN platform, without any user interaction. References [1] S. Jahid, P. Mittal, and N. Borisov, “EASiER: encryption-based access control in social networks with efficient revocation,” in Proc. ACM Symposium on Information, Computer and Communications Security (ASIACCS), pp. 411–415, 2011.[2] A. Tootoonchian, S. Saroiu, Y. Ganjali, and A. Wolman, “Lockr: better privacy for social networks,” in Proceedings of the 2009 ACM Conference on Emerging Networking Exper- iments and Technology, CoNEXT 2009, Rome, Italy, December 1-4, 2009, pp. 169–180, 2009.[3] S. Guha, K. Tang, and P. Francis, “NOYB: privacy in online social networks,” in Proc. Workshop on Online Social Networks (WOSN), pp. 49–54, 2008.[4] J. Ning, I. Singh, H. V. Madhyastha, S. V. Krishnamurthy, G. Cao, and P. Mohapatra, “Secret message sharing using online social media,” in Proc. IEEE Conference on Commu- nications and Network Security (CNS), pp. 319–327, 2014.[5] T. V. X. Phuong, G. Yang, and W. Susilo, “Efficient hidden vector encryption with constant- size ciphertext,” in Proc. European Symposium on Research in Computer Security (ES- ORICS), pp. 472–487, 2014.[6] S. Kaur, S. Bansal, and R. K. Bansal, “Steganography and classification of image steganog- raphy techniques,” in Proc. International Conference on Computing for Sustainable Global Development (INDIACom), 2014.
-
-
-
Resilient Output Feedback Control of Cyberphysical Systems
By Nader MeskinCyber-physical system architectures are being used in many different applications such as power systems, transportation systems, process control systems, large-scale manufacturing systems, ecological systems, and health-care systems. Many of these applications involve safetycritical systems, and hence, any failures or cyber attacks can cause catastrophic damage to the physical system being controlled resulting in drastic societal ramifications. Due to the open communication and computation platform architectures of CPS, one of th most important challenges in these systems is their vulnerability to malicious cyber attacks. Cyber attacks can severely compromise system stability, performance, and integrity. In particular, malicious attacks in feedback control systems can compromise sensor measurements as well as actuator commands to severely degrade closed-loop system performance and integrity. Cyber attacks are continuously becoming more sophisticated and intelligent, and hence, it is vital to develop algorithms that can suppress their effects on cyber-physical systems.In this paper, an output feedback adaptive control architecture is presented to suppress or counteract the effect of false data injection actuator attacks in linear systems, where it is assumed that the attacker is capable of maliciously manipulating the controller commands to the actuators. In particular, the proposed controller is composed of two components, namely anominal controller and an additive corrective signal. It is assumed that the nominal controller has been already designed and implemented to achieve a desired closed-loop nominal performance. Using the nominal controller, an additive adaptive corrective signal is designed and added to the output of the nominal controller in order to suppress the effect of the actuator attacks. Thus, in the proposed control architecture, there is no need to redesign the nominal controller; only the adaptive corrective signal is designed using the available information from the nominal controller and the system.
-
-
-
On Dependability Traffic Load and Energy Consumption Tradeoff in Data Center Networks
Authors: Zina Chkirbene, Ala Gouissem, Ridha Hamila and Sebti FoufouMega data centers (DCs) are considered as efficient and promising infrastructures for supporting numerous cloud computing services such as online office, online social networking, Web search and IT infrastructure out-sourcing. The scalability of these services is influenced by the performance and dependability characteristics of the DCs. Consequently, the DC networks are constructed with a large number of network devices and links in order to achieve high performance and reliability. As a result, these requirements increase the energy consumption in DCs. In fact, in 2010, the total energy consumed by DCs was estimated to be about 120 billion Kilowatts of electricity in 2012, which is about 2.8% of the total electricity bill in the USA. According to industry estimates, the USA data center market achieved almost US 39 billion in 2009, growing from US 16.2 billion in 2005. One of the primary reasons behind this issue is that all the links and devices are always powered on regardless of the traffic status. The statistics showed that the traffic drastically alternates, especially between mornings and nights, and also between working days and weekends. Thus, the network utilization depends on the actual period, and generally, the peak capacity of the network is reached only in rush times. This non-proportionality between traffic load and energy consumption is caused by the fact that -most of the time- only a subset of the network devices and links can be enough to forward the data packets to their destinations while the remaining idle nodes are just wasting energy. Such observations inspired us to propose a new approach that powers off the unused links by deactivating the end-ports of each one of them to save energy. The deactivation of ports is proposed in many researches. However, these solutions have high computational complexity, network delay and reduced network reliability. In this paper, we propose a new approach to reduce the power consumption in DC. By exploiting the correlation in time of the network traffic, the proposed approach uses the traffic matrix of the current network state, and manages the state of switch ports (on/off) at the beginning of each period, while making sure to keep the data center fully connected. During the rest of each time period, the network must be able to forward its traffic through the active ports. The decision to close or open depends on a predefined threshold value; the port is closed only if the sum of the traffic generated by its connected node is less than the threshold. We also investigate the minimum period of time during which a port should not change its status. This minimum period is necessary given that it takes time and energy to switch a port on and off. Also, one of the major challenges in this work is powering off the idle devices for more energy saving while guaranteeing the connectivity of each server. So, we propose a new traffic aware algorithm that presents a tradeoff between energy saving and reliability satisfaction. For instance, in HyperFlatNet, simulation results show that the proposed approach reduces the energy consumption by 1.8*104 WU (Watt per unit of time) for a correlated network with1000-server (38 % of energy saving). In addition, and thanks to the proposed traffic aware algorithm, the new approach shows a good performance even in case of high failure rate (up to 30%) which means when one third of the links failed, the connection failure rate is only 0.7%. Both theoretical analysis and simulation experiments are conducted to evaluate and verify the performance of the proposed approach compared to the state-of-the-art techniques.
-
-
-
Visualization of Electricity Consumption in Qatar
Authors: Salma Tarek Shalaby, Engy Khaled Soliman and Noora FetaisThe amount of raw data related to electricity consumption is increasing rapidly with the increase of construction sites, population and Qatar preparation for 2022 world cup. By this increase, managers will find difficulties in studying the data and keeping track of the consumption. Thus, taking actions and future planning will be a hard task and the decisions taken might not be beneficial because of the miss understanding of data. In this project, a customized web application is developed to visualize the data on an interactive map. The idea behind the project is to help decision makers to take actions in an efficient and easy way based on the data visualized thus, it supports Qatar's 2030 vision for saving time and electricity. Instead of reading big tables with huge incomprehensible numbers, the application easily visualizes the average consumption on the map. It also provides different chart types to help the user in comparing the data and consequently take the right decision. The rapid increase of data challenges the ability of using such data in decision-making, the challenge also extends to the ability of avoiding the risk of getting lost in these big numbers. Reading such data and trying to analyze it could be wasteful in terms of time and money. Moreover, it could cut down industrial and scientific opportunities. The current solution in Qatar for electric consumption analysis is using Microsoft Excel. The stakeholders only use professional software for operational purposes, but not for analyzing the data. As a result, they are going to see what they asked for only and they would waste any opportunity for deeper insight into these data. Visual analytics is a powerful tool to visualize and transparent processes to provide a means of communicating about them rather than providing results. Data visualization is an effective tool for communication regardless of the communicators’ expertise. It is also viewed as an important analytical tool for effective research communication. It is not limited to the display of raw data sets, but rather all static and interactive visual representations of research data, which may include interactive charts, queries, etc. Combining the visualization aspect with the analytics of big data will significantly help resolving the problem of reading the electricity consumption data. One of the project's goals is to improve the readability of data insights and unlock the ultimate power of data visualization; the data presentation element is where alternative representations will be used to test the persuasive power of visualization. The project aims to make data understandable using data visualization techniques. It provides several features such as an interactive map that indicates the average consumption. The zooming levels are divided into three levels: 1) The Whole Country 2) Municipalities 3) Areas level. In addition, the data is visualized using different graph types: line graphs, pie charts, bar charts and others. This helps the managers and decision makers to effectively analyze the data and compare between different areas and the average consumption through years. Furthermore, it provides different utilities such as emailing the results, printing, saving and showing the table.
-
-
-
Saffara: Intelligent queuing application for improving clinical workflow
This paper examines the impact on patient experience through the creation of a bespoke patient queuing and communication application using in-house developed technologies. Sidra Medicine hospital's outpatient pharmacy was experiencing mismanaged queue lines, dissatisfied patients, and the lack of data necessary to determine the length of time elapsing in obtaining medication. After analyzing patient surveys through the method of sentiment analysis and generation of word clouds, we validated that there was scope for workflow improvement in the pharmacy department. The Center for Medical Innovation, Software, and Technology (CMIST) department was commissioned to develop the software application necessary to deliver efficiency and improvement in response to the lack of a queuing and communication system. The use of an in-house development team to create an application for queuing and communication as opposed to selecting a popular vendor software resulted in many advantages. Some of the main advantages were that the requirements of pharmacy were delivered through rapid customization and in multiple iterations, which were delivered in response to the ever changing customer demand. By using scrum methodology, the team was able to deliver the application called Saffara, for managing queues in the pharmacy and improving patient experience while obtaining medication. The Saffara application, has a unique feature of being integrated to the hospital EMR (Electronic Medical Record) system while ensuring confidentiality, efficiency and time saving. The application integrates with the hospital's EMR to obtain patient information, appointment times and prescribed medication. This integration allowed for the identification of patients' progress and calculation of patients ‘wait times. Patients are automatically notified when their medication is ready for collection, through system generated SMS texts. The application also utilizes a notification display for communication with patients as part of our business continuity procedure. In addition to notifying the patient, the Saffara application also generates detailed analytical reports for each hour and for each patient, which allows us to analyze the bottlenecks in the clinical workflow. We present these technologies to any stakeholders through a web dashboard and detailed web-based reports in our application. The pharmacy stakeholders, i.e., the pharmacy management team utilize the dashboards and quantitative data in the reports to predict staffing levels to deliver optimization in patient medication delivery. In this paper, we present the methods we use to calculate the useful analytics like patient wait times across different stages in the workflow and hourly breakdown of patients being served. We will also discuss how we reduced patient wait times by adding unique features to a queuing application like automation of steps in the pharmacy workflow through generation of patient identifiers and automatic ticket tracking. This paper will also highlight how we are scaling our application from pharmacy to all clinics of the hospital. The goal of the application is to provide a consistent experience for patients in all clinics as well as a consistent way for staff to gather and analyze data for workflow improvement. Our future work is to explore how we can use machine learning to identify the parameters that play a vital role in wait times as well as patient experience. The objective of this paper is to highlight how our technology converges the patient experience and staff workflow enhancements to deliver improvement in a clinical workflow setting.
-
-
-
Advances in Databased Process Monitoring and Applications
Authors: M. Ziyan Sheriff, hazem Nounou, M. Nazmul Karim and Mohamed NounouMany processes utilize statistical process monitoring (SPM) methods in order to ensure that process safety and product quality is maintained. Principal Component Analysis (PCA) is a data-based modeling and fault detection technique that it widely used by the industry [1]. PCA is a dimensionality reduction technique that transforms multivariate data into a new set of variables, called principal components, which capture most of the variations in the data in a small number of variables. This work examines different improved PCA-based monitoring techniques, discusses their advantages and drawbacks, and also provides solutions to address the issues faced by these techniques. Most data based monitoring techniques are known to rely on three fundamental assumptions: that fault-free data are not contaminated with excessive noise, are decorrelated (independent), and follow a normal distribution [2]. However, in reality, most processes may violate one or more of these assumptions. Multiscale wavelet-based data representation is a powerful data analysis tool that utilizes wavelet coefficients which are known to possess characteristics that are inherently able to satisfy these assumptions as they are able to denoise data, force data to follow a normal distribution and be decorrelated at multiple scales. Multiscale representation has been utilized to develop a multiscale principal component analysis (MSPCA) method for improved fault detection [3]. In a previous work, we also analyzed the performance of multiscale charts under violation of the main assumptions, demonstrating that multiscale methods do provide lower missed detection rates, and ARL1 values when compared to conventional charts, with comparable false alarm rates [2]. The choice of wavelet to use, the choice of decomposition depth, and Gibb's phenomenon are a few issues faced by multiscale representation, and these will be discussed in this work. Another common drawback of most conventional monitoring techniques used in the industry is that they are only capable of efficiently handling linear data [4]. The kernel principal component analysis (KPCA) method is a simple improvement to the PCA model that enables nonlinear data to be handled. KPCA relies on transforming data from the time domain to a higher dimensional space where linear relationships can be drawn, making PCA applicable [5]. From a fault detection standpoint KPCA suffers from a few issues that require discussion, i.e., the importance of the choice of kernel utilized, the kernel parameters, and the procedures required to bring the data back to the time domain, also known as the pre-image problem in literature [6]. Therefore, this work also provides a discussion on these concerns. Recently, literature has shown hypothesis testing methods, such as the Generalized Likelihood Ratio (GLR) charts can provide improved fault detection performance [7]. This is accomplished by utilizing a window length of previous observations in order to compute the maximum likelihood estimates (MLEs) for the mean and variance, which are then utilized to maximize the likelihood functions in order to detect shifts in the mean and variance [8], [9]. Although, utilizing a larger window length of data to compute the MLEs has shown to reduce the missed detection rate, and ARL1 values, the larger window length increases both the false alarm rate and the computational time required for the GLR statistic. Therefore, an approach to select the window length parameter keeping all fault detection criterion in mind is required, which will be presented and discussed. The individual techniques described above have their own advantages and limitations. Another goal of this work is to develop new algorithms, through the efficient combination of the different SPM techniques, to improve fault detection performances. Illustrative examples using real world applications will be presented in order to demonstrate the performances of the developed techniques as well as their applicability in practice. References [1] I. T. Joliffe, Principal Component Analysis, 2nd ed. New York, NY: Springer-Verlag, 2002. [2] M. Z. Sheriff and M. N. Nounou, “Improved fault detection and process safety using multiscale Shewhart charts,” J. Chem. Eng. Process Technol., vol. 8, no. 2, pp. 1–16, 2017. [3] B. Bakshi, “Multiscale PCA with application to multivariate statistical process monitoring,” AIChE J., vol. 44, no. 7, pp. 1596–1610, Jul. 1998. [4] M. Z. Sheriff, C. Botre, M. Mansouri, H. Nounou, M. Nounou, and M. N. Karim, “Process Monitoring Using Data-Based Fault Detection Techniques: Comparative Studies,” in Fault Diagnosis and Detection, InTech, 2017. [5] J.-M. Lee, C. Yoo, S. W. Choi, P. a. Vanrolleghem, and I.-B. Lee, “Nonlinear process monitoring using kernel principal component analysis,” Chem. Eng. Sci., vol. 59, no. 1, pp. 223–234, 2004. [6] G. H. BakIr, J. Weston, and B. Schölkopf, “Learning to Find Pre-Images,” Adv. neural Inf. Process. Syst. 16, no. iii, pp. 449–456, 2004. [7] M. Z. Sheriff, M. Mansouri, M. N. Karim, H. Nounou, and M. Nounou, “Fault detection using multiscale PCA-based moving window GLRT,” J. Process Control, vol. 54, pp. 47–64, Jun. 2017. [8] M. R. Reynolds and J. Y. Lou, “An Evaluation of a GLR Control Chart for Monitoring the Process Mean,” J. Qual. Technol., vol. 42, no. 3, pp. 287–310, 2010. [9] M. R. Reynolds, J. Lou, J. Lee, and S. A. I. Wang, “The Design of GLR Control Charts for Monitoring the Process Mean and Variance,” vol. 45, no. 1, pp. 34–60, 2013.
-
-
-
Haralick feature extraction from timefrequency images for automatic detection and classification of audio anomalies for road surveillance
More LessIn this paper, we propose a novel method for the detection of road accidents by analyzing audio streams for road surveillance application. In the last decade, due to the increase in a number of people and transportation vehicles, traffic accidents have become one of a major public issue worldwide. The vast number of injuries and death due to road traffic accident reveals the story of a global crisis of road safety. A total of 52,160 road traffic accidents (RTA), 1130 injuries and 85 fatalities were registered during the year 2000, in the state of Qatar. An increase in the number of transportation vehicles around cities has risen the need for more security and safety in public environments. The most obvious reason for a person's death during accidents is the absence or prolong response time of the first aid facility, which is due to the delay in the information of the accident being reached to the police, hospital or ambulance team. In the last couple of years, several surveillance systems have been proposed based on image and video processing for automatically detecting road accidents and car crashes to ensure a quick response by emergency teams. However, in some situations such as adverse weather conditions or cluttered environments, the visual information is not sufficient enough, whereas analyzing the audio tracks can significantly improve the overall reliability of surveillance systems. In this paper we propose a novel method that automatically identifies hazardous situations such as tire skidding and car crashes in presence of background noises, by analyzing the audio streams. Previous studies show that methods for the detection, estimation, and classification of nonstationary signals can be enhanced by utilizing the time-frequency (TF) characteristics of such signals. The TF-based techniques have been proved to outperform classical techniques based on either time or frequency domains, in analyzing real-life nonstationary signals. Time-frequency distributions (TFDs) give additional information about signals that cannot be extracted from either the time domain or frequency domain representations, i.e. the instantaneous frequency of the components of a signal. In order to utilize this extra information provided by TF domain representation, the proposed approach extracts TF image features from quadratic time-frequency distributions (QTFDs), for the detection of audio anomalies. The extended modified-B distribution (EMBD) is utilized to transform a 1-dimensional audio signal into 2-dimensional TF representation which is interpreted as an image. The image descriptors based features are then extracted from the TF representation to classify visually the audio signals into background or abnormal activity patterns in the TF domain. The proposed features are based on Haralick's texture features extracted from the TF representation of audio signals considered and processed as a textured image. These features are used to characterize and hence classify audio signals into M classes. This research study demonstrates that a TF image pattern recognition approach offers significant advantages over standard signal classification methods that utilize either t-domain only or f-domain only features. The proposed method has been experimentally validated on a large open source database of sounds, including several kinds of background noise. The events to be recognized are superimposed on different background sounds of roads and traffic jam. The obtained results are compared with a recent study, utilizing the same large and complex data set of audio signals, and the same experimental setup. The overall classification results confirm the superior performance of the proposed approach with accuracy improvement up to 6%.
-
-
-
Annotation Guidelines for Text Analytics in Social Media
Authors: Wajdi Zaghouani and Anis CharfiAnnotation Guidelines for Text Analytics in Social Media A person's language use reveals much about their profile, however, research in author profiling has always been constrained by the limited availability of training data, since collecting textual data with the appropriate meta-data requires a large collection and annotation effort (Maamouri et al. 2010; Diab et al. 2008; Hawwari et al. 2013).For every text, the characteristics of the author have to be known in order to successfully profile the author. Moreover, when the text is written in a dialectal variety such as the Arabic text found online in social media a representative dataset need to be available for each dialectal variety (Zaghouani et al. 2012; Zaghouani et al. 2016).The existing Arabic dialects are historically related to the classical Arabic and they co-exist with the Modern Standard Arabic in a diglossic relation. While the standard Arabic, has a clearly defined set of orthographic standards, the various Arabic dialects have no official orthographies and a given word could be written in multiple ways in different Arabic dialects (Maamouri et al. 2012; Jeblee et al. 2014).This abstract presents the guidelines and annotation work carried out within the framework of the Arabic Author profiling project (ARAP), a project that aims at developing author profiling resources and tools for a set of 12 regional Arabic dialects. We harvested our data from social media which reflect a natural and spontaneous writing style in dialectal Arabic from users in different regions of the Arabworld.For the Arabic language and its dialectal varieties as foundin social media, to the best of our knowledge, there is nocorpus available for the detection of age, gender, nativelanguage and dialectal variety. Most of the existingresources are available for English or other Europeanlanguages. Having a large amount of annotated data remains the key to reliable results in the taskof author profiling. In order to start the annotation process, we createdguidelines for the annotation of the Tweets according totheir dialectal variety, their native language, the gender of the user and the age. Before starting theannotation process, we hired and trained a group of annotators and we implemented a smooth annotation pipeline to optimize the annotation task. Finally, we followed a consistent annotation evaluation protocol to ensure a high inter-annotator agreement.The Annotations were done by carefully analyzing each ofthe user's profiles, their tweets, and when possible, weinstructed the annotators to use external resources such asLinkedIn or Facebook. We created a general profilesvalidation guidelines and task-specific guidelines toannotate the users according to their gender, age, dialectand their native language. For some accounts, the annotators were not able to identifythe gender as this was based in most of the cases on thename of the person or his profile photo and in some casesby their biography or profile description. In case thisinformation is not available, we instructed the annotators toread the user posts and find linguistic indicators of thegender of the user.Like many other languages, Arabic conjugates verbsthrough numerous prefixes and suffixes and the gender issometimes clearly marked such as in the case of the verbsending in taa marbuTa which is usually of femininegender.In order to annotate the users for their age, we used threecategories: under 20 years, between 20 years and 40 years,and 40 years and up.In our guidelines, we asked our annotators to try their bestto annotate the exact age, for example, they can check theeducation history of the users in LinkedIn and Facebookprofile and find when the graduated from high school forexample in order to guess the age of the users. As the dialect and the regions are known in advance to theannotators, we instructed them to double check and markthe cases when the profile appears to be from a differentdialect group. This is possible despite our initial filteringbased on distinctive regional keywords. We noticed that inmore than 90% the profiles selected belong to the specifieddialect group. Moreover, we asked the annotators to mark and identifyTwitter profiles with a native language other than Arabic,so they are considered as Arabic L2 speakers. In order tohelp the annotators identify those, we instructed them tolook for various cues such as the writing style, the sentence structure, the word order and the spelling errors.AcknowledgementsThis publication was made possible by NPRP grant #9-175-1-033 from the Qatar National Research Fund (a member ofQatar Foundation). The statements made herein are solelythe responsibility of the authors. ReferencesDiab Mona, Aous Mansouri, Martha Palmer, Olga Babko-Malaya, Wajdi Zaghouani, Ann Bies, Mohammed Maamouri. A Pilot Arabic Propbank; LREC 2008, Marrakech, Morocco, May 28-30, 2008.Hawwari, A.; Zaghouani, W.; O»Gorman, T.; Badran, A.; Diab, M., «Building a Lexical Semantic Resource for Arabic Morphological Patterns,» Communications, Signal Processing, and their Applications (ICCSPA), 2013, vol., no., pp.1,6, 12-14 Feb. 2013. Jeblee Serena; Houda Bouamor; Wajdi Zaghouani; Kemal Oflazer. CMUQ@QALB-2014: An SMT-based System for Automatic Arabic Error Correction. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), Doha, Qatar, October 2014.Maamouri Mohamed, Ann Bies, Seth Kulick, Wajdi Zaghouani, Dave Graff and Mike Ciul. 2010. From Speech to Trees: Applying Treebank Annotation to Arabic Broadcast News. In Proceedings of LREC 2010, Valetta, Malta, May 17-23, 2010.Maamouri Mohammed, Wajdi Zaghouani, Violetta Cavalli-Sforza, Dave Graff and Mike Ciul. 2012. Developing ARET: An NLP-based Educational Tool Set for Arabic Reading Enhancement. In Proceedings of The 7th Workshop on Innovative Use of NLP for Building Educational Applications, NAACL-HLT 2012, Montreal, Canada.Obeid Ossama, Wajdi Zaghouani, Behrang Mohit, Nizar Habash, Kemal Oflazer and Nadi Tomeh. A Web-based Annotation Framework For Large- Scale Text Correction. In Proceedings of IJCNLP'2013, Nagoya, Japan.Zaghouani Wajdi, Nizar Habash, Ossama Obeid, Behrang Mohit, Houda Bouamor, Kemal Oflazer. 2016. Building an arabic machine translation post-edited corpus: Guidelines and annotation. In Proceedings of the International Conference on Language Resources and Evaluation (LREC»2016).Zaghouani Wajdi, Abdelati Hawwari and Mona Diab. 2012. A Pilot PropBank Annotation for Quranic Arabic. In Proceedings of the first workshop on Computational Linguistics for Literature, NAACL-HLT 2012, Montreal, Canada.
-
-
-
Towards OpenDomain CrossLanguage Question Answering
Authors: Ines Abbes, Alberto Barrón-Cedeño and Mohamed JemniWe present MATQAM (Multilingual Answer Triggering Question Answering Machine) a multilingual answer triggering open-domain QA system, focusing on answering questions whose answers might be in free texts in multiple languages within Wikipedia.Obtaining relevant information from the Web has become more challenging, since online communities and social media tend to confine people to bounded trends and ways of thinking. Due to the large amount of data available, getting the relevant information has become a more challenging task. Unlike in standard Information Retrieval (IR), Question Answering (QA) systems aim at retrieving the relevant answer(s) to a question expressed in natural language, instead of returning a list of documents. On the one hand, information is dispersed in different languages and needs to be gathered to get more knowledge. On the other hand, extracting answers from multilingual documents is a complicated task because natural languages follow diverse linguistic syntaxes and rules, especially for Semitic languages, such as Arabic. This project tackles open-domain QA using Wikipedia as source of knowledge by building a multilingual —Arabic, French, English— QA system. In order to obtain a collection of Wikipedia articles as well as questions in multiple languages, we extended an existing English dataset: WikiQA (Yang et al., 2015). We used the WikiTailor toolkit (Barrón-Cedeño et al., 2015) to build a comparable corpus form Wikipedia articles and to extract the corresponding articles in Arabic, French, and English. We used neural machine translation to generate the questions in the three languages as well. Our QA system consists of the three following modules. (i) Question processing consists of transforming a natural language question into a query and determining the expected type of the answer in order to define the retrieval mechanism for the extraction function. (ii) The document retrieval module consists of retrieving the most relevant documents from the search engines —in multiple languages— given the produced query. The purpose of this module is to identify the documents that may contain an answer to the question. It requires cross-language representations as well as machine translation technology to do that, as the question could be asked in Arabic, French or English and the answer could be in either of these languages. (iii) The answer identification module ranks specific text fragments that are plausible answers to the question. It first ranks the candidate text fragments in the different languages and, if they are found, they are combined into one consolidated answer. This is a variation of the cross-language QA scenario enabling answer triggering, where no concrete answer has to be provided, if it does not exist. In order to build our QA system, we extend an existing framework (Rücklé and Gurevych, 2017) integrating neural networks for answer selection. References Alberto Barrón-Cedeño, Cristina España Bonet, Josu Boldoba Trapote, and Luís Márquez Villodre. A Factory of Comparable Corpora from Wikipedia. In Proceedings of the Eighth Workshop on Building and Using Comparable Corpora, pages 3–13, Beijing, China, 2015. Association for Computational Linguistics. Andreas Rücklé and Iryna Gurevych. End-to-End Non-Factoid Question Answering with an Interactive Visualization of Neural Attention Weights. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics-System Demonstrations (ACL 2017), pages 19–24, Vancouver, Canada, August 2017. Association for Computational Linguistics. doi:10.18653/v1/P17-4004. URL http://aclweb.org/anthology/P17-4004. Yi Yang, Wen-tau Yih, and Christopher Meek. WikiQA: A Challenge Dataset for Open-Domain Question Answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 2013–2018, Lisbon, Portugal, 2015.
-
-
-
Toward a Cognitive Evaluation Approach for Machine Translation PostEditing
Authors: Wajdi Zaghouani and Irina TemnikovaMachine Translation (MT) today is used more and more by professional translators, including freelancers, companies, and official organisations, such as, for example, the European Parliament. MT output, especially of publicly available MT engines, such as Google Translate, is, however, well known to contain errors and lack fluency from human expectations» point of view. For this reason, the MT translated texts often need manual (or automatic) corrections, known as `Post-Editing» (PE).Although there are fast and simple measures of post-editing cost, such as time to post-edit, or edit-distance, these measures do not reflect the cognitive difficulty involved in correcting the specific errors in the MT output text. As the MT output texts can be of different quality and thus contain errors of different difficulty to be corrected, fair compensation of post-editing should take into account the difficulty of the task, which should thus be measured in the most reliable way. The best solution for this would be to build an automatic classifier which (a) assigns each MT error into a specific correction class, (b) assigns an effort value which reflects the cognitive effort a post-editor needs to make in order to make such a correction, and (c) gives a post-editing effort score to a text. On our way of building such a classifier, we investigate whether an existing cognitive effort model could provide a fairer compensation for the post-editor, by testing it on a new language which strongly differs from the previous languages on which this methodology was tested.The model made use of the Statistical Machine Translation (SMT) error classification schema from which the error classes were subsequently re-grouped and ranked in an increasing order, so as to reflect the cognitive load post-editors experience while correcting the MT output. Error re-grouping and ranking was done on the basis of relevant psycholinguistic error correction literature. The aim of proposing such an approach was to create a better metric for the effort a post-editor faces while correcting MT texts, instead of relying on a non-transparent MT evaluation score such as BLEU.The approach does not rely on using specific software, in contrast to PE cognitive evaluation approaches which are based on keystroke logging or eye-tracking. Furthermore, the approach is more objective than the approaches which rely on human scores for perceived post-editing effort. In its essence, it is similar to other error classification approaches. It is enriched, however by error ranking, based on information specifying which errors require more cognitive effort to be corrected, and which less. In this way, the approach only requires counting the number of errors of each type in the MT output. And thus it allows the comparison of the post-editing cost of different output texts of the same MT engine, the same text as an output of different MT engines, or for different language pairs.Temnikova et al. (2010) tested her approach on two emergency instructions texts, one original (called `Complex») and one manually simplified (called ``Simplified»»), according to Controlled Language (CL) text simplification rules. Both texts were translated using the web version of Google Translate into three languages: Russian, Spanish, and Bulgarian. The MT output was manually post-edited by 3-5 human translators per language and then the number of errors per category was manually counted by one annotator per language.Several researchers based their work on Temnikova»s cognitive evaluation approach. Among them, Koponen et al. (2012) have modified the error classification by adding one additional class.Lacruz and Munoz et al. (2014) enriched our original error ranking/classification with numerical weights from 1 to 9, which showed a good correlation with another metric they used (Pause to Word Ratio), but did not normalize the scores per text length. The weights were added to form a unique score for each text called Mental Load (ML).The current work presented in this abstract makes the following contributions, compared to our previous work: (1) We separate the Controlled Language (CL) evaluation as it was in Temnikova work from the MT evaluation and applies it only as MT evaluation. (2) We test the error classification and ranking method on a new (Non-Indo-European) language (Modern Standard Arabic, MSA). (3) We increase the number of annotators and textual data. (4) We test the approach on new text genres (news articles). On our way of building a classifier which would assign post-editing effort scores to new texts, we have conducted a new experiment, aiming to test whether a previously introduced approach applies also to Arabic, a language different from those for which the cognitive evaluation model was initially developed.The results of the experiment confirmed once again that Machine Translation (MT) texts of different translation quality exhibit different distributions of error categories, with the texts with lower MT quality containing more errors, and error categories which are more difficult to correct (e.g. word order errors). The results also showed some variation in the presence of certain categories of errors, which we deem being typical for Arabic. The comparison of texts of better MT quality showed similar results across all four languages (Modern Standard Arabic, Russian, Spanish, and Bulgarian), which shows that the approach can be applied without modification also to non-Indo-European languages in order to distinguish the texts of better MT quality from those of worse MT quality.In future work, we plan to adapt the error categories to Arabic (e.g., add the category “merge tokens»»), in order to test if such language-specific adaptation would lead to better results for Arabic. We plan to use a much bigger dataset and extract most of the categories automatically.We also plan to assign weights and develop a unique post-editing cognitive difficulty score for MT output texts. We are confident that this will provide a fair estimation of the cognitive effort required for post-editors to edit such texts, and will help translators to receive a fair compensation for their work.
-
-
-
A Decomposition Algorithm to Measure Redundancy in Structured Linear Systems
Authors: Vishnu Vijayaraghavan, Kiavash Kianfar, Hamid Parsaei and Yu DingNowadays, inexpensive smart devices with multiple heterogeneous on-board sensors, networked through wired or wireless links and deployable in large numbers, are distributed throughout a physical process or in a physical environment, providing real-time, and dense spatio-temporal measurements and enabling surveillance and monitoring capability that could not be imagined a decade ago. Such a system-wide deployment of sensing devices is known as distributed sensing, and is considered one of the top ten emerging technologies that will change the world. Oil and gas pipeline systems, electrical grid systems, transportation systems, environmental and ecological monitoring systems, security systems, and advanced manufacturing systems are just a few examples among many others. Malfunction of any of the large-scale systems typically results in enormous economic loss and sometimes even endangers critical infrastructure and human lives. In any of these systems, the system state variables, whose values trigger various actions, are estimated based on the measurements gathered by the sensor system that monitors and controls the system of interest. Consequently, the reliability of these estimations is of utmost importance in economic and safe operation of these large-scale systems. In a linear sensor system, the sensor measurements are combined linear responses of the system states that need to be estimated. In the engineering literature, a linear model is often used to establish connection between sensor measurements in a system and the system's state variables through the sensor system's design matrix. In such systems, the sensor outputs y and the system states x are linked by the set of linear equations represented as y = Ax+e, where y and e are n by 1 vectors, and x is a p by 1 vector. A is an n by p design matrix (n >> p) that models the linear measurement process. The matrix A is assumed to be of full column rank i.e., r(A) = p, where r(A) denotes the rank of A. The last term e is a random noise vector, which is assumed to be normally distributed with mean 0. In the context of estimation reliability, the redundancy degree in a sensor system is the minimum number of sensor failures (or measurement outliers) which can happen before the identifiability of any state is compromised. This number, called the degree of redundancy of the matrix A and denoted by d(A), is formally defined as d(A) = {d-1: there esists A[-d] s.t. r(A[-d]) < r(A)}, where A[ − d] is the reduced matrix after deleting some d rows from the original matrix. The degree of redundancy of linear sensor systems is a measure of robustness of the system against sensor failures and hence the reliability of a linear sensor system. Finding the degree of redundancy for structured linear systems is proven to be NP-hard. Bound and decompose, mixed integer programming, l1-minimization methods have all been studied and compared in the literature. But none of these methods are suitable for finding the degree of redundancy in large scale sensor systems. We propose a decomposition approach which effectively disintegrates the problem into a reasonable number of smaller subproblems utilizing the structure inherent in such linear systems using concepts of duality and connectivity from matroid theory. We propose two different but related algorithms, both of which solves the same redundancy degree problem. While the former algorithm applies the decomposition technique over the vector matroid (the design matrix), the latter uses its corresponding dual matroid. These subproblems are then solved using mixed integer programming to evaluate the degree of redundancy for the whole sensor system. We report substantial computational gains (up to 10 times) for both these algorithms as compared to even the best known existing algorithms.
-
-
-
Framework for Visualizing Browsing Patterns Captured in Computer Logs
Authors: Noora Fetais and Rachael FernandezResearch ProblemAn Intrusion Detection System (IDS) is used for preventing security breaches by monitoring and analyzing the data recorded in log files. An IDS analyst is responsible for detecting intrusions in a system by manually investigating the vast amounts of textual information captured in these logs. The activities that are performed by the analyst can be split into 3 phases, namely: i) Monitoring ii) Analysis and iii) Response [1]. The analyst starts by monitoring the system, application and network logs to find attacks against the system. If an abnormality is observed, the analyst moves to the analysis phase in which he tries to diagnose the attacks by analyzing the users’ activity pattern. After the reason has been diagnosed, appropriate steps are taken to resolve the attacks in the response phase. The analyst's job is time-consuming and inevitably prone to errors due to the large amount of textual information that has to be analyzed [2]. Though there have been various frameworks for visualizing information, there hasn't been much research aimed at visualizing the events that are captured in the log files. Komlodi et al. (2004) proposed a popular framework which is enriched with a good set of requirements for visualizing the intrusions in an IDS. However, they do not provide any details for handling the data in the logs which is essentially the source of data for an IDS, nor do they provide any tasks for predicting an attack. It has also been identified that current IV systems tend to place more importance on the monitoring phase over the other two equally important phases. Hence, a framework that can tackle this problem should be developed. Proposed Framework We propose a framework for developing an IDS which works by monitoring the log files. The framework provides users with a set of parameters that have to be decided before developing the IDS and supports the classification of activities in the network into 3 types, namely: Attack, Suspicious and Not Attack. It also provides phase-specific visualization tasks, and other tasks that are required for extracting information from log files and those that limit the size of the logs. We also outline the working of a Log Agent that is responsible for collecting information from different log files and then summarizing them into one master log file [3]. The proposed framework is applied on a simple file portal system that keeps track of users who access/delete/modify an existing file or add new files.The master log file captures the browsing patterns of the users that use the file portal. This data is then visualized to monitor every activity in the network. Each activity is visualized as a pixel whose attributes describe whether it is an authorized activity or an illegal attempt to access the system. In the analysis phase, tasks that help to determine a potential attack and the reasoning behind the classification of an activity as Suspicious or Attack are provided. Finally, in the response phase, tasks that can resolve the attack and tasks for reporting the details of the attack for future analysis are provided. References [1] A. Komlodi, J. Goodall, and W. Lutters, “131 An information visualization framework for intrusion detection,” CHI’04 Extended Abstracts on..., pp. 1743– 1746, 2004. [Online]. Available: http://dl.acm.org/citation.cfm?id = 1062935 [2] R. Fernandez and N. Fetais, “Framework for Visualizing Browsing Patterns Captured in Computer Logs Using Data Mining Techniques,” International Journal of Computing & Information Sciences, vol. 12, no. 1, pp. 83–87, 2016. [3] H. Kato, H. Hiraishi, and F. Mizoguchi, “Log summarizing agent for web access data using data mining techniques 2. Approach for web access log mining 3 Analysis of web access log 4. Design of Log Analysis System,” Analysis, vol. 00, no. C, pp. 2642–2647.
-
-
-
Automated Service Delivery and Optimal Placement for CRANs
Authors: Aiman Erbad, Deval Bhamare, Raj Jain and Mohammed SamakaTraditionally, in cellular networks, users communicate with the base station that serves the particular cell under coverage. The main functions of a base station can be divided into two, which are the baseband unit (BBU) functionalities and the remote radio head (RRH) functionalities. The RRH module is responsible for digital processing, frequency filtering and power amplification. The main sub-functions of the baseband processing module are coding, modulation, Fast Fourier Transform (FFT) and others. Data generally flows from RRH to BBU for further processing. Such BBU functionalities may be shifted to the cloud based resource pool, called as the Cloud-Radio Access Network (C-RAN) to be shared by multiple RRHs. Advancements in the field of cloud computing, software defined networking and virtualization technology may be leveraged by operators for the deployment of their BBU services, reducing the total cost of deployment. Recently, there has been a trend to collocate the baseband unit (BBU) functionalities and services from multiple cellular base stations into a centralized BBU pool for the statistical multiplexing gains. The technology is known as Cloud Radio Access Network (C-RAN). C-RAN is a novel mobile network architecture that can address a number of challenges the mobile operators face while trying to support the growing end users’ needs. The idea is to virtualize the BBU pools, which can are shared by different cellular network operators, allowing them to rent radio access network (RAN) as a cloud service. However, the manual configuration of the BBU services over the virtualized infrastructure may be inefficient and error-prone with the increasing mobile traffic. Similarly, in centralized BBU pools, non-optimal placement of the Virtual Functions (VFs) might result in a high deployment cost as well as long delays to the end-users. This may mitigate the advantages of this novel technology platform. Hence, the optimized placement of these VFs is necessary to reduce the total delays as well as minimize the overall cost to operate the C-RANs. Despite great advantages provided by the C-RAN architecture, there is no explicit support for the mobile operators to deploy their BBU services over the virtualized infrastructure, which may lead to the ad-hoc and error-prone service deployment in the BBU pools. Given the importance of C-RANs and yet the ad-hoc nature of their deployment, there is a need of automated and optimal application delivery in the context of cloud-based radio access networks to fully leverage the cloud computing opportunities in the Internet. In this work, we propose development of a novel automated service deployment platform, which will help to automate the instantiation of virtual machines at the cloud as user demands vary to achieve end-to-end automation in service delivery for C-RANs. Also, we consider the problem of optimal VF placement over distributed virtual resources spread across multiple clouds, creating a centralized BBU cloud. The aim is to minimize the total response time to the base stations in the network, as well as to satisfy the cost and capacity constraints. In this work, we implement an enhanced version of the two common approaches in the literature, which are: (1) branch-and-bound (BnB) and (2) Simulated Annealing (SA). The enhancement reduces the execution complexity of the BnB heuristic so that the allocation is faster. The proposed enhancements also improve the quality of the solution significantly. We compare the results of the standard BnB and SA schemes with the enhanced approaches to demonstrate these claims. Our aim was to develop a faster solution which can meet the latency requirements of the C-RANs, while the performance (here, in terms of cost and latency) is not far from the optimal. The proposed work contributes to “Information & Computing Technology” pillar of ARC’18. Also, it contributes to Qatar National Vision 2030 that encourages ICT initiatives. This vision, envisages Qatar at the forefront of the latest revolutions in computing, networking, Internet, and Mobility. Mobile applications form the majority of business applications on the Internet. This research proposal addresses the latest research issues in proliferation of the novel technology such as 5G. This project is timely since there is limited research, in Qatar (as well as globally) on supporting application delivery in general in the context of multiple heterogeneous cloud-based application deployment environments.
-
-
-
Does Cultural Affinity Influence Visual Attention An Eye Tracking Study of Human Images in ECommerce Websites
More LessThe objective of this research is to better understand the influence of cultural affinity on the design of Arab e-commerce websites, specifically the use of human images. Three versions of an e-commerce website selling cameras were built for this study – one with the image of an Arab woman holding a camera, another with the image of a Western woman holding a camera, and a third with no human image. All three websites displayed the same products (cameras) and contained the same navigational and textual elements. An eye tracking experiment involving 45 Arab participants showed that the image of the Arab woman gained participants' visual attention faster and for a longer duration than the image of the Western woman. When participants were presented with all three websites, 64.5% expressed their preference to purchase from the website with an Arab image. Although not reported in detail here, a structured questionnaire was also administered to study the influence of cultural affinity on perceived social presence and image appeal. A post-interview yielded further insights into participant preferences and the selection of culture-specific content for designing Arab ecommerce websites.
-
-
-
The political influence of the Internet in the Gulf
More LessI am a current student researching the findings from the QNRF NPRP grant, “Media Use in the Middle East” (NPRP 7-1757-5-261), a seven-nation survey by Northwestern University in Qatar. I am particularly interested in the potential of the Internet to increase feelings of political efficacy among citizens in the Arab region. The Internet has been shown to create feelings of political efficacy in many instances around the world, like how social media accelerated Egypt's 2011 (Gustin, S, 2011), but it can also create feelings of disempowerment (Bibri, S, E, 2015). I am interested in this topic specifically because, with the lack of freedom of expression in the Gulf region, many are turning to the internet to share their opinion on their country's political matter. Although there are consequences to those who criticize Gulf governments, the growing significance of the internet has become a communication tool between the country and its people. In my research, I look only at nationals in the Gulf countries of Qatar, Saudi Arabia, and the United Arab Emirates. Although the survey covers expatriate residents as well, I choose to use data on nationals only due to the personal nature of the question. Expatriates living in one of these Gulf countries might answer the question with their own country in mind or they might answer with their residence country in mind. How can we analyze the meaning of their responses without knowing exactly what they were thinking about? However, we know that nationals of specific countries will be thinking of whether they have increased political efficacy in their own countries, making the analysis clearer. Also, nationals on the Internet would most likely have a bigger political influence on their country than expatriates, and officials would probably prioritize their opinions over expatriates, which also justifies the look at only nationals. I investigate feelings of political efficacy through the Internet by focusing on a set of two questions. The question begins with, “Do you think by using the Internet…” and then asks the respondents two different statements: “…People like you can have more political influence?” and “…Public officials will care more what people like you think?” The response options were a scale of 1 to 5 where 1 means strongly disagree and 5 means strongly agree. These two statements probe different, but important areas of political efficacy. They probe the freedom of expression in the Gulf and whether these expressions are being heard and acted upon by the government, both in terms of citizen perceptions of political influence and emotional connection. My initial research demonstrates differences in levels of political efficacy across the three countries, including over time, and also within different demographics. The overall results show Saudi Arabia (58%) with the most belief in the political efficacy of the Internet, followed by Qatar (36%), then the UAE (18%). For Saudi, the majority of the sample believe they have a political influence on the internet, whereas in Qatar and the UAE, less than half believe so. This demonstrates a significant difference between the nations, which will need further investigating. We also see some interesting changes over time. Overall, more people in 2017 believe that officials care about what's being said on social media than in 2015. In Saudi, it moved up 15%, whereas in the UAE, there was only a 4% increase. Since this particular question was asked differently in Qatar in 2015, the comparison data wasn't given. However, the overall showings implicate that Gulf governments have begun to respond to the public's political input on the Internet. This will further be explored through interviews with nationals from the three countries. I've also found that there are similarities among the three countries as well. By looking at the results by gender, I find that more men (for example, in Saudi, 62%) than women (54%) believe they have more political influence by using the Internet. Also, when looking at the results by cultural conservativism or cultural progressivism, more progressives than conservatives believe they have political influence (Saudi: 65% vs. 53%) and that the government cares more about what people like them think when using the Internet (Saudi: 61% vs. 59%). My next stage of research is to investigate the number of issues about free speech and political efficacy in the Middle East that I have begun to uncover through analysis of the survey data. The main story I want to explain is the belief that the use of the Internet increases political influence in this region. I am particularly interested in investigating questions about gender inequality on social media, the extent to which the public is able to talk about important issues facing their countries on the Internet and if they feel they are being heard. I am also interested in learning more about the perceptions of conservative and progressive people on free speech and politics and how these perceptions have changed over the years. My analysis will be based on data from the survey as well as interviews conducted by myself of citizens from the three countries, which will give context to the survey data. Work cited: Bibri, S. E. (2015) The Shaping of Ambient Intellience and the Internet of Things. Norway: Atlantis Press. Dennis, E. E., Martin, J. D., & Wood, R. (2017). Media use in the Middle East, 2017: A seven-nation survey. Northwestern University in Qatar. Retrieved from www.mideastmedia.org/survey/2017. Gustin, S. Social Media Sparked, Accelerated Egypt's Revolutionary Fire. (2011, Nov 2), Wired. Retrieved from https://www.wired.com/2011/02/egypts-revolutionary-fire/
-
-
-
A Cheremebased Sign Language Description System
Authors: Abdelhadi Soudi and Corinne VinopolBecause sign languages are not written, it is challenging to describe signs without knowing a spoken language equivalent. Sign language does not represent in any direct way the form of the spoken language either by visually representing sounds or syntactic sequences of words of the spoken language. One sign may mean an entire Arabic phrase and vice versa. Sign language can only be described, animated or videotaped. For example, a Deaf person may have it difficult to convey a sign for a concept such as “moon” on paper if he/she does not know the Arabic word. In this paper, we describe a notation system enables users (among other functions) to choose. Sign Language users identify the four cheremes for each hand for the STEM sign for which they want to find Standard Arabic equivalents by using pictorial lists. Using the four descriptors (hand shape, movement, location, and palm orientations of both hands, Deaf and Hard-of-Hearing users can describe Moroccan Sign Language (MSL) signs and find corresponding Arabic STEM terms, MSL and Arabic definitions, and concept pictures. The program will then search the database for the sign that most closely matches the selected cheremes in the STEM Sign Database. Then Standard Arabic information (definitions, parts of speech, etc.) and MSL information (graphic signs and videos) are displayed to the user. There are two possible scenarios that occur: if the system finds an exact match of the selected cheremes, then it will return the exact sign with Standard Arabic and MSL information; otherwise, it will display signs that most closely match the selected cheremes. In our database, signs and words have an N-N relationship, which means that there are signs that refer to multiple Standard Arabic words and vice-versa. Therefore, we had to group the database by signs and try to select signs with more than one Standard Arabic equivalent. Our database is in alphabetical order by Arabic base form word/stem. This means that when there is an Arabic word that has more than one meaning and consequently different signs, there are separate entries for that word. In order to reverse the strategy, that is, identify signs that can be expressed as different Arabic words and invariably have different meanings, we have reordered the database according to sign graphic file name. By programming retrieval using this reverse strategy, we have created the first-ever digital MSL thesaurus.The creation of this resource required, among other things, the identification and development of codes for the MSL cheremes, code assignment to the STEM signs and addition of Arabic definitions and videotapes of MSL translations of definitions. A usability and feasibility evaluation of the tool was conducted by having educators of deaf children, their parents, and deaf children themselves test the software.
-
-
-
Effectiveness of Driver Simulator as a Driving Education Tool
Authors: Semira Omer Mohammed, Wael Alhajyaseen, Rafaat Zriak and Mohammad KhorasaniThe impact and validity of driving simulators as an educational tool in driving schools in the licensing process remains questioned in literature. Many driving schools utilize driving simulators as a tool to assist students to learn required skills faster. Few studies showed conflicting results whether utilizing driving simulator is effective in improving the quality of driving education process. The applications of driving simulators are not limited to the driver training and education. It can assist in identifying risky drivers for further training to improve risk perception. Driver training has two key aspects: vehicle control and safety knowledge. However, it is common for training courses to focus on vehicle control and priority rules while giving lower attention to safety and risk identification skills. In this regard, driving simulators can play an important role by providing an artificial environment in which students can experience potential risks while driving. In Qatar, the training process to get licensed typically covers basics of vehicle control and driving laws (road signs etc.). Advanced training courses such as defensive driving are also available for those who completed the normal training process and successfully received their license. Such advanced courses are usually limited to companies who require such training to their employees. This paper aims to investigate the effectiveness of driving simulators in driving education. A driving school in the State of Qatar utilizes advanced simulators in its training programme. This study looks at students who go through both simulator and non-simulator training tracks in the driving school. Novice students begin with a 10-hour theory course which mainly focuses on road signs and markings. Following their theory course, the students are required to complete a sign test. This is followed by 5 simulator training sessions of 20-minutes each for those registered in simulator track. The first session is for simulator adaptation and familiarization with the vehicle's controls. The student is required to drive slowly around a simple oval road; a few cars are added at the end of the session. The second session uses a more complex road network with intersections and roundabouts. The last three sessions use a virtual replica of a section of Doha. The third session is conducted with no cars on the road. Traffic is added in the fourth session. In the fifth simulation session, surrounding vehicles are designed to run with unexpected or even aggressive behaviors such as sudden lane changes, speeding and not giving right of way. At the end of the fifth session, the student is issued a performance report. After the simulator sessions, students start with the 40 hours of on the road training. The student is then required to do a parking test followed by a road test. The students are permitted to take their road tests after 20 hours of on the road training. Each student is allowed to fail up to 2 road tests before they are required to sign up for further courses. A random sample of student data was collected from both simulator training and non-simulator training tracks. All the students were first time learners with no previous license and who passed their road tests. Data collected include gender, age, nationality and number of road tests undertaken before they passed. The study aims to determine whether any of the collected variables have significant effect on the number of road tests attempted and passing driving test on first attempt. The factors tested are gender, ethnicity, age and whether on simulator or non-simulator lesson track. Furthermore, the study attempted to formulate a model that can predict the likelihood of passing the driving test on first attempt. This pilot study is expected to clarify the effectiveness of driving simulators as an educational tool and whether their utilization is justifiable.Acknowledgment: This publication was made possible by the NPRP award [NPRP 9-360-2-150] from the Qatar National Research Fund (a member of The Qatar Foundation). The statements made herein are solely the responsibility of the author[s].
-
-
-
ThreatBased Security Risk Evaluation in the Cloud
Authors: Armstrong Nhlabatsi, Khaled Khan, Noora Fetais, Rachael Fernandez, Jin Hong and Dong Seong KimResearch ProblemCyber attacks are targeting the cloud computing systems, where enterprises, governments, and individuals are outsourcing their storage and computational resources for improved scalability and dynamic management of their data. However, the different types of cyber attacks, as well as the different attack goals, create difficulties providing the right security solution needed. This is because different cyber attacks are associated with different threats in the cloud computing systems, where the importance of threats varies based on the cloud user requirements. For example, a hospital patient record system may prioritize the security of cyber attacks tampering patient records, while a media storage system may prioritize the security of cyber attacks carrying out a denial of service attack for ensuring a high availability. As a result, it is of paramount importance to analyze the risk associated with the cloud computing systems taking into account the importance of threats based on different cloud user requirements.However, the current risk evaluation approaches focus on evaluating the risk associated with the asset, rather than the risk associated with different types of threats. Such a holistic approach to risk evaluation does not show explicitly how different types of threats contribute to the overall risk of the cloud computing systems. Consequently, This makes it difficult for security administrators to make fine-grained decisions in order to select security solutions based on different importance of threats given the cloud user requirements. Therefore, it is necessary to analyze the risk of the cloud computing systems taking into account the different importance of threats, which enables the allocation of resources to reduce particular threats, identify the risk associated with different threats imposed, and identify different threats associated with cloud components.Proposed SolutionThe STRIDE threat modeling framework (short for STRIDE) is proposed by Microsoft, which can be used for threat categorization. Using the STRIDE, we propose a threat-guided risk evaluation approach for the cloud computing systems, which can evaluate the risk associated with each threat category from the STRIDE explicitly. Further, we utilize seven different types of security metrics to evaluate the risk namely: \textit{component, component-threat, threat-category, snapshot, path-components, path-threat}, and \textit{overall asset}. Component, component-threat, threat-cateory, and snapshot risks measure the total risk on a component, component risk for a particular threat category, total snapshot risk for a single threat, and the total risk of the snapshot considering all threat categories, respectively. Path-components, path-threat, and overall asset measure the total risk of components in an attack path, the risk of a single threat category in the attack path, and the overal risk to an asset considering all attack paths, respectively. These metrics makes it possible to measure the contribution of each threat category to the overall risk more precisely.When a vulnerability is discovered in a component (e.g. a Virtual Machine) of the Cloud deployment, the administrator first determines which types of threats could be posed should the vulnerability be successfully exploited, and what would be the impacts of each of those threats on the asset. The impact assignment of each threat type is weighted depending on the importance of the component. For example, a Virtual Machine (VM) that acts a Web Server in a medical records management application could be assigned a higher weighting for \textit{denial-of-service} threats because if such attacks are successfully launched then the rest of the VMs that are reached through the Web Server will be unavailable. On the other hand, a vulnerability discovered in a VM that hosts a database of medical records would be rated highest impact for \textit{information disclosure} because if it is compromised confidentiality of the medical history of patients will be violated.By multiplying the probability of successfully exploiting the vulnerability with the threat impact, we compute the risk of each threat type. The variation in the assignment of impact for different threat types enables our approach to compute risks associated with the threats - thus empowering the security administrator with the ability to make fine-grained decisions on how much resources to allocate for mitigating which type of threat and which threats to prioritize. We evaluated the usefulness of our approach through its application to attack scenarios in an example Cloud deployment. Our results show that it is more effective and informative to administrators compared to asset-based approaches to risk evaluation.
-
-
-
Compressive SensingBased Remote Monitoring Systems for IoT applications
Authors: Hamza Djelouat, MOHAMED Al Disi, Abbes Amira and Faycal BensaaliInternet of things (IoT) is shifting the healthcare delivery paradigm from in-person encounters between patients and providers to an «anytime, anywhere» model delivery. Connected health has become more profound than ever due to the availability of wireless wearable sensors, reliable communication protocols and storage infrastructures. Wearable sensors would offer various insights regarding the patient's health (electrocardiogram (ECG), electroencephalography (EEG), blood pressure, etc.) and their daily activities (hours slept, step counts, stress maps,) which can be used to provide a thorough diagnosis and alert healthcare providers to medical emergencies. Remote elderly monitoring system (REMS) is the most popular sector of connected health, due to the spread of chronic diseases amongst the older generation. Current REMS use low power sensors to continuously collect patient's records and feed them to a local computing unit in order to perform real-time processing and analysis. Afterward, the local processing unit, which acts as a gateway, feeds the data and the analysis report to a cloud server for further analysis. Finally, healthcare providers can then access the data, visualize it and provide the proper medical assistant if necessary. Nevertheless, the state-of-the-art IoT-based REMS still face some limitations in terms of high energy consumption due to raw data streaming. The high energy consumption decreases the sensor's lifespan immensely, hence, a severe degradation in the overall performance of the REMS platform. Therefore, sophisticated signal acquisition and analysis methods, such as compressed sensing (CS), should be incorporated. CS is an emerging sampling/compression theory, which guarantees that an N-length sparse signals can be recovered from M-length measurement vector (M<<N) using efficient algorithms such as convex relaxation approaches and greedy algorithms. This work aims to enable two different scenarios for REMS by leveraging the concept of CS in order to reduce the number of samples transmitted form the sensors while maintaining a high quality of service. The first one is dedicated to abnormal heart beat detection, in which, ECG data from different patients is collected, transmitted and analysed to identify any type of arrhythmia or irregular abnormalities in the ECG. The second one aims to develop an automatic fall detection platform in order to detect falls occurrence, their strength, their direction in order to raise alert and provide prompt assistance and adequate medical treatment. In both applications, CS is explored to reduce the number of transmitted samples form the sensors, hence, increase the sensors lifespan. In addition, the identification and the detection is enabled by means of machine learning and pattern recognition algorithms. In order to quantify the performance of the system, subspace pursuit (SP) has been adopted as recovery algorithm. Whereas for data identification and classification, K-nearest neighbour (KNN), E-nearest neighbour (ENN), decision tree (BDT) and committee machine (CM) have been adopted.
-