- Home
- Conference Proceedings
- Qatar Foundation Annual Research Conference Proceedings
- Conference Proceeding
Qatar Foundation Annual Research Conference Proceedings Volume 2018 Issue 3
- Conference date: 19-20 Mar 2018
- Location: Qatar National Convention Center (QNCC), Doha, Qatar
- Volume number: 2018
- Published: 15 March 2018
51 - 77 of 77 results
-
-
Toward a Cognitive Evaluation Approach for Machine Translation PostEditing
Authors: Wajdi Zaghouani and Irina TemnikovaMachine Translation (MT) today is used more and more by professional translators, including freelancers, companies, and official organisations, such as, for example, the European Parliament. MT output, especially of publicly available MT engines, such as Google Translate, is, however, well known to contain errors and lack fluency from human expectations» point of view. For this reason, the MT translated texts often need manual (or automatic) corrections, known as `Post-Editing» (PE).Although there are fast and simple measures of post-editing cost, such as time to post-edit, or edit-distance, these measures do not reflect the cognitive difficulty involved in correcting the specific errors in the MT output text. As the MT output texts can be of different quality and thus contain errors of different difficulty to be corrected, fair compensation of post-editing should take into account the difficulty of the task, which should thus be measured in the most reliable way. The best solution for this would be to build an automatic classifier which (a) assigns each MT error into a specific correction class, (b) assigns an effort value which reflects the cognitive effort a post-editor needs to make in order to make such a correction, and (c) gives a post-editing effort score to a text. On our way of building such a classifier, we investigate whether an existing cognitive effort model could provide a fairer compensation for the post-editor, by testing it on a new language which strongly differs from the previous languages on which this methodology was tested.The model made use of the Statistical Machine Translation (SMT) error classification schema from which the error classes were subsequently re-grouped and ranked in an increasing order, so as to reflect the cognitive load post-editors experience while correcting the MT output. Error re-grouping and ranking was done on the basis of relevant psycholinguistic error correction literature. The aim of proposing such an approach was to create a better metric for the effort a post-editor faces while correcting MT texts, instead of relying on a non-transparent MT evaluation score such as BLEU.The approach does not rely on using specific software, in contrast to PE cognitive evaluation approaches which are based on keystroke logging or eye-tracking. Furthermore, the approach is more objective than the approaches which rely on human scores for perceived post-editing effort. In its essence, it is similar to other error classification approaches. It is enriched, however by error ranking, based on information specifying which errors require more cognitive effort to be corrected, and which less. In this way, the approach only requires counting the number of errors of each type in the MT output. And thus it allows the comparison of the post-editing cost of different output texts of the same MT engine, the same text as an output of different MT engines, or for different language pairs.Temnikova et al. (2010) tested her approach on two emergency instructions texts, one original (called `Complex») and one manually simplified (called ``Simplified»»), according to Controlled Language (CL) text simplification rules. Both texts were translated using the web version of Google Translate into three languages: Russian, Spanish, and Bulgarian. The MT output was manually post-edited by 3-5 human translators per language and then the number of errors per category was manually counted by one annotator per language.Several researchers based their work on Temnikova»s cognitive evaluation approach. Among them, Koponen et al. (2012) have modified the error classification by adding one additional class.Lacruz and Munoz et al. (2014) enriched our original error ranking/classification with numerical weights from 1 to 9, which showed a good correlation with another metric they used (Pause to Word Ratio), but did not normalize the scores per text length. The weights were added to form a unique score for each text called Mental Load (ML).The current work presented in this abstract makes the following contributions, compared to our previous work: (1) We separate the Controlled Language (CL) evaluation as it was in Temnikova work from the MT evaluation and applies it only as MT evaluation. (2) We test the error classification and ranking method on a new (Non-Indo-European) language (Modern Standard Arabic, MSA). (3) We increase the number of annotators and textual data. (4) We test the approach on new text genres (news articles). On our way of building a classifier which would assign post-editing effort scores to new texts, we have conducted a new experiment, aiming to test whether a previously introduced approach applies also to Arabic, a language different from those for which the cognitive evaluation model was initially developed.The results of the experiment confirmed once again that Machine Translation (MT) texts of different translation quality exhibit different distributions of error categories, with the texts with lower MT quality containing more errors, and error categories which are more difficult to correct (e.g. word order errors). The results also showed some variation in the presence of certain categories of errors, which we deem being typical for Arabic. The comparison of texts of better MT quality showed similar results across all four languages (Modern Standard Arabic, Russian, Spanish, and Bulgarian), which shows that the approach can be applied without modification also to non-Indo-European languages in order to distinguish the texts of better MT quality from those of worse MT quality.In future work, we plan to adapt the error categories to Arabic (e.g., add the category “merge tokens»»), in order to test if such language-specific adaptation would lead to better results for Arabic. We plan to use a much bigger dataset and extract most of the categories automatically.We also plan to assign weights and develop a unique post-editing cognitive difficulty score for MT output texts. We are confident that this will provide a fair estimation of the cognitive effort required for post-editors to edit such texts, and will help translators to receive a fair compensation for their work.
-
-
-
A Decomposition Algorithm to Measure Redundancy in Structured Linear Systems
Authors: Vishnu Vijayaraghavan, Kiavash Kianfar, Hamid Parsaei and Yu DingNowadays, inexpensive smart devices with multiple heterogeneous on-board sensors, networked through wired or wireless links and deployable in large numbers, are distributed throughout a physical process or in a physical environment, providing real-time, and dense spatio-temporal measurements and enabling surveillance and monitoring capability that could not be imagined a decade ago. Such a system-wide deployment of sensing devices is known as distributed sensing, and is considered one of the top ten emerging technologies that will change the world. Oil and gas pipeline systems, electrical grid systems, transportation systems, environmental and ecological monitoring systems, security systems, and advanced manufacturing systems are just a few examples among many others. Malfunction of any of the large-scale systems typically results in enormous economic loss and sometimes even endangers critical infrastructure and human lives. In any of these systems, the system state variables, whose values trigger various actions, are estimated based on the measurements gathered by the sensor system that monitors and controls the system of interest. Consequently, the reliability of these estimations is of utmost importance in economic and safe operation of these large-scale systems. In a linear sensor system, the sensor measurements are combined linear responses of the system states that need to be estimated. In the engineering literature, a linear model is often used to establish connection between sensor measurements in a system and the system's state variables through the sensor system's design matrix. In such systems, the sensor outputs y and the system states x are linked by the set of linear equations represented as y = Ax+e, where y and e are n by 1 vectors, and x is a p by 1 vector. A is an n by p design matrix (n >> p) that models the linear measurement process. The matrix A is assumed to be of full column rank i.e., r(A) = p, where r(A) denotes the rank of A. The last term e is a random noise vector, which is assumed to be normally distributed with mean 0. In the context of estimation reliability, the redundancy degree in a sensor system is the minimum number of sensor failures (or measurement outliers) which can happen before the identifiability of any state is compromised. This number, called the degree of redundancy of the matrix A and denoted by d(A), is formally defined as d(A) = {d-1: there esists A[-d] s.t. r(A[-d]) < r(A)}, where A[ − d] is the reduced matrix after deleting some d rows from the original matrix. The degree of redundancy of linear sensor systems is a measure of robustness of the system against sensor failures and hence the reliability of a linear sensor system. Finding the degree of redundancy for structured linear systems is proven to be NP-hard. Bound and decompose, mixed integer programming, l1-minimization methods have all been studied and compared in the literature. But none of these methods are suitable for finding the degree of redundancy in large scale sensor systems. We propose a decomposition approach which effectively disintegrates the problem into a reasonable number of smaller subproblems utilizing the structure inherent in such linear systems using concepts of duality and connectivity from matroid theory. We propose two different but related algorithms, both of which solves the same redundancy degree problem. While the former algorithm applies the decomposition technique over the vector matroid (the design matrix), the latter uses its corresponding dual matroid. These subproblems are then solved using mixed integer programming to evaluate the degree of redundancy for the whole sensor system. We report substantial computational gains (up to 10 times) for both these algorithms as compared to even the best known existing algorithms.
-
-
-
Framework for Visualizing Browsing Patterns Captured in Computer Logs
Authors: Noora Fetais and Rachael FernandezResearch ProblemAn Intrusion Detection System (IDS) is used for preventing security breaches by monitoring and analyzing the data recorded in log files. An IDS analyst is responsible for detecting intrusions in a system by manually investigating the vast amounts of textual information captured in these logs. The activities that are performed by the analyst can be split into 3 phases, namely: i) Monitoring ii) Analysis and iii) Response [1]. The analyst starts by monitoring the system, application and network logs to find attacks against the system. If an abnormality is observed, the analyst moves to the analysis phase in which he tries to diagnose the attacks by analyzing the users’ activity pattern. After the reason has been diagnosed, appropriate steps are taken to resolve the attacks in the response phase. The analyst's job is time-consuming and inevitably prone to errors due to the large amount of textual information that has to be analyzed [2]. Though there have been various frameworks for visualizing information, there hasn't been much research aimed at visualizing the events that are captured in the log files. Komlodi et al. (2004) proposed a popular framework which is enriched with a good set of requirements for visualizing the intrusions in an IDS. However, they do not provide any details for handling the data in the logs which is essentially the source of data for an IDS, nor do they provide any tasks for predicting an attack. It has also been identified that current IV systems tend to place more importance on the monitoring phase over the other two equally important phases. Hence, a framework that can tackle this problem should be developed. Proposed Framework We propose a framework for developing an IDS which works by monitoring the log files. The framework provides users with a set of parameters that have to be decided before developing the IDS and supports the classification of activities in the network into 3 types, namely: Attack, Suspicious and Not Attack. It also provides phase-specific visualization tasks, and other tasks that are required for extracting information from log files and those that limit the size of the logs. We also outline the working of a Log Agent that is responsible for collecting information from different log files and then summarizing them into one master log file [3]. The proposed framework is applied on a simple file portal system that keeps track of users who access/delete/modify an existing file or add new files.The master log file captures the browsing patterns of the users that use the file portal. This data is then visualized to monitor every activity in the network. Each activity is visualized as a pixel whose attributes describe whether it is an authorized activity or an illegal attempt to access the system. In the analysis phase, tasks that help to determine a potential attack and the reasoning behind the classification of an activity as Suspicious or Attack are provided. Finally, in the response phase, tasks that can resolve the attack and tasks for reporting the details of the attack for future analysis are provided. References [1] A. Komlodi, J. Goodall, and W. Lutters, “131 An information visualization framework for intrusion detection,” CHI’04 Extended Abstracts on..., pp. 1743– 1746, 2004. [Online]. Available: http://dl.acm.org/citation.cfm?id = 1062935 [2] R. Fernandez and N. Fetais, “Framework for Visualizing Browsing Patterns Captured in Computer Logs Using Data Mining Techniques,” International Journal of Computing & Information Sciences, vol. 12, no. 1, pp. 83–87, 2016. [3] H. Kato, H. Hiraishi, and F. Mizoguchi, “Log summarizing agent for web access data using data mining techniques 2. Approach for web access log mining 3 Analysis of web access log 4. Design of Log Analysis System,” Analysis, vol. 00, no. C, pp. 2642–2647.
-
-
-
Automated Service Delivery and Optimal Placement for CRANs
Authors: Aiman Erbad, Deval Bhamare, Raj Jain and Mohammed SamakaTraditionally, in cellular networks, users communicate with the base station that serves the particular cell under coverage. The main functions of a base station can be divided into two, which are the baseband unit (BBU) functionalities and the remote radio head (RRH) functionalities. The RRH module is responsible for digital processing, frequency filtering and power amplification. The main sub-functions of the baseband processing module are coding, modulation, Fast Fourier Transform (FFT) and others. Data generally flows from RRH to BBU for further processing. Such BBU functionalities may be shifted to the cloud based resource pool, called as the Cloud-Radio Access Network (C-RAN) to be shared by multiple RRHs. Advancements in the field of cloud computing, software defined networking and virtualization technology may be leveraged by operators for the deployment of their BBU services, reducing the total cost of deployment. Recently, there has been a trend to collocate the baseband unit (BBU) functionalities and services from multiple cellular base stations into a centralized BBU pool for the statistical multiplexing gains. The technology is known as Cloud Radio Access Network (C-RAN). C-RAN is a novel mobile network architecture that can address a number of challenges the mobile operators face while trying to support the growing end users’ needs. The idea is to virtualize the BBU pools, which can are shared by different cellular network operators, allowing them to rent radio access network (RAN) as a cloud service. However, the manual configuration of the BBU services over the virtualized infrastructure may be inefficient and error-prone with the increasing mobile traffic. Similarly, in centralized BBU pools, non-optimal placement of the Virtual Functions (VFs) might result in a high deployment cost as well as long delays to the end-users. This may mitigate the advantages of this novel technology platform. Hence, the optimized placement of these VFs is necessary to reduce the total delays as well as minimize the overall cost to operate the C-RANs. Despite great advantages provided by the C-RAN architecture, there is no explicit support for the mobile operators to deploy their BBU services over the virtualized infrastructure, which may lead to the ad-hoc and error-prone service deployment in the BBU pools. Given the importance of C-RANs and yet the ad-hoc nature of their deployment, there is a need of automated and optimal application delivery in the context of cloud-based radio access networks to fully leverage the cloud computing opportunities in the Internet. In this work, we propose development of a novel automated service deployment platform, which will help to automate the instantiation of virtual machines at the cloud as user demands vary to achieve end-to-end automation in service delivery for C-RANs. Also, we consider the problem of optimal VF placement over distributed virtual resources spread across multiple clouds, creating a centralized BBU cloud. The aim is to minimize the total response time to the base stations in the network, as well as to satisfy the cost and capacity constraints. In this work, we implement an enhanced version of the two common approaches in the literature, which are: (1) branch-and-bound (BnB) and (2) Simulated Annealing (SA). The enhancement reduces the execution complexity of the BnB heuristic so that the allocation is faster. The proposed enhancements also improve the quality of the solution significantly. We compare the results of the standard BnB and SA schemes with the enhanced approaches to demonstrate these claims. Our aim was to develop a faster solution which can meet the latency requirements of the C-RANs, while the performance (here, in terms of cost and latency) is not far from the optimal. The proposed work contributes to “Information & Computing Technology” pillar of ARC’18. Also, it contributes to Qatar National Vision 2030 that encourages ICT initiatives. This vision, envisages Qatar at the forefront of the latest revolutions in computing, networking, Internet, and Mobility. Mobile applications form the majority of business applications on the Internet. This research proposal addresses the latest research issues in proliferation of the novel technology such as 5G. This project is timely since there is limited research, in Qatar (as well as globally) on supporting application delivery in general in the context of multiple heterogeneous cloud-based application deployment environments.
-
-
-
Does Cultural Affinity Influence Visual Attention An Eye Tracking Study of Human Images in ECommerce Websites
More LessThe objective of this research is to better understand the influence of cultural affinity on the design of Arab e-commerce websites, specifically the use of human images. Three versions of an e-commerce website selling cameras were built for this study – one with the image of an Arab woman holding a camera, another with the image of a Western woman holding a camera, and a third with no human image. All three websites displayed the same products (cameras) and contained the same navigational and textual elements. An eye tracking experiment involving 45 Arab participants showed that the image of the Arab woman gained participants' visual attention faster and for a longer duration than the image of the Western woman. When participants were presented with all three websites, 64.5% expressed their preference to purchase from the website with an Arab image. Although not reported in detail here, a structured questionnaire was also administered to study the influence of cultural affinity on perceived social presence and image appeal. A post-interview yielded further insights into participant preferences and the selection of culture-specific content for designing Arab ecommerce websites.
-
-
-
The political influence of the Internet in the Gulf
More LessI am a current student researching the findings from the QNRF NPRP grant, “Media Use in the Middle East” (NPRP 7-1757-5-261), a seven-nation survey by Northwestern University in Qatar. I am particularly interested in the potential of the Internet to increase feelings of political efficacy among citizens in the Arab region. The Internet has been shown to create feelings of political efficacy in many instances around the world, like how social media accelerated Egypt's 2011 (Gustin, S, 2011), but it can also create feelings of disempowerment (Bibri, S, E, 2015). I am interested in this topic specifically because, with the lack of freedom of expression in the Gulf region, many are turning to the internet to share their opinion on their country's political matter. Although there are consequences to those who criticize Gulf governments, the growing significance of the internet has become a communication tool between the country and its people. In my research, I look only at nationals in the Gulf countries of Qatar, Saudi Arabia, and the United Arab Emirates. Although the survey covers expatriate residents as well, I choose to use data on nationals only due to the personal nature of the question. Expatriates living in one of these Gulf countries might answer the question with their own country in mind or they might answer with their residence country in mind. How can we analyze the meaning of their responses without knowing exactly what they were thinking about? However, we know that nationals of specific countries will be thinking of whether they have increased political efficacy in their own countries, making the analysis clearer. Also, nationals on the Internet would most likely have a bigger political influence on their country than expatriates, and officials would probably prioritize their opinions over expatriates, which also justifies the look at only nationals. I investigate feelings of political efficacy through the Internet by focusing on a set of two questions. The question begins with, “Do you think by using the Internet…” and then asks the respondents two different statements: “…People like you can have more political influence?” and “…Public officials will care more what people like you think?” The response options were a scale of 1 to 5 where 1 means strongly disagree and 5 means strongly agree. These two statements probe different, but important areas of political efficacy. They probe the freedom of expression in the Gulf and whether these expressions are being heard and acted upon by the government, both in terms of citizen perceptions of political influence and emotional connection. My initial research demonstrates differences in levels of political efficacy across the three countries, including over time, and also within different demographics. The overall results show Saudi Arabia (58%) with the most belief in the political efficacy of the Internet, followed by Qatar (36%), then the UAE (18%). For Saudi, the majority of the sample believe they have a political influence on the internet, whereas in Qatar and the UAE, less than half believe so. This demonstrates a significant difference between the nations, which will need further investigating. We also see some interesting changes over time. Overall, more people in 2017 believe that officials care about what's being said on social media than in 2015. In Saudi, it moved up 15%, whereas in the UAE, there was only a 4% increase. Since this particular question was asked differently in Qatar in 2015, the comparison data wasn't given. However, the overall showings implicate that Gulf governments have begun to respond to the public's political input on the Internet. This will further be explored through interviews with nationals from the three countries. I've also found that there are similarities among the three countries as well. By looking at the results by gender, I find that more men (for example, in Saudi, 62%) than women (54%) believe they have more political influence by using the Internet. Also, when looking at the results by cultural conservativism or cultural progressivism, more progressives than conservatives believe they have political influence (Saudi: 65% vs. 53%) and that the government cares more about what people like them think when using the Internet (Saudi: 61% vs. 59%). My next stage of research is to investigate the number of issues about free speech and political efficacy in the Middle East that I have begun to uncover through analysis of the survey data. The main story I want to explain is the belief that the use of the Internet increases political influence in this region. I am particularly interested in investigating questions about gender inequality on social media, the extent to which the public is able to talk about important issues facing their countries on the Internet and if they feel they are being heard. I am also interested in learning more about the perceptions of conservative and progressive people on free speech and politics and how these perceptions have changed over the years. My analysis will be based on data from the survey as well as interviews conducted by myself of citizens from the three countries, which will give context to the survey data. Work cited: Bibri, S. E. (2015) The Shaping of Ambient Intellience and the Internet of Things. Norway: Atlantis Press. Dennis, E. E., Martin, J. D., & Wood, R. (2017). Media use in the Middle East, 2017: A seven-nation survey. Northwestern University in Qatar. Retrieved from www.mideastmedia.org/survey/2017. Gustin, S. Social Media Sparked, Accelerated Egypt's Revolutionary Fire. (2011, Nov 2), Wired. Retrieved from https://www.wired.com/2011/02/egypts-revolutionary-fire/
-
-
-
A Cheremebased Sign Language Description System
Authors: Abdelhadi Soudi and Corinne VinopolBecause sign languages are not written, it is challenging to describe signs without knowing a spoken language equivalent. Sign language does not represent in any direct way the form of the spoken language either by visually representing sounds or syntactic sequences of words of the spoken language. One sign may mean an entire Arabic phrase and vice versa. Sign language can only be described, animated or videotaped. For example, a Deaf person may have it difficult to convey a sign for a concept such as “moon” on paper if he/she does not know the Arabic word. In this paper, we describe a notation system enables users (among other functions) to choose. Sign Language users identify the four cheremes for each hand for the STEM sign for which they want to find Standard Arabic equivalents by using pictorial lists. Using the four descriptors (hand shape, movement, location, and palm orientations of both hands, Deaf and Hard-of-Hearing users can describe Moroccan Sign Language (MSL) signs and find corresponding Arabic STEM terms, MSL and Arabic definitions, and concept pictures. The program will then search the database for the sign that most closely matches the selected cheremes in the STEM Sign Database. Then Standard Arabic information (definitions, parts of speech, etc.) and MSL information (graphic signs and videos) are displayed to the user. There are two possible scenarios that occur: if the system finds an exact match of the selected cheremes, then it will return the exact sign with Standard Arabic and MSL information; otherwise, it will display signs that most closely match the selected cheremes. In our database, signs and words have an N-N relationship, which means that there are signs that refer to multiple Standard Arabic words and vice-versa. Therefore, we had to group the database by signs and try to select signs with more than one Standard Arabic equivalent. Our database is in alphabetical order by Arabic base form word/stem. This means that when there is an Arabic word that has more than one meaning and consequently different signs, there are separate entries for that word. In order to reverse the strategy, that is, identify signs that can be expressed as different Arabic words and invariably have different meanings, we have reordered the database according to sign graphic file name. By programming retrieval using this reverse strategy, we have created the first-ever digital MSL thesaurus.The creation of this resource required, among other things, the identification and development of codes for the MSL cheremes, code assignment to the STEM signs and addition of Arabic definitions and videotapes of MSL translations of definitions. A usability and feasibility evaluation of the tool was conducted by having educators of deaf children, their parents, and deaf children themselves test the software.
-
-
-
Effectiveness of Driver Simulator as a Driving Education Tool
Authors: Semira Omer Mohammed, Wael Alhajyaseen, Rafaat Zriak and Mohammad KhorasaniThe impact and validity of driving simulators as an educational tool in driving schools in the licensing process remains questioned in literature. Many driving schools utilize driving simulators as a tool to assist students to learn required skills faster. Few studies showed conflicting results whether utilizing driving simulator is effective in improving the quality of driving education process. The applications of driving simulators are not limited to the driver training and education. It can assist in identifying risky drivers for further training to improve risk perception. Driver training has two key aspects: vehicle control and safety knowledge. However, it is common for training courses to focus on vehicle control and priority rules while giving lower attention to safety and risk identification skills. In this regard, driving simulators can play an important role by providing an artificial environment in which students can experience potential risks while driving. In Qatar, the training process to get licensed typically covers basics of vehicle control and driving laws (road signs etc.). Advanced training courses such as defensive driving are also available for those who completed the normal training process and successfully received their license. Such advanced courses are usually limited to companies who require such training to their employees. This paper aims to investigate the effectiveness of driving simulators in driving education. A driving school in the State of Qatar utilizes advanced simulators in its training programme. This study looks at students who go through both simulator and non-simulator training tracks in the driving school. Novice students begin with a 10-hour theory course which mainly focuses on road signs and markings. Following their theory course, the students are required to complete a sign test. This is followed by 5 simulator training sessions of 20-minutes each for those registered in simulator track. The first session is for simulator adaptation and familiarization with the vehicle's controls. The student is required to drive slowly around a simple oval road; a few cars are added at the end of the session. The second session uses a more complex road network with intersections and roundabouts. The last three sessions use a virtual replica of a section of Doha. The third session is conducted with no cars on the road. Traffic is added in the fourth session. In the fifth simulation session, surrounding vehicles are designed to run with unexpected or even aggressive behaviors such as sudden lane changes, speeding and not giving right of way. At the end of the fifth session, the student is issued a performance report. After the simulator sessions, students start with the 40 hours of on the road training. The student is then required to do a parking test followed by a road test. The students are permitted to take their road tests after 20 hours of on the road training. Each student is allowed to fail up to 2 road tests before they are required to sign up for further courses. A random sample of student data was collected from both simulator training and non-simulator training tracks. All the students were first time learners with no previous license and who passed their road tests. Data collected include gender, age, nationality and number of road tests undertaken before they passed. The study aims to determine whether any of the collected variables have significant effect on the number of road tests attempted and passing driving test on first attempt. The factors tested are gender, ethnicity, age and whether on simulator or non-simulator lesson track. Furthermore, the study attempted to formulate a model that can predict the likelihood of passing the driving test on first attempt. This pilot study is expected to clarify the effectiveness of driving simulators as an educational tool and whether their utilization is justifiable.Acknowledgment: This publication was made possible by the NPRP award [NPRP 9-360-2-150] from the Qatar National Research Fund (a member of The Qatar Foundation). The statements made herein are solely the responsibility of the author[s].
-
-
-
ThreatBased Security Risk Evaluation in the Cloud
Authors: Armstrong Nhlabatsi, Khaled Khan, Noora Fetais, Rachael Fernandez, Jin Hong and Dong Seong KimResearch ProblemCyber attacks are targeting the cloud computing systems, where enterprises, governments, and individuals are outsourcing their storage and computational resources for improved scalability and dynamic management of their data. However, the different types of cyber attacks, as well as the different attack goals, create difficulties providing the right security solution needed. This is because different cyber attacks are associated with different threats in the cloud computing systems, where the importance of threats varies based on the cloud user requirements. For example, a hospital patient record system may prioritize the security of cyber attacks tampering patient records, while a media storage system may prioritize the security of cyber attacks carrying out a denial of service attack for ensuring a high availability. As a result, it is of paramount importance to analyze the risk associated with the cloud computing systems taking into account the importance of threats based on different cloud user requirements.However, the current risk evaluation approaches focus on evaluating the risk associated with the asset, rather than the risk associated with different types of threats. Such a holistic approach to risk evaluation does not show explicitly how different types of threats contribute to the overall risk of the cloud computing systems. Consequently, This makes it difficult for security administrators to make fine-grained decisions in order to select security solutions based on different importance of threats given the cloud user requirements. Therefore, it is necessary to analyze the risk of the cloud computing systems taking into account the different importance of threats, which enables the allocation of resources to reduce particular threats, identify the risk associated with different threats imposed, and identify different threats associated with cloud components.Proposed SolutionThe STRIDE threat modeling framework (short for STRIDE) is proposed by Microsoft, which can be used for threat categorization. Using the STRIDE, we propose a threat-guided risk evaluation approach for the cloud computing systems, which can evaluate the risk associated with each threat category from the STRIDE explicitly. Further, we utilize seven different types of security metrics to evaluate the risk namely: \textit{component, component-threat, threat-category, snapshot, path-components, path-threat}, and \textit{overall asset}. Component, component-threat, threat-cateory, and snapshot risks measure the total risk on a component, component risk for a particular threat category, total snapshot risk for a single threat, and the total risk of the snapshot considering all threat categories, respectively. Path-components, path-threat, and overall asset measure the total risk of components in an attack path, the risk of a single threat category in the attack path, and the overal risk to an asset considering all attack paths, respectively. These metrics makes it possible to measure the contribution of each threat category to the overall risk more precisely.When a vulnerability is discovered in a component (e.g. a Virtual Machine) of the Cloud deployment, the administrator first determines which types of threats could be posed should the vulnerability be successfully exploited, and what would be the impacts of each of those threats on the asset. The impact assignment of each threat type is weighted depending on the importance of the component. For example, a Virtual Machine (VM) that acts a Web Server in a medical records management application could be assigned a higher weighting for \textit{denial-of-service} threats because if such attacks are successfully launched then the rest of the VMs that are reached through the Web Server will be unavailable. On the other hand, a vulnerability discovered in a VM that hosts a database of medical records would be rated highest impact for \textit{information disclosure} because if it is compromised confidentiality of the medical history of patients will be violated.By multiplying the probability of successfully exploiting the vulnerability with the threat impact, we compute the risk of each threat type. The variation in the assignment of impact for different threat types enables our approach to compute risks associated with the threats - thus empowering the security administrator with the ability to make fine-grained decisions on how much resources to allocate for mitigating which type of threat and which threats to prioritize. We evaluated the usefulness of our approach through its application to attack scenarios in an example Cloud deployment. Our results show that it is more effective and informative to administrators compared to asset-based approaches to risk evaluation.
-
-
-
Compressive SensingBased Remote Monitoring Systems for IoT applications
Authors: Hamza Djelouat, MOHAMED Al Disi, Abbes Amira and Faycal BensaaliInternet of things (IoT) is shifting the healthcare delivery paradigm from in-person encounters between patients and providers to an «anytime, anywhere» model delivery. Connected health has become more profound than ever due to the availability of wireless wearable sensors, reliable communication protocols and storage infrastructures. Wearable sensors would offer various insights regarding the patient's health (electrocardiogram (ECG), electroencephalography (EEG), blood pressure, etc.) and their daily activities (hours slept, step counts, stress maps,) which can be used to provide a thorough diagnosis and alert healthcare providers to medical emergencies. Remote elderly monitoring system (REMS) is the most popular sector of connected health, due to the spread of chronic diseases amongst the older generation. Current REMS use low power sensors to continuously collect patient's records and feed them to a local computing unit in order to perform real-time processing and analysis. Afterward, the local processing unit, which acts as a gateway, feeds the data and the analysis report to a cloud server for further analysis. Finally, healthcare providers can then access the data, visualize it and provide the proper medical assistant if necessary. Nevertheless, the state-of-the-art IoT-based REMS still face some limitations in terms of high energy consumption due to raw data streaming. The high energy consumption decreases the sensor's lifespan immensely, hence, a severe degradation in the overall performance of the REMS platform. Therefore, sophisticated signal acquisition and analysis methods, such as compressed sensing (CS), should be incorporated. CS is an emerging sampling/compression theory, which guarantees that an N-length sparse signals can be recovered from M-length measurement vector (M<<N) using efficient algorithms such as convex relaxation approaches and greedy algorithms. This work aims to enable two different scenarios for REMS by leveraging the concept of CS in order to reduce the number of samples transmitted form the sensors while maintaining a high quality of service. The first one is dedicated to abnormal heart beat detection, in which, ECG data from different patients is collected, transmitted and analysed to identify any type of arrhythmia or irregular abnormalities in the ECG. The second one aims to develop an automatic fall detection platform in order to detect falls occurrence, their strength, their direction in order to raise alert and provide prompt assistance and adequate medical treatment. In both applications, CS is explored to reduce the number of transmitted samples form the sensors, hence, increase the sensors lifespan. In addition, the identification and the detection is enabled by means of machine learning and pattern recognition algorithms. In order to quantify the performance of the system, subspace pursuit (SP) has been adopted as recovery algorithm. Whereas for data identification and classification, K-nearest neighbour (KNN), E-nearest neighbour (ENN), decision tree (BDT) and committee machine (CM) have been adopted.
-
-
-
Applied Internet of Things IoT: Car monitoring system for Modeling of Road Safety and Traffic System in the State of Qatar
Authors: Rateb Jabbar, Khalifa Al-Khalifa, Mohamed Kharbeche, Wael Alhajyaseen, Mohsen Jafari and Shan JiangOne of the most interesting new approaches in the transportation research field is the Naturalistic Driver Behavior which is intended to provide insight into driver behavior during everyday trips by recording details about the driver, the vehicle and the surroundings through an unobtrusive data gathering equipment and without experimental control. In this paper, an Internet of Things solution that collects and analyzes data based on Naturalistic Driver Behavior approach is proposed. The analyzed and collected data will be used as a comprehensive review, and analysis of the existing Qatar traffic system, including traffic data infrastructure, safety planning, engineering practices and standards. Moreover, data analytics for crash prediction and the use of these predictions for the purpose of systemic and systematic network hotspot analysis, risk-based characterization of roadways, intersections, and roundabouts are developed. Finally, an integrated safety risk solution was proposed. This latter, enables decision makers and stakeholders (road users, state agencies, and law enforcement) to identify both high-risk locations and behaviors by measuring a set of dynamic variables including event-based data, roadway conditions, and driving maneuvers. More specifically, the solution consists of a driver behaviors detector system that uses mobile technologies. The system can detect and analyze several behaviors like drowsiness and yawning. Previous works are based on detecting and extracting facial landmarks from images. However, the new suggested system is based on a hybrid approach to detect driver behavior utilizing a deep learning technique using a multilayer perception classifier. In addition, this solution can also collect data about every day trips like start time, end time, average speed, maximum speed, distance and minimum speed. Furthermore, it detects for every fifteen seconds measurements like GPS position, distance, acceleration and rotational velocity along the Roll, Pitch and Yaw axes. The main advantage of the solution is to reduce safety risks on the roads while optimizing safety mitigation costs to a society. The proposed solution has three-layer architecture, namely, the perception, network, and application layers as detailed below. I. The perception layer is the physical layer, composed from several Internet of Thing devices that uses mainly use the smart phones equipped with cameras and sensors (Magnetometer, Accelerometers Gyroscope and Thermometer, GPS sensor and Orientation sensor) for sensing and gathering information about the driver behavior roads and environment as shown in Fig. 1. II. The network layer is responsible for establishing the connection with the servers. Its features are also used for transmitting and processing sensor data. In this solution, hybrid system that collect data and store them locally before sending them to the server is used. This technique proves its efficiency in case of Poor Internet coverage and unstable Internet connection. III. The application layer is responsible for delivering application specific services to end user. It consists in sending the data collected to web server in order to be treat and analyzed before displaying it to the final end user. The web service which part of the application layer is the component responsible for collecting data not only from devices but also from other sources such General Traffic Directorate at Minister of Interior to gather the crash details. This web service stocks all stored data in database server and analyses them. Then, the stored data and analysis will be available for end user via website that has direct access to the web services. Figure 1: Architecture of Car monitoring system Keywords: Driver Monitoring System, DrowsinessDetection, Deep Learning, Real-time Deep Neural Network, Fig. 1: Architecture of IoT solution Keywords: Driver Monitoring System, Drowsiness Detection, Deep Learning, Real-time Deep Neural Network,
-
-
-
Importance of CapabilityDriven Requirements for Smart City Operations
Capability oriented requirements engineering is an emerging research area where designers are faced with the challenge of analyzing changes in the business domain, capturing user requirements, and developing adequate IT solutions taking into account these changes and answering user needs. In this context, researching the interplay between design-time and run-time requirements with a focus on adaptability is of great importance. Approaches to adaptation in the requirements engineering area consider issues underpinning the awareness of requirements and the evolution of requirements. We are focusing on researching the influence of capability-driven requirements on architectures for adaptable systems to be utilized in smart city operations. We investigate requirements specification, algorithms, and prototypes for smart city operations with a focus on intelligent management of transportation and on validating the proposed approaches. In this framework, we conducted a systematic literature review (SLR) of requirements engineering approaches for adaptive system (REAS). We investigated the modeling methods used, the requirements engineering activities performed, the application domains involved, and the deficiencies that need be tackled (in REAS in general, and in SCOs in particular). We aimed at providing an updated review of the state of the art in order to support researchers in understanding trends in REAS in general, and in SCOs in particular. We also focused on the study of Requirement Traceability Recovery (RTR). RTR is the process of constructing traceability links between requirements and other artifacts. It plays an important role in many parts of the software life-cycle. RTR becomes more important and exigent in the case of systems that change frequently and continually, especially adaptive systems where we need to manage the requirement changes in such systems and analyze their impact. We formulated RTR as a mono and a multi-objective search problem using a classic Genetic Algorithm (GA) and a Non-dominated Sorting-based Genetic Algorithm (NSGA-II) respectively. The mono-objective approach takes as input the software system, a set of requirements and generates as output a set of traces between the artifacts of the system and the requirements introduced in the input. This is done based on the textual similarity between the description of the requirements and the artifacts (name of code elements, documentation, comments, etc.). The multi-objective approach takes into account three objectives, namely, the recency of change, the frequency of change, and the semantic similarity between the description of the requirement and the artifact. To validate the two approaches, we used three different open source projects. The reported results confirmed the effectiveness of the two approaches in correctly generating the traces between the requirements and artifacts with high precision and a recall. A comparison between the two approaches shows that the multi-objective approach is more effective than the mono-objective one. We also proposed an approach aiming at optimizing service composition in service-oriented architectures in terms of security goals and cost using NSGA-II in order to help software engineers to map the optimized service composition to the business process model based on security and cost. To do this, we adapted the DREAD model for security risk assessment by suggesting new categorizations for calculating DREAD factors based on a proposed service structure and service attributes. To validate the proposal, we implemented the YAFA-SOA Optimizer. The evaluation of this optimizer shows that risk severity for the generated service composition is less than 0.5, which matches the validation results obtained from a security expert. We also investigated requirements modeling for an event with a large crowd using the capability-oriented paradigm. The motivation was the need for the design of services that meet the challenges of alignment, agility, and sustainability in relation to dynamically changing enterprise requirements especially in large-scale events such as sports events. We introduced the challenges to stakeholders involved in this process and advocated a capability-oriented approach for successfully addressing these challenges. We also investigated a multi-type, proactive and context-aware recommender system in the environment of smart cities. The recommender system recommends gas stations, restaurants, and attractions, proactively, in an internet of things environment. We used a neural network to do the reasoning and validated the system on 7000 random contexts. The results are promising. We also conducted a user's acceptance survey (on 50 users) that showed satisfaction with the application. We also investigated capturing uncertainty in adaptive intelligent transportation systems, which need to monitor their environment at run-time and adapt their behavior in response to changes in this environment. We modelled an intelligent transportation case study using the KAOS goal model and modelled uncertainty by extending our case study using variability points, and hence having different alternatives to choose from depending on the context at run-time. We handled uncertainty by molding our alternatives using ontologies and reasoning to select the optimal alternative at run-time when uncertainty occurs. We also devised a framework, called Vehicell that exploits 5G mobile communication infrastructures to increase the effectiveness of vehicular communications and enhance the relevant services and applications offered in urban environments. This may help in solving some of the mobility problems, and smooth the way for innovative services to citizens and visitors and improve the overall quality of life.
-
-
-
Secure RF Energy Harvesting Scheme for Future Wireless Networks
More LessWireless communication is shaping the future of seamless and reliable connectivity of billions of devices. The communication sector in Qatar is mainly driven by rising demand of higher data rates and uninterrupted connectivity of wireless devices. Wireless fidelity (Wi-Fi), cellular telephone and computer interface devices (e.g. Bluetooth) are a few of the commonly used applications for wireless distribution of information in Qatar. According to analysts, strong growth of Islamic banking, increase in IT consolidation and increased adoption of mobility solutions are some of the key contributors to the growth of digital infrastructure in Qatar. Modernization of legacy infrastructure is another focal point of government of Qatar to enable e-government initiative in rural areas and Tier II/ III cities of Qatar which has long term effects in various domains such as health, electricity, water, heat, communication and trade. Considering this exponential rise of wireless communication in Qatar, a great deal of research is being done from the perspective of secure deployment of new wireless networks. There is also a growing demand to develop more energy-efficient communication techniques to reduce the consumption of fossil fuel without compromising the quality of experience of users. This is also beneficial for solving economic issues that cellular operators are faced with the ever growing number of users. Moreover, with the upcoming FIFA world cup in 2022, millions of dollars are being spent to enhance the capacity and security of existing and upcoming communication networks. However, with greater connectivity and ultimate functionality come several important challenges. The first challenge is the security of data or more specifically who has access to the data. The broadcast nature of wireless channels implies that the transmitted information signals are also received by nodes other than the intended receiver, which results in the leakage of information. Encryption techniques at higher layers are used to secure transmitted information. However, the high computational complexity of these cryptographic techniques consumes significant amount of energy. Moreover, secure secret key management and distribution via an authenticated third party is typically required for these techniques, which may not be realizable in a dense wireless networks. Therefore, a considerable amount of work has recently been devoted to information-theoretic physical layer security (PLS) as a secure communication technique which exploits the characteristics of wireless channels, such as fading, noise, and interferences. The varying nature of these factors causes randomness in the wireless channel which can be exploited to achieve security. The transmission of secret messages takes place when the receiver»s channel experiences less fading than the eavesdropper»s channel, otherwise transmission remains suspended. The second concern is regarding the limited lifetime of wireless devices, especially when massive amount of data needs to be collected and transferred across the network. This challenge can be addressed by innovative means of powering, and for small energy limited devices, this implies the use of energy harvesting (EH) techniques. In this context, the transfer of data and power over a common electromagnetic (EM) wave has gained significant research interest over the past decade. The technique which merges wireless information transfer (WIT) with wireless power transmission (WPT) is commonly termed as simultaneous wireless information and power transfer (SWIPT). However, SWIPT systems cannot be supported using conventional design of transmitter and receiver. To address this issue, two broad categories of receiver architectures have been proposed in SWIPT literature i.e. separated and integrated architecture. In separated receiver architecture, the information decoder and energy harvester act as dedicated and separate units. This although not only increases the cost of receiver but also increases the complexity of the hardware. In contrast, the integrated receiver architecture jointly processes the information and energy using a unified circuitry for both. This architecture reduces the cost and hardware complexity Our work attempts to address the aforementioned issues by evaluating secrecy performance and proposing practical secrecy enhancement scheme in EH wireless devices. In particular, we investigate PLS in SWIPT systems in the presence of multiple eavesdroppers. The secrecy performance of the SWIPT system is analyzed for Rician faded communication links. The security performance is analyzed for imperfect channel estimation, and both separated and integrated receiver architectures for the SWIPT system. We derive closed-form expressions of the secrecy outage probability and the ergodic secrecy rate for the considered scenario and validate the derived analytical expressions through extensive simulations. Our results reveal that an error floor appears due to channel estimation errors at high values of signal to noise ratio (SNR); such that outage probability cannot be further minimized despite an increase in the SNR of the main link. Moreover, the results show that largest secrecy rate can be achieved when the legitimate receiver is equipped with separated SWIPT receiver architecture and the eavesdroppers have an integrated SWIPT receiver architecture. It is also demonstrated that the power splitting factor at both legitimate receiver and at eavesdroppers play a prominent role in determining the secrecy performance of SWIPT. We prove that a larger power splitting factor is required to ensure link security for poor channel estimation. Finally, our work discusses transmit antenna selection and baseline antenna selection schemes to improve security. Therein, it is shown that transmit antenna selection outperforms baseline antenna selection. The results provided in this work can be readily used to evaluate the secrecy performance of SWIPT systems operating in the presence of multiple eavesdroppers.
-
-
-
Implementing and Analyzing a Recursive Technique for Building Path Oblivious RAM
Authors: Maan Haj Rachid, Ryan Riley and Qutaibah MalluhiIt has been demonstrated that encrypting confidential data before storing it is not sufficient because data access patterns can leak significant information about the data itself (Goldreich & Ostrovsky, 1996). Oblivious RAM (ORAM) schemes exist in order to protect the access pattern of data in a data-store. Under an ORAM algorithm, a client accesses a data store in such a way that does not reveal which item it is interested in. This is typically accomplished by accessing multiple items each access and periodically reshuffling some, or all, of the data on the data-store. One critical limitation of ORAM techniques is the need to have large storage capacity on the client, which is typically a weak device. In this work, we utilize an ORAM technique that adapts itself to working with clients having very limited storage. A trivial implementation for an oblivious RAM scans the entire memory for each actual memory access. This scheme is called linear ORAM. Goldreich and Ostrovsky (Goldreich & Ostrovsky, 1996) presented two ORAM constructions with a hierarchical layered structure: the first, Square-root ORAM, provides square root access complexity and constant space requirement; the second, Hierarchical ORAM, requires logarithmic space and polylogarithmic access complexity. Square-root ORAM was revisited by (Zahur, et al.) to improve its performance in a multi-party secure computation setting. The work of Shi et al. (Shi, Chan, Stefanov, & Li, 2011) adopted a sequence of binary trees as the underlying structure. (Stefanov, et al., 2013) utilized this concept to build a simple ORAM called path ORAM. In path ORAM, every block (item) of data in the input array A is mapped to a (uniformly) random leaf in a tree (typically a binary tree) on the server. This is done using a position map stored in the client memory. Each node in the tree has exactly Z blocks which are initially dummy blocks. Each data item is stored in a node on the path extending from the leaf to which the data item was mapped, to the root. When a specific item is requested, a position map is used to point out the leaf to which the block is mapped. Then the whole path is read starting from the block's mapped leaf up to the root into a stash. We call this procedure a read path method. The stash is a space which also exists on the client. The block is mapped to a new leaf (uniformly random). The client gets the required block from the stash. We then try to evict the contents of the stash to the same path which we read from starting from the leaf to the root. We call this procedure a write path method. To transfer a block into a node in the tree: - The node that is tested should have enough space. - The node should be on the path to which the tested block in the stash is mapped. If both conditions are met, a block is evicted to the tested node. Clearly, the tree is encrypted and whenever the client reads a path into the stash, all read blocks are decrypted and the requested block is sent to the client. The client encrypts the blocks before writing them back to the path. The security proof of this ORAM type is explained in (Stefanov, et al., 2013). We assume that a position map can be fit in the client's memory. Since it requires O(N) space, that could be a problem. (Stefanov, et al., 2013) mentioned a general idea for a solution that uses another smaller ORAM O1 on the server to store the position map and stores the position map for O1 in the client's memory. We employ a recursive generalized version of this approach. If the position map is still larger than the client's capacity, a smaller ORAM O2 is built to store the position map for O1. We call these additional trees auxiliary trees. We implemented the recursive technique for Path ORAM and studied the effect of the threshold size of position map (and consequently, the number of auxiliary trees) on the performance of path ORAM. We tested our implementation on 1 million items using several threshold sizes of position map. The number of accesses is 10,000 in all tests. Our results show expected negative correlation between the time consumption and the threshold size of the position map. However, the results suggest that unless the increase in the threshold size of position map decreases the number of trees, no significant improvement in performance will be noticed. It is also clear that the initialization process which is the process of building the items» tree and the auxiliary trees and filling the initial values comprises more than 98% of the consumed time. Accordingly, this type of ORAM suits the case of large number of accesses since the server can fulfil client»s request very fast after finishing the initialization process. References Goldreich, O., & Ostrovsky, R. (1996). Software protection and simulation on oblivious rams. Journal of the ACM (JACM), vol. 43, no. 3, pp. 431–473. Shi, E., Chan, T.-H., Stefanov, E., & Li, M. (2011). Oblivious ram with o ((logn) 3) worst-case cost. International Conference on The Theory and Application of Cryptology and Information Security. Springer, pp. 197–214. Stefanov, E., Dijk, M. V., Shi, E., Fletcher, C., Ren, L., Yu, X., et al. (2013). Path oram: an extremely simple oblivious ram protocol. Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. ACM, pp. 299–310.. Zahur, S., Wang, X., Raykova, M., Gascon, A., Doerner, J., Evans, D., et al. (n.d.). Revisiting square-root oram: Efficient random access in multi-party computation. 2016: Security and Privacy (SP), IEEE Symposium on. IEEE, pp. 218–234.
-
-
-
CONNECT: CONtextual NamE disCovery for blockchainbased services in the IoT
More LessThe Internet of Things is gaining momentum thanks to the provided vision of seamlessly interconnected devices. However, a unified way to discover and to interact with the surrounding smart environment is missing. As an outcome, we have been assisting to the development of heterogeneous ecosystems, where each service provider adopts its own protocol—thus preventing IoT devices from interacting when belonging to different providers. And, the same is happening again for the blockchain technology which provides a robust and trusted way to accomplish tasks—unfortunately not providing interoperability thus creating the same heterogeneous ecosystems above highlighted.In this context, the fundamental research question we address is how do we find things or services in the Internet of Things. In this paper, we propose the firstIoT discovery approach which provides an answer to the above question by exploiting hierarchical and universal multi-layered blockchains. Our approach does neither define new standards nor force service providers to change their own protocol. On the contrary, it leverages the existing and publicly available information obtained from each single blockchain to have a better knowledge of the surrounding environment. The proposed approach is detailed and discussed with the support of relevant use cases.
-
-
-
Multiple Input Multiple Output InVivo Communication for Nano Sensor at Terahertz Frequencies
Authors: Aya Fekry Abdelaziz, Ke Yang, Khalid Qaraqe, Joseph Boutros, Qammer Abbasi and Akram AlomainyThis study presents a preliminary feasibility investigation of signal propagation and antenna diversity techniques inside the human skin tissues in the frequency range (0.8-1.2)Terahertz (THz) by applying multiple input-single output (MISO) technique. THz application in in-vivo communication has received a great attention for its unique properties as non ionizing characteristics, strong interaction with water content inside the human tissues, and molecular sensitivity [1]. This study helps to evaluate the usage and the performance of MISO system for nanoscale network. The human skin tissue is represented by three main layers: stratum corneum, epidermis, and dermis. The path loss model and the channel characterization inside the human skin was investigated in [2]. The diversity gain (DG) for two different in-vivo channels resulting from the signal propagation between two transmitting antennas, located at the dermis layer, and one receiving antenna, located at epidermis layer, is calculated to evaluate the system performance. Different diversity combining techniques are applied in this study: selection combining (SC), equal-gain combing (EGC), and maximum-ratio combining (MRC). In the simulation setting in CST microwave studio, the distance between the transmitting antennas is fixed; while the effect of the distance between the receiver and the transmitters is analyzed at different frequencies. Although MIMO antenna systems are used in wireless communication to enhance data throughput, from the initial study it is predicted that might be it is not useful in in-vivo nano communication. Results demonstrates that there is a high cross correlation between the two channels. Figure 1 shows the CDF plot for the two channel with different diversity combining techniques used in the study.[1] J. M. Jornet and I. F. Akyildiz, “Channel modeling and capacity analysis for electromagnetic wireless nanonetworks in theterahertz band,” IEEE Transactions on Wireless Communications, vol. 10, no. 10, pp. 3211–3221, 2011.[2] Q. H. Abbasi, H. El Sallabi, N. Chopra, K. Yang, K. A. Qaraqe, and A. Alomainy, “Terahertz channel characterizationinside the human skin for nano-scale body-centric networks,” IEEE Transactions on Terahertz Science and Technology, vol.6, no. 3, pp. 427–434, 2016.
-
-
-
A Cyberphysical Testbed for Smart Water Networks Research Education and Development in Qatar
Smart water networks integrate sensor data, computation, control, and communication technologies to enhance system performance, reliability and consumer satisfaction. The cyber-physical systems are built from, and rely upon, the tight integration of physical elements of real-time sensors and data integration and cyber algorithmic, control computational and communication layers. A cyber-physical testbed has been developed at Texas A&M University at Qatar to simulate a real smart water network for research, education, and technology development purposes. The physical components include pipes, an automated pump-storage system, programmable logic controllers, controllable valves, a disinfectant injector, sensors, and data acquisition devices. Flow, pressure, temperature, and specific water quality parameters, such as pH and conductivity are continuously monitored by sensors, providing an operator with an up-to-date performance of the current state of the system. The pump-storage system is controlled by programmable logic controllers, and are designed to enable evaluation and enhancement of feedback and model-predictive control algorithms physically. The water tank is equipped with heating apparatus to conduct experimental studies for understanding the effect of water temperature on the fate and transport of chlorine and disinfection byproducts in the drinking water distribution of Qatar. The physical facility is integrated with a cyber-data acquisition and communications layer, and a cloud-based data storage, analytics, and visualization platform. Acquired data is stored and maintained in the format of a non-relational database on a cloud storage service. Additionally, a MongoDB server is set up to query and write data records. The analytics backend engine performs a variety of data transforms including, but not limited to, data cleansing and time series imputation and forecast. The visualization frontend provides a graphical interface to allow the operators interact with the backend engine by doing queries, plotting time series, running data analytics tasks, and generating reports. Together, this integrated physical and cyber layers unleash opportunities for education, research, development, evaluation, and commercialization of a variety of smart water networks technologies. It provides an environment that can predict leaks and pipe bursts based on real-time analytics of high-frequency pressure readings data on the cloud. This also enables developing smart pump-storage control technologies that help reducing non-revenue water loss, energy costs, and carbon emissions. The research team is also investigating harnessing the profound solar power resources available in Qatar for powering treatment plants and pumps by innovating control strategies that can handle the intermittency of such renewable power sources. Two asset management models are also developed and implemented on the testbed. (1) Performance assessment model of water distribution systems, which comprises four assessment modules for water pipelines, water accessories, water segments, and water networks. The model identifies critical factors affecting the performance of a water network and schedules pipe maintenance and replacement plans. (2) Risk assessment model for water pipelines failure, which evaluates the risks of performance and mechanical failures and applies a hierarchical fuzzy model to determine the risk using four risk factors as inputs (i.e., environmental, physical, operational, and post-failure). In addition to the research and technology purposes, this testbed has also provided a valuable learning resource for both operators and students. There are already several undergraduate students who are involved in design and construction of this facility. This has created an opportunity to train, educate, and empower undergraduate students, the future engineers and industry leaders of Qatar.
-
-
-
Realtime Object Detection on Android using Tensorflow
More LessDetection of images or moving objects have been highly worked upon, and has been integrated and used in commercial, residential and industrial environments. But, most of the strategies and techniques have heavy limitations. One of the limitations is due to low computational resources at user level. Other important limitations that need to be tackled are lack of proper data analysis of the measured trained data, dependency on the motion of the objects, inability to differentiate one object from other, and also concern over speed of the object under detection and Illuminacy. Hence, there is a need to draft, apply and recognize new techniques of detection that tackle the existing limitations. In our project we have worked upon a model based on Scalable Object Detection, using Deep Neural Networks to localize and track people, cars, potted plants and 16 others categories in the camera preview in real-time. The large Visual Recognition ImageNet package ‘inception5h’ from google is used. This is a trained model, with images of the respective categories, which is then converted to a graph file using neural networks. The graph nodes are usually huge in number and these are optimized for the use in android. The use of already available trained model is just for the purpose of ease and convenience, nevertheless any set of images can be trained and used in the android application. An important point to note is that training the images will need huge computational speeds and more than one computer supporting GPU. Also a.jar file built with the help of bazel is added to android studio to support the integration of Java and tensorflow. This jar file is the key to getting tensorflow in a mobile device. The jar file is built with the help of openCV, which is a library of programming functions mainly aimed at real-time computer vision. Once this has been integrated, any input to the android application inputed at real-time, is predicted with the help of tiny-yolo (you look only once – a darknet reference network). This application supports multi object detection, which is very useful. All the steps occur simultaneously with great speeds giving remarkable results, detecting all the categories of the trained model within a good illuminacy. The real time detection reference network used also works with an acceptable acceleration of moving objects but is not quite effective with low illumination. The objects are limited to identify only 20 categories but the scope can be broadened with a revised trained model. The 20 categories include «aeroplane»,»bicycle»,»bird»,»boat», «bottle», «bus», «car», «cat», «chair», «cow»,»diningtable»,»dog»,»horse»,»motorbike»,»person», «pottedplant», «sheep»,»sofa»,»train» and «tvmonitor». The application can be used handy in a mobile phone or any other smart device with minimal computational resources ie.., no connectivity to the internet. The application challenges speed and illuminacy. Effective results will help in real-time detection of traffic signs and pedestrians from a moving vehicle. This goes hand in hand with the similar intelligence in cameras which can be used as an artificial eye, and can be used in many areas such as surveillance, Robotics, Traffic, facial recognition, etc.
-
-
-
An integrated multiparametric system for infrastructure monitoring and earlywarning based on internet of things
Authors: Farid Toauti, Damiano Crescini, Alessio Galli and Adel Ben MnaouerOur daily life strictly depends on distributed civil and industrial infrastructures in which Qatar and other heavily industrialized countries have large investments. Recently, fails in such infrastructures have incurred enormous economic losses and development disruptions as well as human lives. Infrastructures are strategic assets for a sustainable development requiring correct management. To this end, their health levels and serviceability should be continuously assessed. Geophysical and mechanical quantities that determine such serviceability are for instance tilt angles, vibration levels, applied forces, stress, and existence of previous structural defects. It follows that for a feasible serviceability assessment, appropriate sensing and data processing of those parameters have to be achieved. For example, bridges are monitored for structure movements and stress level while earthquake early warning systems detect primary seismic waves before arrival of strong waves. In case of riverbank conservation, water level must be monitored together with the associated mass flow for load estimation. In addition, precipitation rate and groundwater level are paramount indicators to anticipate slope fault. Finally, strain/temperature measurement can be used to sense the health of concrete gravity or arch dams. End-users, engineers and owners can take the most appropriate decisions based on the sensed parameters. The Structural Health Assessment (SHA) is not straightforward. The structural condition is generally complex in terms of architectural parameters like damage existence, distributed masses, damping factors, stiffness matrices, and/or applied distributed forces. The above factors make such SHA extremely difficult and/or exceptionally expensive. With the aim to alleviate this difficulty, possible approaches in SHA are based on vibration measurements. The analysis of such measurements reveals the structure dynamic behaviour, which in turn reflects the characteristics and distributed forces on structures. Also, structural soundness is obtained by the estimation of the inverse analyses of the dynamic performance. However, this dynamic behaviour that is inherently complex in both time and/or spatial scale, is more complicated by the fact that for example deterioration/damage/erosion is essentially a local phenomenon. Commonly, technicians with specific domain knowledge achieve SHAs manually. Obviously, this incurs high costs and inadequate monitoring frequency. Also, there is a high probability of making errors due to improper positioning of the instrumentation or to mere mistakes during data collection. Nevertheless, for commonly large buildings (e.g. towers, general buildings, bridges and tunnels), data from just few distributed sensors cannot accurately fulfil the SHA. Consequently, the use of dense distributed sensors working at a sufficiently high sampling frequency becomes a must. Physical wiring of the site under observation is impractical due to cost and architectural constraints. Thus, for Structural Health Monitoring (SHM), networks of dense distributed sensors, which are wirelessly connected, become imperative. When a copious number of transducers are adopted, wireless communication appears to be attractive. Also, the high cost needed for the installation of wired sensors can be strongly reduced by employing wireless sensors. The present research the authors implemented a WSN-based approach for widespread monitoring without forcing intolerable boundary conditions, i.e., requiring wiring the measuring nodes, triggering manual data collection or imposing strong modifications to the site before the deployment of the sensory hardware (less intrusive). In view of the above discussion, the investigators explored some key issues on the above challenges by referring to several SHM engineering paradigms. The author designed a novel multi-parametric system dedicated to stability monitoring and control of soils, engineering works (e.g. bridges, stadium, tunnels), underground rail tunnels, offshore platform in order to continuously evaluate danger levels of potentially instable areas. The proposed system can be assembled ‘in situ’ structuring an underground-instrumented column, where different modules are joined together on a digital bus (e.g. via RS485 or CANBUS communication). Each module contains up to ten different sensors (e.g. accelerometers, magnetometers, inclinometers, extensometers, temperature sensors, and piezometers) and an electronic board for data collection, conversion, filtering and data transmission. Special flexible joints that permit strong, continuous adaptability to bends and twists of the drilling hole, link the modules. A control unit, installed outside the ground provides the readings at regular time intervals and it is connected to other columns via wireless communication forming a wide network. In particular, the proposed approach allows both analysing the response of the infrastructure to vibrations on the fly, so an early warning signal can be triggered, and saving the corresponding measurements for further analysis. Authors believe that this proposal is original and unique in three aspects. First, as most of the earlier studies on SHM were carried out by adapting existing hardwired solution for snap shot measurements rather than representative long-term monitoring, our proposal presents the first initiative to develop green WSN technologies applied to sustainable SHM applications. Second, it will develop tailored sensor technology and new techniques for SHM taking into account metrological and physical parameters such as resolution, cost, accuracy, size, and power consumption. Third, the project will commission a novel multi-parametric SHM system, which can be customized to other areas (e.g. environmental monitoring, traffic monitoring, etc.). The research is to support innovations at system and component levels leading to out-of-the-box know-how. The proposed solution is based on novel/customized sensors and data processing, environmentally powered communication platform, and communication networks and algorithms embracing the visionary nature of the IoT with out-of-the-box solutions. Specific outcomes have been experimental proof-of-concept, through testing and prototyping, of a tailored SHM sensor technology and smart techniques that uniquely provide self-calibration and self-diagnostics of faults, a multi-sensor viable instrumented column for SHM with advanced techniques, and environmentally powered wireless platform with innovative MAC protocols (power-aware, context-aware, cognitive and polymorphic). This work employs tools and techniques of modern sensing, processing, and networking in order to generate novel SHM solutions that uniquely provide precision measurement, green IoT-based communication approach, viability, and cost-effectiveness.
-
-
-
Learning Spatiotemporal Latent Factors of Traffic via a Regularized Tensor Factorization: Imputing Missing Values and Forecasting
Authors: Abdelkader Baggag, Tahar Zanouda, Sofiane Abbar and Fethi FilaliSpatiotemporal data related to traffic has become common place due to the wide availability of cheap sensors and the rapid deployment of IoT platforms. Yet, this data suffer several challenges related to sparsity, incompleteness, and noise, which makes traffic analytics difficult. In this paper, we investigate the problem of missing data or noisy information in the context of real-time monitoring and forecasting of traffic congestion for road networks. The road network is represented as a directed graph in which nodes are junctions and edges are road segments. We assume that the city has deployed high-fidelity sensors for speed reading in a subset of edges. Our objective is to infer speed readings for the remaining edges in the network as well as missing values to malfunctioning sensors. We propose a tensor representation for the series of road network snapshots, and develop a regularized factorization method to estimate the missing values, while learning the latent factors of the network. The regularizer, which incorporates spatial properties of the road network, improves the quality of the results. The learned factors along with a graph-based temporal dependency are used in an autoregressive algorithm to predict the future state of the road network with long horizon. Extensive numerical experiments with real traffic data from the cities of Doha(Qatar) and Aarhus (Denmark) demonstrate that the proposed approach is appropriate for imputing missing data and predicting traffic state.Main contributions. The main contributions are:We propose a novel temporal regularized tensor factorization framework (TRTF) for high-dimensional traffic data. TRTF provides a principled approach to account for both the spatial structure and the temporal dependencies.We introduce a novel data-driven graph-based autoregressive model, where the weights are learned from the data. Hence, the regularizer can account for both positive and negative correlations.We show that incorporating temporal embeddings into CP-WOPT leads to accurate multi-step forecasting, compared to state of the art matrix factorization based methods.We conduct extensive experiments on real traffic congestion datasets from two different cities and show the superiority of TRTF for both tasks of missing value completion and multi-step forecasting under different experimental settings. For instance,TRTF outperforms LSM-RN by 24% and TRMF by 29%.Conclusion. We present in this paper TRTF, an algorithm for temporal regularized tensor decomposition. We show how the algorithm can be used for several traffic related tasks such as missing value completion and forecasting. The proposed algorithm incorporates both spa-tial and temporal properties into the tensor decomposition procedures such as CP-WOPT, yielding to learning better factors. We also, extend TRTF with an auto-regressive procedure to allow for multi step-ahead forecasting of future values. We compare our method to recently developed algorithms that deal with the same type of problems using regularized matrix factorization,and show that under many circumstances, TRTF does provide better results. This is particularly true in cases where the data suffers from high proportions of missing values, which is common in the traffic context. For instance, TRTF achieves a 20% gain in MAPE score compared to the second best algorithm (CP-WOPT) in completing missing values in the case of extreme sparsity observed in Doha. As future work, we will first focus on adding non-negativity constraints to TRTF, although the highest fraction of negative values generated by our method throughout all the experiments did not exceed 0.7%. Our second focus will be to optimize TRTF training phase in order to increase its scalability to handle large dense tensors, and to implement it on a parallel environment.
-
-
-
A Deep Learning Approach for Detection of Electricity Theft Cyber Attacks in Smart Grids
Authors: Muhammad Ismail, Mostafa Shahin, Erchin Serpedin and Khalid QaraqeFuture smart grids rely on advanced metering infrastructure (AMI) networks for monitoring and billing purposes. However, several research works have revealed that such AMI networks are vulnerable to different kinds of cyber attacks. In this research work, we consider one type of such cyber attacks that targets electricity theft and we propose a novel detection mechanism based on a deep machine learning approach. While existing research papers focus on shallow machine learning architectures to detect these cyber attacks, we propose a deep feedforward neural network (D-FF-NN) detector that can thwart such cyber attacks efficiently. To optimize the D-FF-NN hyper-parameters, we apply a sequential grid search technique that significantly improves the detector»s performance while reducing the associated complexity in the learning process. We carry out extensive studies to test the proposed detector based on a publicly available real load profile data of 5000 customers. The detector»s performance is investigated against a mixture of different attacks including partial reduction attacks, selective by-pass attacks, and price-based load control attacks. Our study reveals that the proposed D-FF-NN detector presents a superior performance compared with state-of-the-art detector»s that are based on shallow machine learning architectures
-
-
-
A simple and secure framework for protecting sensitive data stored on the cloud
Authors: Elias Yaacoub, Ali Sakr and Hassan NouraIn the past decade, Cloud-Computing emerged as a new computing concept with a distributed nature using virtual network and systems. Many businesses rely on this technology to keep their systems running but concerns are rising about security breaches in cloud computing. This work presents a secure approach for storing data on the cloud. the proposed methodology is described as follows: 1) The client who wants to store data on the cloud subscribes with n cloud providers (CPs). 2) A file F to be stored on the cloud is subdivided into n parts, or subfiles: F1, F2,.... Fn. 3) Each part is encrypted with an encryption key Kf. The encrypted parts are denoted by F1*, F2*,…, Fn*. 4) A random permutation vector P(f) is generated. 5) The encrypted parts are stored on the n clouds according to P(F); in other words, due to P(F), F1* could be stored on CP3 for example F2* on CPn, etc. 6) In order to be able to retrieve his files, the client needs to maintain some information related to the distribution of the various parts and to the encryption of the file. Thus, he maintains two tables. 7) The first table contains a hash of the file name H(F_name), and the key Kf. 8) The second table contains a hash of the file name H(F_name), a hash of the file content itself (unencrypted) H(F), a hash of the encrypted file content H(F*), and the permutation vector P(F), encrypted with Kf. 9) The two tables are stored on different servers protected by advanced security measures, and preferably located at different locations. 10) In order to obtain the file, the client enters the file name. Then the system computes the hash value of the name, finds the corresponding entry in Table 1, and obtains the key. Then, the corresponding entry in Table 2 is found. The key, obtained from Table 1, is used to decrypt the permutation vector P(f). Then, the encrypted parts are downloaded from the different cloud providers. Afterwards, they are assembled in the correct order and decrypted. The hash values of the encrypted and unencrypted versions are then computed and compared to their corresponding values stored in Table2 in order to check for the integrity of the file downloaded from the cloud. This approach allows the client to use the same storage space on the cloud: Instead of using a single cloud provider to store a file of size S bits, the client is using n cloud providers, where he stores S/n bits with each cloud provider. Thus, the storage costs of the two methods are comparable. If the client wishes to introduce redundancy in the file, such that he can recover the whole file from j parts instead of n parts, with j< = n, then redundancy can be added to the original file as appropriate. In this case, the storage costs will increase accordingly, but this is an added enhancement that can be used with or without the proposed approach. On the other hand, the overhead due to the proposed approach consists of maintaining two tables containing the information relevant to the file. The storage required to maintain the entry corresponding to each file in these two tables is small compared to a typical file size: in fact, we only need to store a few hash values (of fixed size), along with the encryption key. This seems a reasonable price to pay for a client that has sensitive data that he cannot post unencrypted on the cloud, or even posting it encrypted with a single provider is risky in case a security breach occurs at the provider»s premises.
-
-
-
Substring search over encrypted data
More LessOur data, be it personal or professional, is increasingly outsourced. This results from the development of cloud computing in the past ten years, a paradigm that shifts computing to a utility. Even without realizing it, cloud computing has entered our lives inexorably: every owner of a smartphone, every user of a social network is using cloud computing, as most IT companies and tech giants in particular are using infrastructure as a service to offer services in the model of software as a service. These services (dropbox, google, facebook, twitter…) are simple to use, flexible…and free! Users just send their data and they get all services without paying. Actually, these companies are making most of their revenues by profiling the users thanks to the data that the users willingly provide. The data is the indirect payment to benefit from these services. This raises privacy concerns at the personal level, as well as confidentiality issues for sensitive documents in a professional environment. The classical way of dealing with confidentiality is to conceal the data through encryption. However, cloud providers need access to data in order to provide useful services, not only to profile users. Take a cloud email service as example, where the emails are stored and archived in the cloud and only downloaded to the user's phone or computer when the user wants to read them. If the emails are encrypted in the cloud, the cloud cannot access them and confidentiality is enforced. However, the cloud can also not provide any useful service to the user such as a search functionality over emails. To meet these conflicting requirements (hiding the data and accessing the data) a solution is to develop mechanisms that allow computation on encrypted data. While generic protocols for computation on encrypted data have been researched developed, such as Gentry's breakthrough fully homomorphic encryption, their performance remains unsatisfactory. On the contrary, tailoring solutions to specific needs result in more practical and efficient solution. In the case of searching over encrypted data, searchable encryptions algorithms have been developed for over decade and achieve now satisfactory performance (linear in the size of the dictionary). Most of the work in this field focus on single keyword search in the symmetric setting. To overcome this limitation, we first proposed a scheme based on letter orthogonalization that allows testing of string membership by performing efficient inner products (AsiaCCS 2013). Going further, we now propose a general solution to the problem of efficient substring search over encrypted data. The solution enhances existing “keyword” searchable encryption schemes by allowing searching for any part of encrypted keywords without requiring one to store all possible combinations of substrings from a given dictionary. The proposed technique is based on the previous idea of letter orthogonalization. We first propose SED-1, the base protocol for substring search. We then identify some attacks on SED-1 that demonstrate the complexity of the substring search problem under different threat scenarios. This leads us to propose our second and main protocol SED-2. The protocol is also efficient in that the search complexity is linear in the size of the keyword dictionary. We run several experiments on a sizeable real world dataset to evaluate the performance of our protocol. This final work has been accepted for publication in the IOS journal of computer security https://content.iospress.com/articles/journal-of-computer-security/jcs14652.
-
-
-
Almost BPXOR Coding Technique for Tolerating Three Disk Failures in RAID7 Architectures
Authors: Naram Mhaisen, Mayur Punkar, Yongge Wang, Yvo Desmedt and Qutaibah MalluhiRedundant Array of Independent Disks (RAID) storage architectures provide protection of digital infrastructure against potential disks failures. For example, RAID-5 and RAID-6 architectures provide protection against one and two disk failures, respectively. Recently, the data generation has significantly increased due to the emergence of new technologies. Thus, the size of storage systems is also growing rapidly to accommodate such large data sizes, which increases the probability for disks failures. This necessitates a new RAID architecture that can tolerate up to three disk failures. RAID architectures implement coding techniques. The code specifies how data is stored among multiple disks and how lost data can be recovered from surviving disks. This abstract introduces a novel coding scheme for new RAID-7 architectures that can tolerate up to three disks failures. The code is an improved version of the existing BP-XOR code and is called “Almost BP-XOR”.There are multiple codes that can be used for RAID-7 architectures. However, [5,2] BP-XOR codes have significantly lower encoding and decoding complexities than most common codes [1]. Regardless of this fact, this code does not achieve the fastest data decoding and reconstruction speeds due to its relatively low efficiency of 0.4. Furthermore, the existence of MDS [6,3] bx6 BP-XOR codes, b>2 (which achieves efficiency of 0.5) is still an open research question. This work proposes [6,3] 2 x 6 Almost BP-XOR codes. These codes largely utilize the simple and fast BP-XOR decoder while achieving an efficiency of 0.5, leading to the fastest recovery from disk failures among other state-of-the-art codes. An algorithm to generate a [6, 3] 2 x 6 Almost BP-XOR code has been developed and an example code is provided in Table 1. The [6, 3] 2 x 6 Almost BP-XOR codes are constructed in a way that any three-column-erasure pattern will result in one of the following two main scenarios. First: At least one of the surviving degree-three encoding symbols contains two known information symbols. This scenario occurs in 70% of three-column erasure cases (i.e. 14 out of the 20 possible cases). The recovery process in such scenario is identical to that of the BP-XOR codes; Knowing any two information symbols in a degree-three encoding symbol is sufficient to know the third information symbol through performing a simple XOR operation. Second: None of the surviving degree-three encoding symbols contains two known information symbols. This scenario occurs in the remaining 30% of three-column erasure cases (i.e., 6 out if the possible 20). The BP-XOR decoder fails in such a scenario. However, due to the construction of the codes, at least one surviving degree-three encoding symbol contains a known information symbol. Thus, knowing one of the reaming two information symbols in such a degree-three encoding symbol will initiate the BP-XOR decoder again.Table 2 shows these erasure patterns along with an expression for one of the missing information symbols. these expressions can be stored in buffers and used whenever the corresponding erasure pattern occurs. Solutions in Table 2 are derived from the inverse of a 6x6 submatrix that results from a generator matrix G by deleting columns from G corresponding to erased code columns. The read complexity of almost BP-XOR codes is 1. On the other hand. The decoding of almost BP-XOR codes require just 6 XOR operations when for a given three-column-erasure pattern BP-XOR decoding succeeds. However, when the BP- XOR decoder fails, it will require up to 15 XOR operations in total. The normalized repairing complexity is 15/6 = 2.5.Experimentally, Fig.1 shows that the proposed Almost BP-XOR codes require the least amount of time to decode and reconstruct erased columns. Thus, it is concluded that the [6, 3] 2x6 almost BP-XOR codes are best suited for RAID-7 system that requires storage efficiency of 0.5. References [1] Y. Wang. “Array BP-XOR codes for reliable cloud storage systems,” in Proc. of the 2013 IEEE International Symposium on Information Theory (ISIT), pp. 326–330, Istanbul, Turkey, July 2013.NoteFigures, Tables, and more details are provided in the complete attached file titled Abstract-ARC18.pdf, (respecting the same word count restriction).
-
-
-
Leveraging Online Social Media Data for Persona Profiling
Authors: Bernard J. Jansen, Soon-gyo Jung, Joni Salminen, Jisun An and Haewoon KwakThe availability of large quantities of online data affords the isolation of key user segments based on demographics and behaviors for many online systems. However, there is an open question of how organizations can best leverage this user information in communication and decision-making. The automatic generation of personas to represent customer segments is an interactive design technique with considerable potential for product development, policy decision, and content creation. A persona is an imaginary but characteristic person that is the representation of a customer, audience, or user segment. The representative segment shares common characteristics in terms of behavioral attributes or demographics. Representing a user segment, a persona is generally developed in the form of a detailed profile narrative, typically expressed in one or two pages, about a representative but an imaginary individual that embodies the collection of users with similar behaviors or demographics. In order to make the fictitious individual appear as a real person to system developers and other decision-makers, the persona profile usually comprises a variety of demographic and behavioral details, such as socioeconomic status, gender, hobbies, family members, friends, possessions, among other data and information. Along with this data, the persona profiles typically address the goals, needs, wants, frustrations and other attitudinal aspects of the fictitious individual that are relevant to the product being developed and designed. Personas have typically been fairly static once created by using manual, qualitative methods. In this research, we demonstrate a data-driven approach for creating and validating personas in real time, based on automated analysis of actual user data. Using a variety of data collection sites and research partners from various verticals (digital content, non-profits, retail, service, etc., we are specifically interested in understanding the users of these organizations by identifying (1) whom the organizations are reaching (i.e., user segment) and (2) what content are associated with each user segment. Focusing on one aspect of user behavior, we collect tens of millions of instances of interaction by users to online content, specifically examining the topics of content interaction. We then decompose the interaction patterns, discover related impactful demographics, and add personal properties; this approach creates personas based on these behavioral and demographic aspects that represent the core user segments for each organization. We conduct analysis to remove outliers and use non-negative matrix factorization to identify first the meaningful behavioral patterns and then the impactful demographic groupings. We then demonstrate how these findings can be leveraged to generate real-time personas based on actual user data to facilitate organizational communication and decision-making. Demonstrating that these insights can be used to develop personas in near real-time, the research results provide insights into user segmentation, competitive marketing, topical interests, and preferred system features for the users. Overall, research implications are that personas can be generated in near real-time representing the core users groups of online products.
-
-
-
A Fast and Secure Approach for the Transmission of Monitoring Data over MultiRATs
Authors: Elias Yaacoub, Rida Diba and Hassan NouraIn an mHealth remote patient monitoring scenario, usually control units/data aggregators receive data from the body area network (BAN) sensors then send it to the network or “cloud”. The control unit would have to transmit the measurement data to the home access point (AP) using WiFi for example, or directly to a cellular base station (BS), e.g. using the long-term evolution (LTE) technology, or both (e.g. using multi-homing to transmit over multiple radio access technologies (Multi-RATs). Fast encryption or physical layer security techniques are needed to secure the data. In fact, during normal conditions, monitoring data can be transmitted using best effort transmission. However, when real-time processing detects an emergency situation, the current monitoring data should be transmitted real-time to the appropriate medical personnel in emergency response teams. The proposed approach consists of benefiting of the presence of multi-RATs in order to exchange the secrecy information more efficiently while optimizing the transmission time. It can be summarized as follows (assuming there are two RATs):1) The first step is to determine the proportion of data bits to be transmitted over each RAT in order to minimize the transmission time, given the data rates achievable on each RAT. Denoting the data rates by R1, and R2, and the total number of bits to be transmitted by D = SUM(D1,D2), where D1 and D2 are the number of bits to be transmitted over RAT1 and RAT2 respectively, then they should be selected such that D1/R1 = D2/R22) Then, the exchange of the secrecy parameters between sender and receiver is done over the two RATs in order to maintain the security of the transmission. To avoid the complexity of public key cryptography, a three way handshake can be used: 2-1) The sender decides to divide the data into n parts, with a fraction n1 sent on RAT1 and a fraction n2 sent on RAT2, according to the ratios determined in 1) above (i.e. the sum of the bits in the n1 parts should be close to D1 bits, and the sum of the n2 parts should be close to D2 bits) 2-2) The sender generates a scrambling vector P(D,n) to scramble the n data parts and transmit them out of order. 2-3) The sender groups the secret information consisting of S = {n, n1, n2, P(D,n)}, and could add additional information to protect against replay attacks, e.g. timestamp, nonce, etc., and sends this information on the two RATs, encrypted by a different key: K11 on RAT1 and K12 on RAT2. Thus, {S}_K11 is sent on RAT1 and {S}_K12 is sent on RAT2. 2-4) The receiver does not know K11 and K12. Thus, it encrypts the received information with two other keys K21 (over RAT1) and K22 (over RAT2) and sends them back: {{S}_K11,K21} is sent on RAT1 and {{S}_K12,K22} is sent on RAT2. 2-5) The sender decodes the received encrypted vectors using his keys and sends back {S}_K21 on RAT1 and {S}_K22 on RAT2. The secret information is still securely encoded by the receiver's secret keys K21 and K22. 2-6) The receiver can now decrypt the information and obtain S. 3. The two parties can now communicate using the secret scrambling approach provided by S. This information can be changed periodically as needed. For example, if the data is subdivided over 10 parts, with 40% to be sent over LTE and 60% to be sent over WiFi, according to the scrambling vector P(D,n) = {4,1,10,7,3,9,5,2,6}, then parts {4,1,10,7} are sent over LTE and parts {7,3,9,5,2,6} are sent over WiFi. The receiver will sort them out in the correct order.
-
-
-
Crowdsourced MultiView Live Video Streaming using Cloud Computing
Authors: Aiman Erbad and Kashif BilalMulti-view videos are composed of multiple video streams captured simultaneously using multiple cameras from various angles (different viewpoints) of a scene. Multi-view videos offer more appealing and realistic view of the scene leading to higher user satisfaction and enjoyment. However, displaying realistic and live multiview scenes captured from a limited view-points faces multiple challenges, including excessive number of precise synchronization of many cameras, color differences among cameras, large bandwidth, computation and storage requirements, and complex encoding. current multi-view video setups are very limited and based in studios. We propose a novel system to collect individual video streams (views) captured for the same event by multiple attendees, and combine them into multi-view videos, where viewers can watch the event from various angles, taking crowdsourced media streaming to a new immersive level. The proposed system is called Cloud based Multi-View Crowdsourced Streaming (CMVCS), and it delivers multiple views of an event to viewers at the best possible video representation based on each viewer's available bandwidth. CMVCS is a complex system having many research challenges. In this study, we focus on resource allocation of the CMVCS system. The objective of the study is to maximize the overall viewer satisfaction by allocating available resources to transcode views in an optimal set of representations, subject to computational and bandwidth constraints. We choose the video representation set to maximize QoE using Mixed Integer Programming (MIP). Moreover, we propose a Fairness Based Representation Selection (FBRS) heuristic algorithm to solve the resource allocation problem efficiently. We compare our results with optimal and Top-N strategies. The simulation results demonstrate that FBRS generates near optimal results and outperforms the state-of-the-art Top-N policy, which is used by a large scale system (Twitch). Moreover, we consider region based distributed datacenters to minimize the overall end-to-end latency. To further enhance the viewers’ satisfaction level and Quality of Experience (QoE), we propose an edge based cooperative caching and online transcoding strategy to minimize the delay and backhaul bandwidth consumption. Our main research contributions are: We present the design and architecture of a Cloud based Multi-View Crowdsourced Streaming (CMVCS) system that allows viewers to experience the captured events from various angles. We propose a QoE metric to determine the overall user satisfaction based on the received view representation, viewers’ bandwidth capability, and end-to-end latency between viewer and transcoding site. We formulate a Mixed Integer Programming (MIP) optimization problem for multi-region distributed resource allocation to choose the optimal set of views and representations to maximize QoE in constrained settings. We propose a fairness based heuristic algorithm to find near optimal resource allocation efficiently. We propose an edge computing based video caching and online transcoding strategy to minimize delay and backhaul network consumption. We use multiple real-world traces to simulate various scenarios and show the efficiency of the proposed solution.
-