Presenters

dai Dong Dai alex Dr. Alexander Chaudhry cao Dr. Cao Zhang Dr. Xianfeng Zhang Fan Dr.Fan Shaok Inan Dr.Fethi Inan JackCooney Dr.Jack Cooney sarraf Dr.Sarraf shsiang Dr.Simon Hsiang Ning Dr.Weihong Ning Elshan Elshan jialin Liu Jaln timeforge TimeForge-A2

Using Property Graphs for Rich Metadata Management in HPC Systems

Abstract:
HPC platforms are capable of generating huge amounts of metadata about different entities including jobs, users, and files. Simple metadata, which describe the attributes of these entities (e.g., file size, name, and permissions mode), has been well recorded and used in current systems. However, only a limited amount of rich metadata, which records not only the attributes of entities but also relationships between them, are captured in current HPC systems. Rich metadata may include information from many sources, including users and applications, and must be integrated into a unified framework. Collecting, integrating, processing, and querying such a large volume of metadata pose considerable challenges for HPC systems. In our research, we propose a rich metadata management approach that unifies metadata into one generic property graph. We argue that this approach supports not only simple metadata operations such as directory traversal and permission validation but also rich metadata operations such as provenance query and security auditing. The property graph approach provides an extensible method to store diverse metadata and presents an opportunity to leverage rapidly evolving graph storage and processing techniques.

Close

Modeling Cross-Category Dependencies in Households’ Purchase Incidence Outcomes

Abstract:
This research attempts to advance the literature on multi-category demand models by modeling cross-category dependencies in households’ purchase incidence outcomes. Such dependencies are of enormous practical interest to retailers in managing shelf placements of products and co-promotion decisions across categories. These dependencies are currently estimated by practitioners using data mining techniques (such as affinity analysis). Marketing researchers have employed econometric techniques, which can simultaneously model the impact of marketing variables and unobserved heterogeneity across households, for the same purpose. Two such econometric models are the Multivariate Probit Model (Chib and Greenberg 1998, Chib, Seetharaman and Strijnev 2002) and the Multivariate Logit Model (Russell and Peterson 2000, Niraj, Padmanabhan and Seetharaman 2008). One limitation of these econometric models is that they incorporate cross-category dependencies only at the pair-wise level. However, such dependencies can be expected to manifest at higher order (third-order, fourth-order etc.) as well. The goal of this research is to explicitly model such higher order dependencies in households’ cross-category purchase incidence outcomes. We propose an extended version of the Multivariate Logit model of Russell and Peterson (2000) that enables the estimation of such higher-order cross-category dependencies. We investigate the relative magnitudes of such higher-order effects relative to the second-order effects, as well as the methodological and substantive consequences of ignoring such higher-order effects in the model.
Key Words: Multivariate Logit, Multi-Category Demand, Cross-Category Dependencies

Close

Scalable Spatiotemporal Analysis of Location-based Social Media Data

Abstract:
In the past several years, social media (e.g., Twitter and Facebook) has been experiencing a spectacular rise and popularity, and becoming a ubiquitous discourse for content sharing and social networking. With the widespread of mobile devices and location-based services, social media typically allows users to share whereabouts of daily activities (e.g., check-ins and taking photos), and thus strengthens the roles of social media as a proxy to understand human behaviors and complex social dynamics in geographic spaces. Unlike conventional spatiotemporal data, this new modality of data is dynamic, massive, and typically represented in stream of unstructured media (e.g., texts and photos), which pose fundamental representation, modeling and computational challenges to conventional spatiotemporal analysis and geographic information science. In this presentation, we describe a scalable computational framework to harness massive location-based social media data for efficient and systematic spatiotemporal data analysis. Within this framework, the concept of space-time trajectories (or paths) is applied to represent activity profiles of social media users. A hierarchical spatiotemporal data model, namely a spatiotemporal data cube model, is developed based on collections of space-time trajectories to represent the collective dynamics of social media users across aggregation boundaries at multiple spatiotemporal scales. The framework is implemented based upon a public data stream of Twitter feeds posted in the continent of North America. To demonstrate the advantages and performance of this framework, an interactive flow mapping interface (including both single-source and multiple-source flow mapping) is developed to allow real-time, and interactive visual exploration of movement dynamics in massive location-based social media at multiple scales.

Close

Sentimental Interplay between Structured and Unstructured User-Generated Contents - A Case Study of Online Hotel Reviews

Abstract:
Purpose: Consumers use varied social media channels to produce opinions and share experiences, generating structured and unstructured user-generated contents. However, few researches have directly explored their interrelationship. The purpose of this paper is to explore the interplay between structured and unstructured UGC from the identical consumer, and validate in the cross-contextual population bearing varied or dispersed attitudes.
Design/methodology/approach: Natural Language Processing (NLP) techniques, specifically, topic classification and sentiment analysis on the sentence level, are adopted to retrieve consumers’ sentimental polarity on five components relative to itemized ratings, with which canonical correlation analyses are conducted for empirical validation of the interplay between structured and unstructured UGCs among overall sample as well as group populations segmented by mean-variance approach.
Findings: Though Chinese consumers are differentiated from the non-Chinese consumers in the general pattern of RATE (itemized rating) and COMMENT (review comment), the two important forms of UGC, taken collectively, show significant interrelationship. However, the interplay varies on segmented population. Consumers of extremely dissatisfied, or with heterogeneous itemized rating, tend to have closer RATE-COMMENT relationship, and the interaction between valence and dispersion will further strengthen or loosen the relationship. Rooted in the cultural context, Chinese customers have relatively looser RATE-COMMENT relationship than the internationals.

Close

Sentiment Analysis in Social Media Platforms: The Contribution of Social Networks and Platform-Specific Features

Abstract:
The massive amount of data in social media platforms is a key source for companies to analyze customer sentiment and opinions. Existing sentiment analysis approaches predict the sentiment of a sentence based on the composition of words or phrases. Consequently, current sentiment analysis systems consider each sentence as a separate unit and ignore the social network relationships in social media, resulting in ineffective sentiment analysis performance. Inspired by theories that are found in social networks, we propose a sentiment analysis framework that considers the social relationships among users and social media platform features. We conducted a series of experiments, comparing the proposed system against several existing approaches on a dataset collected from Facebook. The results indicate that we can more accurately classify sentiment of sentences by utilizing social network relationships in combination with platform-specific features. The results have important implications for companies to analyze customer opinions.

Close

Dr.fethi inan

Abstract:

Close

Investor Demand for Information in Newly Issued Securities

Abstract:
Empirical studies of how information is impounded in prices often focus on the supply of information from corporate announcements, analyst reports, and news stories, or rely on proxies for the presence of informed traders such as insiders, institutional traders, trade size and short sellers. The demand for information by investors is less well understood because of the lack of data on the information acquisition process. Our study directly measures investor demand for information and its impact on security prices using search traffic associated with corporate filings on the EDGAR system of the Securities and Exchange Commission (SEC). Our analysis focuses on the registration period for IPOs when information asymmetries between investors and the issuing firm are likely to be high. Consistent with the important role of informed investors in the price discovery process, we find that EDGAR search traffic significantly increases for peer firms on IPO filing dates. We also find that investor demand for information is positively related to the probability of IPO success, and can predict both price revisions and initial returns. Overall, our results indicate that information acquisition is reflected in the pricing of newly issued securities.

Close

Machine Learning Projects at TTU Applied Vision Lab

Abstract:
This talk will summarize the ongoing and anticipated research efforts of the Applied Vision Lab in the general area of machine learning. Two ongoing projects will be presented: one in the area of medical image analysis and the other in agriculture. The talk will also outline a plan to establish a multi-college team of researchers at TTU to extend the application domain of our agriculture-related work to field phenotyping. Tackling the anticipated technical challenges in these projects will require the collaborative efforts of a multidisciplinary team with expertise in all aspects of big data analytics.

Close

Ambient intelligence in Reality Mining – a Big Data Perspective

Abstract:
Ambient intelligence (AmI) promises calming, empowering and enriching environments that are responsive to the presence of people, supporting out their activities of daily living (ACL) or job-related tasks. Reality mining could utilize the collection and analysis of AmI pertaining to human social behavior, with the goal of identifying predictable and collaborative information. From a big data perspective, the challenge is to provide situation aware, anticipatory and adaptive behavioral mining mechanism that parsimoniously summarized the current state-space of interest, such as: (1) Who is doing what? (2) What information is needed by whom? (3) What kind of information sharing or collaboration is plausible?

To resolve the first question, we propose using a diffeomorphism to map between the manifolds of human joint postures and their major concerns (or performance), which can be applied to retirement home, airport security, or workplace safety monitoring. To the second question, we recommend a risk-management approach to fine-tune the setting of stochastic resonance that can enhance the visual, audio, and haptic perception. To reach a divide-and-conquer strategy for the final question, we aggregate Shapley values and Choquet integrals of multilayer social networks based on human activities and resources needed. The goal is to find a balance between accommodating the need of majority and sacrificing the individual differences.

Close

Relationship among Customers’ Contract Status, Product Type and Levels of Customer Service - A Case Study of a Call Center in B2B E-market

Abstract:
Customer service is important to companies as it has become the core to business success. Call centers is a prevalent means for companies to communicate with and provide customer service to their customers. This study aims to investigate the call center customer service of Alibaba, a prestigious Chinese E-commerce company. Specifically, the quality and efficiency of the customer service Alibaba call center provided to the customers with different types of contract status and products will be investigated. The data in this study were obtained from Alibaba call center of the Chinese portal 1688.com, which cover a whole month of call records from 12/01/2009 to 12/31/2009. They contain 50,280 cases from 17,043 customers. Using a sample of 2,000 cases as training data, two separate logistic regression models were built. Additionally, sentiment analysis was conducted to explore variables from the call logs. The dependent variables for the two regression models were customers’ contract status (new contract vs. renewed contract) and type of products the customers used (Gold Supplier vs. AliExpress), respectively. The explanatory variables included variables related to the features of the customer calls such as number of days between calls from the same customer, number of calls to/from a customer per month, customer satisfaction toward the call. Preliminary results showed that there were 1190 new contracts and 810 renewed contracts, and 277 Gold Supplier and 1700 AliExpress customers.

Close

Abstract:
Making useful inferences from large datasets is an emerging necessity in today’s fast paced environment. Among several sense-making processes, visualization stands out as one of the biggest challenges with business analytics.

This study’s focus is on the evolutionary side of visualization, which address how conjectural deductions from different visualized data has evolved over the years of human history by the virtue of natural selection. This study looks at evolutionary fit- the probability of survival of a specific trait through natural selection -with respect to individuals’ inferences from 3D surface graphs versus balloon graphs. This study uses functional magnetic resonance imaging (fMRI) to investigate several hypothesis related to speed, accuracy and area of activation in the brain. fMRI findings show that humans show more efficiency in making inferences from 3D surface graphs than a 4D graph. These findings of this research are consistent with evolutionary fit, because the graphs that mimic the presentation of visualized data prove to be easier to understand and comprehend than those that have no basis in existing brain structures. figure

Close

Speeding up Scientific Discovery with In-Advance Computing Model

Abstract:
Scientific breakthroughs are increasingly powered by advanced computing and data analysis capabilities. The data-driven scientific discovery has become the new fourth paradigm of scientific innovation after theory, experiment, and simulation driven innovations. The data-driven scientific discovery is based upon advanced high performance computing (HPC) that traditionally powers simulation driven research and further requires processing massive amounts of datasets. Revealing and exploring the interesting knowledge hidden inside scientific datasets faces critical challenges and the problem is beyond the capability of traditional HPC software systems.
The fundamental issue is the data movement that often dominates the overall analysis performance and the execution time. To optimize and speedup the discovery process, this work studies the scientific workflow and designs an In-Advance Computing model, which is designed to better support generic scientific analysis routines. The fundamental idea of this model is to use trade off computation for better data movement, by predicting the analysis operation and perform the computation in-advance. This In-Advance Computing Model uses a distributed in-memory database for storing the analytic results. When these results are hit by demand analytic operations, there is no need to access the requested data any more, and the results are ready to be used. As data movement dominates the run time of big data analysis, and computing is virtually free for big data problem, the in-advance computing model can be a promising solution that fully leverages data locality and reduces the data movement and the time to solution. The evaluation results show that the in-advance computing can dramatically reduce the data movement and improve the big application performance with up to 6X speedup.

Close

TimeForge

Abstract:

Close