×

Data Science Seminar

Upcoming Speakers:

November 16, 2023

4-5pm, IES 111

Quang Nguyen

Opportunities in sports analytics with player tracking data: A case study of NFL pass rush evaluation

Abstract: This talk provides an overview of the opportunities provided by player tracking data in sports. First, we discuss how player tracking data have replaced traditional box-score statistics and play-by-play data as the state of the art in sports analytics. Next, we present a case study of using tracking data for continuous-time assessment in American football. In particular, we propose a novel metric for evaluating pass rushers in the National Football League (NFL). The metric, called STRAIN, is a simple, interpretable and model-free statistic for measuring defensive pressure in football at the continuous-time within-play level. Our metric addresses the shortcomings of previous pass rush statistics, which are either discrete-time quantities or based on subjective judgment. STRAIN also possesses great predictability of pressure and stability over time. We also fit a multilevel model to understand the defensive pressure contribution of every pass rusher at the play-level. We apply our approach to NFL data and present comparisons of STRAIN for different defensive positions and play outcomes as well as rankings of the NFL's best pass rushers according to our metric.

Bio: Quang Nguyen is a second year PhD student in the Department of Statistics & Data Science at Carnegie Mellon University. His current research is on network analysis, and he is also interested in applications of statistics and machine learning in sports with focus on player tracking data. Quang previously received his MS in Applied Statistics from Loyola University Chicago and BS in Mathematics and Data Science from Wittenberg University in Springfield, Ohio. He is a 2021 Carnegie Mellon Sports Analytics Conference (CMSAC) Reproducible Research Competition Methods Open-Track winner, a 2023 NFL Big Data Bowl Finalist, and is an avid supporter of Manchester United.

Past Seminars

April 27, 2023.

4-5pm, IES 111

Daniel Moreira

Scientific Integrity Verification Through Image Forensics

Abstract: Many images issued in scientific articles are reused, retouched, or composed to enhance the quality of the publications. Most of these edits are benign and aim at helping the reader better to understand the outcomes of the study at hand. However, there are also edits comprising scientific misconduct, which undermine the integrity of the presented research and aim at deceiving the reader. Deciding the legitimacy of edits made to scientific images is an open problem that no current technology can perform satisfactorily in a fully automated fashion. It thus remains to the human experts the task to inspect the images as part of the peer-review process. Nonetheless, they do not need to be forsaken. Tools from media forensics can enter the scene to help them execute the tedious parts and provide more information for faster and more accurate decisions. In this talk, I will describe our recent efforts to make image analysis tools available to reviewers and editors in a principled way. The endeavor has involved not only the study, development, and deployment of techniques, but also the collection of a dataset of scientific papers as a benchmark, which contains real cases of articles retracted due to problems with images.

Bio: Daniel Moreira received a Ph.D. degree in computer science from the University of Campinas, Brazil, in 2016. After working four years as a systems analyst with the Brazilian Federal Data Processing Service, he joined the University of Notre Dame for six years, first as a post-doctoral fellow and later as an assistant research professor. He is currently an assistant professor in the Department of Computer Science at Loyola University Chicago. He is also a member of the IEEE Information Forensics and Security Technical Committee (IFS-TC), 2021-2023 term. His research interests include investigating the application of techniques from media forensics, computer vision, machine learning, and biometrics to improve our society. Full CV here.

April 6, 2023.

4-5pm, 312 Cuneo Hall

Yas Silva

Patterns of Global Prevalence of Anti-Asian Prejudice on Twitter and other BullyBlocker Efforts

Abstract: Anti-Asian prejudice increased during the COVID-19 pandemic, evidenced by a rise in physical attacks on individuals of Asian descent. Concurrently, as many governments enacted stay-at-home mandates, the spread of anti-Asian content increased in online spaces, including social media platforms such as Twitter. In this study, we investigated temporal and geographic patterns in the prevalence of social media content relevant to anti-Asian prejudice within the U.S. and worldwide. Results of a range of exploratory and descriptive analyses offer novel insights. For instance, in the U.S., the prevalence of anti-Asian and counter-hate messages fluctuated over time in patterns that largely mirrored salient events relevant to COVID-19. Additional analyses revealed informative patterns in the prevalence of original tweets versus retweets and the co-occurrence of negative and positive content within a tweet. Together, these findings underscore the value of research examining trends in social media messages of hate and counter-hate during the COVID-19 pandemic. This presentation will also include a review of previous and current research in the BullyBlocker project (http://bullyblocker.cs.luc.edu/).  

Bio: Yas (Yasin) Silva is an associate professor in the Computer Science department at Loyola University Chicago. He leads the BullyBlocker project currently funded by the National Science Foundation and Google Research. He received his doctorate and master's degree from Purdue University. Before joining Loyola, Dr. Silva worked as assistant and associate professor at Arizona State University. Dr. Silva’s research focuses on innovative ways to analyze and process data. More specifically, he has been working in the areas of social media analysis, cyberbullying detection in social networks, big data, similarity-aware data analysis, and fairness and transparency in AI. He has published more than 50 papers in top tier conference proceedings and journals such as ACM SIGMOD, VLDB, IEEE ICDE, WSDM, SIAM SDM, IJCAI, IEEE TKDE, and the VLDB Journal. Dr. Silva received the Outstanding Innovation Award at ASU (2018), the Motorola Scholarship for Entrepreneurship, and was also inducted into Upsilon Pi Epsilon, the International Honor Society for the Computer Sciences.

March 30, 4-5pm, 312 Cuneo Hall

George Thiruvathukal and Nicholas Synovic

Snapshot Metrics Are Not Enough: Analyzing Software Repositories with Longitudinal Metrics

Abstract: Software metrics capture information about software development processes and products. These metrics support decision-making, e.g., in team management or dependency selection. However, existing metrics tools measure only a snapshot of a software project. Little attention has been given to enabling engineers to reason about metric trends over time -- longitudinal metrics that give insight about process, not just product. In this work, we present PRiME (PRocess MEtrics), a tool for computing and visualizing process metrics. The currently-supported metrics include productivity, issue density, issue spoilage, and bus factor, all of which can help to understand the “health” of a software development effort. 

Bios: George K. Thiruvathukal is professor and chairperson of the computer science department at Loyola University Chicago and visiting computer scientist at Argonne National Laboratory. He directs the Software and Systems Laboratory (https://ssl.cs.luc.edu) at Loyola University Chicago. His research and teaching interests include high-performance computing and distributed systems, programming languages, software engineering, machine learning, and data science. He is also interested in the interdisciplinary computing and work in computational science, data science, and digital humanities. His recent body of work is primarily in the areas of energy-efficient computer vision and empirical software engineering. For more information, see http://gkt.sh.

Nicholas Synovic is a Master of Science graduate student in the computer science department at Loyola University Chicago. He holds at B.S. in Computer Science from Loyola University Chicago and joined the MS program in the current academic year. His primary interests are in software engineering with a focus on empirical software engineering in support of traditional software development and modern machine learning development.  He been a lead and co-lead student author on several papers since joining the Software and Systems Laboratory at Loyola University Chicago. For more information, see http://nsynovic.dev/.

March 23, 4-5pm, 312 Cuneo Hall

Lin Wang, Department of Statistics, Purdue

Design, Modeling, and Active Learning for Computer Experiments

Abstract: Computer experiments are simulations or mathematical models that are run on a computer to study the behavior of a system. These experiments have found wide applications in science and engineering when it is either too expensive or impossible to conduct experiments in the physical world. In this talk, I will provide a comprehensive introduction to the methodology used in designing and analyzing computer experiments. Specifically, I will focus on two key techniques: surrogate modeling for analyzing computer experiments, and space-filling designs for collecting data from these experiments. In some cases, the input space of a system may be highly complex, or the output may be heterogeneous, making it difficult to analyze using a one-stage design. Active learning becomes essential for these cases. I will also introduce some active learning techniques used in computer experiments, such as entropy-based sampling and uncertainty sampling.

Bio: Lin Wang is an Assistant Professor of Statistics at Purdue University. Prior to joining Purdue in 2022, she was an Assistant Professor of Statistics at George Washington University from 2019 to 2022. She obtained her PhD in Statistics in 2019 from University of California, Los Angeles. Her research interests include computer experiments, experimental design and sampling, and causal inference.

 

 

October 18, 2022 | Matt Stuart | Inverse Leverage Effect: from Cryptocurrencies to Meme Stocks October 4, 2022 | Gregory J. Matthews | Completion of Partially Observed Curves with Application to Classification of Bovid Teeth

Past Seminars

November 1, 2022 | Miles Xi | Social Media, Big Data, and Diversity-Preserving Subsampling

MILES XI

Tuesday, November 1, 2022
11:30am - 12:30pm
Dumbach 231

OR

Zoom: https://luc.zoom.us/j/8955486944

Social Media, Big Data, and Diversity-Preserving Subsampling

Abstract:
In this talk, I will introduce a study to understand how politicians use images to express
ideological rhetoric through social media. Using deep learning and computer vision techniques
to analyze Facebook images posted by Federal lawmakers, the study shows that the image on
social media is a strong proxy distinguishing between liberals and conservatives. Motivated by
this study and the following work, I propose an efficient subsampling method to select a
subsample from the full data with balanced subgroups. It provides a representative subsample
that improves downstream data analysis. The application of the proposed method to large-
scale genomic data demonstrates its advantage in clustering analysis, diversity preservation,
and computational efficiency.

Bio:
Dr. Miles Xi is an Assistant Professor in the Department of Mathematics and Statistics at Loyola
University Chicago. Dr. Xi received his Ph.D. in Statistics from UCLA. His research focuses on
developing statistical and machine-learning methods for large-scale genomics data. Some of his
work relates to applications in biopharmaceutical statistics and electronic health records. He is
also interested in applying artificial intelligence to social science.

October 18, 2022 | Matt Stuart | Inverse Leverage Effect: from Cryptocurrencies to Meme Stocks

MATT STUART

Tuesday, October 18, 2022
11:30am - 12:30pm
Dumbach 231

Inverse Leverage Effect: from Cryptocurrencies to Meme Stocks

Abstract:
In the existing continuous time finance literature, most financial assets exhibit a traditional leverage effect; i.e., a negative correlation between an asset's price and its volatility.  In this chapter, we propose that assets which are highly speculative in nature exhibit an inverse leverage effect, or a positive correlation between its price and volatility.  We propose to model these highly speculative assets jointly with the S&P 500 to a stochastic volatility model with bivariate asymmetric Laplacian (ALD) jumps in returns that is flexible to allow for both independent jump times and contemporaneous jump times.  Monte Carlo Markov Chain (MCMC) methods with the particle Gibbs with ancestor sampling (PGAS) algorithm are developed to estimate the model parameters and latent state variables, such as SV, jump times, jump sizes, and is validated through simulation studies.  The method is applied to fit daily returns of S&P 500 paired with an array of speculative stocks including Bitcoin, Dogecoin, AMC, and Gamestop. 

Bio:
Dr. Stuart obtained his undergraduate degree from Drake University, studying actuarial science.  He then went to graduate school at Iowa State University, where he obtained his Master’s degree and PhD studying statistics.  He joined the Loyola faculty in 2022.  His main research areas of interest are financial mathematics, semi-parametric methods, Bayesian statistics, survey statistics, and computational statistics. He has applied these methods in various areas, including studying split questionnaire designs, cryptocurrencies, and agricultural economics.

October 4, 2022 | Gregory J. Matthews | Completion of Partially Observed Curves with Application to Classification of Bovid Teeth

GREGORY J. MATTHEWS

Tuesday, October 4, 2022
11:30am - 12:30pm
Dumbach 231

Completion of Partially Observed Curves with Application to Classification of Bovid Teeth

Abstract:
Statistical shape analysis of closed curves is well-developed when curves are fully observed. This work considers partially observed curves and develops methods for curve completion or imputation by leveraging tools from the statistical analysis of shape of fully observed curves, which enables sensible curve completions. On a dataset containing partially observed bovid teeth arising from a biological anthropology application, the method is implemented and classification of the completed teeth is carried out based on a shape distance on the set of curves. Work related to classification methods for fully observed closed curves will also be presented.

Bio:
Dr. Gregory J. Matthews graduated from Worcester Polytechnic Institute (WPI) in 2004 with a Bachelor's degree in Actuarial Science and in 2005 with a Master's Degree in Applied Statistics. He worked in the direct marketing department at Brookstone in Merrimack, NH for two years before returned to graduate school where he graduated from the University of Connecticut with a Ph.D. in Statistics in 2011.  From 2011-2014, he completed a Postdoctoral Research Fellowship at the School of Public Health at the University of Massachusetts-Amherst.  Since, 2014 he has been a faculty member in the Department of Mathematics and Statistics at Loyola University Chicago where he was promoted to Associate Professor with tenure in 2020. In addition, he is the Director of the Data Science Program at Loyola.  His areas of research interest include statistical disclosure control, missing data methods, statistical shape analysis, statistics in sports, statistical consulting.

Upcoming Speakers:

November 16, 2023

4-5pm, IES 111

Quang Nguyen

Opportunities in sports analytics with player tracking data: A case study of NFL pass rush evaluation

Abstract: This talk provides an overview of the opportunities provided by player tracking data in sports. First, we discuss how player tracking data have replaced traditional box-score statistics and play-by-play data as the state of the art in sports analytics. Next, we present a case study of using tracking data for continuous-time assessment in American football. In particular, we propose a novel metric for evaluating pass rushers in the National Football League (NFL). The metric, called STRAIN, is a simple, interpretable and model-free statistic for measuring defensive pressure in football at the continuous-time within-play level. Our metric addresses the shortcomings of previous pass rush statistics, which are either discrete-time quantities or based on subjective judgment. STRAIN also possesses great predictability of pressure and stability over time. We also fit a multilevel model to understand the defensive pressure contribution of every pass rusher at the play-level. We apply our approach to NFL data and present comparisons of STRAIN for different defensive positions and play outcomes as well as rankings of the NFL's best pass rushers according to our metric.

Bio: Quang Nguyen is a second year PhD student in the Department of Statistics & Data Science at Carnegie Mellon University. His current research is on network analysis, and he is also interested in applications of statistics and machine learning in sports with focus on player tracking data. Quang previously received his MS in Applied Statistics from Loyola University Chicago and BS in Mathematics and Data Science from Wittenberg University in Springfield, Ohio. He is a 2021 Carnegie Mellon Sports Analytics Conference (CMSAC) Reproducible Research Competition Methods Open-Track winner, a 2023 NFL Big Data Bowl Finalist, and is an avid supporter of Manchester United.

Past Seminars

April 27, 2023.

4-5pm, IES 111

Daniel Moreira

Scientific Integrity Verification Through Image Forensics

Abstract: Many images issued in scientific articles are reused, retouched, or composed to enhance the quality of the publications. Most of these edits are benign and aim at helping the reader better to understand the outcomes of the study at hand. However, there are also edits comprising scientific misconduct, which undermine the integrity of the presented research and aim at deceiving the reader. Deciding the legitimacy of edits made to scientific images is an open problem that no current technology can perform satisfactorily in a fully automated fashion. It thus remains to the human experts the task to inspect the images as part of the peer-review process. Nonetheless, they do not need to be forsaken. Tools from media forensics can enter the scene to help them execute the tedious parts and provide more information for faster and more accurate decisions. In this talk, I will describe our recent efforts to make image analysis tools available to reviewers and editors in a principled way. The endeavor has involved not only the study, development, and deployment of techniques, but also the collection of a dataset of scientific papers as a benchmark, which contains real cases of articles retracted due to problems with images.

Bio: Daniel Moreira received a Ph.D. degree in computer science from the University of Campinas, Brazil, in 2016. After working four years as a systems analyst with the Brazilian Federal Data Processing Service, he joined the University of Notre Dame for six years, first as a post-doctoral fellow and later as an assistant research professor. He is currently an assistant professor in the Department of Computer Science at Loyola University Chicago. He is also a member of the IEEE Information Forensics and Security Technical Committee (IFS-TC), 2021-2023 term. His research interests include investigating the application of techniques from media forensics, computer vision, machine learning, and biometrics to improve our society. Full CV here.

April 6, 2023.

4-5pm, 312 Cuneo Hall

Yas Silva

Patterns of Global Prevalence of Anti-Asian Prejudice on Twitter and other BullyBlocker Efforts

Abstract: Anti-Asian prejudice increased during the COVID-19 pandemic, evidenced by a rise in physical attacks on individuals of Asian descent. Concurrently, as many governments enacted stay-at-home mandates, the spread of anti-Asian content increased in online spaces, including social media platforms such as Twitter. In this study, we investigated temporal and geographic patterns in the prevalence of social media content relevant to anti-Asian prejudice within the U.S. and worldwide. Results of a range of exploratory and descriptive analyses offer novel insights. For instance, in the U.S., the prevalence of anti-Asian and counter-hate messages fluctuated over time in patterns that largely mirrored salient events relevant to COVID-19. Additional analyses revealed informative patterns in the prevalence of original tweets versus retweets and the co-occurrence of negative and positive content within a tweet. Together, these findings underscore the value of research examining trends in social media messages of hate and counter-hate during the COVID-19 pandemic. This presentation will also include a review of previous and current research in the BullyBlocker project (http://bullyblocker.cs.luc.edu/).  

Bio: Yas (Yasin) Silva is an associate professor in the Computer Science department at Loyola University Chicago. He leads the BullyBlocker project currently funded by the National Science Foundation and Google Research. He received his doctorate and master's degree from Purdue University. Before joining Loyola, Dr. Silva worked as assistant and associate professor at Arizona State University. Dr. Silva’s research focuses on innovative ways to analyze and process data. More specifically, he has been working in the areas of social media analysis, cyberbullying detection in social networks, big data, similarity-aware data analysis, and fairness and transparency in AI. He has published more than 50 papers in top tier conference proceedings and journals such as ACM SIGMOD, VLDB, IEEE ICDE, WSDM, SIAM SDM, IJCAI, IEEE TKDE, and the VLDB Journal. Dr. Silva received the Outstanding Innovation Award at ASU (2018), the Motorola Scholarship for Entrepreneurship, and was also inducted into Upsilon Pi Epsilon, the International Honor Society for the Computer Sciences.

March 30, 4-5pm, 312 Cuneo Hall

George Thiruvathukal and Nicholas Synovic

Snapshot Metrics Are Not Enough: Analyzing Software Repositories with Longitudinal Metrics

Abstract: Software metrics capture information about software development processes and products. These metrics support decision-making, e.g., in team management or dependency selection. However, existing metrics tools measure only a snapshot of a software project. Little attention has been given to enabling engineers to reason about metric trends over time -- longitudinal metrics that give insight about process, not just product. In this work, we present PRiME (PRocess MEtrics), a tool for computing and visualizing process metrics. The currently-supported metrics include productivity, issue density, issue spoilage, and bus factor, all of which can help to understand the “health” of a software development effort. 

Bios: George K. Thiruvathukal is professor and chairperson of the computer science department at Loyola University Chicago and visiting computer scientist at Argonne National Laboratory. He directs the Software and Systems Laboratory (https://ssl.cs.luc.edu) at Loyola University Chicago. His research and teaching interests include high-performance computing and distributed systems, programming languages, software engineering, machine learning, and data science. He is also interested in the interdisciplinary computing and work in computational science, data science, and digital humanities. His recent body of work is primarily in the areas of energy-efficient computer vision and empirical software engineering. For more information, see http://gkt.sh.

Nicholas Synovic is a Master of Science graduate student in the computer science department at Loyola University Chicago. He holds at B.S. in Computer Science from Loyola University Chicago and joined the MS program in the current academic year. His primary interests are in software engineering with a focus on empirical software engineering in support of traditional software development and modern machine learning development.  He been a lead and co-lead student author on several papers since joining the Software and Systems Laboratory at Loyola University Chicago. For more information, see http://nsynovic.dev/.

March 23, 4-5pm, 312 Cuneo Hall

Lin Wang, Department of Statistics, Purdue

Design, Modeling, and Active Learning for Computer Experiments

Abstract: Computer experiments are simulations or mathematical models that are run on a computer to study the behavior of a system. These experiments have found wide applications in science and engineering when it is either too expensive or impossible to conduct experiments in the physical world. In this talk, I will provide a comprehensive introduction to the methodology used in designing and analyzing computer experiments. Specifically, I will focus on two key techniques: surrogate modeling for analyzing computer experiments, and space-filling designs for collecting data from these experiments. In some cases, the input space of a system may be highly complex, or the output may be heterogeneous, making it difficult to analyze using a one-stage design. Active learning becomes essential for these cases. I will also introduce some active learning techniques used in computer experiments, such as entropy-based sampling and uncertainty sampling.

Bio: Lin Wang is an Assistant Professor of Statistics at Purdue University. Prior to joining Purdue in 2022, she was an Assistant Professor of Statistics at George Washington University from 2019 to 2022. She obtained her PhD in Statistics in 2019 from University of California, Los Angeles. Her research interests include computer experiments, experimental design and sampling, and causal inference.