Guruprasad Nayak


I am an Applied Scientist at Amazon Web Services (AWS) where I work on building Automated Machine Learning (AutoML) on AWS Sagemaker. Before joining AWS, I spent over 2 years at A9/Amazon advertising, where I worked in the Inventory Forecasting Systems group.

Before joining Amazon, I earned a PhD in Computer Science from the University of Minnesota (UMN) where I was advised by Prof. Vipin Kumar (ACM SIGKDD 2012 Innovation Award winner and author of "Introduction to Data Mining"). Prior to that, I received my Bachelors degree in Computer Science and Engineering from Indian Institute of Technology Kanpur (IITK) in 2013.

At UMN, I was part of a 5-year $10M Expeditions in Computing project to understand climate change from data. My doctoral research focused on advancing traditional machine learning algorithms to use weak supervision for addressing the problem of not having adequate labeled samples for training supervised learning models. To know more about my research at UMN and elsewhere, please take a look at some of my peer-reviewed articles listed below.


email | linkedin | google scholar | CV | Thesis


nayak013 AT alumni DOT umn DOT edu
  
My picture

Publications

Deep layers beware: Unraveling the surprising benefits of JPEG compression for image classification pre-processing
Guruprasad Nayak, Gerald Friedland
25th IEEE International Symposium on Multimedia 2023
(Acceptance Rate: 24.69%)
preprint   abstract   Published version   bibtex

In this paper, we explore the intriguing effects of JPEG compression as a pre-processing technique for image classification tasks. Building upon the findings of a previous study by Friedland et al., which demonstrated that substantial JPEG compression does not significantly degrade classification accuracy, we investigate the potential benefits and limitations of this approach when applied to various classifiers, such as AutoGluon-multimodal and EfficientNet. Our experiments not only confirm the original results but also reveal notable I/O benefits, with compressed images occupying as little as 14 % of the original dataset size while maintaining comparable accuracy. Despite these promising findings, we also document several investigations that did not yield beneficial outcomes. We found no evidence to suggest that JPEG compression leads to faster model convergence or allows smaller models to achieve the same accuracy. Additionally, our experiments showed that tabular classifiers could not match the performance of deep neural networks when trained on JPEG-compressed input, and that JPEG compression does not make classifiers more resilient to noise in input images. Together, our results provide a comprehensive evaluation of JPEG compression as a pre-processing technique for image classification. While the approach offers undeniable benefits in terms of data storage and accuracy preservation, it does not appear to yield advantages in terms of model convergence, model size, or robustness to noise. This study contributes valuable insights for researchers and practitioners working in multimedia signal processing and image recognition, paving the way for further exploration and optimization of multimedia compression techniques.
@inproceedings{nayak2023deep,
  title={Deep Layers Beware: Unraveling the Surprising Benefits of JPEG Compression for Image Classification Pre-processing},
  author={Nayak, Guruprasad and Friedland, Gerald},
  booktitle={2023 IEEE International Symposium on Multimedia (ISM)},
  pages={182--185},
  year={2023},
  organization={IEEE}
}

Weakly Supervised Classification Using Group-Level Labels
Guruprasad Nayak, Rahul Ghosh, Xiaowei Jia, Vipin Kumar
2nd International Workshop on Data-Efficient Machine Learning (DeMaL), Knowledge Discovery and Data Mining (KDD) Conference 2021
preprint   abstract   Published version   bibtex

In many applications, finding adequate labeled data to train predictive models is a major challenge. In this work, we propose methods to use group-level binary labels as weak supervision to train instance-level binary classification models. Aggregate labels are common in several domains where annotating on a group-level might be cheaper or might be the only way to provide annotated data without infringing on privacy. We model group-level labels as Class Conditional Noisy (CCN) labels for individual instances and use the noisy labels to regularize predictions of the model trained on the strongly-labeled instances. Our experiments on real-world application of land cover mapping shows the utility of the proposed method in leveraging group-level labels, both in the presence and absence of class imbalance.
@article{nayak2021weakly,
  title={Weakly Supervised Classification Using Group-Level Labels},
  author={Nayak, Guruprasad and Ghosh, Rahul and Jia, Xiaowei and Kumar, Vipin},
  booktitle={2nd International Workshop on Data-Efficient Machine Learning (DeMaL), Knowledge Discovery and Data Mining (KDD) Conference 2021},
  year={2021}
}

Semi-supervised Classification using Attention-based Regularization on Coarse-resolution Data
Guruprasad Nayak, Rahul Ghosh, Xiaowei Jia, Varun Mithal, Vipin Kumar
SIAM International Conference on Data Mining, 2020
(Acceptance Rate: 19.3%)
preprint   abstract   Published version   bibtex   Code

Many real-world phenomena are observed at multiple resolutions. Predictive models designed to predict these phenomena typically consider different resolutions separately. This approach might be limiting in applications where predictions are desired at fine resolutions but available training data is scarce. In this paper, we propose classification algorithms that leverage supervision from coarser resolutions to help train models on finer resolutions. The different resolutions are modeled as different views of the data in a multi-view framework that exploits the complementarity of features across different views to improve models on both views. Unlike traditional multi-view learning problems, the key challenge in our case is that there is no one-to-one correspondence between instances across different views in our case, which requires explicit modeling of the correspondence of instances across resolutions. We propose to use the features of instances at different resolutions to learn the correspondence between instances across resolutions using an attention mechanism.Experiments on the real-world application of mapping urban areas using satellite observations and sentiment classification on text data show the effectiveness of the proposed methods.
@inbook{doi:10.1137/1.9781611976236.29,
author = {Guruprasad Nayak and Rahul Ghosh and Xiaowei Jia and Varun Mithafi and Vipin Kumar},
title = {Semi-supervised Classification using Attention-based Regularization on Coarse-resolution Data},
booktitle = {Proceedings of the 2020 SIAM International Conference on Data Mining},
chapter = {},
pages = {253-261},
doi = {10.1137/1.9781611976236.29},
URL = {https://epubs.siam.org/doi/abs/10.1137/1.9781611976236.29},
eprint = {https://epubs.siam.org/doi/pdf/10.1137/1.9781611976236.29}
}

Spatio-temporal classification at multiple resolutions using multi-view regularization
Guruprasad Nayak, Rahul Ghosh, Xiaowei Jia, Varun Mithal, Vipin Kumar
4th International Workshop on Big Spatial Data, IEEE BigData 2019
Published version   abstract   bibtex   Code

In this work, we present a multi-view framework to classify spatio-temporal phenomena at multiple resolutions. This approach utilizes the complementarity of features across different resolutions and improves the corresponding models by enforcing consistency of their predictions on unlabeled data. Unlike traditional multi-view learning problems, the key challenge in our case is that there is a many-to-one correspondence between instances across different resolutions, which needs to be explicitly modeled. Experiments on the real-world application of mapping urban areas using spatial raster data-sets from satellite observations show the benefits of the proposed multi-view framework.
@inproceedings{nayak2019spatio,
  title={Spatio-temporal classification at multiple resolutions using multi-view regularization},
  author={Nayak, Guruprasad and Ghosh, Rahul and Jia, Xiaowei and Mithal, Varun and Kumar, Vipin},
  booktitle={2019 IEEE International Conference on Big Data (Big Data)},
  pages={4117--4120},
  year={2019},
  organization={IEEE}
}

Automated assessment of knowledge hierarchy evolution: comparing directed acyclic graphs
Guruprasad Nayak, Sourav Dutta, Deepak Ajwani, Patrick Nicholson, Alessandra Sala
Information Retrieval Journal, 2019
preprint   Published version   abstract   bibtex   code

Automated construction of knowledge hierarchies from huge data corpora is gaining increasing attention in recent years, in order to tackle the infeasibility of manually extracting and semantically linking millions of concepts. As a knowledge hierarchy evolves with these automated techniques, there is a need for measures to assess its temporal evolution, quantifying the similarities between diferent versions and identifying the relative growth of diferent subgraphs in the knowledge hierarchy. In this paper, we focus on measures that leverage structural properties of the knowledge hierarchy graph to assess the temporal changes. We propose a principled and scalable similarity measure, based on Katz similarity between concept nodes, for comparing diferent versions of a knowledge hierarchy, modeled as a generic directed acyclic graph. We present theoretical analysis to depict that the proposed measure accurately captures the salient properties of taxonomic hierarchies, assesses changes in the ordering of nodes, along with the logical subsumption of relationships among concepts. We also present a linear time variant of the measure, and show that our measures, unlike previous approaches, are tunable to cater to diverse application needs. We further show that our measure provides interpretability, thereby identifying the key structural and logical diference in the hierarchies. Experiments on a real DBpedia and biological knowledge hierarchy showcase that our measures accurately capture structural similarity, while providing enhanced scalability and tunability. Also, we demonstrate that the temporal evolution of diferent subgraphs in this knowledge hierarchy, as captured purely by our structural measure, corresponds well with the known disruptions in the related subject areas.

@article{nayak2019automated,
  title={Automated assessment of knowledge hierarchy evolution: comparing directed acyclic graphs},
  author={Nayak, Guruprasad and Dutta, Sourav and Ajwani, Deepak and Nicholson, Patrick and Sala, Alessandra},
  journal={Information Retrieval Journal},
  volume={22},
  number={3-4},
  pages={256--284},
  year={2019},
  publisher={Springer}
}

Classifying Heterogeneous Sequential Data by Cyclic Domain Adaptation: An Application in Land Cover Detection
Xiaowei Jia, Guruprasad Nayak, Ankush Khandelwal, Anuj Karpatne, Vipin Kumar
SIAM International Conference on Data Mining, 2019
(Acceptance rate: 22.7%)
preprint   Published version   abstract   bibtex   code

Recent advances in processing remote sensing data have provided unprecedented potential for monitoring land covers. However, it is extremely challenging to deploy an automated monitoring system for different regions and across different years given the involved data heterogeneity over space and over time. The heterogeneity exists on two aspects. First, for many land covers, the distinguishing temporal patterns are only visible in certain discriminative period. Due to the change of weather conditions, the discriminative period can shift across space and time, which causes heterogeneity to the sequential data. Second, the collected remote sensing data are affected by acquisition devices and natural variables, e.g., precipitation and sunlight. In this paper, we introduce a novel framework to effectively detect land covers using the sequential remote sensing data. At the same time, we propose new learning strategies based on attention networks and domain adaptation to addresses the aforementioned challenges. The evaluation on two real-world applications - cropland mapping and burned area detection, demonstrate that the proposed method can effectively detect land covers under different weather conditions.

@inproceedings{jia2019classifying,
  title={Classifying Heterogeneous Sequential Data by Cyclic Domain Adaptation: An Application in Land Cover Detection},
  author={Jia, Xiaowei and Nayak, Guruprasad and Khandelwal, Ankush and Karpatne, Anuj and Kumar, Vipin},
  booktitle={Proceedings of the 2019 SIAM International Conference on Data Mining},
  pages={540--548},
  year={2019},
  organization={SIAM}
}

Spatial Context-Aware Networks for Mining Temporal Discriminative Period in Land Cover Detection
Xiaowei Jia, Sheng Li, Ankush Khandelwal, Guruprasad Nayak, Anuj Karpatne, Vipin Kumar
SIAM International Conference on Data Mining, 2019
(Acceptance rate: 22.7%)
preprint   Published version   abstract   bibtex   code

Detecting land use and land cover changes is critical to monitor natural resources and analyze global environmental changes. In this paper, we investigate the land cover detection using the remote sensing data from earth-observing satellites. Due to the natural disturbances, e.g., clouds and aerosoles, and the data acquisition errors by devices, remote sensing data frequently contain much noise. Also, many land covers cannot be easily identified in most dates of a year. Instead, they show distinctive temporal patterns only during certain period of a year, which is also referred to as the discriminative period. To address these challenges, we propose a novel framework which combines the spatial context knowledge with the LSTM-based temporal modeling for land cover detection. Specifically, the framework learns the spatial context knowledge selectively from its neighboring locations. Then we propose two approaches for discriminative period detection based on multi-instance learning and local attention mechanism, respectively. Our evaluations in two real-world applications demonstrate the effectiveness of the proposed method in identifying land covers and detecting discriminative periods.

@inproceedings{jia2019spatial,
  title={Spatial Context-Aware Networks for Mining Temporal Discriminative Period in Land Cover Detection},
  author={Jia, Xiaowei and Li, Sheng and Khandelwal, Ankush and Nayak, Guruprasad and Karpatne, Anuj and Kumar, Vipin},
  booktitle={Proceedings of the 2019 SIAM International Conference on Data Mining},
  pages={513--521},
  year={2019},
  organization={SIAM}
}

Automated Knowledge Hierarchy Assessment
Guruprasad Nayak, Sourav Dutta, Deepak Ajwani, Patrick Nicholson, Alessandra Sala
Second Workshop on Knowledge Graphs and Semantics for Text Retrieval, Analysis, and Understanding (KG4IR), SIGIR 2018
Published version   abstract   bibtex   code

Automated construction of knowledge hierarchies is gaining increasing attention to tackle the infeasibility of manually extracting and semantically linking millions of concepts. With the evolution of knowledge hierarchies, there is a need for measures to assess its temporal evolution, quantifying the similarities between different versions and identifying the relative growth of different subgraphs in the knowledge hierarchy. This work proposes a principled and scalable similarity measure, based on Katz similarity between concept nodes, for comparing knowledge hierarchies, modeled as generic Directed Acyclic Graphs (DAGs).

@inproceedings{nayak2018automated,
  title={Automated Knowledge Hierarchy Assessment},
  author={Nayak, Guruprasad and Dutta, Sourav and Ajwani, Deepak and others},
  booktitle={The Second Workshop on Knowledge Graphs and Semantics for Text Retrieval, Analysis, and Understanding (KG4IR), Michigan, USA, 12 July 2018},
  volume={2127},
  pages={59--60},
  year={2018},
  organization={CEUR Workshop Proceedings}
}

Classifying Multivariate Time Series by Learning Sequence-level Discriminative Patterns
Guruprasad Nayak, Varun Mithal, Xiaowei Jia, Vipin Kumar
SIAM International Conference on Data Mining, 2018
(Acceptance rate: 23.2%)
preprint   Published version   abstract   bibtex   code

Time series classification algorithms designed to use local context do not work on landcover classification problems where the instances of the two classes may often exhibit similar feature values due to the large natural variations in other land covers across the year and unrelated phenomena that they undergo. In this paper, we propose to learn discriminative patterns from the entire length of the time series, and use them as predictive features to identify the class of interest. We propose a novel neural network algorithm to learn the key signature of the class of interest as a function of the feature values together with the discriminative pattern made from that signature through the entire time series in a joint framework. We demonstrate the utility of this technique on the landcover classification application of burned area mapping that is of considerable societal importance.

@inproceedings{nayak2018classifying,
  title={Classifying multivariate time series by learning sequence-level discriminative patterns},
  author={Nayak, Guruprasad and Mithal, Varun and Jia, Xiaowei and Kumar, Vipin},
  booktitle={Proceedings of the 2018 SIAM International Conference on Data Mining},
  pages={252--260},
  year={2018},
  organization={SIAM}
}

Mapping Burned Areas in Tropical Forests Using a Novel Machine Learning Framework
Varun Mithal*, Guruprasad Nayak*, Ankush Khandelwal, Vipin Kumar, Ramakrishna Nemani, Nikunj Oza
Remote Sensing, 2018
preprint   Published version   abstract   bibtex   code   Web-viewer

This paper presents an application of a novel machine-learning framework on MODIS (moderate-resolution imaging spectroradiometer) data to map burned areas over tropical forests of South America and South-east Asia. The RAPT (RAre Class Prediction in the absence of True labels) framework is able to build data adaptive classification models using noisy training labels. It is particularly suitable when expert annotated training samples are difficult to obtain as in the case of wild fires in the tropics. This framework has been used to build burned area maps from MODIS surface reflectance data as features and Active Fire hotspots as training labels that are known to have high commission and omission errors due to the prevalence of cloud cover and smoke, especially in the tropics. Using the RAPT framework we report burned areas for 16 MODIS tiles from 2001 to 2014. The total burned area detected in the tropical forests of South America and South-east Asia during these years is 2,071,378 MODIS (500 m) pixels (approximately 520 K sq. km.), which is almost three times compared to the estimates from collection 5 MODIS MCD64A1 (783,468 MODIS pixels). An evaluation using Landsat-based reference burned area maps indicates that our product has an average user’s accuracy of 53% and producer’s accuracy of 55% while collection 5 MCD64A1 burned area product has an average user’s accuracy of 61% and producer’s accuracy of 27%. Our analysis also indicates that the two products can be complimentary and a combination of the two approaches is likely to provide a more comprehensive assessment of tropical fires. Finally, we have created a publicly accessible web-based viewer that helps the community to visualize the burned area maps produced using RAPT and examine various validation sources corresponding to every detected MODIS pixel.

@article{mithal2018mapping,
  title={Mapping burned areas in tropical forests using a novel machine learning framework},
  author={Mithal, Varun and Nayak, Guruprasad and Khandelwal, Ankush and Kumar, Vipin and Nemani, Ramakrishna and Oza, Nikunj},
  journal={Remote Sensing},
  volume={10},
  number={1},
  pages={69},
  year={2018},
  publisher={Multidisciplinary Digital Publishing Institute}
}

RAPT: Rare Class Prediction in Absence of True Labels
Varun Mithal, Guruprasad Nayak, Ankush Khandelwal, Vipin Kumar, Nikunj Oza, Ramakrishna Nemani
IEEE Transactions on Knowledge and Data Engineering, 2017
preprint   Published version   abstract   bibtex   code

Many real-world problems involve learning models for rare classes in situations where there are no gold standard labels for training samples but imperfect labels are available for all instances. In this paper, we present RAPT, a three step predictive modeling framework for classifying rare class in such problem settings. The first step of the proposed framework learns a classifier that jointly optimizes precision and recall by only using imperfectly labeled training samples. We also show that, under certain assumptions on the imperfect labels, the quality of this classifier is almost as good as the one constructed using perfect labels. The second and third steps of the framework make use of the fact that imperfect labels are available for all instances to further improve the precision and recall of the rare class. We evaluate the RAPT frameworkon two real-world applications of mapping forest fires and urban extent from earth observing satellite data. The experimental results indicate that RAPTcan be used to identifyforest fires and urban areas with high precision and recall by using imperfect labels, even though obtaining expert annotated samples on a global scale is infeasible in these applications.

@article{mithal2017rapt,
  title={Rapt: Rare class prediction in absence of true labels},
  author={Mithal, Varun and Nayak, Guruprasad and Khandelwal, Ankush and Kumar, Vipin and Oza, Nikunj C and Nemani, Ramakrishna},
  journal={IEEE Transactions on Knowledge and Data Engineering},
  volume={29},
  number={11},
  pages={2484--2497},
  year={2017},
  publisher={IEEE}
}

Predict Land Covers with Transition Modeling and Incremental Learning
Xiaowei Jia, Ankush Khandelwal, Guruprasad Nayak, James Gerber, Kimberley Carlson, Paul West, Vipin Kumar
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017
(Acceptance rate: 17.5%)
preprint   Published version   abstract   bibtex   code   Promotional video

Land cover prediction is essential for monitoring global environmental change. Unfortunately, traditional classification models are plagued by temporal variation and emergence of novel/unseen land cover classes in the prediction process. In this paper, we propose an LSTM-based spatio-temporal learning framework with a dual-memory structure. The dual-memory structure captures both long-term and short-term temporal variation patterns, and is updated incrementally to adapt the model to the ever-changing environment. Moreover, we integrate zero-shot learning to identify unseen classes even without labelled samples. Experiments on both synthetic zand real-world datasets demonstrate the superiority of the proposed framework over multiple baselines in land cover prediction.

@inproceedings{jia2017incremental,
  title={Incremental dual-memory lstm in land cover prediction},
  author={Jia, Xiaowei and Khandelwal, Ankush and Nayak, Guruprasad and Gerber, James and Carlson, Kimberly and West, Paul and Kumar, Vipin},
  booktitle={Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
  pages={867--876},
  year={2017},
  organization={ACM}
}

Predict Land Covers with Transition Modeling and Incremental Learning
Xiaowei Jia, Ankush Khandelwal, Guruprasad Nayak, James Gerber, Kimberley Carlson, Paul West, Vipin Kumar
SIAM International Conference on Data Mining, 2017
(Acceptance rate: 26.0%)
preprint   Published version   abstract   bibtex   code

Successful land cover prediction can provide promising insights in the applications where manual labeling is extremely difficult. However, traditional machine learning models are plagued by temporal variation and noisy features when directly applied to land cover prediction. Moreover, these models cannot take fully advantage of the spatio-temporal relationship involved in land cover transitions. In this paper, we propose a novel spatio-temporal framework to discover the transitions among land covers and at the same time conduct classification at each time step. Based on the proposed model, we incrementally update the model parameters in the prediction process, thus to mitigate the impact of the temporal variation. Our experiments in two challenging land cover applications demonstrate the superiority of the proposed method over multiple baselines. In addition, we show the efficacy of spatio-temporal transition modeling and incremental learning through extensive analysis.

@inproceedings{jia2017predict,
  title={Predict land covers with transition modeling and incremental learning},
  author={Jia, Xiaowei and Khandelwal, Ankush and Nayak, Guruprasad and Gerber, James and Carlson, Kimberly and West, Paul and Kumar, Vipin},
  booktitle={Proceedings of the 2017 SIAM International Conference on Data Mining},
  pages={171--179},
  year={2017},
  organization={SIAM}
}

Multiple Instance Learning for burned area mapping using multitemporal reflectance data
Guruprasad Nayak, Varun Mithal, Vipin Kumar
International Workshop on Climate Informatics, 2016
Published version   abstract   code

Mapping burned area on a global scale typically requires the use of a weak signal like Active Fire for training the burned scar classification model. Since these weak signals typically are inaccurate with respect to temporal and spatial pinpointing of the event occurrence, the use of Multiple instance learning paradigm to model the occurrence of the event in a wider spatio-temporal window is demonstrably beneficial than using the exact date of the weak signal. In this work, we demonstrate the use of MIL algorithm to model the temporal uncertainty of the weak signal. We further propose an noise-robust extension to the MIL paradigm for learning on sequence data.

Visitor Flag Counter

I set up this flag counter for curiosity. It shows counts since June 28, 2020. Flag Counter