Experience

July 2022 - Present Apple, Cambridge, MA
Researcher, AIML Resident

User-led Semi-supervised Learning via Noisy Signals

The ground truth labels for any data is often noisy or biased leading to a disconnect between model training and user preferences. To address this we work to develop novel methods that detect and mitigate uncertainties/biases of labeled data. It can furthermore improve semi-supervised learning by characterizing the noise in the labels for Seq2seq models, e.g., ASR, LLM.

Understanding Adaptive Optimizers in Federated Learning for E2E ASR

To address research gaps in Federated Learning (FL) for End-to-End Automatic Speech Recognition (E2E ASR), we review conventional FL research, revealing that adaptive optimizers enhance coordination among heterogeneous client updates and some optimizers induce smoother optimization subspace thus improving the performance of FL system (see Publications - C6).

Federated Learning with Differential Privacy for End-to-End Speech Recognition

We work on first steps towards having on-device training for ASR with federated learning and privacy preservation to deliver privacy to users and improve ASR with using more data / context from on-device training (see Publications - P1, W2).

August 2019 - July 2022 Purdue University
Graduate Research Assistant (RA)

Learning from Partially-observed Multimodal Data (Sponsored by HP Lab)

Developing unsupervised techniques to learn from partially observed multimodal datasets. The aim is to learn high quality latent representation of datasets (with missing modalities) using self-supervised, unsupervised techniques.

Efficient Clustering of Document in Clustered Vector Spaces (Sponsored by HP Lab)

Developed a novel technique for interpretable document clustering to help curate data for downstream applications such as personalization of marketing campaigns for focus groups.

Federated Private Representation Learning (Sponsored by Northrupp Grumman)

Private representation learning using Generative Adversarial Networks (GANs) on distributed datasets using differentially private parameter sharing (see Publications - C3).

Link Prediction in Social Learning Networks

Graph neural network based link prediction in social learning networks for connection recommendations using both learnt and explicit network metrics.

Publications and projects

Conference publications: C4 - C2
Journal publications: J3 - J2
Workshop publications: W1
Projects: P8 - P5

June 2021 - August 2021 Zillow Group, Seattle, WA
Applied Scientist-Intern

Unsupervised Multimodal Representation Learning and Finetuning for Document Understanding and Scene Attribute Recognition

Developed an unsupervised multimodal (visual, lingual, spatial, etc. modes) representation learning framework that leverages the unlabeled raw documents (e.g. property documents) and weakly labeled image dataset (e.g. listings images and descriptions).
The learned transformations help improve performance on several downstream few-shot learning tasks including sequence classification, token classification, image attribute detection, and localization etc.

September 2018 - August 2019 Foundation AI, Los Angeles, CA
Research Scientist

Computer Vision and NLP for Document Understanding

Development of novel CV methods for document analysis and OCR using GANs, CNNs and Graph convolutions.
Developed key-value pair extraction NLP model leveraging link prediction techniques on unstructured documents.

Publications and projects

Conference publications: C1
Projects: O5

June 2015 - August 2018 Practo Technologies, Bangalore, India
Data Scientist, Senior Software Engineer, Software Engineer

Computer Vision for Medical Imaging

Developed novel CV models for diagnosing lung-cancer, brain tumor, and diabetic retinopathy using radiology images.
Developed NLP solutions using LSTM and attention based deep learning methods for 90,000-class classification.
Developed semi-supervised text classifier for highlighting important phrases in clinical documents.

NLP for Automated Medical Document Annotation

Developed methods such as Q-Map for fast retrieval of concepts from text documents (see J1) and CASCADENET - a hierarchical deep neural network for massively categorical classification (see C1) - for automated coding ICD-10 (International Code for Diseases), CPT (Current ProceduralTerminology) of clinical documents.
The Q-Map and CASCADENET play a vital role in development of subsequent clinical decision support system and other solutions with application in insurance claims automation, disease trend and epidemic outbreak characterization.

Publications and projects

Journal publications: J1
Projects: O4, O1

Sheikh Shams Azam