Open to Summer & Fall 2026 AI/ML internships · relocation welcome

Aman Pandey

Master's candidate in Data Science at Arizona State (4.00 GPA) with 4+ years shipping machine-learning and data systems in production. Building toward agentic AI and large-scale ML.

4.00 GPA @ ASU
2M+ DAILY TXNS
114 LANGUAGES · NLP
MS · DEC 2026

View projects Résumé

TEMPE, AZ

About

Four years of shipping production ML,
now building toward research depth.

Before graduate school I spent four years as a software and ML engineer in Delhi, architecting distributed task queues that processed two million transactions a day, then building a 114-language NLP pipeline that reshaped how an entertainment studio localized its catalog.

At ASU I've carried that practitioner instinct into research-leaning work: 450+ experiments characterizing Temporal Fusion Transformers on financial time series, a diffusion framework for privacy-preserving CGM data, and a market-basket system on 32.4M transactions.

Off the clock I'm usually deep in a paper I don't strictly need to read (lately, why a Temporal Fusion Transformer refuses to stabilise on noisy financial returns), or annotating a plot that already works. I keep a shortlist of side projects I'll probably never finish; the good ones tend to finish themselves once the right question shows up.

Education

Coursework that earns the GPA behind it.

Arizona State University

M.S., Data Science, Analytics and Engineering

Jan 2025 – Dec 2026 · Tempe, AZ

GPA: 4.00 / 4.00
Credits: 28
Grad: Dec 2026

Coursework · 9 completed · 1 in progress›

CSE 511Data Processing at Scale2025 SpringA+
CSE 572Data Mining2025 SpringA+
DSE 501Statistics for Data Analysts2025 SpringA
CSE 575Statistical Machine Learning2025 FallA
DSE 506Computing Data-Driven Optimization2025 FallA+
EEE 598Deep Learning: Foundations & Applications2025 FallA
CSE 543Information Assurance & Security2026 SpringA
CSE 571Artificial Intelligence2026 SpringA-
EEE 515Machine Vision & Pattern Recognition2026 SpringA
FSE 570Data Science Capstone2026 Fallin progress

Amity University

B.Tech., Computer Science

Jul 2016 – May 2020 · Noida, India

Semester abroad

Adelphi University
Garden City, NY
Birkbeck, University of London
London, UK

Where I've shipped

Four years of shipped production systems.

Distributed queues, multilingual NLP pipelines, and churn analytics. The places where the model's RMSE matters a lot less than whether the pipeline stayed up.

Software Engineer · My Next Film
Apr 2023 – Dec 2024 · New Delhi, India
- Engineered a multilingual NLP pipeline supporting 114 languages using seq2seq Transformers on AWS with Google and Azure speech APIs, improving translation accuracy by 76% and reducing manual review costs.
- Built a reviewer web app with automated task allocation that cut project cycle time by 41% and lifted translation quality by 20%.
- Automated 400+ voice narrations with Amazon Polly, matching accent and timbre to character profiles across markets.
PythonPyTorchseq2seq TransformersAWS (EC2, S3, Lambda)Google/Azure SpeechTableau
Data Analyst · Youth Buzz
Sep 2022 – Mar 2023 · Noida, India
- Lifted Net Promoter Score by +10 through analytics-driven strategy; automated survey reporting with zero-shot NLP classification and shipped Power BI dashboards for leadership.
- Built churn prediction models (logistic regression, 0.84 AUC) on 50K+ customer records and ran RFM clustering to identify fee-driven attrition.
- Advised a retention strategy that cut attrition ~50% in fee-sensitive cohorts within one quarter.
Pythonpandasscikit-learnSQLPower BIZero-shot NLP
Software Developer · Invesca Technology
Dec 2020 – Jul 2022 · Noida, India
- Architected Celery and Redis distributed task queues processing 2M+ daily transactions, cutting pipeline latency by 40% and sustaining 99.9% SLA across peak windows at 10x normal traffic.
- Built log analytics dashboards and multithreaded Python services that raised backend throughput by 35%.
- Automated anomaly detection alerts that prevented overload incidents during peak campaign operations.
PythonCeleryRedisDistributed systemsMultithreadingLog analytics

Things I've built

A research-leaning portfolio that ships.

Three projects that each demonstrate something different: graduate-grade rigor, end-to-end analytics at scale, and generative modeling where data can't leave the room.

01 · Deep learning · Time series · Finance

2026

FinFusion

Result59.1%Directional accuracy · weekly · 9-fold rolling.

Temporal Fusion Transformers for S&P 500 return forecasting, with 450+ experiments and documented negative results.

PyTorch Lightningpytorch-forecastingPythonFRED APIyfinance

Case studyGitHub

02 · Generative modeling · Biomedical time series

2026

GlucoCastIn progress

Result18%RMSE improvement vs. LSTM / CNN baselines.

Conditional diffusion framework for privacy-preserving blood glucose forecasting.

PyTorchDiffusion modelsOhioT1DM datasetConditional generation

Case study

03 · Analytics · Recommendation · Unsupervised

2026

BasketIQ

Result32.4MInstacart transactions analysed.

Market-basket analysis and customer segmentation at 32.4M-transaction scale.

Pythonpandasmlxtend (Apriori)scikit-learnChart.js

Case studyGitHub

Further projects

NLP · Personality computing

Traitlytics

Predicting Big-Five personality traits from LinkedIn text with fine-tuned BERT.

Biosignals · Generative audio · Mobile

Pulse2SymphonyIn progress

Biosignal-conditioned music generation from smartphone-camera PPG.

Computer vision · Real-time systems

Gaze-Tracker

Real-time driver drowsiness detection from facial landmarks on CPU.

What I reach for

Tools I reach for.

Grouped honestly. Chips I haven't shipped to production recently are tagged as such elsewhere on this site. Nothing inflated to fill a keyword list.

Languages & core

PythonSQLRJavaGitShell

NLP & generative

LLMsBERT · RoBERTaseq2seq TransformersZero-shot classificationAgentic AIMulti-agent systemsAgentic pipelinesLangChainRetrieval-Augmented GenerationRAG orchestrationVector embeddingsModel Context Protocol (MCP)

MLOps & cloud

DockerKubernetesCI/CDFastAPICelery · RedisAWS (EC2, S3, Lambda, API Gateway)AzureDistributed systems

Machine learning & deep learning

PyTorchTensorFlowTransformersLSTM · CNN · RNNDiffusion modelsscikit-learnFeature engineeringStatistical inference

Data & analytics

pandasNumPy · SciPyPySparkPostgreSQLBigQuerySnowflakeETL / ELTData modeling

Visualization & BI

TableauPower BILooker StudioMatplotlib · SeabornChart.jsGoogle Analytics

Get in touch

Let’s talk about ML engineering, research, or a problem you’re stuck on.

The fastest path is email. For recruiter outreach, LinkedIn works too. For code-flavored conversations, GitHub issues and DMs are fine.

Email · preferred

amanpandey.ds@gmail.com

I reply within a day or two. Include what you’re working on. The more specific the question, the better the answer.

in/amanpandeyy

→

GitHub

aman-720

→

Phone

+1 (508) 373-8918

→

X / Twitter

@aman_720

→

Or download the PDF:

Résumé