CV
Work Experience
- Roblox, Senior Machine Learning Engineer (08/2025 - Present)
- End-to-end LLM post-training and fine-tuning for generative AI models for game development, powering Roblox developers.
- Training agentic AI models focused on code generation for game development in Lua.
- Designing and optimizing the full post-training stack: large-scale data extraction and cleaning, synthetic data generation, supervised fine-tuning, reinforcement learning with verifiable rewards and rubric-based scoring systems.
- Profiling and optimizing multi-node distributed RL training pipelines for large scale model training.
- Building evaluation frameworks to measure model quality across diverse code-generation and agentic tasks. Trajectory and tool-use based evaluation design.
- Maintaining and developing Open Game Eval, Roblox’s open-source LLM evaluation framework for game-development tasks.
- Google, Senior Research Data Scientist (06/2022 - 08/2025), Mountain View, CA
- LLM Evaluation Research, Google Search (AI Overviews):
- Researched and built algorithms for LLM-as-a-judge evaluation systems powering Google Search’s generative AI products.
- Trained and optimized judge models for automated scoring of LLM outputs at scale.
- Developed novel approaches to uncertainty quantification and calibration for LLM judges.
- Designed hybrid human-LLM evaluation algorithms that improved evaluation efficiency while maintaining quality alignment with human preferences.
- Monitored and analyzed judge-model performance across production deployments.
- Central Causal Inference Team:
- Google Ads: tech lead on ML models predicting effects of sales interventions on advertisers; work used to personalize and prioritize sales interventions, affecting millions of dollars in sales costs and operations.
- Google Maps: Bayesian ML models for understanding routing interventions, evaluated across the 10 biggest US cities via a novel crossover experiment design. Two co-authored papers with the Google Maps and Mobility AI Research team.
- YouTube: designed the YouTube Hype Small Creator bonus mechanism (covered by The Verge); creator-intervention models.
- Google Play: price-experimentation systems.
- Implemented and maintained internal R and Python packages for ML modeling and causal inference, used across multiple partner organizations.
- Talks at Joint Statistical Meetings, INFORMS, and the Causal AI Conference (invited speaker).
- Uber, Applied Scientist II (10/2021 - 06/2022), San Francisco, CA
- Experimentation Science Team. Maintained and enhanced Uber’s central experimentation platform processing hundreds of A/B tests daily.
- Implemented scalable variance-reduction techniques from recent statistics research, improving statistical power for experiment analysis at scale.
- Built features in Python and PySpark for Uber’s experimentation engine; conducted research on identifying spillovers in switchback experiments.
- Translated cutting-edge statistical methodology into production-grade software serving the entire organization.
- Facebook, Research Scientist Intern (06/2020 - 09/2020), Menlo Park, CA
- Designed, scoped, and executed an independent research project evaluating a major ads product launch using causal-inference and quasi-experimental methods on a mixture of observational and experimental data.
- Stanford University, Researcher and Teaching Assistant (09/2017 - 09/2021)
- PhD research at Stanford GSB combining machine learning, statistical modeling, and causal inference. Learning reliable signals out of noisy, high-dimensional real-world data and assigning credit correctly under selection and confounding.
- Built an end-to-end NLP pipeline processing ~6M newspaper articles using dynamic topic models and a novel influence metric to measure journalistic news content (PNAS 2021; dataset released on Harvard Dataverse).
- Developed hierarchical Bayesian ML models for predicting advertiser valuations in online auction platforms (ACM WWW 2022).
- Research assistantships with Shoshana Vasserman, Gregory J. Martin, Steven Callander, Takuo Sugaya, and Avidit Acharya.
- Teaching: Data-Driven Impact (Susan Athey & Niall Keleher — neural networks, word embeddings, topic models, matrix completion, recommendation systems); Data and Decisions (Peter Reiss — master’s-level statistics).
- Ph.D. Affiliate, Golub Capital Social Impact Lab.
- J.P. Morgan Chase & Co., Business Analyst (08/2016 - 09/2017), London
- Analysis to inform decisions and coordinate cross-regional response to client needs in preparation for Brexit.
- Boğaziçi University, Teaching Assistant (09/2013 - 05/2015)
- Mathematical statistics at introductory and advanced levels.
Education
- Stanford University, Ph.D., Graduate School of Business, 2021
- Stanford University, M.S., Statistics, 2020
- University of Cambridge, M.Phil. (Research), Economics, 2016
- Boğaziçi University, B.A., Economics (High Honors), 2015
Skills
- LLM post-training: synthetic data generation, supervised fine-tuning (SFT), reinforcement learning with verifiable and rubric-based rewards, reward function design, agentic AI & code generation.
- LLM evaluation: LLM-as-a-judge systems, uncertainty quantification and calibration for judges, hybrid human-LLM evaluation, trajectory and tool-based evaluation methods, conformal inference.
- Causal inference & statistics: Bayesian statistics, experimentation and variance reduction, quasi-experimental methods, machine-learning based synthetic control and panel data methods, nonparametric & ML based doubly-robust estimation, switchback / crossover experiment design, dose-reaponse function and continuous and marginal treatment effect estimation.
- Languages & tools: Python, R, Lua, SQL, Spark / PySpark, Julia, Unix; PyTorch, TensorFlow, Keras.
- Selected coursework (Ph.D. / M.S.): machine learning, statistical learning, Bayesian statistics, applied statistics, stochastic processes, design and analysis of algorithms, experiment design, causal inference, econometrics.
- Languages: English, Turkish, French, Japanese.