CV

Work Experience

Roblox, Senior Machine Learning Engineer (08/2025 - Present)
- End-to-end LLM post-training and fine-tuning for generative AI models for game development, powering Roblox developers.
- Training agentic AI models focused on code generation for game development in Lua.
- Designing and optimizing the full post-training stack: large-scale data extraction and cleaning, synthetic data generation, supervised fine-tuning, reinforcement learning with verifiable rewards and rubric-based scoring systems.
- Profiling and optimizing multi-node distributed RL training pipelines for large scale model training.
- Building evaluation frameworks to measure model quality across diverse code-generation and agentic tasks. Trajectory and tool-use based evaluation design.
- Maintaining and developing Open Game Eval, Roblox’s open-source LLM evaluation framework for game-development tasks.
Google, Senior Research Data Scientist (06/2022 - 08/2025), Mountain View, CA
- LLM Evaluation Research, Google Search (AI Overviews):
  - Researched and built algorithms for LLM-as-a-judge evaluation systems powering Google Search’s generative AI products.
  - Trained and optimized judge models for automated scoring of LLM outputs at scale.
  - Developed novel approaches to uncertainty quantification and calibration for LLM judges.
  - Designed hybrid human-LLM evaluation algorithms that improved evaluation efficiency while maintaining quality alignment with human preferences.
  - Monitored and analyzed judge-model performance across production deployments.
- Central Causal Inference Team:
  - Google Ads: tech lead on ML models predicting effects of sales interventions on advertisers; work used to personalize and prioritize sales interventions, affecting millions of dollars in sales costs and operations.
  - Google Maps: Bayesian ML models for understanding routing interventions, evaluated across the 10 biggest US cities via a novel crossover experiment design. Two co-authored papers with the Google Maps and Mobility AI Research team.
  - YouTube: designed the YouTube Hype Small Creator bonus mechanism (covered by The Verge); creator-intervention models.
  - Google Play: price-experimentation systems.
  - Implemented and maintained internal R and Python packages for ML modeling and causal inference, used across multiple partner organizations.
  - Talks at Joint Statistical Meetings, INFORMS, and the Causal AI Conference (invited speaker).
Uber, Applied Scientist II (10/2021 - 06/2022), San Francisco, CA
- Experimentation Science Team. Maintained and enhanced Uber’s central experimentation platform processing hundreds of A/B tests daily.
- Implemented scalable variance-reduction techniques from recent statistics research, improving statistical power for experiment analysis at scale.
- Built features in Python and PySpark for Uber’s experimentation engine; conducted research on identifying spillovers in switchback experiments.
- Translated cutting-edge statistical methodology into production-grade software serving the entire organization.
Facebook, Research Scientist Intern (06/2020 - 09/2020), Menlo Park, CA
- Designed, scoped, and executed an independent research project evaluating a major ads product launch using causal-inference and quasi-experimental methods on a mixture of observational and experimental data.
Stanford University, Researcher and Teaching Assistant (09/2017 - 09/2021)
- PhD research at Stanford GSB combining machine learning, statistical modeling, and causal inference. Learning reliable signals out of noisy, high-dimensional real-world data and assigning credit correctly under selection and confounding.
- Built an end-to-end NLP pipeline processing ~6M newspaper articles using dynamic topic models and a novel influence metric to measure journalistic news content (PNAS 2021; dataset released on Harvard Dataverse).
- Developed hierarchical Bayesian ML models for predicting advertiser valuations in online auction platforms (ACM WWW 2022).
- Research assistantships with Shoshana Vasserman, Gregory J. Martin, Steven Callander, Takuo Sugaya, and Avidit Acharya.
- Teaching: Data-Driven Impact (Susan Athey & Niall Keleher — neural networks, word embeddings, topic models, matrix completion, recommendation systems); Data and Decisions (Peter Reiss — master’s-level statistics).
- Ph.D. Affiliate, Golub Capital Social Impact Lab.
J.P. Morgan Chase & Co., Business Analyst (08/2016 - 09/2017), London
- Analysis to inform decisions and coordinate cross-regional response to client needs in preparation for Brexit.
Boğaziçi University, Teaching Assistant (09/2013 - 05/2015)
- Mathematical statistics at introductory and advanced levels.

Education

Stanford University, Ph.D., Graduate School of Business, 2021
Stanford University, M.S., Statistics, 2020
University of Cambridge, M.Phil. (Research), Economics, 2016
Boğaziçi University, B.A., Economics (High Honors), 2015

Skills

LLM post-training: synthetic data generation, supervised fine-tuning (SFT), reinforcement learning with verifiable and rubric-based rewards, reward function design, agentic AI & code generation.
LLM evaluation: LLM-as-a-judge systems, uncertainty quantification and calibration for judges, hybrid human-LLM evaluation, trajectory and tool-based evaluation methods, conformal inference.
Causal inference & statistics: Bayesian statistics, experimentation and variance reduction, quasi-experimental methods, machine-learning based synthetic control and panel data methods, nonparametric & ML based doubly-robust estimation, switchback / crossover experiment design, dose-reaponse function and continuous and marginal treatment effect estimation.
Languages & tools: Python, R, Lua, SQL, Spark / PySpark, Julia, Unix; PyTorch, TensorFlow, Keras.
Selected coursework (Ph.D. / M.S.): machine learning, statistical learning, Bayesian statistics, applied statistics, stochastic processes, design and analysis of algorithms, experiment design, causal inference, econometrics.
Languages: English, Turkish, French, Japanese.

Eray Turkel, Ph.D.

CV

Work Experience

Education

Skills