You are viewing a preview of this job. Log in or register to view more details about this job.

Mechanistic Interpretability Machine Learning Expert

Job Description

  • This is a remote, project-based role for machine learning researchers with deep expertise in mechanistic interpretability. You will complete tasks at the frontier of interpretability research — including analyzing internal model representations, reverse-engineering learned circuits, and developing tools and techniques to understand how neural networks compute. Work is over the next 2–3 weeks, asynchronous, and assigned on a project-by-project basis, with an expected commitment of 10–20 hours per week for the projects you accept. This position offers exceptional pay, exposure to cutting-edge AI safety and interpretability research, and a strong addition to your research portfolio.

 

Why Apply

  • Flexible Time Commitment – Work on your schedule while tackling meaningful research challenges
  • Startup Exposure – Work directly with an early-stage Y Combinator-backed company, gaining hands-on experience that sets you apart
  • Exceptional Pay – Project-based pay ranges from $150–$200/hour
  • Portfolio Building – Gain experience on frontier interpretability and AI safety research problems
  • Professional Growth – Sharpen your skills on varied, challenging model analysis and reverse-engineering tasks

 

Responsibilities

  • Conduct mechanistic interpretability research on transformer-based and other neural network architectures
  • Identify, isolate, and analyze computational circuits responsible for specific model behaviors
  • Apply and extend techniques such as activation patching, probing, sparse autoencoders, and attention analysis
  • Develop tools and frameworks to automate or scale interpretability workflows across model families
  • Document methodologies, findings, and technical approaches clearly and reproducibly

 

Required Qualifications

  • Published researcher with at least one first-author publication in a peer-reviewed venue (e.g., NeurIPS, ICML, ICLR, or equivalent)
  • Master's or PhD in Machine Learning, Artificial Intelligence, Computer Science, or a related quantitative field
  • Demonstrated expertise in mechanistic interpretability, model analysis, or AI safety research
  • Deep familiarity with transformer architectures and modern large language model internals
  • Strong problem-solving skills and ability to work independently on open-ended research tasks

 

Preferred Qualifications

  • Hands-on experience with interpretability tools and libraries (e.g., TransformerLens, baukit, or similar)
  • Familiarity with sparse autoencoders, superposition, and feature geometry research
  • Background in TA'ing or teaching deep learning, NLP, or AI safety courses

 

Company Description

  • AfterQuery is a research lab investigating the boundaries of artificial intelligence through novel datasets and experimentation. We're backed by top investors, including Y Combinator and Box Group, and support all leading AI labs.