You are viewing a preview of this job. Log in or register to view more details about this job.

Mechanistic Interpretability Machine Learning Expert

Job Description

This is a remote, project-based role for machine learning researchers with deep expertise in mechanistic interpretability. You will complete tasks at the frontier of interpretability research — including analyzing internal model representations, reverse-engineering learned circuits, and developing tools and techniques to understand how neural networks compute. Work is over the next 2–3 weeks, asynchronous, and assigned on a project-by-project basis, with an expected commitment of 10–20 hours per week for the projects you accept. This position offers exceptional pay, exposure to cutting-edge AI safety and interpretability research, and a strong addition to your research portfolio.

Why Apply

Flexible Time Commitment – Work on your schedule while tackling meaningful research challenges
Startup Exposure – Work directly with an early-stage Y Combinator-backed company, gaining hands-on experience that sets you apart
Exceptional Pay – Project-based pay ranges from $150–$200/hour
Portfolio Building – Gain experience on frontier interpretability and AI safety research problems
Professional Growth – Sharpen your skills on varied, challenging model analysis and reverse-engineering tasks

Responsibilities

Conduct mechanistic interpretability research on transformer-based and other neural network architectures
Identify, isolate, and analyze computational circuits responsible for specific model behaviors
Apply and extend techniques such as activation patching, probing, sparse autoencoders, and attention analysis
Develop tools and frameworks to automate or scale interpretability workflows across model families
Document methodologies, findings, and technical approaches clearly and reproducibly

Required Qualifications

Published researcher with at least one first-author publication in a peer-reviewed venue (e.g., NeurIPS, ICML, ICLR, or equivalent)
Master's or PhD in Machine Learning, Artificial Intelligence, Computer Science, or a related quantitative field
Demonstrated expertise in mechanistic interpretability, model analysis, or AI safety research
Deep familiarity with transformer architectures and modern large language model internals
Strong problem-solving skills and ability to work independently on open-ended research tasks

Preferred Qualifications

Hands-on experience with interpretability tools and libraries (e.g., TransformerLens, baukit, or similar)
Familiarity with sparse autoencoders, superposition, and feature geometry research
Background in TA'ing or teaching deep learning, NLP, or AI safety courses

Company Description

AfterQuery is a research lab investigating the boundaries of artificial intelligence through novel datasets and experimentation. We're backed by top investors, including Y Combinator and Box Group, and support all leading AI labs.