Job Hunt

LeetCode resources and ML interview prep — for SWE and research roles.

āœļø

Notes on the Job Hunt

Timing, information, connection & mindset

→

LeetCode / åŠ›ę‰£

Job Openings / 岗位

Career pages for major AI labs — US and China.

Community Posts

Submitted by the community. Want to post? Reach out.

+ Submit a post
UrgentDeepSeekĀ·Research Intern — Reasoning & RL
InternshipHangzhoušŸ‡ØšŸ‡³

Working on LLM reasoning, post-training, and RL-based alignment. Background in math/CS preferred. 3–6 months, potential for return offer.

šŸ“© å¾®äæ”ęŠ•é€’ļ¼Œč”ē³»å…¬ä¼—å·åŽå°

2025-04
å­—čŠ‚č·³åŠØ Ā· SeedĀ·LLM Pre-training Intern
InternshipBeijingšŸ‡ØšŸ‡³

Joining the Seed foundation model team. Focus on data pipeline, training efficiency, and scaling experiments. Strong Python + distributed training background needed.

šŸ“© é‚®ä»¶ē®€åŽ†č‡³ seed-intern@bytedance.com

AnthropicĀ·Research Engineer Intern
InternshipSan Francisco, CAšŸ‡ŗšŸ‡ø

Summer internship on the interpretability or alignment team. Strong ML fundamentals required. US work authorization needed.

ML Interview / é¢ē»Ā·å…«č‚”

Core concepts with rigorous answers — for LLM researchers and practitioners.

01

Transformer & Attention

When d_k is large, the dot products QĀ·Kįµ€ grow in magnitude — their variance scales with d_k, pushing softmax into regions with very small gradients (saturation). Dividing by √d_k normalizes the variance back to ~1, keeping softmax in a stable gradient regime.

Scaled dot-product attention
Attention(Q,K,V)=softmax ⁣(QK⊤dk)V\text{Attention}(Q,K,V) = \text{softmax}\!\left(\frac{QK^\top}{\sqrt{d_k}}\right)V

Key insight: Concrete example: d_k = 64 → without scaling, dot products have std ā‰ˆ 8; after scaling by 1/√64 = 1/8, std ā‰ˆ 1.

02

Training & Optimization

03

Architecture Design

04

Training & Alignment

05

Inference & Deployment

20 questions across 5 categories. More coming soon.