← Back to Projects
May 8, 2026 · Applied project

Reasoning Data Distillation and Post-training Optimization

A desensitized project note about constructing reasoning data and improving model behavior through post-training.

Role: Data distillation, rule-based cleaning, training, evaluation, and failure analysis

Confidentiality: Domain details, internal data, and exact metrics are removed.

Data DistillationCoTSFTDPOReasoning

Problem

For complex domain tasks, simple text generation often produces shallow answers. The model may miss hidden constraints, make weak logical jumps, or fail to maintain reasoning consistency.

My Role

I worked across the full pipeline: data distillation, rule cleaning, training, evaluation, and failure analysis.

Approach

The workflow included:

  1. Using teacher models to generate reasoning-rich examples.
  2. Applying domain rules and APIs to clean invalid data.
  3. Building supervised fine-tuning data.
  4. Applying preference optimization to improve answer quality.
  5. Analyzing remaining reasoning failures.

What I Learned

Post-training is not just about adding more data. The hard part is deciding what kind of reasoning behavior should be rewarded, rejected, or rewritten.