CryoFastAR: Fast Cryo-EM Ab Initio Reconstruction Made Easy

Accepted to ICCV 2025

Jiakai Zhang^1,2, Shouchen Zhou^1,2, Haizhao Dai^1,2, Xinhang Liu³, Peihao Wang⁴, Zhiwen Fan⁴, Yuan Pei¹, and Jingyi Yu¹

¹ShanghaiTech University ²Cellverse Co., Ltd. ³Hong Kong University of Science and Technology ⁴The University of Texas at Austin

Paper arXiv Code

TL;DR: CryoFastAR turns noisy cryo-EM particle stacks into high-fidelity ab initio reconstructions with a single feed-forward pass—no iterative pose search or estimation required.

CryoFastAR overview illustration showing feed-forward cryo-EM reconstruction.

CryoFastAR performs feed-forward pose estimation and reconstruction. By leveraging multi-view integration and progressive training on large-scale simulated cryo-EM data, CryoFastAR predicts particle poses directly and dramatically accelerates ab initio reconstruction compared with iterative solvers.

Abstract

Pose estimation from unordered images is fundamental for 3D reconstruction, robotics, and scientific imaging. Recent geometric foundation models enable end-to-end dense 3D reconstruction but remain underexplored for cryo-electron microscopy (cryo-EM), where reconstruction still depends on slow iterative optimization. We introduce CryoFastAR, the first geometric foundation model that directly predicts poses from cryo-EM particle images for fast ab initio reconstruction. CryoFastAR integrates multi-view features and is trained on large-scale simulated cryo-EM data with realistic noise and CTF modulation. A progressive training strategy stabilizes learning by starting from simplified settings before gradually increasing difficulty. Experiments on synthetic and real datasets show that CryoFastAR matches or exceeds reconstruction quality while significantly accelerating inference compared with traditional pipelines. Code, models, and datasets are released to spur further research.

Demo Video

Pipeline

CryoFastAR employs a ViT-Large encoder, a decoder with view integration and updating blocks, and downstream heads for pose regression. Training progresses from clean two-view projections to noisy, CTF-augmented multi-view images, and finally to real particle data, ensuring robust generalization. The model processes large batches of views simultaneously, delivering feed-forward pose predictions ready for refinement.

Qualitative Results

Synthetic Results

Synthetic reconstructions comparing CryoFastAR with traditional iterative solvers across multiple molecules.

CryoFastAR preserves fine structural details across simulated complexes while avoiding the artifacts that iterative pose-bootstrapping methods sometimes introduce. Visual comparisons highlight sharper density recovery for flexible regions and cleaner background suppression on challenging molecules.

Experimental Results

Qualitative comparison on the experimental spliceosome dataset showing CryoFastAR capturing more complete structures.

On the experimental spliceosome dataset, CryoFastAR reconstructs cohesive densities that trace helices and peripheral domains, while iterative baselines struggle with missing or noisy regions. Feed-forward predictions offer a clean starting volume that downstream refinement can further polish.

Ablations & Training Insights

Ablation study results showing gains from curriculum stages and architectural choices.

Ablation studies demonstrate that staged training, multi-view integration, and high-capacity encoders collectively reduce rotation and translation errors. Removing curriculum steps or view-fusion modules leads to noticeable drops in accuracy, validating the end-to-end design decisions reported in the paper.

BibTeX

@inproceedings{zhang2025cryofastar,
  author    = {Zhang, Jiakai and Zhou, Shouchen and Dai, Haizhao and Liu, Xinhang and Wang, Peihao and Fan, Zhiwen and Pei, Yuan and Yu, Jingyi},
  title     = {CryoFastAR: Fast Cryo-EM Ab Initio Reconstruction Made Easy},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      = {2025}
}