Reconstructing Vision With Minimal fMRI Data: Cross-Subject Pretraining With MindEye2
We then map from CLIP space to pixel space by fine-tuning Stable Diffusion XL to accept CLIP latents as inputs instead of text. This approach ...









