2602.13902v1
J-PAS: Semi-Supervised Sim-to-Obs Transfer for Robust Star--Galaxy--Quasar Classification
First listed 2026-02-14 | Last updated 2026-02-14
Abstract
Modern studies in astrophysics and cosmology increasingly rely on simulations and cross-survey analyses, yet differences in data generation, instrumentation, calibration, and unmodeled physics introduce distribution mismatches between datasets (``domain shift''). In machine-learning pipelines, this occurs when the joint distribution of inputs and labels differs between the training (source) and application (target) domains, causing source-trained models to underperform on the target. Transfer learning and domain adaptation provide principled ways to mitigate this effect. We study a concrete simulation-to-observation case: semi-supervised domain adaptation (SSDA) to transfer a four-class spectral classifier -- high-redshift quasars, low-redshift quasars, galaxies, and stars -- from J-PAS mock catalogs based on DESI spectra to real J-PAS observations. Our pipeline pretrains on abundant labeled DESI$\rightarrow$J-PAS mocks and adapts to the target domain using a small labeled J-PAS subset. We benchmark SSDA against two baselines: a J-PAS--only supervised model trained with the same target-label budget, and a mocks-only model evaluated on held-out J-PAS data. On this held-out J-PAS data, SSDA achieves a macro-F1 score (balancing precision and recall) of $0.82$ and an overall true positive rate of $0.89$, compared to $0.79/0.85$ for the J-PAS--only baseline and $0.73/0.87$ for the mocks-only model. The gains are driven primarily by improved quasar classification, especially in the high-redshift subclass ($\mathrm{F1}=0.66$ vs.\ $0.55/0.37$), yielding better-calibrated candidate lists for spectroscopic targeting (e.g., WEAVE-QSO) and AGN searches. This study shows how modest target supervision enables robust, data-efficient simulation-to-observation transfer when simulations are plentiful but target labels are scarce.
Short digest
Presents a semi-supervised domain adaptation pipeline that transfers a four-class spectral classifier (stars, galaxies, low‑z QSOs, high‑z QSOs) from DESI→J‑PAS mocks to real J‑PAS using a small labeled target set. On held‑out J‑PAS data it attains macro‑F1=0.82 and TPR=0.89, outperforming a J‑PAS‑only baseline (0.79/0.85) and a mocks‑only model (0.73/0.87), with the largest gains for high‑z quasars (F1=0.66 vs 0.55/0.37). The approach yields better‑calibrated quasar candidate lists for spectroscopic follow‑up (e.g., WEAVE‑QSO) and AGN searches when target labels are scarce. Results indicate efficient sim‑to‑obs transfer that boosts quasar purity at low FPR while keeping galaxy/star performance saturated.
Key figures to inspect
- Figure 1: Inspect SED pairs (real J‑PAS solid vs DESI→J‑PAS mock dashed) per class to see band‑by‑band systematics and missing‑band behavior that drive domain shift.
- Figure 2: Check per‑magnitude class balance differences between the full mock catalog and the labeled J‑PAS subset to understand label scarcity and potential magnitude‑dependent bias.
- Figure 3: Compare the four confusion matrices to pinpoint which misclassifications are fixed by SSDA—especially leakage between high‑z QSO and galaxies—and read off per‑class TPR/PPV/F1.
- Figure 4: Use the radar plot and per‑class ROC curves to quantify that SSDA primarily lifts both AUC and F1 for the quasar subclasses while leaving GALAXY/STAR essentially saturated in performance.
Discussion
Log in to view the paper discussion, see votes, and leave your own feedback.