2602.13902v1

J-PAS: Semi-Supervised Sim-to-Obs Transfer for Robust Star--Galaxy--Quasar Classification

Theme match 2/5

arXiv abstract page arXiv PDF Highlighted PDF

Daniel López-Cano, L. Raul Abramo, L. Nakazono, I. Pérez-Ràfols, G. Martínez-Solaeche, J. Chaves-Montero, Matthew M. Pieri, Jailson Alcaniz, Narciso Benitez, Silvia Bonoli, Saulo Carneiro, Javier Cenarro, David Cristóbal-Hornillos, Simone Daflon, Renato Dupke, Alessandro Ederoclite, Rosa González Delgado, Antonio Hernán-Caballero, Carlos Hernández-Monteagudo, Jifeng Liu, Carlos López-Sanjuan, Antonio Marín-Franch, Claudia Mendes de Oliveira, Mariano Moles, Fernando Roig, Laerte Sodré, Keith Taylor, Jesús Varela, Héctor Vázquez Ramió, Jose Vilchez, Javier Zaragoza-Cardiel

First listed 2026-02-14 | Last updated 2026-02-14

Abstract

Modern studies in astrophysics and cosmology increasingly rely on simulations and cross-survey analyses, yet differences in data generation, instrumentation, calibration, and unmodeled physics introduce distribution mismatches between datasets (``domain shift''). In machine-learning pipelines, this occurs when the joint distribution of inputs and labels differs between the training (source) and application (target) domains, causing source-trained models to underperform on the target. Transfer learning and domain adaptation provide principled ways to mitigate this effect. We study a concrete simulation-to-observation case: semi-supervised domain adaptation (SSDA) to transfer a four-class spectral classifier -- high-redshift quasars, low-redshift quasars, galaxies, and stars -- from J-PAS mock catalogs based on DESI spectra to real J-PAS observations. Our pipeline pretrains on abundant labeled DESI$\rightarrow$J-PAS mocks and adapts to the target domain using a small labeled J-PAS subset. We benchmark SSDA against two baselines: a J-PAS--only supervised model trained with the same target-label budget, and a mocks-only model evaluated on held-out J-PAS data. On this held-out J-PAS data, SSDA achieves a macro-F1 score (balancing precision and recall) of $0.82$ and an overall true positive rate of $0.89$, compared to $0.79/0.85$ for the J-PAS--only baseline and $0.73/0.87$ for the mocks-only model. The gains are driven primarily by improved quasar classification, especially in the high-redshift subclass ($\mathrm{F1}=0.66$ vs.\ $0.55/0.37$), yielding better-calibrated candidate lists for spectroscopic targeting (e.g., WEAVE-QSO) and AGN searches. This study shows how modest target supervision enables robust, data-efficient simulation-to-observation transfer when simulations are plentiful but target labels are scarce.

Short digest

Presents a semi-supervised domain adaptation pipeline that transfers a four-class spectral classifier (stars, galaxies, low‑z QSOs, high‑z QSOs) from DESI→J‑PAS mocks to real J‑PAS using a small labeled target set. On held‑out J‑PAS data it attains macro‑F1=0.82 and TPR=0.89, outperforming a J‑PAS‑only baseline (0.79/0.85) and a mocks‑only model (0.73/0.87), with the largest gains for high‑z quasars (F1=0.66 vs 0.55/0.37). The approach yields better‑calibrated quasar candidate lists for spectroscopic follow‑up (e.g., WEAVE‑QSO) and AGN searches when target labels are scarce. Results indicate efficient sim‑to‑obs transfer that boosts quasar purity at low FPR while keeping galaxy/star performance saturated.

Key figures to inspect

Figure 1: Inspect SED pairs (real J‑PAS solid vs DESI→J‑PAS mock dashed) per class to see band‑by‑band systematics and missing‑band behavior that drive domain shift.
Figure 2: Check per‑magnitude class balance differences between the full mock catalog and the labeled J‑PAS subset to understand label scarcity and potential magnitude‑dependent bias.
Figure 3: Compare the four confusion matrices to pinpoint which misclassifications are fixed by SSDA—especially leakage between high‑z QSO and galaxies—and read off per‑class TPR/PPV/F1.
Figure 4: Use the radar plot and per‑class ROC curves to quantify that SSDA primarily lifts both AUC and F1 for the quasar subclasses while leaving GALAXY/STAR essentially saturated in performance.

J-PAS: Semi-Supervised Sim-to-Obs Transfer for Robust Star--Galaxy--Quasar Classification

Abstract

Short digest

Key figures to inspect

Discussion