
Background: Imaging Mass Cytometry (IMC) enables high-dimensional spatial profiling of complex tissue microenvironments, yet current analysis pipelines still rely heavily on unsupervised clustering followed by manual post-hoc annotation. While expert annotation is essential for assigning biological meaningful populations, the upstream clustering step is performed without biological priors and can be strongly influenced by technical variation, leading to clusters that are not immediately biologically interpretable. This workflow is labour-intensive, difficult to scale, and becomes increasingly challenging in large multi-sample datasets, where batch effects and data heterogeneity can obscure rare but biologically important cell populations and reduce the biological meaningfulness of clustering.
Objectives: Develop and evaluate a semi-supervised clustering tool that better aligns with the biological knowledge that scales with larger datasets, possibly reducing the cons of classical workflow.
Methods: We developed AltraFlowSOM, a semi-supervised extension of the FlowSOM framework. AltraFlowSOM extends the FlowSOM framework by incorporating Supervised Self Organizing Maps (SSOM’s) an existing supervised extension of SOM to integrate partial expert annotations directly into the clustering process, thereby guiding the formation of clusters toward biologically meaningful populations. The framework merges raw marker expression with biological knowledge layers (e.g., manual gating labels) using a multi-layer SSOM architecture. This guides cluster formation toward biologically relevant phenotypes while preserving the ability to discover previously unrecognized populations.
We benchmarked the performances across major analytical strategies, including:
Unsupervised (FlowSOM & Phenograph, with and without batch-effect correction)
Supervised learning (Random Forest)
AltraFlowSOM (semi-supervised)
Two independent IMC datasets were used for evaluation; Lupus Nephritis (n=22) and Sjögren’s Disease (n=10).
Results: AltraFlowSOM achieved the highest concordance with manually gated ground truth, outperforming FlowSOM, Phenograph (with and without batch correction), and Random Forest across Adjusted Rand Index (ARI), purity and F1-score metrics. High performance persisted even under full supervision and when only partial annotations were available; models trained on only a subset of samples performed comparably to those trained on fully annotated datasets. Using a leave-one-out strategy, training on (n-1) samples and testing on the remaining held-out sample, AltraFlowSOM demonstrated strong generalizability across unseen patient samples and across disease indications. The method improved resolution of rare and biologically meaningful cell populations, and produced phenotypes that aligned more closely with known immunobiology.
Conclusions: AltraFlowSOM integrates expert knowledge directly into cluster formation, bridging the gap between manual gating and automated phenotyping. It enhances biological interpretability, increases sensitivity to rare populations, and provides a scalable, generalizable tool for high-dimensional IMC data. This framework supports reproducible, biologically informed cellular phenotyping in autoimmune disease research and is broadly applicable across spatial and single-cell cytometry platforms.
REFERENCES: [1] Van Gassen, S., Callebaut, B., Van Helden, M. J., Lambrecht, B. N., Demeester, P., Dhaene, T., & Saeys, Y. (2015). FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data. Cytometry Part A, 87 (7), 636–645.
[2] Schapiro, D., Jackson, H. W., Raghuraman, S., Fischer, J. R., Zanotelli, V., Schulz, D.,.. & Bodenmiller, B. (2017). histoCAT: Analysis of cell phenotypes and interactions in multiplex image cytometry data. Nature Methods, 14(9), 873–876.
Acknowledgments: NIL.
Disclosure of Interests: None declared.