Significance: Multi-laboratory initiatives are essential in performance assessment and standardization—crucial for bringing biophotonics to mature clinical use—to establish protocols and develop reference tissue phantoms that all will allow universal instrument comparison. Aim: The largest multi-laboratory comparison of performance assessment in near-infrared diffuse optics is presented, involving 28 instruments and 12 institutions on a total of eight experiments based on three consolidated protocols (BIP, MEDPHOT, and NEUROPT) as implemented on three kits of tissue phantoms. A total of 20 synthetic indicators were extracted from the dataset, some of them defined here anew. Approach: The exercise stems from the Innovative Training Network BitMap funded by the European Commission and expanded to include other European laboratories. A large variety of diffuse optics instruments were considered, based on different approaches (time domain/frequency domain/continuous wave), at various stages of maturity and designed for different applications (e.g., oximetry, spectroscopy, and imaging). Results: This study highlights a substantial difference in hardware performances (e.g., nine decades in responsivity, four decades in dark count rate, and one decade in temporal resolution). Agreement in the estimates of homogeneous optical properties was within 12% of the median value for half of the systems, with a temporal stability of textless5 % over 1 h, and day-to-day reproducibility of textless3 % . Other tests encompassed linearity, crosstalk, uncertainty, and detection of optical inhomogeneities. Conclusions: This extensive multi-laboratory exercise provides a detailed assessment of near-infrared Diffuse optical instruments and can be used for reference grading. The dataset—available soon in an open data repository—can be evaluated in multiple ways, for instance, to compare different analysis tools or study the impact of hardware implementations.