Cerebral tissue oximetry (CTO) based on near-infrared spectroscopy provides clinically relevant information on tissue perfusion and has been used specifically in infant monitoring and during cardiac surgeries. With different sensor designs in the selection and configuration of light sources and detectors and implemented analysis algorithms in tissue oxygen saturation (StO2) extraction, substantial differences were observed between different commercially available CTO device measurements. Tissue models (phantoms) mimicking a human head both optically and anatomically provide the controllable, safe, universally agreeable, and reproducible environment for the evaluation of CTO device performances. In this article, we implemented our realistic and dynamic multilayer mixed solid and liquid phantom design with controlled and repeatable test procedures in the evaluation of multiple commercially available CTO devices. Performances were evaluated for each CTO device within themselves, across each other and a reference device simultaneously under continuously changing oxygen saturation conditions. Results indicated the feasibility of our phantoms in CTO device performance testing and suggested that in general, CTO device measurements were very precise ( S_ res < 1.48 %) and highly correlated ( R^2 > 0.89 and p < 0.0001 ) but had varying levels of accuracy, sensitivity, static, and proportional bias ( 15.4%< A_ rms < 21.1 %, 0.31 < sensitivity (regression slope) <0.43, and 35.94< bias (regression intercept) <49.74). In addition, gauge repeatability and reproducibility (Gauge R&R) analysis provided that most of the CTO devices were within acceptable ranges with precision-to-tolerance (P/T) ratio <10%, a number of distinct categories (NDC) >5, and intraclass correlation coefficient (ICC) >80%.