Style4D-Bench: A Benchmark Suite for 4D Stylization
- Beiqi Chen *1,2
- Shuai Shao *2
- Haitang Feng 3,2
- Jianhuang Lai 4
- Jianlou Si 5†
-
Guangcong Wang
2†
3Nanjing University 4Sun Yat-Sen University 5Alibaba Group
Abstract
We introduce Style4D-Bench, the first benchmark suite specifically designed for 4D stylization, with the goal of standardizing evaluation and facilitating progress in this emerging area. Style4D-Bench comprises: 1) a strong baseline that make an initial attempt for 4D stylization, 2) a comprehensive evaluation protocol measuring spatial fidelity, temporal coherence, and multi-view consistency through both perceptual and quantitative metrics, and 3) a curated collection of high-resolution dynamic 4D scenes with diverse motions and complex backgrounds. To establish a strong baseline, we present Style4D, a novel framework built upon 4D Gaussian Splatting. It consists of three key components: a basic 4DGS scene representation to capture reliable geometry, a Style Gaussian Representation that leverages lightweight per-Gaussian MLPs for temporally and spatially aware appearance control, and a Holistic Geometry-Preserved Style Transfer module designed to enhance spatio-temporal consistency via contrastive coherence learning and structural content preservation. Extensive experiments on Style4D-Bench demonstrate that Style4D achieves state-of-the-art performance in 4D stylization, producing fine-grained stylistic details with stable temporal dynamics and consistent multi-view rendering. We expect Style4D-Bench to become a valuable resource for benchmarking and advancing research in stylized rendering of dynamic 3D scenes.
Framework
Framework Overview: Style4D consists of three key components, a basic 4DGS representation, a Style Gaussian Representation, and a Holistic Geometry-preserved Style Transfer. We first train a basic 4DGS representation with the content image to obtain 4D scene geometry. Then we propose a new Style Gaussian Representation for 4D stylization. We also introduce a Holistic Geometry-preserved Style Transfer module to improve consistency and quality of stylization.
Demo Video
Results: more stylized results
Citation
@misc{chen2025style4dbenchbenchmarksuite4d,
title = {Style4D-Bench: A Benchmark Suite for 4D Stylization},
author = {Beiqi Chen and Shuai Shao and Haitang Feng and Jianhuang Lai and Jianlou Si and Guangcong Wang},
year = {2025},
eprint = {2508.19243},
archivePrefix= {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2508.19243},
}
Related Links
SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections.
CaG: Traditional Classification Neural Networks are Good Generators: They are Competitive with DDPMs and GANs.
Text2light: Zero-Shot Text-Driven HDR Panorama Generation.
StyleLight generates HDR indoor panorama from a limited FOV image.
Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis.
AvatarCLIP proposes a zero-shot text-driven framework for 3D avatar generation and animation.
Text2Human proposes a text-driven controllable human image generation framework.
Relighting4D can relight human actors using the HDRI generated by us.
Acknowledgements
The computational resources are supported by SongShan Lake HPC Center (SSL-HPC) in Great Bay University. This work was also supported by Guangdong Research Team for Communication and Sensing Integrated with Intelligent Computing (Project No. 2024KCXTD047).
The website template is borrowed from SparseNeRF.