Paper |
Video |
Code |
Materials |
TL;DR: GuardSplat is an efficient watermarking framework to protect the copyright of 3DGS assets. It presents superior performance to state-of-the-art watermarking approaches in capacity, invisibility, robustness, security, and training efficiency.
Application scenarios of GuardSplat. To protect the copyright of 3D Gaussian Splatting (3DGS) assets, (a) the owners (Alice) can use our GuardSplat to embed the secret message (blue key) into these models. (b) If malicious users (Bob) render views for unauthorized uses, (c) Alice can use the private message decoder to extract messages (purple key) for copyright identification.
3D Gaussian Splatting (3DGS) has recently created impressive assets for various applications. However, the copyright of these assets is not well protected as existing watermarking methods are not suited for 3DGS considering security, capacity, and invisibility. Besides, these methods often require hours or even days for optimization, limiting the application scenarios. In this paper, we propose GuardSplat, an innovative and efficient framework that effectively protects the copyright of 3DGS assets. Specifically, 1) We first propose a CLIP-guided Message Decoupling Optimization module for training the message decoder, leveraging CLIP's aligning capability and rich representations to achieve a high extraction accuracy with minimal optimization costs, presenting exceptional capability and efficiency. 2) Then, we propose a Spherical-harmonic-aware (SH-aware) Message Embedding module tailored for 3DGS, which employs a set of SH offsets to seamlessly embed the message into the SH features of each 3D Gaussian while maintaining the original 3D structure. It enables the 3DGS assets to be watermarked with minimal fidelity trade-offs and prevents malicious users from removing the messages from the model files, meeting the demands for invisibility and security. 3) We further propose an Anti-distortion Message Extraction module to improve robustness against various visual distortions. Extensive experiments demonstrate that GuardSplat outperforms the state-of-the-art methods and achieves fast optimization speed.
(a) Bit accuracy and PSNR versus state-of-the-art methods with $N_L=32$ bits. The radius of circles is proportional to their total training time evaluated on RTX 3090 GPU. Our GuardSplat surpasses the competitors in bit accuracy and reconstrution quality. (b) Training accuracy curves with $N_L=32$ bits. Compared to existing methods that require hours or even days for optimization, our GuardSplat achieves much better efficiency, which only takes 5 and 20 minutes to train the message decoder and watermark a 3DGS asset, respectively.
Comparisons of four 3D watermarking frameworks. These frameworks differ in how to embed messages and how to train message decoders. (a) Directly training 3D models on the watermarked images. (b) Directly embedding messages and training a message decoder for extraction. (c) Employing the message decoder from a 2D watermarking network for optimization. (d) GuardSplat first trains a message decoder to extract messages from CLIP textual features. This message decoder then can be applied to the CLIP visual features for watermarking 3D models via optimization.
Overview of GuardSplat. (a) Given a binary message $M$, we first transform it into CLIP tokens $T$ using the proposed message tokenization. We then employ CLIP's textual encoder $\mathcal{E _T}$ to map $T$ to the textual feature $F _\mathcal{T}$. Finally, we feed $F _\mathcal{T}$ into message decoder $\mathcal{D _M}$ to extract the message $\hat{M}$ for optimization. (b) For each 3D Gaussian, we freeze all the attributes and build a learnable spherical harmonic (SH) offset $\boldsymbol{h}^o _i$ as the watermarked SH feature, which can be added to the original SH features as $\boldsymbol{h} _i + \boldsymbol{h}^o _i$ to render the watermarked views. (c) We first feed the 2D rendered views to CLIP's visual encoder $\mathcal{E _V}$ to acquire the visual feature $F _{\mathcal{V}}$ and then employ the pre-trained message decoder to extract the message $\hat{M}$. A differentiable distortion layer is used to simulate various visual distortions during optimization. $\mathcal{D _M}$ and $\boldsymbol{h}^o _i$ are optimized by the corresponding losses, respectively.
Lego
Chair
Fern
Room
@article{chen2024guardsplat,
author={Chen, Zixuan and Wang, Guangcong and Zhu, Jiahao and Lai, Jian-Huang and Xie, Xiaohua},
title={GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting},
year={2024},
journal={arXiv preprint},
}
Our work is built by gaussian-splatting and CLIP, and this project page is based on the website template provided by Lior Yariv. We sincerely appreciate their selfless spirits and contributions.