SAVE: Protagonist Diversification with Structure Agnostic Video Editing
We adopt motion personalization in video editing tasks, isolating the motion from a single source video and subsequently modifying the protagonist accordingly.
I am a Ph.D student at Seoul National University, under the supervision of Prof. Nojun Kwak.
My primary focus is on video & image generation, aiming to push the boundaries of their applications in real-world scenarios. Specifically, developing generative models that provide more diverse experiences to users is my central goal. My research interests also include a broader computer vision area, with experience spanning diffusion, video rendering, segmentation, and 3D object detection.
We adopt motion personalization in video editing tasks, isolating the motion from a single source video and subsequently modifying the protagonist accordingly.
There is a conflict among contextual embeddings in zero-shot T2I customization when varying the subject's pose. We resolve it by orthogonalization and attention swap.
With the increasing importance of discriminating machine-text from human text, we show the existence of backdoor path that confounds the relationships between text and its detection score.
We advance dynamic pruning by employing refined gradients to update the pruned weights, enhancing both training stability and the model performance.
In video scene rendering, we reformulate neural radiance fields to additionally consider consistency fields, enabling more efficient and controllable scene manipulation.
We utilize the Gaussian Mixture Model (GMM) in the 3D object detection task to predict the distribution of 3D bounding boxes, eliminating the need for laborious, hand-crafted anchor design.
We delve into data augmentation in 3D object detection, leveraging sophisticated and rich structural information present in 3D labels.