ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation
We developed a new real-image editing approach for ReFlow, leveraging analysis of intermediate representations
I am a Ph.D student at Seoul National University, under the supervision of Prof. Nojun Kwak.
My primary focus is on video & image generation, aiming to push the boundaries of their applications in real-world scenarios. Specifically, developing generative models that provide more diverse experiences to users is my central goal. My research interests also include a broader computer vision area, with experience spanning diffusion, video rendering, segmentation, and 3D object detection.
We developed a new real-image editing approach for ReFlow, leveraging analysis of intermediate representations
There is a conflict among contextual embeddings in zero-shot T2I customization when varying the subject's pose. We resolve it by orthogonalization and attention swap.
We aim to prevent unauthorized T2I customization by guiding the weight trajectory of protected data to ensure it outputs only the precise targeted image.
We adopt a personalization framework for video editing tasks, isolating the motion as a concept from a source video and subsequently modifying the protagonist as a novel context.
With the increasing importance of discriminating machine-text from human text, we show the existence of backdoor path that confounds the relationships between text and its detection score.
We advance dynamic pruning by employing refined gradients to update the pruned weights, enhancing both training stability and the model performance.
In video scene rendering, we reformulate neural radiance fields to additionally consider consistency fields, enabling more efficient and controllable scene manipulation.
We utilize the Gaussian Mixture Model (GMM) in the 3D object detection task to predict the distribution of 3D bounding boxes, eliminating the need for laborious, hand-crafted anchor design.
We delve into data augmentation in 3D object detection, leveraging sophisticated and rich structural information present in 3D labels.