Netflix has officially unveiled VOID (Video Object and Interaction Deletion), an AI-powered video editing model designed to reshape the landscape of film post-production.
Developed in collaboration with researchers from Sophia University, the model marks a departure from traditional video cropping or frame-interpolation techniques. VOID identifies and removes specific objects from a video, then leverages a vision-language model (VLM) to automatically fill in the resulting visual gaps. This "physics-driven" approach ensures that the background remains logically consistent; for instance, if a car involved in a collision is removed, the model automatically generates a seamless road surface and the surrounding environment.
Dynamic Reconstruction of Complex Scenes
In their preprint paper, the research team highlighted that VOID’s core strength lies in its ability to handle complex, dynamic scenes. During testing, VOID not only erased targets but also inferred how remaining objects would behave in the absence of the removed subject. For example, when a person is removed from a video of someone jumping into a pool, the model can generate footage of a calm water surface, free of any splashes.
To validate its performance, researchers compared VOID against existing editing tools such as Runway, Generative Omnimatte, and DiffuEraser. A subjective preference survey involving 25 participants found that 64.8% favored the results generated by VOID, compared to just 18.4% for Runway.
Netflix has now made the model available to the public on Hugging Face. The project’s development team, which includes Saman Motamed, William Harvey, and Benjamin Klein, noted that VOID has demonstrated exceptional modeling capabilities for complex dynamic scenes across both synthetic and real-world datasets.
While the film industry remains divided on the role of AI in creative work, the emergence of VOID offers a solution for modifying scenes in post-production without the need for reshoots. Whether addressing simple object occlusions or performing large-scale scene rewrites, this tool provides video creators with a powerful new technical option.