Listen to this story
|
Text-to-image AI art generators, be it DALL-E 2 or Midjourney, have become the talk of the internet. But generating art using AI is not restricted to just images. Pushing the boundaries of ‘text-to-image’ art, several easy-to-use tools developed with video and audio enhancing abilities are hitting the market.
Here’s a curated list of such tools that go beyond just creating images from textual prompts.
Lucid Sonic Dreams – StyleGAN
It is a Python package that syncs generative adversarial networks (GAN) generated visuals with music using only a few lines of code.
The Tutorial Notebook on Google Collab details all the parameters one can modify and provides sample code templates.
For more information, click here.
FILM Colab
Developed by Stephen Young, FILM transforms near-duplicate photos into slow-motion footage that looks like it is shot with a video camera.
It is a Tensorflow 2 implementation of a high-quality frame interpolation neural network. FILM follows a unified single-network approach that doesn’t use other pre-trained networks, like optical flow or depth, to achieve state-of-the-art results.
It’s a multi-scale feature extractor that shares the same convolution weights across the scales. The model is trainable from frame triplets alone.
For more information, click here.
AnimationKit.ai
It is an upscaling and interpolation processing tool that uses Real-ESRGAN video upscaling to raise the resolution to 4x, RIFE interpolation/motion to make the footage smooth, and FFMPEG hevc_nvenc (h265) compression.
For more information, click here.
3D Photography using Context-aware Layered Depth Inpainting
It is a tool for converting a single RGB-D input image into a 3D photo.
Layered Depth Image is used with direct pixel connectivity as underlying representation, and it presents a model that iteratively synthesises new local colour-and-depth content into the occluded region.
Using standard graphics engines, the resulting 3D photos can be efficiently rendered with motion parallax.
For more information, click here.
Wiggle Standalone 5.0
Wiggle Standalone generates semi-random animation keyframes for zoom or spin for use.
Wiggle is based on ‘episodes’ of motion. Each episode is made of three distinct phases: attack (ramp up), decay (ramp down), and sustain (hold level steady). This is similar in concept to an ADSR envelope in a musical synthesiser.
The parameters allow you to set the overall duration of each episode, the time split between phases, and the relative levels of the parameters in each phase.
Wiggle can also be integrated directly into Diffusion notebooks.
For more information, click here.
Audio reactive videos notebook
With this notebook, you can turn any video into audio-reactive.
The volume of the sound affects the speed of the video generated; hence one can slow down the original video if there are not enough frames left.
For more information, click here.
Zero-Shot Text Guided Object Generation with Dream Fields
It combines neural rendering with multi-modal image and text representations, synthesising diverse 3D objects just from language descriptions.
This notebook demonstrates a scaled-down version of Dream Fields, a method for synthesising 3D objects from natural language descriptions. Dream Fields train a 3D Neural Radiance Field (NeRF), so 2D renderings from any perspective are semantically consistent with a given description. The loss is based on the OpenAI CLIP text-image model.
For more information, click here.
‘BLIP’: Bootstrapping Language-Image Pre-training
BLIP achieves state-of-the-art on seven vision-language tasks, including image-text retrieval image captioning, visual question answering, visual reasoning, visual dialogue, and zero-shot text-video retrieval zero-shot video question answering.
For more information, click here.