Back to Projects

Text to Video Generation

Welcome to my ongoing journey with the Open-Sora AI video generator model! As a tool that connects the power of artificial intelligence to transform text into video, Open-Sora is nothing short of revolutionary. It's particularly designed for those of us who aren't pros in video editing but need to churn out high-quality educational or promotional videos. The AI takes over the heavy lifting, managing everything from scene transitions to text overlays, all while making the video production process a breeze for a broad audience.

Why Dive Deeper?

After dabbling with several models available on platforms like Hugging Face and GitHub, I found the initial excitement of generating videos started to fade. This led me to a pivotal decision: to dive deeper into the inner workings of video generation models. My goal? To not only understand but also to potentially enhance the architecture of these AI models. This exploration isn't just for kicks; it's aimed at making substantive contributions to the field and to my own work.

My Roadmap

Here’s how I plan to tackle this challenge:

In-Depth Exploration: I’m dedicating time to studying various models, starting with Open-Sora, to grasp their intricacies.
Strengthening the Model: By diversifying the training data, I hope to enhance the model’s robustness and output quality.
Boosting Capabilities: More GPUs are on my shopping list to amp up the resolution and optimization of the videos.
Adding Sound: The final touch would be integrating audio to bring these videos to life.

Challenges Along the Way

My journey has been exciting but not without hurdles. The most significant has been the GPU limitation. High-quality video generation demands powerful GPUs, and despite leveraging resources like Google Cloud, Colab Enterprise, and Vertex AI, it remains a bottleneck. Nonetheless, I've managed to run the model at lower resolutions and have been tinkering with the outputs, experimenting, and learning with each step. Please find my repository to play with the code, remember you will need to have access to GPU or else the model is not able to run.

Generated Videos

Bridge:

Fashion show:

Whale:

Looking Ahead

The videos we've been able to generate so far are just the beginning. They've been instrumental in comparing different approaches, like those from Midjourney, and thinking critically about how we can improve the Open-Sora model. This project is more than just a technical challenge; it’s a creative adventure that pushes the boundaries of AI video generation.

Stay tuned as I continue to navigate this exciting field, break down barriers, and hopefully contribute to an ever-evolving landscape of AI technology.