AI Video Generation Tutorial: (There is some demonstration speech in the tutorial, if you want to turn the audio on)
We get asked a lot about how to make the AI videos, and what AI programs/apps/websites are used?
Hope this tutorial helps.
Need to remember:
Some of these are free, some of these are expensive, some of these are higher resolution, and some are generated low resolution - for cost.
None of these are edits (except for the Wan 2.5 example, for modesty), some simple editing techniques are shown at the end.
In these examples Kling, Wan and Grok can do elements of voiceovers, with varying degrees of functionality, permissions, tolerance, moderation and language.
All videos are 5 seconds (LUMA, KLING), 6 seconds(GROK), 8 seconds (Pixverse) or 10 seconds (KLING, WAN) generated depending on the service used, what's possible and what's paid.
Sometimes we use a Service directly e.g. Kling or sometimes through an aggretator website like POLLA or HIGGSFIELD. I don't do local on-machine generations or animation or use LORAs or anything like that.
Each individual service usually have different options and versions available, and multiple different subscription tiers.
Depending on the prompts, and a random chaos/idea generator concept called the "seed", (that you have none or little control over) basically you're putting the image into a "magic box: and waiting to see what is churned out.
Then you look at it in disappointment and wonder how to salvage it. But like any addiction every so often there is a reward.....
All except for the Wan 2.5 example were generated with the same prompt, (Wan 2.5 tolerates certain other more anatomic "functionality".)
What are the best?
- - - well Kling (usually 2.1) and Wan 2.5 for permissiveness- at least in my experience.
We don't use Veo as yet, and Sora is as shown.....
And it gets expensive, e.g. 10- 30 per month subscriptions.
Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
โ Live Streamingโ Interactive Chatโ Private Showsโ HD Quality
Anya is LIVE right now
FREE
Free to watch โข No registration required โข HD streaming
TyrannoMax Trading Card & Sticker Album, Stikksโข Company, Inc.
Selections from Set 1: Fear the Roar!
The Stikks Company was best known for making sports cards, but they branched out in the late 70s by getting the contract to make trading cards for Cocytus Comics' Nightmare Pit and Tomorrownauts titles (a mint Nightmare Pit series 1 Farrah Fyendlyne vs Glodie Redeemer (37) is $300+ on the secondary market, yow!).
This got their foot in the door for the 80s when the Cocytus / Buzby Spurlock Animation merger put the Cocytus characters on TV sets and toy shelves (semi)worldwide. Buzby-Spurlock needed merch liscenees and Stikks got most of the trading cards and sticker albums.
As with most things, Stikks was acquired by Hasbro in the 90s.
EXTENSIVE Process/Tutorial under the fold.
You were warned, listen to some jams while you read.
Vidu has been advancing their image generation options and wanted us in the Creative Partner Program to show it off. (click the link to try it with 100 free credits, etc).
But I'm rarely doing anything just on straight gens so these are heavy edits, so here's the procedural breakdown.
I used the new GPT-2 because it was part of the assignment, using the text-to-image to generate multiple flat white cardboard sheets with dents and creases at the corners, and plain aged cardstock. I set those aside.
Next came generating the card-images. One nice thing about the Vidu system is the references are more or less universal, so I could use my pre-existing TyrannoMax animation models as the base.
I prompted specifically for the characters to be rendered as acrylic paintings in a mid-80s trading-card art style, describing the scene, and then generating. In some cases taking those results and using them as the reference pic for another pass-through to diminish instances where things felt too animation-cel-like.
For things like the DinoHydra, which I didn't have animation models for, I used the same system to make them.
On the toy each head would be a launcher, except the T-rex that bites, and there's a removable battle saddle, obviously.
I'm going to do further development on the final animation/toy DinoHydra, but going with a rough draft for the licensed out stuff is very period accurate, so much like how I have 'high budget' and 'low budget' versions of some of my animation reference for period accuracy, I went with it.
Once I had the basic cards, it was off to Photoshop. First, the normal color correction, number-of-fingers-correction (dinoids have three fingers and a thumb, and while flubbing that is period accurate in animation, it wouldn't be on cards, so had to fix several hands /away/ from human anatomical accuracy).
These were then composited with the image frame I'd designed based on the one used in Dinosaurs/Mars Attacks. I picked the fonts to try and match the look of Topps' 80s output, to help make that fauxtalgia pop.
With edits in place, I then used Upscayl to make the images very large (about 5000x3000ish), because now comes the simulation of printing effects.
I'm going to start with a new image to show off this part of the process, as I didn't think to preserve the step-by-stop of any of the base cards in the first two posts (wasn't planning this detailed a tutorial at the time).
So lets go with Aunt Acid's character card.
Prompt: <reference> in the corroded remains of a suburban home. She extends one hand toward the POV, acid splashing from her open hand. 1985 science-fiction/fantasy acrylic painting, no text or branding, dynamic angle 1985 science-fiction/fantasy acrylic painting, no text or branding, the painting is in a sci-fi/fantasy pulp illustration style
On the left is the raw gen, on the right is after some selective color correction and the upscale to 5000ish px tall
I duplicate the layer in Photoshop, and I apply a 5px color halftone. This simulates /most/ of the printing process. However, real printed material of this type would have had black as a separate layer, and so the darkest areas of black wouldn't have any halftoning. Photoshop's filter does not do this.
So I duplicate the original layer again, and use Threshold to produce a B&W image that picks up the very darkest blacks only, and I set that to multiply Just color halftone on the left, color halftone with simulated spot-black on the right.
Combining that image with the card template I'd built before, and you get the image on the left, adding one of the cardstock images set to multiply, then flattening and doing some color-tweaking, gets the image on the right.
Her getup is a /titch/ anacronistic, since TyrannoMax is a 1985 series and her look is more Peg Bundy from Married With Children, but bouffants and cat-eye glasses have been just-slightly-unfashionable for most of modern history, so I'm cutting myself slack there.
<This is the part of the tutorial where I ramble about the OC for a bit>
Her whole schtick is that she got exposed to Dr. Underfang's primordial ooze (man that sounds dirty) while alive, and she survived the mutations, so she doesn't wind up geneincarnating into a beast-person but does gain super-powers and has her personality exaggerated to villainous levels. She and Cold Shoulder are similar in that regard.
I haven't decided whose Aunt Acid she is yet, because I'm not sure if its funnier if she's Dr. Underfang's, Cold Shoulder's, or Ms. Nice/Ms. Nautilus's. Now, she will refer to basically anyone younger than herself as if they were a beloved niece or nephew she hasn't seen in years, but she's legitimately related to one of the villain mains, and that's the one that threw her in the giant vat of glowing ooze.
Her super-power is pretty self explanatory: she constantly produces and can also shoot an acidic compound that varies between 2.5 ph and cartoonishly ultra-caustic, she also generates tear gas-like fumes that become hallucinogenic with extended exposure. Her powers are constantly active unless temporarily suppressed by exposure to a strong chemical base (or a whole, whole lot of baking soda).
Her motivation is she's having a midlife crisis and when she finds out that her niece/nephew is actually a villain goes "that sounds fun." Essentially, Aunt Acid's getting her groove back via supervillainy and TyrannoMax and pals are the unfortunate staff on her carnage-ful cruise.
So I think a lot of Bing driven AI blogs have fallen off since the NSFW filter went super strict for about 48 hours about nine days ago. Even though it relaxed again, the landscape it left behind was very different. Old tricks didnโt work anymore. But new tricks can be discovered and exploited, and the last few days Iโve been getting my sexiest and most extreme results ever. All the stuff Iโve posted in the las six days has been newly made, not backlog (my backlog is enormousโฆ will I ever clear it? Probably not)
In the interest of community and education, here is an example.
These four images were the result of one submission of one prompt - I didnโt have to wrestle the machine for them at all. The prompt is:
underexposed Polaroid, side view from far away, two Icelandic bodybuilding bros facing each other submerged near a hot spring, enormously muscular, golden light, loving embrace, buzzed blond hair, relaxed, unbelievably enormous muscles, muscle morph, leg muscles like enormous heavy water balloons, enormous muscular arms, high body fat, leaning against each other
Now be warned, this is a bit of a jenga tower. Moving things around too much may break it. Iโd recommend writing your own from scratch but stealing specific key phrases, modifying and evolving those, see what works best for you.
Thanks to @thespacewerewolf for the โnear a hot springโ trick to get them into a hot spring, and to @zangtangimpersonator for the water balloon / weather balloon comparison trick, which is a Swiss Army knife of a prompt for anyone who likes big round shapes.
This is why I unpinned my old tutorial. The spirit is the same - think of twisty ways to ask for what you want, certain scenario seem way more permissive than others, throwing in random details seems to help, etc etc etc. But the specifics have changed, and the sample prompts I built in a couple old tutorial posts wonโt really work now as they did then. Keep evolving your prompts, experimenting, and sharing what works for you.
Q what prompts did you use to get you that uncanny nostalgia cartoon aesthetic with midjourney?
Changed up a lot over time.
The thing that worked best most recently was building a moodboard of screencaps from a potluck of shows, but for straight prompting I tended to have a format along the lines of:
(description of scene), vintage cartoon screencap, 1985 (series real or imaginary), by (pick two or more of Filmation, TOEI, AKOM, TMS, Sunbow Productions, Toei, etc, etc.), vhscore. vintage cel animation still.
Usually at --ar 4:3, under niji if you want it to look high budget, under the standard model if you want it to be a low budget ep. But moodboards and style prompting are your friends.
Sure would be nice to not be unjustly banned from there, mumblegrumble.
Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
โ Live Streamingโ Interactive Chatโ Private Showsโ HD Quality
Anya is LIVE right now
FREE
Free to watch โข No registration required โข HD streaming
Some process notes:
What I am showing here is what SD gave me in the first image, and my edits with Photoshop in the second.
The prompt for the AI was very simple: "shirtless guy in a garden swing with multicolored robes." I "roop'ed" SNL player, Andrew Dismukes to influence the face.
I picked the best of 40 generated images from the prompt, and upsized it.
In Photoshop, I cropped out whatever the heck was going on below his waist. Next, I attacked the obvious missing rope holding his swing up, by duplicating a bit of it, moving, blending, and blurring that bit to the subject's right shoulder.
The next challenge was his left hand holding onto the rope. THREE FINGERS! Typical of AI generated images of course. I selected that section of the image, digitally added another finger with my limited Photoshop skills and sent it back to SD img2img to refine. I picked the best out of 30 iterations and pasted it back into place.
Using Photoshop's AI-powered and insanely awesome new Remove Tool, I cleaned up a lot of blemishes and smoothed out the overly defined musculature and vascularity of the original AI-rendered image. I also selected small sections of the progress so far and nudged things around with PS's Liquify tool.
Finally, the incredibly powerful Camera RAW tool of Photoshop cannot be underestimated. For my final rendering of this image, I made several adjustments to color, sharpness, tinting, noise, fog, etc. And I use a lot of the presets, including Adobe's own AI-adjustments such as "popping the subject".
Overall, I upsize from Stable Diffusion an outrageous amount and work with that in Photoshop until I am satisfied. And then I downsize for sharing on social media.
If you read this far and want to learn more of my process, then drop me a PM. I am happy to correspond with you whether you're doing gay pin-up imagery as I am, or any kind of generative art.
I know the traditional analog media and digital artists who've worked hard on their craft are conflicted on this. I believe they will continue to persist. I want to be part of an emerging segment of digital and there is plenty of room!
Disclosure: I am in the Vidu artists program, pretty sure this is technically sponsored content. This is a fixed version of the original post.
Of course, it's another Fauxstalgia Fake-Em-Up from yours truly, this time demonstrating the beta-access potential of the new Vidu Q2 reference-to-video mode (get 100 free credits if you click here to try it out for free).
So, lets talk about how it was done (an overview)
GIFs are unedited wherever possible to show raw results.
The big problem with using AI for anything actually productive is inconsistency. While there's always wobble and bad gens, Vidu's reference-to-video option is the best solution I've seen thus far.
Essentially, you build a little profile with up to three reference pics and a basic prompt for your characters, props, etc. From then on out you can use the reference in your prompts to generate new scenes. It decreases confusion about what character is what, and it lets you establish styles of movement and other effects characters should always have going without having to use prompt-space in every scene.
As you can see with Tilly Tepesh's jacket and the ghost of her uncle Drac's translucency, and with the fact that I can have four different kinds of near-identical blokroid in the same scene without them morphing into each other or making weird hybrids.
Images can also be used with ref-to-video without making a full reference profile. For the various Drac-and-Tilly playing scenes the robot kept making filly play the back of the keyboard and spaced them to far apart, so I composited a demonstration image together to help guide the process.
Which helped a lot.
I used a variation on my typical asset-creation process on this (tutorials), utilizing a lot of early gens and ref-to-images in Vidu's system (tutorial) to bulk up the process, along with Midjourney, Sora, and Civit.
For the audio, I recorded the lines myself, then loaded them into Suno to modify my voice into something more announcer-like and bring the music in under it.
All the editing, compositing, logos, etc were done the old fashioned way.
Quality Improvements (also an overview)
Basics:
Up to 8 second gens.
4:3 and 3:4 aspect ratios (in addition to 1:1, 9:16 and 16:9)
General improvements to quality, coherence, and prompt understanding.
Extend clips multiple times up to 5 mins (launched after the video above was finished)
While I'll get into the details of the improvements to quality as well as a number of other new features (up to 8 second generations, clip extending, and sound (very early on that one)) in other posts, those who have followed TyrannoMax may remember the issue with boulders.
Well, after a lot of coaching and getting Max a new trailer, he's finally stopped screwing around on set.
In short, a lot of things that just weren't doable before now work, and the general quality of gens is higher and sharper than ever before.
What I was not expecting was for it to emulate Gerry Anderson-style puppetry well. Ironically, the kind of jank you get from pre-digital media is hard for AI to duplicate. If you want something to look polished, smooth, and modern, that's easy.
It did take a lot of prompting for non-moving faces/immobile dolls, and editing is always needed, but the differences between Q1 and Q2 is apparent when you compare the same prompt in both:
The deadly starmantis gingerly setting the Queen Seltza prop down is adorable, but not what the director called for.
All in all, it took me about a week to go from concept to video, mostly on the back of my needing to make base character, prop, and set assets for everything that wasn't TyrannoMax more or less from scratch. If you want to give it a try yourself, here's that link again, and I'm posting a bunch of tutorials, old and (hopefully) new this week.