EG "make me a 120 minute Star Trek / Star Wars crossover". It should be more or less comparable to a big-budget studio film, although it doesn't have to pass a full Turing Test as long as it's pretty good. The AI doesn't have to be available to the public, as long as it's confirmed to exist.
People are also trading
A lot of people have posted examples but this is really spooky good
https://x.com/simone_m_romeo/status/1926017107851960453?t=k_kxuTMAErVn4S1FFZfJHQ&s=19
Arguably are we now already at the stage where this could be done? Like could someone with enough money to pay for all the best model subscriptions etc not just make a simple python program that:
Takes the prompt, and asks the best language model to generate a movie synopsis, then from that a list of scene summaries, then from that scene scripts, then split that by shot, then from that descriptions of the starting frames.
Asks the best image model with character consistency to generate the starting frames.
Asks the best video model with built in audio to generate the shots from the starting frames and the shot-split script.
Stitches together all the shots sequentially.
If I had the money and free time I would genuinely attempt this. Are there any Manifold users who do have the time, money, and willingness to try this? Is there a major reason it would fail?
@TheAllMemeingEye we don't yet have consistent audio over multiple clips (voices would change). Expanding from a synopsis isn't actually that good yet, I think you'd also need to (Python) script out that step to get the consistency and subtlety that good stories have.
@TheAllMemeingEye
> If I had the money and free time I would genuinely attempt this. Are there any Manifold users who do have the time, money, and willingness to try this? Is there a major reason it would fail?
If someone gave me $3000 in Google cloud credits I have a python script that does this (once Veo3 is available via API), but probably not worth the money. Veo does not yet have support for consistent characters or voices, to say nothing of all the minor details like "it takes a director's eye to turn a screenplay into a list of shots that make an actually watchable movie".
minor details like "it takes a director's eye to turn a screenplay into a list of shots that make an actually watchable movie".
I'm no expert, and for now the statement might still be somewhat true, but this kinda feels like what directors would be telling people and themselves as a cope in the face of automation, similar to visual artists in recent years, when the truth is that bulk of the layperson population don't actually notice or care about the connoisseurial finesse
@TheAllMemeingEye
I'm not talking about fine-art here. I'm talking about much more basic problems. Films have a ton of unwritten conventions that help the viewer know e.g. "What am I supposed to look at?" that current AI simply doesn't grok.
I'm not saying the problem is unsolvable or it will never be automated. I'm saying that current AI models are not even in the right ballpark of solving the problem. At a minimum, it's going to require fully multimodal video-in video-out models (the video equivalent of 4o imagegen).
absolutely bonkers what veo 3 can do - https://www.reddit.com/r/Bard/comments/1krla4b/veo_3_is_just_insanely_good/
There’s so much time till 2028, and you may assume these models will start improve their own capabilities soon and at that point I’m not sure what can stop a full length movie from happening. Seems like animated movies might seem more natural.
@GauravYadav Yes, language models are already contributing to AI researcher productivity, and will do so at a rapidly increasing rate.
Also, an important fact is that it's much easier to go from generating a movie of length 1 hour to length 2 hours than it is to go from one second to two seconds.
In other words, increasing movie gen length gets easier at an exponential rate.
Going from 1 second to one minute has taken us a couple years. Going from one minute to an hour will take much less time, even before we take into account the general, increasingly hyper-exponential nature of AI progress (i.e. take into account the fact that it will increasingly amplify the productivity of AI researchers)
@TheAllMemeingEye just think about it in the extreme. Going from 0.1 seconds to 1 second requires learning a massive amount about physics. Going from 1 year to 10 years requires learning only a little.
@jim at some point, length is just a parameter. Does a 120 minute film take any skill that a 90 minute film does not?
Veo 3 definitely updated me in a positive direction, it's pretty impressive. That said, I still think there's a lot of distance between Veo 3 and a full-length movie. The current 8-second clips are good, but still recognizable as AI. There's lots of minor issues with emotion, consistency, etc. and then we still have to scale it up by 1000x the length? Honestly, I think the most likely way for this to resolve yes is if text models get good enough that they can basically use a 30-second video model as a tool, and prompt it in a very detailed way a bunch of times. But 2.5 years just isn't that long of a time, it's about the time between ChatGPT and today.
@dominic Regarding the issues of consistency and emotion: the market description says, "although it doesn't have to pass a full Turing Test as long as it's pretty good." To me, this sounds like it means the video doesn't need to be perfect, as long as there are no major hallucinations, such as severe character inconsistency that leaves the viewer confused about the plot or what’s actually happening.
In my opinion, it shouldn't matter if there are slight morphing effects in the background or minor inconsistencies in composition that reveal the movie was AI generated. What really matters is whether the generator maintains enough consistency to keep the movie watchable and coherent, rather than turning it into a confusing, morphing mess
@LukaChrelashvili Hmm, I read that sentence as meaning that it’s okay if it has some way to tell that it’s been AI generated (like maybe AI loves a particular type of camera shot), but it still has to be high quality and seem like a real movie. I think the bar for quality needs to be higher than “doesn’t confuse the viewer about the plot” or “maintains visual coherence”.
@dominic part of this market is predicting how @ScottAlexander will resolve it. There's various meta markets about that, e.g. here is mine:
@ScottAlexander Is there any limit to the prompt for resolving yes (I would assume yes but it doesn’t seem airtight from the description)? Or anything that fits into the potentially massive context window for “a prompt?”
@AndreW50f5 from the description 'EG "make me a 120 minute Star Trek / Star Wars crossover"' I don't think feeding in a full script would resolve this yes
@AndreW50f5 I think you are revealing a tension between the implied resolution criteria and the bare wording. Probably would be good to get a clarification.