共计 7760 个字符,预计需要花费 20 分钟才能阅读完成。
本文是对于如何应用 cuda 和 Stable-Diffusion 生成视频的残缺指南,将应用 cuda 来减速视频生成,并且能够应用 Kaggle 的 TESLA GPU 来收费执行咱们的模型。
#install the diffuser package
#pip install --upgrade pip
!pipinstall--upgradediffuserstransformersscipy
#load the model from stable-diffusion model card
importtorch
fromdiffusersimportStableDiffusionPipeline
fromhuggingface_hubimportnotebook_login
模型加载
模型的权重是是在 CreateML OpenRail- M 许可下公布的。这是一个凋谢的许可证,不要求对生成的输入有任何权力,并禁止咱们成心生产非法或无害的内容。如果你对这个许可有疑难,能够看这里
https://huggingface.co/CompVi…
咱们首先要成为 huggingface Hub 的注册用户,并应用拜访令牌能力使代码工作。咱们应用是 notebook,所以须要应用 notebook_login() 来进行登录的工作
执行完代码上面的单元格将显示一个登录界面,须要粘贴拜访令牌。
ifnot (Path.home()/'.huggingface'/'token').exists(): notebook_login()
而后就是加载模型
model_id="CompVis/stable-diffusion-v1-4"
device="cuda"
pipe=StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe=pipe.to(device)
显示依据文本生成图像
%%time
#Provide the Keywords
prompts= [
"a couple holding hands with plants growing out of their heads, growth of a couple, rainy day, atmospheric, bokeh matte masterpiece by artgerm by wlop by alphonse muhca",
"detailed portrait beautiful Neon Operator Girl, cyberpunk futuristic neon, reflective puffy coat, decorated with traditional Japanese ornaments by Ismail inceoglu dragan bibin hans thoma greg rutkowski Alexandros Pyromallis Nekro Rene Maritte Illustrated, Perfect face, fine details, realistic shaded, fine-face, pretty face",
"symmetry!! portrait of minotaur, sci - fi, glowing lights!! intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha, 8 k",
"Human, Simon Stalenhag in forest clearing style, trends on artstation, artstation HD, artstation, unreal engine, 4k, 8k",
"portrait of a young ruggedly handsome but joyful pirate, male, masculine, upper body, red hair, long hair, d & d, fantasy, roguish smirk, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha",
"Symmetry!! portrait of a sith lord, warrior in sci-fi armour, tech wear, muscular!! sci-fi, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha",
"highly detailed portrait of a cat knight wearing heavy armor, stephen bliss, unreal engine, greg rutkowski, loish, rhads, beeple, makoto shinkai and lois van baarle, ilya kuvshinov, rossdraws, tom bagshaw, tom whalen, alphonse mucha, global illumination, god rays, detailed and intricate environment",
"black and white portrait photo, the most beautiful girl in the world, earth, year 2447, cdx"
]
显示
%%time
#show the results
images=pipe(prompts).images
images
#show a single result
images[0]
第一个文本:a couple holding hands with plants growing out of their heads, growth of a couple, rainy day, atmospheric, bokeh matte masterpiece 的图像如下
将生成的图像显示在一起
#show the results in grid
fromPILimportImage
defimage_grid(imgs, rows, cols):
w,h=imgs[0].size
grid=Image.new('RGB', size=(cols*w, rows*h))
fori, imginenumerate(imgs): grid.paste(img, box=(i%cols*w, i//cols*h))
returngrid
grid=image_grid(images, rows=2, cols=4)
grid
#Save the results
grid.save("result_images.png")
如果你的 GPU 内存无限(可用的 GPU RAM 小于 4GB),请确保以 float16 精度加载 StableDiffusionPipeline,而不是如上所述的默认 float32 精度。这能够通过通知扩散器冀望权重为 float16 精度来实现:
%%time
importtorch
pipe=StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe=pipe.to(device)
pipe.enable_attention_slicing()
images2=pipe(prompts)
images2[0]
grid2=image_grid(images, rows=2, cols=4)
grid2
如果要更换噪声调度器,也须要将它传递给 from_pretrained:
%%time
fromdiffusersimportStableDiffusionPipeline, EulerDiscreteScheduler
model_id="CompVis/stable-diffusion-v1-4"
# Use the Euler scheduler here instead
scheduler=EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe=StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)
pipe=pipe.to("cuda")
images3=pipe(prompts)
images3[0][0]
#save the final output
grid3.save("results_stable_diffusionv1.4.png")
看看这图就是更换不同调度器的后果
#results are saved in tuple
images3[0][0]
grid3=image_grid(images3[0], rows=2, cols=4)
grid3
#save the final output
grid3.save("results_stable_diffusionv1.4.png")
查看全副图片
创立视频。
根本的操作曾经实现了,当初咱们来应用 Kaggle 生成视频
首先进入 notebook 设置: 在加速器抉择 GPU,
而后装置所需的软件包
pipinstall-Ustable_diffusion_videos
fromhuggingface_hubimportnotebook_login
notebook_login()
#Making Videos
fromstable_diffusion_videosimportStableDiffusionWalkPipeline
importtorch
#"CompVis/stable-diffusion-v1-4" for 1.4
pipeline=StableDiffusionWalkPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
revision="fp16",
).to("cuda")
#Generate the video Prompts 1
video_path=pipeline.walk(
prompts=['environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000',
'environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000',
'environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000',
'environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000',
'environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000'],
seeds=[42,333,444,555],
num_interpolation_steps=50,
#height=1280, # use multiples of 64 if > 512. Multiples of 8 if < 512.
#width=720, # use multiples of 64 if > 512. Multiples of 8 if < 512.
output_dir='dreams', # Where images/videos will be saved
name='imagine', # Subdirectory of output_dir where images/videos will be saved
guidance_scale=8.5, # Higher adheres to prompt more, lower lets model take the wheel
num_inference_steps=50, # Number of diffusion steps per image generated. 50 is good default
)
将图像扩充到 4k,这样能够生成视频
fromstable_diffusion_videosimportRealESRGANModel
model=RealESRGANModel.from_pretrained('nateraw/real-esrgan')
model.upsample_imagefolder('/kaggle/working/dreams/imagine/imagine_000000/', '/kaggle/working/dreams/imagine4K_00')
为视频增加音乐
为视频减少音乐能够通过提供音频文件的将音频增加到视频中。
%%capture
!pipinstallyoutube-dl
!youtube-dl-fbestaudio--extract-audio--audio-formatmp3--audio-quality0-o"music/thoughts.%(ext)s"https://soundcloud.com/nateraw/thoughts
fromIPython.displayimportAudio
Audio(filename='music/thoughts.mp3')
这里咱们应用 youtube-dl 下载音频(须要留神该音频的版权),而后将音频退出到视频中
# Seconds in the song.
audio_offsets= [7, 9]
fps=8
# Convert seconds to frames
num_interpolation_steps= [(b-a) *fpsfora, binzip(audio_offsets, audio_offsets[1:])]
video_path=pipeline.walk(prompts=['blueberry spaghetti', 'strawberry spaghetti'],
seeds=[42, 1337],
num_interpolation_steps=num_interpolation_steps,
height=512, # use multiples of 64
width=512, # use multiples of 64
audio_filepath='music/thoughts.mp3', # Use your own file
audio_start_sec=audio_offsets[0], # Start second of the provided audio
fps=fps, # important to set yourself based on the num_interpolation_steps you defined
batch_size=4, # increase until you go out of memory.
output_dir='dreams', # Where images will be saved
name=None, # Subdir of output dir. will be timestamp by default
)
本文代码你能够在这里找到:
https://avoid.overfit.cn/post/781a2bd8a4534f7cb2d223c141d37df8
作者:Bob Rupak Roy