关于人工智能:登峰造极师出造化Pytorch人工智能AI图像增强框架ControlNet绘画实践基于Python310

人工智能太疯狂，传统劳动力和内容创作平台被 AI 枪毙，弃尸尘埃。并非空穴来风，也不是危言耸听，人工智能 AI 图像增强框架 ControlNet 正在疯狂地改写绘画艺术的倒退过程，你问我绘画行业将来的样子？我只好指着 ControlNet 的方向。本次咱们在 M1/M2 芯片的 Mac 零碎下，体验人工智能登峰造极的绘画艺术。

ControlNet 在 HuggingFace 训练平台上也有体验版，请参见：https://huggingface.co/spaces…

但因为公共平台算力无限，同时输出参数也受到平台的限度，一次只能训练一张图片，不能让人开怀畅饮。

为了能和史上最平凡的图像增强框架 ControlNet 一亲芳泽，咱们抉择本地搭建 ControlNet 环境，首先运行 Git 命令拉取官网的线上代码：

git clone https://github.com/lllyasviel/ControlNet.git

拉取胜利后，进入我的项目目录：

cd ControlNet

因为 Github 对文件大小有限度，所以 ControlNet 的训练模型只能独自下载，模型都放在 HuggingFace 平台上：https://huggingface.co/lllyas…，须要留神的是，每个模型的体积都十分微小，达到了 5.71G，令人乍舌。

下载好模型后，须要将其放到 ControlNet 的 models 目录中：

├── models  
│ ├── cldm_v15.yaml  
│ ├── cldm_v21.yaml  
│ └── control_sd15_canny.pth

这里笔者下载了 control\_sd15\_canny.pth 模型，即放入 models 目录中，其余模型也是一样。

随后装置运行环境，官网举荐应用 conda 虚拟环境，装置好 conda 后，运行命令激活虚拟环境即可：

conda env create -f environment.yaml  
conda activate control

但笔者查看了官网的 environment.yaml 配置文件：

name: control  
channels:  
  - pytorch  
  - defaults  
dependencies:  
  - python=3.8.5  
  - pip=20.3  
  - cudatoolkit=11.3  
  - pytorch=1.12.1  
  - torchvision=0.13.1  
  - numpy=1.23.1  
  - pip:  
      - gradio==3.16.2  
      - albumentations==1.3.0  
      - opencv-contrib-python==4.3.0.36  
      - imageio==2.9.0  
      - imageio-ffmpeg==0.4.2  
      - pytorch-lightning==1.5.0  
      - omegaconf==2.1.1  
      - test-tube>=0.7.5  
      - streamlit==1.12.1  
      - einops==0.3.0  
      - transformers==4.19.2  
      - webdataset==0.2.5  
      - kornia==0.6  
      - open_clip_torch==2.0.2  
      - invisible-watermark>=0.1.5  
      - streamlit-drawable-canvas==0.8.0  
      - torchmetrics==0.6.0  
      - timm==0.6.12  
      - addict==2.4.0  
      - yapf==0.32.0  
      - prettytable==3.6.0  
      - safetensors==0.2.7  
      - basicsr==1.4.2

一望而知，Python 版本是老旧的 3.8，Torch 版本 1.12 并不反对 Mac 独有的 Mps 训练模式。

同时，Conda 环境也有一些毛病：

环境隔离可能会导致一些问题。尽管虚拟环境容许您治理软件包的版本和依赖关系，但有时也可能导致环境抵触和奇怪的谬误。

Conda 环境能够占用大量磁盘空间。每个环境都须要独立的软件包正本和依赖项。如果须要创立多个环境，这可能会导致磁盘空间有余的问题。

软件包可用性和兼容性也可能是一个问题。Conda 环境可能不蕴含某些软件包或库，或者可能不反对特定操作系统或硬件架构。

在某些状况下，Conda 环境的创立和治理可能会变得复杂和耗时。如果须要治理多个环境，并且须要在这些环境之间频繁切换，这可能会变得艰难。

所以咱们也能够用最新版的 Python3.10 来构建 ControlNet 训练环境，编写 requirements.txt 文件：

pytorch==1.13.0  
gradio==3.16.2  
albumentations==1.3.0  
opencv-contrib-python==4.3.0.36  
imageio==2.9.0  
imageio-ffmpeg==0.4.2  
pytorch-lightning==1.5.0  
omegaconf==2.1.1  
test-tube>=0.7.5  
streamlit==1.12.1  
einops==0.3.0  
transformers==4.19.2  
webdataset==0.2.5  
kornia==0.6  
open_clip_torch==2.0.2  
invisible-watermark>=0.1.5  
streamlit-drawable-canvas==0.8.0  
torchmetrics==0.6.0  
timm==0.6.12  
addict==2.4.0  
yapf==0.32.0  
prettytable==3.6.0  
safetensors==0.2.7  
basicsr==1.4.2

随后，运行命令：

pip3 install -r requirements.txt

至此，基于 Python3.10 来构建 ControlNet 训练环境就实现了，对于 Python3.10 的装置，请移玉步至：一网成擒全端涵盖，在不同架构 (Intel x86/Apple m1 silicon) 不同开发平台 (Win10/Win11/Mac/Ubuntu) 上装置配置 Python3.10 开发环境，这里不再赘述。

ControlNet 的代码中将训练模式写死为 Cuda，CUDA 是 NVIDIA 开发的一个并行计算平台和编程模型，因而不反对 NVIDIA GPU 的零碎将无奈运行 CUDA 训练模式。

除此之外，其余不反对 CUDA 训练模式的零碎可能包含：

没有装置 NVIDIA GPU 驱动程序的零碎

没有装置 CUDA 工具包的零碎

应用的 NVIDIA GPU 不反对 CUDA（较旧的 GPU 型号可能不反对 CUDA）

没有足够的 GPU 显存来运行 CUDA 训练模式（尤其是在训练大型深度神经网络时须要大量显存）

须要留神的是，即便零碎反对 CUDA，也须要确保所应用的机器学习框架反对 CUDA，否则无奈应用 CUDA 进行训练。

咱们能够批改代码将训练模式改为 Mac 反对的 Mps，请参见：闻其声而知雅意,M1 Mac 基于 PyTorch(mps/cpu/cuda)的人工智能 AI 本地语音辨认库 Whisper(Python3.10)，这里不再赘述。

如果代码运行过程中，报上面的谬误：

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

阐明以后零碎不反对 cuda 模型，须要批改几个中央，以我的项目中的 gradio\_canny2image.py 为例子，须要将 gradio\_canny2image.py 文件中的 cuda 替换为 cpu，同时批改 /ControlNet/ldm/modules/encoders/modules.py 文件，将 cuda 替换为 cpu，批改 /ControlNet/cldm/ddim\_hacked.py 文件，将 cuda 替换为 cpu。至此，训练模式就改成 cpu 了。

批改完代码后，间接在终端运行 gradio\_canny2image.py 文件：

python3 gradio_canny2image.py

程序返回：

➜  ControlNet git:(main) ✗ /opt/homebrew/bin/python3.10 "/Users/liuyue/wodfan/work/ControlNet/gradio_cann  
y2image.py"  
logging improved.  
No module 'xformers'. Proceeding without it.  
/opt/homebrew/lib/python3.10/site-packages/pytorch_lightning/utilities/distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.  
  rank_zero_deprecation(  
ControlLDM: Running in eps-prediction mode  
DiffusionWrapper has 859.52 M params.  
making attention of type 'vanilla' with 512 in_channels  
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.  
making attention of type 'vanilla' with 512 in_channels  
Loaded model config from [./models/cldm_v15.yaml]  
Loaded state_dict from [./models/control_sd15_canny.pth]  
Running on local URL:  http://0.0.0.0:7860  
  
To create a public link, set `share=True` in `launch()`.

此时，在本地零碎的 7860 端口上会运行 ControlNet 的 Web 客户端服务。

拜访 http://localhost:7860，就能够间接上传图片进行训练了。

这里以本站的 Logo 图片为例子：

通过输出疏导词和其余训练参数，就能够对现有图片进行扩散模型的加强解决，这里的疏导词的意思是：红宝石、黄金、油画。训练后果堪称是言有尽而意无穷了。

除了主疏导词，零碎默认会增加一些辅助疏导词，比方要求图像品质的 best quality, extremely detailed 等等，残缺代码：

from share import *  
import config  
  
import cv2  
import einops  
import gradio as gr  
import numpy as np  
import torch  
import random  
  
from pytorch_lightning import seed_everything  
from annotator.util import resize_image, HWC3  
from annotator.canny import CannyDetector  
from cldm.model import create_model, load_state_dict  
from cldm.ddim_hacked import DDIMSampler  
  
  
apply_canny = CannyDetector()  
  
model = create_model('./models/cldm_v15.yaml').cpu()  
model.load_state_dict(load_state_dict('./models/control_sd15_canny.pth', location='cpu'))  
model = model.cpu()  
ddim_sampler = DDIMSampler(model)  
  
  
def process(input_image, prompt, a_prompt, n_prompt, num_samples, image_resolution, ddim_steps, guess_mode, strength, scale, seed, eta, low_threshold, high_threshold):  
    with torch.no_grad():  
        img = resize_image(HWC3(input_image), image_resolution)  
        H, W, C = img.shape  
  
        detected_map = apply_canny(img, low_threshold, high_threshold)  
        detected_map = HWC3(detected_map)  
  
        control = torch.from_numpy(detected_map.copy()).float().cpu() / 255.0  
        control = torch.stack([control for _ in range(num_samples)], dim=0)  
        control = einops.rearrange(control, 'b h w c -> b c h w').clone()  
  
        if seed == -1:  
            seed = random.randint(0, 65535)  
        seed_everything(seed)  
  
        if config.save_memory:  
            model.low_vram_shift(is_diffusing=False)  
  
        cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([prompt + ',' + a_prompt] * num_samples)]}  
        un_cond = {"c_concat": None if guess_mode else [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}  
        shape = (4, H // 8, W // 8)  
  
        if config.save_memory:  
            model.low_vram_shift(is_diffusing=True)  
  
        model.control_scales = [strength * (0.825 ** float(12 - i)) for i in range(13)] if guess_mode else ([strength] * 13)  # Magic number. IDK why. Perhaps because 0.825**12<0.01 but 0.826**12>0.01  
        samples, intermediates = ddim_sampler.sample(ddim_steps, num_samples,  
                                                     shape, cond, verbose=False, eta=eta,  
                                                     unconditional_guidance_scale=scale,  
                                                     unconditional_conditioning=un_cond)  
  
        if config.save_memory:  
            model.low_vram_shift(is_diffusing=False)  
  
        x_samples = model.decode_first_stage(samples)  
        x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cpu().numpy().clip(0, 255).astype(np.uint8)  
  
        results = [x_samples[i] for i in range(num_samples)]  
    return [255 - detected_map] + results  
  
  
block = gr.Blocks().queue()  
with block:  
    with gr.Row():  
        gr.Markdown("## Control Stable Diffusion with Canny Edge Maps")  
    with gr.Row():  
        with gr.Column():  
            input_image = gr.Image(source='upload', type="numpy")  
            prompt = gr.Textbox(label="Prompt")  
            run_button = gr.Button(label="Run")  
            with gr.Accordion("Advanced options", open=False):  
                num_samples = gr.Slider(label="Images", minimum=1, maximum=12, value=1, step=1)  
                image_resolution = gr.Slider(label="Image Resolution", minimum=256, maximum=768, value=512, step=64)  
                strength = gr.Slider(label="Control Strength", minimum=0.0, maximum=2.0, value=1.0, step=0.01)  
                guess_mode = gr.Checkbox(label='Guess Mode', value=False)  
                low_threshold = gr.Slider(label="Canny low threshold", minimum=1, maximum=255, value=100, step=1)  
                high_threshold = gr.Slider(label="Canny high threshold", minimum=1, maximum=255, value=200, step=1)  
                ddim_steps = gr.Slider(label="Steps", minimum=1, maximum=100, value=20, step=1)  
                scale = gr.Slider(label="Guidance Scale", minimum=0.1, maximum=30.0, value=9.0, step=0.1)  
                seed = gr.Slider(label="Seed", minimum=-1, maximum=2147483647, step=1, randomize=True)  
                eta = gr.Number(label="eta (DDIM)", value=0.0)  
                a_prompt = gr.Textbox(label="Added Prompt", value='best quality, extremely detailed')  
                n_prompt = gr.Textbox(label="Negative Prompt",  
                                      value='longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality')  
        with gr.Column():  
            result_gallery = gr.Gallery(label='Output', show_label=False, elem_id="gallery").style(grid=2, height='auto')  
    ips = [input_image, prompt, a_prompt, n_prompt, num_samples, image_resolution, ddim_steps, guess_mode, strength, scale, seed, eta, low_threshold, high_threshold]  
    run_button.click(fn=process, inputs=ips, outputs=[result_gallery])  
  
  
block.launch(server_name='0.0.0.0')

其余的模型，比方 gradio\_hed2image.py，它能够保留输出图像中的许多细节，适宜图像的从新着色和款式化的场景：

还记得 AnimeGANv2 模型吗：神工鬼斧惟肖惟妙，M1 mac 零碎深度学习框架 Pytorch 的二次元动漫动画格调迁徙滤镜 AnimeGANv2+Ffmpeg(图片 + 视频)疾速实际，之前还只能通过对立模型滤镜进行转化，当初只有批改疏导词，咱们就能够肆意地变动出不同的滤镜，人工智能技术的倒退，就像发情的海，波澜壮阔。

“人类嘛时候会被人工智能代替呀？”

“就是当初！就在明天！”

就算是达芬奇还魂，齐白石再生，他们也会被现今的人工智能 AI 技术所震撼，纵横恣肆的笔墨，抑扬变动的状态，左右跌宕的心气，焕然飞动的神采！历史长河中这一刻，大千世界里这一处，让咱们变得疯狂！

最初奉上批改后的基于 Python3.10 的 Cpu 训练版本的 ControlNet，与众亲同飨：https://github.com/zcxey2911/…\_py3.10\_cpu\_NoConda

关于人工智能:登峰造极师出造化Pytorch人工智能AI图像增强框架ControlNet绘画实践基于Python310

本地装置和配置 ControlNet

批改训练模式(Cuda/Cpu/Mps)

开始训练

结语