散装码农,优秀的开源项目

您现在的位置是：网站首页> AI人工智能

优秀的开源项目

AI人工智能
2025-01-03
2617人已阅读

摘要

优秀的开源项目

基于gradio框架开发知识收集

Dify 一个开源 LLM 应用开发平台【与coze对标】

喂饭教程！15分钟用Dify搭建基于智能体的聊天式数据查询应用

AI工具整合一

AI工具整合二

InstantID一张照片生成各种风格写真图

开源三国之 Gemma：Google 最强开源模型 Gemma 能打吗

Fay开源数字人框架

supervision为你书写可重用的计算机视觉工具

suno-ai/bark文本转语音支持多种语言

coqui/xtts最强的克隆声音tts

TencentARC/PhotoMaker根据一组照片生成该人物其他照片

FishAudio接近人声的tts

Omost简单提示词扩展成详细且精准的Prompt

ChatTTS-ui将文字合成为语音，同时支持对外提供API接口

LivePortrait快手最新脸部视频AI模型

Unique3D从单个图像生成高质量、高效的 3D 网格

EchoMimic数字人

DiffMorpher：一键生成两张图片平滑变形视频

Pinokio：一键安装所有开源 AI 应用，编程小白的福音

FLUX.1 模型的官方推理库

Deep-Live-Cam实时换脸只需要一张图片

开源可离线运行的AI生产力工具，保护数据隐私，功能强大

IOPaint | 一个超级好用的万能AI工具，抠图，擦除，扩图，形状替换，环境感知等

FireRedTTS整合包,1021,官方模型更新,支持长文本

VideoChat实时语音交互数字人

Llama-3.3-70B震撼登场！70b参数128k上下文性能接近gpt4

QVQ-72B开源的图像转文字的开源模型

Dify 一个开源 LLM 应用开发平台

点击查看源码

Dify 是一个开源 LLM 应用开发平台。Dify 的直观界面结合了 AI 工作流、RAG 管道、代理功能、模型管理、可观察性功能等，让您可以快速从原型转向生产

喂饭教程！15分钟用Dify搭建基于智能体的聊天式数据查询应用

点击在线体验

InstantID一张照片生成各种风格写真图

InstantID的效果非常惊艳，它能够在各种风格中生成高保真的个性化图像，例如卡通、油画、素描、动漫、游戏等。用户只需要输入一张面部图像和一段文本描述，就能得到满意的结果。

InstantID 的创新之处主要有三个方面：

人脸特征提取：InstantID 利用预训练的人脸编码器，比如 InsightFace 的 antelopev 模型，来提取强语义的人脸特征，以增强图像生成的语义准确性。这样，扩散模型就能更好地识别和保留人脸的细节，比如眼睛、鼻子、嘴巴等。

Cross-Attention 机制：InstantID 通过解耦的交叉注意力机制，将人脸特征作为 Image Prompt 嵌入，增强文本提示的效果，同时保持对生成图像的精细控制。这样，扩散模型就能更好地根据文本的要求，来调整图像的风格，比如颜色、光照、背景等。

IdentityNet：InstantID 引入 IdentityNet 对人脸图像进行编码，通过强语义和弱空间的条件控制，进一步提升 ID 的保真度。IdentityNet 是一个可插拔的模块，它可以和任何预训练的文本到图像扩散模型兼容，而无需重新训练。

项目地址：

https://instantid.github.io/

代码地址：

https://github.com/InstantID/InstantID

体验地址：

https://huggingface.co/spaces/InstantX/InstantID

特别注意要写实的照片，将风格设置为无风格

开源三国之 Gemma：Google 最强开源模型 Gemma 能打吗

Fay开源数字人框架

点击进入软件主页

点击进入接入教程

supervision为你书写可重用的计算机视觉工具

点击进入项目主页

样例代码

import cv2

import supervision as sv

from ultralytics import YOLO

image = cv2.imread(...)

model = YOLO('yolov8s.pt')

result = model(image)[0]

detections = sv.Detections.from_ultralytics(result)

len(detections)

import supervision as sv

dataset = sv.DetectionDataset.from_yolo(

images_directory_path=...,

annotations_directory_path=...,

data_yaml_path=...

)

dataset.classes

['dog', 'person']

len(dataset)

suno-ai/bark文本转语音支持多种语言

点击进入github地址

点击进入huggingface体验地址

基本使用

from bark import SAMPLE_RATE, generate_audio, preload_models

from scipy.io.wavfile import write as write_wav

from IPython.display import Audio

# download and load all models

preload_models()

# generate audio from text

text_prompt = """

Hello, my name is Suno. And, uh — and I like pizza. [laughs]

But I also have other interests such as playing tic tac toe.

"""

audio_array = generate_audio(text_prompt, history_prompt="v2/en_speaker_1"）

# save audio to disk

write_wav("bark_generation.wav", SAMPLE_RATE, audio_array)

# play text in notebook

Audio(audio_array, rate=SAMPLE_RATE)

命令行方式

python -m bark --text "Hello, my name is Suno." --output_filename "example.wav"

安装:

pip install git+https://github.com/suno-ai/bark.git

git clone https://github.com/suno-ai/bark

cd bark && pip install .

通过Transformers使用

pip install git+https://github.com/huggingface/transformers.git

from transformers import AutoProcessor, BarkModel

from IPython.display import Audio

import scipy

processor = AutoProcessor.from_pretrained("suno/bark")

model = BarkModel.from_pretrained("suno/bark")

voice_preset = "v2/en_speaker_6"

inputs = processor("Hello, my dog is cute", voice_preset=voice_preset)

audio_array = model.generate(**inputs)

audio_array = audio_array.cpu().numpy().squeeze()

#播放

sample_rate = model.generation_config.sample_rate

Audio(audio_array, rate=sample_rate)

#生成文件

sample_rate = model.generation_config.sample_rate

scipy.io.wavfile.write("bark_out.wav", rate=sample_rate, data=audio_array)

coqui/xtts最强的克隆声音tts

点击进入体验地址

点击进入github地址

if you are only interested in synthesizing speech with the released 🐸TTS models, installing from PyPI is the easiest option.

pip install TTS

If you plan to code or train models, clone 🐸TTS and install it locally.

git clone https://github.com/coqui-ai/TTS

pip install -e .[all,dev,notebooks] # Select the relevant extras

Running a multi-speaker and multi-lingual model

import torch

from TTS.api import TTS

# Get device

device = "cuda" if torch.cuda.is_available() else "cpu"

# List available 🐸TTS models

print(TTS().list_models())

# Init TTS

tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)

# Run TTS

# ❗ Since this model is multi-lingual voice cloning model, we must set the target speaker_wav and language

# Text to speech list of amplitude values as output

wav = tts.tts(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en")

# Text to speech to a file

tts.tts_to_file(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en", file_path="output.wav")

Running a single speaker model

# Init TTS with the target model name

tts = TTS(model_name="tts_models/de/thorsten/tacotron2-DDC", progress_bar=False).to(device)

# Run TTS

tts.tts_to_file(text="Ich bin eine Testnachricht.", file_path=OUTPUT_PATH)

# Example voice cloning with YourTTS in English, French and Portuguese

tts = TTS(model_name="tts_models/multilingual/multi-dataset/your_tts", progress_bar=False).to(device)

tts.tts_to_file("This is voice cloning.", speaker_wav="my/cloning/audio.wav", language="en", file_path="output.wav")

tts.tts_to_file("C'est le clonage de la voix.", speaker_wav="my/cloning/audio.wav", language="fr-fr", file_path="output.wav")

tts.tts_to_file("Isso é clonagem de voz.", speaker_wav="my/cloning/audio.wav", language="pt-br", file_path="output.wav")

Converting the voice in source_wav to the voice of target_wav

tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False).to("cuda")

tts.voice_conversion_to_file(source_wav="my/source.wav", target_wav="my/target.wav", file_path="output.wav")

you can clone voices by using any model in 🐸TTS.

tts = TTS("tts_models/de/thorsten/tacotron2-DDC")

tts.tts_with_vc_to_file(

"Wie sage ich auf Italienisch, dass ich dich liebe?",

speaker_wav="target/speaker.wav",

file_path="output.wav"

)

text to speech using Fairseq models in ~1100 languages

# TTS with on the fly voice conversion

api = TTS("tts_models/deu/fairseq/vits")

api.tts_with_vc_to_file(

"Wie sage ich auf Italienisch, dass ich dich liebe?",

speaker_wav="target/speaker.wav",

file_path="output.wav"

)

Command-line tts

List provided models:

$ tts --list_models

Query by type/name: The model_info_by_name uses the name as it from the --list_models.

$ tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>"

For example:

$ tts --model_info_by_name tts_models/tr/common-voice/glow-tts

$ tts --model_info_by_name vocoder_models/en/ljspeech/hifigan_v2

Query by type/idx: The model_query_idx uses the corresponding idx from --list_models.

$ tts --model_info_by_idx "<model_type>/<model_query_idx>"

For example:

$ tts --model_info_by_idx tts_models/3

Query info for model info by full name:

$ tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>"

Run TTS with default models:

$ tts --text "Text for TTS" --out_path output/path/speech.wav

Run TTS and pipe out the generated TTS wav file data:

$ tts --text "Text for TTS" --pipe_out --out_path output/path/speech.wav | aplay

Run a TTS model with its default vocoder model:

$ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --out_path output/path/speech.wav

For example:

$ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --out_path output/path/speech.wav

Run with specific TTS and vocoder models from the list:

$ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --vocoder_name "<model_type>/<language>/<dataset>/<model_name>" --out_path output/path/speech.wav

For example:

$ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --vocoder_name "vocoder_models/en/ljspeech/univnet" --out_path output/path/speech.wav

Run your own TTS model (Using Griffin-Lim Vocoder):

$ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav

Run your own TTS and Vocoder models:

$ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav

--vocoder_path path/to/vocoder.pth --vocoder_config_path path/to/vocoder_config.json

Multi-speaker Models

List the available speakers and choose a <speaker_id> among them:

$ tts --model_name "<language>/<dataset>/<model_name>" --list_speaker_idxs

Run the multi-speaker TTS model with the target speaker ID:

$ tts --text "Text for TTS." --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" --speaker_idx <speaker_id>

Run your own multi-speaker TTS model:

$ tts --text "Text for TTS" --out_path output/path/speech.wav --model_path path/to/model.pth --config_path path/to/config.json --speakers_file_path path/to/speaker.json --speaker_idx <speaker_id>

Voice Conversion Models

$ tts --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" --source_wav <path/to/speaker/wav> --target_wav <path/to/reference/wav>

TencentARC/PhotoMaker根据一组照片生成该人物其他照片

点击进入体验地址

点击进入源码地址

FishAudio接近人声的tts

GitHub :https://github.com/fishaudio/fish-speech

在线体验:https://fish.audio/zh-CN/

Omost简单提示词扩展成详细且精准的Prompt

点击进入源码

点击进入体验站点

ChatTTS-ui将文字合成为语音，同时支持对外提供API接口

一个简单的本地网页界面，使用ChatTTS将文字合成为语音，同时支持对外提供API接口

点击进入源码页

LivePortrait快手最新脸部视频AI模型

点击查看源码

在线体验1

在线体验2

1. 克隆代码并准备环境

git clone https://github.com/KwaiVGI/LivePortrait

cd LivePortrait

# create env using conda

conda create -n LivePortrait python==3.9

conda activate LivePortrait

# install dependencies with pip (for Linux and Windows)

pip install -r requirements.txt

# for macOS with Apple Silicon

pip install -r requirements_macOS.txt

注意：确保您的系统已安装FFmpeg，包括ffmpeg和ffprobe！

2. 下载预训练权重

下载预训练权重的最简单方法是从 HuggingFace 下载：

# first, ensure git-lfs is installed, see: https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage

git lfs install

# clone and move the weights

git clone https://huggingface.co/KwaiVGI/LivePortrait temp_pretrained_weights

mv temp_pretrained_weights/* pretrained_weights/

rm -rf temp_pretrained_weights

或者，你可以从Google Drive或百度云下载所有预训练权重。解压并将它们放在中./pretrained_weights。

确保目录结构如下，或包含：

pretrained_weights

├── insightface

│ └── models

│ └── buffalo_l

│ ├── 2d106det.onnx

│ └── det_10g.onnx

└── liveportrait

├── base_models

│ ├── appearance_feature_extractor.pth

│ ├── motion_extractor.pth

│ ├── spade_generator.pth

│ └── warping_module.pth

├── landmark.onnx

└── retargeting_models

└── stitching_retargeting_module.pth

3.推理

快速动手

# For Linux and Windows

python inference.py

# For macOS with Apple Silicon, Intel not supported, this maybe 20x slower than RTX 4090

PYTORCH_ENABLE_MPS_FALLBACK=1 python inference.py

如果脚本成功运行，你会得到一个名为的输出mp4文件animations/s6--d0_concat.mp4。此文件包含以下结果：驾驶视频，输入图像和生成的结果。

图像

-s或者，您可以通过指定和参数来更改输入-d：

python inference.py -s assets/examples/source/s9.jpg -d assets/examples/driving/d0.mp4

# disable pasting back to run faster

python inference.py -s assets/examples/source/s9.jpg -d assets/examples/driving/d0.mp4 --no_flag_pasteback

# more options to see

python inference.py -h

驾驶视频自动裁剪

📕 要使用您自己的驾驶视频，我们建议：

将其裁剪为1:1 的宽高比（例如 512x512 或 256x256 像素），或通过启用自动裁剪--flag_crop_driving_video。

重点关注头部区域，与示例视频类似。

尽量减少肩部运动。

确保驾驶视频的第一帧是正面且表情中性。

以下是自动裁剪的案例--flag_crop_driving_video：

python inference.py -s assets/examples/source/s9.jpg -d assets/examples/driving/d13.mp4 --flag_crop_driving_video

如果觉得自动裁剪的效果不好，您可以修改--scale_crop_video、--vy_ratio_crop_video选项来调整比例和偏移量，或者手动进行调整。

动作模板制作

您还可以使用自动生成的以结尾的运动模板文件来.pkl加速推理，并保护隐私，例如：

python inference.py -s assets/examples/source/s9.jpg -d assets/examples/driving/d5.pkl

在我们的主页上发现更多有趣的结果😊

4. Gradio 界面

我们还提供了 Gradio界面以获得更好的体验，只需运行即可：

# For Linux and Windows:

python app.py

# For macOS with Apple Silicon, Intel not supported, this maybe 20x slower than RTX 4090

PYTORCH_ENABLE_MPS_FALLBACK=1 python app.py

您可以指定--server_port、、--share参数--server_name来满足您的需求！

🚀 我们还提供了加速选项--flag_do_torch_compile。首次推理会触发优化过程（约一分钟），使后续推理速度提高 20-30%。性能提升可能因 CUDA 版本的不同而有所差异。

# enable torch.compile for faster inference

python app.py --flag_do_torch_compile

Unique3D从单个图像生成高质量、高效的 3D 网格

EchoMimic数字人

EchoMimic整合包：https://pan.quark.cn/s/9bba5f7f7167

DiffMorpher：一键生成两张图片平滑变形视频

开源地址：https://github.com/Kevin-thu/DiffMorpher

一键启动包：https://aiyy.info/diffmorpher/

点击在线体验

Pinokio：一键安装所有开源 AI 应用，编程小白的福音

点击查看原文

点击查看源码

点击进入软件主页

Pinokio整合了几乎所有市面上开源的 AI 工具，不管是大语言模型、AI 绘画、AI语音、AI 视频，还是AI 音乐等其他相关的AI工具，全部都可以在这里找到。

而这款工具，它最大的亮点就是这些 AI 应用都可以傻瓜式地一键安装。这也是为什么我说Pinokio是编程小白的福音，无需懂编程，无需会代码，只要会电脑就行。

哪怕对于编程大佬也是可以入手的一款工具，以前我们在本地部署开源AI工具，我们需要按步骤一步步克隆项目、安装/解决依赖、运行调试，特别繁琐，如今只需要一键安装即可。

AI应用一键安装

说实话，作为一个程序员，我在第一次听到“一键安装”这个概念时，真的是抱着怀疑的态度。

毕竟，以前安装 Stable Diffusion、ComfyUI、TTS 等AI应用时，我可是吃尽了苦头。各种环境配置、命令行操作，还有那无穷无尽的报错信息，简直让人头疼。但 Pinokio 彻底改变了我的看法。

记得第一次尝试安装 Stable Diffusion Web UI 时，即便按照官方文档一步步操作，依旧避免不了各种莫名其妙的错误。毕竟我是在Mac上用的，不像Windows有秋叶大佬那样的大神给封装好启动器，一键就可运行。

那段时间，我真的想把电脑砸了。安装好了也还时不时出些小毛病！

Pinokio支持全平台（除了移动端）：Windows、Mac、Linux

下载可直接前往GitHub项目下，如网络条件差的朋友，小编也在文末提供下载好的资源！

如何安装Pinokio

官网：https://pinokio.computer/

GitHub：https://github.com/pinokiocomputer/pinokio

安装途径一：官网直接下载

点击“Download”

安装途径二：GitHub Release页面下载

安装包下载到本地后，就可以双击进行安装了（我是在Mac电脑上进行的）

Tips：如果Mac上安装完成后，打开Pinokio时提示“文件已损坏，您应该将它移到废纸篓”，请执行以下命令即可解决: xattr -rd com.apple.quarantine /Applications/Pinokio.app

Pinokio界面功能介绍

首次进行Pinokio会显示如下设置界面，需要设置Pinokio的Home目录，也就是之后安装AI工具时存储的主目录，所有安装的AI工具都会在此目录下，然后设置Pinokio的主题（暗黑和亮白）按照自己习惯就行。

保存后会进入到Pinokio应用主界面，然后点击“Visit Discover Page”

这样就进入到一个类似于应用市场的界面，可以安装脚本库中有的开源AI工具。

“Community Script”顾名思义为社区脚本工具，指的是pinokio内需的脚本工具

“Download from URL”自定义开源AI工具安装，只要是GitHub上有的，其实都可以复制github链接进行一键式安装。

Pinokio 一键式安装AI应用

接下来就拿 IDM-VTON 这个开源换装项目，做一个演示说明。直接点击Download下载。

会出现如下环境依赖安装界面，比如：Python虚拟环境管理应用conda、git拉取代码、zip解压用的等等。（有一点觉得不好，不支持自定义软件路径，像conda/git/zip/py/brew其实会编程的基本上自己电脑都会安装过，无需再安装新的）

我们直接点击Install安装，等待下载完成

直到7个安装依赖都安装完成后，提示Install Complete!说明所有依赖安装完成！

此时会弹出一个弹窗，让你继续安装AI应用，在我这里也就是安装IDM-VTON项目

由于是测试给大家看的，安装时间比较久，所以结果就不在这里展示了，如果想安装其他项目的小伙伴照此流程即可。

安装完成后的项目会在主界面显示，直接Star启动项目就行了。

还有一个功能，Pinokio可以开启一个本地HTTP服务，就跟客户端界面一样，在浏览器上一样可以执行所有的操作。

所以，测验用了下 Pinokio，安装AI工具简直就像喝水一样简单。一键安装，自动搞定所有配置。

（但是还是可能因为网络原因，安装失败！再就是安装的项目对显卡有特别的最低要求，所以请根据电脑配置选择是否安装相应的AI应用）

相信编程小白，使用后的那种感觉，就像是解锁了新技能，会特别有成就感

FLUX.1 模型的官方推理库

点击进入开源地址

https://huggingface.co/spaces/ChristianHappy/FLUX.1-schnell

https://replicate.com/collections/flux

https://replicate.com/black-forest-labs/flux-pro

https://replicate.com/black-forest-labs/flux-dev

https://replicate.com/black-forest-labs/flux-schnell

FLUX生成图片格式技巧

问AI获得相机生成图片格式

世界上做好的相机是啥

相机拍摄的图像的格式是啥

最后提示词变为

相机特有格式文件

如 L1000001.DNG:提示词

Deep-Live-Cam实时换脸只需要一张图片

点击进入源码页

C#开源可离线运行的AI生产力工具，保护数据隐私，功能强大

开源地址：https://github.com/rnchg/Apt

IOPaint | 一个超级好用的万能AI工具，抠图，擦除，扩图，形状替换，环境感知等

点击查看源码

一键包下载

夸克：https://pan.quark.cn/s/132bcc341ad4 提取码：xjag

mega：https://mega.nz/folder/Qu0zETKT#yxsnmmLtJfkWUNrk8KFq7Q

官网：https://www.iopaint.com/

点击在线体验

FireRedTTS整合包,1021,官方模型更新,支持长文本

点击进入项目源码主页

Clone and install

Clone the repo

https://github.com/FireRedTeam/FireRedTTS.git

cd FireRedTTS

Create conda env

# step1.create env

conda create --name redtts python=3.10

# stpe2.install torch （pytorch should match the cuda-version on your machine）

# CUDA 11.8

conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia

# CUDA 12.1

conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia

# step3.install fireredtts form source

Download models

Download the required model files from Model_Lists and place them in the folder pretrained_models

Basic Usage

import os

import torchaudio

from fireredtts.fireredtts import FireRedTTS

tts = FireRedTTS(

config_path="configs/config_24k.json",

pretrained_path=<pretrained_models_dir>,

)

#same language

rec_wavs = tts.synthesize(

prompt_wav="examples/prompt_1.wav",

text="小红书，是中国大陆的网络购物和社交平台，成立于二零一三年六月。",

)

rec_wavs = rec_wavs.detach().cpu()

out_wav_path = os.path.join("./example.wav")

torchaudio.save(out_wav_path, rec_wavs, 24000)

VideoChat实时语音交互数字人

实时语音交互数字人，支持端到端语音方案（GLM-4-Voice - THG）和级联方案（ASR-LLM-TTS-THG）。可自定义形象与音色，支持音色克隆，首包延迟低至3s。

在线demo：https://www.modelscope.cn/studios/AI-ModelScope/video_chat

TODO:

TTS模块添加音色克隆功能

TTS模块添加edge-tts

LLM模块添加qwen本地推理

支持GLM-4-Voice，提供ASR-LLM-TTS-THG和MLLM-THG两种生成方式

GLM-4-Voice集成vllm推理加速

集成gradio-webrtc（需等待支持音视频同步），提高视频流稳定性

Llama-3.3-70B震撼登场！70b参数128k上下文性能接近gpt4

QVQ-72B开源的图像转文字的开源模型

点击进入体验地址

点击进入模型地址

上一篇：AI人工智能应用编程

下一篇：AI数学基础

您现在的位置是：网站首页> AI人工智能

优秀的开源项目

相关文章