From aaf9cb90baa5a15dbd9d9b65f56c1cfcbc13d437 Mon Sep 17 00:00:00 2001
From: Peng Liu <pliutd7308@gmail.com>
Date: Tue, 8 Apr 2025 20:11:28 +0800
Subject: [PATCH] =?UTF-8?q?Update=20=E2=80=983.2=20Vision=20Foundation=20M?=
 =?UTF-8?q?odels=E2=80=99=20=20and=20=20=E2=80=983.8=20Multimodal=20Models?=
 =?UTF-8?q?=E2=80=99?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 README.md | 3 +++
 1 file changed, 3 insertions(+)
diff --git a/README.md b/README.md
index 03403b1..b82aa89 100644
--- a/README.md
+++ b/README.md
@@ -186,6 +186,8 @@
 
 * Grounding-DINO: [repo](https://github.com/IDEA-Research/GroundingDINO), [在线尝试](https://deepdataspace.com/playground/grounding_dino), **这个DINO与上面Meta的DINO没有关系**, 是一个由IDEA研究院(做了很多不错开源项目的机构)开发集成的图像目标检测的框架, 很多时候需要对目标物体进行检测的时候可以考虑使用。
 
+* OmDet-Turbo: [repo](https://github.com/om-ai-lab/OmDet), 一个由OmAI Lab开源的研究, 提供OVD（开放词表目标检测）能力, 优点在于推理速度非常快（100+FPS）, 适合需要高FPS的自定义目标物体检测场景。
+
 * Grounded-SAM: [repo](https://github.com/IDEA-Research/Grounded-SAM-2), 比Grounding-DINO多了一个分割功能, 也就是支持检测后分割, 也有很多下游应用, 具体可以翻一下README。
 
 * FoundationPose: [website](https://github.com/NVlabs/FoundationPose), 来自Nvidia的研究, 物体姿态追踪模型。
@@ -456,6 +458,7 @@ CS231n (斯坦福计算机视觉课程): [website](https://cs231n.stanford.edu/s
 * 最经典的工作CLIP: [知乎](https://zhuanlan.zhihu.com/p/493489688)<br>
 * 多模态大语言模型的经典工作 LLaVA: [website](https://llava-vl.github.io/)<br>
 * 多模态生成模型综述: [pdf](https://arxiv.org/pdf/2503.04641)<br>
+* 多模态大语言模型强化学习项目：VLM-R1: [repo](https://github.com/om-ai-lab/VLM-R1) 来自OmAI Lab的多模态大语言模型DeepSeek R1-style强化学习开源项目，使用GRPO强化学习算法对多模态大语言模型进行优化，效果优于常规sft，是训练具身智能模型的一种新方向。<br>
 
 <section id="navigation"></section>