multimodal-agents-course
Сообществоот multi-modal-ai
An MCP Multimodal AI Agent with eyes and ears!
Описание
<h1 align="center">Kubrick Course</h1> <p align="center"> <img alt="logo" src="static/hal_9000.png" width=100 /> <h4 align="center" >Hi Dave...</h4> </p> <hr> <p align="center"> <h4 align="center">Learn to build AI Agents that can understand images, text, audio and videos.</h3> </p> <p align="center"> A <b>free, Open-source</b> course by <a href="https://theneuralmaze.substack.com"> The Neural Maze </a> and <a href="https://neuralbits.substack.com">Neural Bits</a> in collaboration with <a href="https://github.com/pixeltable">Pixeltable</a> and <a href="https://github.com/comet-ml/opik">Opik</a> </p> </br> <img alt="logo" src="kubrick-api/static/agent_architecture.gif" width=1000 /> </br> --- ## 📖 About This Course Tired of tutorials that just walk you through connecting an existing MCP server to Claude Desktop? Yeah, us too. That's why we built **Kubrick AI**, an MCP Multimodal Agent for video processing tasks. Yes! You read that right. > 💡 Agents + Video Processing ... and MCP! This course is a collaboration between The Neural Maze and Neural Bits (from now on, "The Neural Bros"), and it's built for developers who want to go beyond the basics and build serious, production-ready AI Systems. In particular, you'll: * Learn how to build an MCP server for video processing using Pixeltable and FastMCP * Design a custom, Groq-powered agent, connected to your MCP server with its own MCP client * Integrate your agentic system with Opik for full observability and prompt versioning ## 🖊️ What you'll learn * Learn how to use Pixeltable for multimodal data processing and stateful agents * Create complex MCP servers using FastMCP: expose resources, prompts, and tools * Apply prompt versioning to your MCP server (instead of defining the prompts in the Agent API) * Learn how to implement custom MCP clients for your agents * Implement an MCP Tool Agent from scratch, using Llama 4 Scout and Maverick as the LLMs * Use Opik for MCP prompt versioning * Learn how to implement custom tracing and monitoring with Opik > 🚀 No shortcuts. No fluff. Let's learn by doing. --- ## 💻 What You'll Do: Completing this course, you'll learn how to design and enable Agents to understand multimodal data, across images, video, audio, and text inputs, all within a single system. Specifically, you'll get to: - Build a complex Multimodal Processing Pipeline - Build a Video Search Engine and expose its functionality to an Agent via MCP (Model Context Protocol) - Build a production-ready API to power the Agent. - Integrate LLMOps principles and best software engineering practices. - Learn about video, embeddings, streaming APIs, Vision Language Models (VLMs), and more. After completing this course, you'll have built your own Kubrick Agent with a HAL-themed spin-off, to play the role of a new set of eyes and ears: <video src="https://github.com/user-attachments/assets/ef77c2a9-1a77-4f14-b2dd-e759c3f6db72"/></video> --- ## Getting Started Kubrick is **not** a simple tutorial. So, to get this system up and running, there are a few things you need to do first. We have detailed the steps to get you started in this [GETTING_STARTED.md](GETTING_STARTED.md) file. > 💡 Having Kubrick running is just the first step! Now that you have it up and running, it's time to actually understand how it works (see [Course Syllabus](#-course-syllabus)). --- ## Watch the Full Video Course <p align="center"> <a href="https://www.youtube.com/watch?v=_iYB1z1_Xgs&t=316s"><img src="static/video_thumbnail.png" alt="Kubrick Multimodal Agent" width="500"></a> </p> --- ## 🧑🎓 Who is this course for? You'll get the most out of this course by building it yourself, from the ground up. The course components are structured to cover key concepts and demonstrate how to build upon them, ultimately leading to AI Systems. | Target Audience | Skills you'll get | | ----- | ------------------| | ML/AI Engineers | Build complex MCP Servers, learn to apply AI Models to Video, Images, and Speech.| | Software Engineers | Learn to connect AI Components with APIs, building end-to-end agentic applications.| | Data Engineers/Scientists | Learn to design an AI System, managing Video/Audio/Image data processing and structure. Regardless of your experience or title, this course aims to unpack complex topics in practical terms and concepts you could understand, learn, and apply - helping you to build a complete AI system. ## 🎓 Prerequisites In this section, we outlined a few requirements and nice-to-haves to improve your learning experience while taking this course. | Category | Label | Description | |----------| ----- | ----------- | | Programming Skills (Beginner) | Requirement | Understanding of Programming in general, and the Python language syntax. | AI/ML Concepts (Beginner) | Nice to Have | Understanding the basic concepts behind AI, AI Models, and AI Systems. | LLMs, MCP, Agents | Nice to Have | Perfect if you know about them, not a
Отзывы (0)
Пока нет отзывов. Будьте первым!
Статистика
Информация
Технологии
Похожие серверы
mcp-chain-of-draft-server
Chain of Draft Server is a powerful AI-driven tool that helps developers make better decisions through systematic, iterative refinement of thoughts and designs. It integrates seamlessly with popular AI agents and provides a structured approach to reasoning, API design, architecture decisions, code reviews, and implementation planning.
mcp-use-ts
mcp-use is the framework for MCP with the best DX - Build AI agents, create MCP servers with UI widgets, and debug with built-in inspector. Includes client SDK, server SDK, React hooks, and powerful dev tools.
mesh
Define and compose secure MCPs in TypeScript. Generate AI workflows and agents with React + Tailwind UI. Deploy anywhere.
rhinomcp
RhinoMCP connects Rhino 3D to AI Agent through the Model Context Protocol (MCP)