doctor
Сообществоот sisig-ai
Doctor is a tool for discovering, crawl, and indexing web sites to be exposed as an MCP server for LLM agents.
Установка
# Install pre-commitОписание
<div align="center"> <picture> <img alt="Doctor Logo" src="doctor.png" height="30%" width="30%"> </picture> <br> <h2>🩺 Doctor</h2> [](https://github.com/sisig-ai/doctor) [](LICENSE.md) [](https://github.com/sisig-ai/doctor/actions/workflows/pytest.yml) [](https://codecov.io/gh/sisig-ai/doctor) A tool for discovering, crawl, and indexing web sites to be exposed as an MCP server for LLM agents for better and more up-to-date reasoning and code generation. </div> --- ### 🔍 Overview Doctor provides a complete stack for: - Crawling web pages using crawl4ai with hierarchy tracking - Chunking text with LangChain - Creating embeddings with OpenAI via litellm - Storing data in DuckDB with vector search support - Exposing search functionality via a FastAPI web service - Making these capabilities available to LLMs through an MCP server - Navigating crawled sites with hierarchical site maps --- ### 🏗️ Core Infrastructure #### 🗄️ DuckDB - Database for storing document data and embeddings with vector search capabilities - Managed by unified Database class #### 📨 Redis - Message broker for asynchronous task processing #### 🕸️ Crawl Worker - Processes crawl jobs - Chunks text - Creates embeddings #### 🌐 Web Server - FastAPI service exposing endpoints - Fetching, searching, and viewing data - Exposing the MCP server --- ### 💻 Setup #### ⚙️ Prerequisites - Docker and Docker Compose - Python 3.10+ - uv (Python package manager) - OpenAI API key #### 📦 Installation 1. Clone this repository 2. Set up environment variables: ``` export OPENAI_API_KEY=your-openai-key ``` 3. Run the stack: ``` docker compose up ``` --- ### 👁 Usage 1. Go to http://localhost:9111/docs to see the OpenAPI docs 2. Look for the `/fetch_url` endpoint and start a crawl job by providing a URL 3. Use `/job_progress` to see the current job status 4. Configure your editor to use `http://localhost:9111/mcp` as an MCP server --- ### ☁️ Web API #### Core Endpoints - `POST /fetch_url`: Start crawling a URL - `GET /search_docs`: Search indexed documents - `GET /job_progress`: Check crawl job progress - `GET /list_doc_pages`: List indexed pages - `GET /get_doc_page`: Get full text of a page #### Site Map Feature The Maps feature provides a hierarchical view of crawled websites, making it easy to navigate and explore the structure of indexed sites. **Endpoints:** - `GET /map`: View an index of all crawled sites - `GET /map/site/{root_page_id}`: View the hierarchical tree structure of a specific site - `GET /map/page/{page_id}`: View a specific page with navigation (parent, siblings, children) - `GET /map/page/{page_id}/raw`: Get the raw markdown content of a page **Features:** - **Hierarchical Navigation**: Pages maintain parent-child relationships, allowing you to navigate through the site structure - **Domain Grouping**: Pages from the same domain crawled individually are automatically grouped together - **Automatic Title Extraction**: Page titles are extracted from HTML or markdown content - **Breadcrumb Navigation**: Easy navigation with breadcrumbs showing the path from root to current page - **Sibling Navigation**: Quick access to pages at the same level in the hierarchy - **Legacy Page Support**: Pages crawled before hierarchy tracking are grouped by domain for easy access - **No JavaScript Required**: All navigation works with pure HTML and CSS for maximum compatibility **Usage Example:** 1. Crawl a website using the `/fetch_url` endpoint 2. Visit `/map` to see all crawled sites 3. Click on a site to view its hierarchical structure 4. Navigate through pages using the provided links --- ### 🔧 MCP Integration Ensure that your Docker Compose stack is up, and then add to your Cursor or VSCode MCP Servers configuration: ```json "doctor": { "type": "sse", "url": "http://localhost:9111/mcp" } ``` --- ### 🧪 Testing #### Running Tests To run all tests: ```bash # Run all tests with coverage report pytest ``` To run specific test categories: ```bash # Run only unit tests pytest -m unit # Run only async tests pytest -m async_test # Run tests for a specific component pytest tests/lib/test_crawler.py ``` #### Test Coverage The project is configured to generate coverage reports automatically: ```bash # Run tests with detailed coverage report pytest --cov=src --cov-report=term-missing ``` #### Test Structure - `tests/conftest.py`: Common fixtures for all tests - `tests/lib/`: Tests for library components - `test_crawler.py`: Tests for the crawler module - `test_crawler_enha
Отзывы (0)
Пока нет отзывов. Будьте первым!
Статистика
Информация
Технологии
Похожие серверы
mcp-chain-of-draft-server
Chain of Draft Server is a powerful AI-driven tool that helps developers make better decisions through systematic, iterative refinement of thoughts and designs. It integrates seamlessly with popular AI agents and provides a structured approach to reasoning, API design, architecture decisions, code reviews, and implementation planning.
mcp-use-ts
mcp-use is the framework for MCP with the best DX - Build AI agents, create MCP servers with UI widgets, and debug with built-in inspector. Includes client SDK, server SDK, React hooks, and powerful dev tools.
mesh
Define and compose secure MCPs in TypeScript. Generate AI workflows and agents with React + Tailwind UI. Deploy anywhere.
rhinomcp
RhinoMCP connects Rhino 3D to AI Agent through the Model Context Protocol (MCP)