Skip to main content
You can integrate Runpod with any system that supports custom endpoint configuration. Integration is usually straightforward: any library or framework that accepts a custom base URL for API calls will work with Runpod without specialized adapters or connectors. This means you can use Runpod with tools like n8n, CrewAI, LangChain, and many others by simply pointing them to your Runpod endpoint URL.

Endpoint integration options

Runpod offers four deployment options for endpoint integrations:

Public Endpoints

Public Endpoints are pre-deployed AI models that you can use without setting up your own Serverless endpoint. They’re vLLM-compatible and return OpenAI-compatible responses, so you can get started quickly or test things out without deploying infrastructure. The following Public Endpoint URLs are available for OpenAI-compatible models:
# Public Endpoint for Qwen3 32B AWQ
https://api.runpod.ai/v2/qwen3-32b-awq/openai/v1

# Public Endpoint for ibm/IBM Granite-4.0-H-Small
https://api.runpod.ai/v2/granite-4-0-h-small/openai/v1

vLLM workers

vLLM workers provide an inference engine that returns OpenAI-compatible responses, making it ideal for tools that expect OpenAI’s API format. When you deploy a vLLM endpoint, access it using the OpenAI-compatible API at:
https://api.runpod.ai/v2/ENDPOINT_ID/openai/v1
Where ENDPOINT_ID is your Serverless endpoint ID. For a full walkthrough of how to integrate a vLLM endpoint with an agentic framework, see the n8n integration guide:

SGLang workers

SGLang workers also return OpenAI-compatible responses, offering optimized performance for certain model types and use cases.

Load balancing endpoints

Load balancing endpoints let you create custom endpoints where you define your own inputs and outputs. This gives you complete control over the API contract and is ideal when you need custom behavior beyond standard inference patterns.

Model configuration and compatibility

Some models require specific vLLM environment variables to work with external tools and frameworks. You may need to set a custom chat template or tool call parser to ensure your model returns responses in the format your integration expects. For example, you can configure the Qwen/qwen3-32b-awq model for OpenAI compatibility by adding these environment variables in your vLLM endpoint settings:
ENABLE_AUTO_TOOL_CHOICE=true
REASONING_PARSER=qwen3
TOOL_CALL_PARSER=hermes
These settings enable automatic tool choice selection and set the right parsers for the Qwen3 model to work with tools that expect OpenAI-formatted responses. For more information about tool calling configuration and available parsers, see the vLLM tool calling documentation.

Compatible frameworks

The same integration pattern works with any framework that supports custom OpenAI-compatible endpoints, including:
  • n8n: A workflow automation tool with AI integration capabilities.
  • CrewAI: A framework for orchestrating role-playing autonomous AI agents.
  • LangChain: A framework for developing applications powered by language models.
  • AutoGen: Microsoft’s framework for building multi-agent conversational systems.
  • Haystack: An end-to-end framework for building search systems and question answering.
Configure these frameworks to use your Runpod endpoint URL as the base URL, and provide your Runpod API key for authentication.

Third-party integrations

For infrastructure management and orchestration, Runpod also integrates with:
  • dstack: Simplified Pod orchestration for AI/ML workloads.
  • SkyPilot: Multi-cloud execution framework.
  • Mods: AI-powered command-line tool.