Chat Completions API

OpenAI-compatible API with Konnect extensions for multi-model patterns.

POST/v1/chat/completions

Creates a chat completion. Fully compatible with the OpenAI API format, with Konnect extensions for multi-model patterns like Ensemble, Debate, and Council.

Authentication

All API requests require authentication via Bearer token.

Authorization: Bearer YOUR_API_KEY

Request Body

modelrequired

Model ID to use. Use konnect-ensemble, konnect-debate, or konnect-council for multi-model patterns.

Type: string

messagesrequired

Array of messages in the conversation. Each message has a role and content.

Type: array

streamoptional

Whether to stream the response via Server-Sent Events. Defaults to false.

Type: boolean

Konnect Extensions

konnect.pattern

Pattern type: ensemble, debate, council

konnect.models

Array of model IDs to use in the pattern.

konnect.aggregation

Aggregation method for ensemble: synthesis, best_of_n, consensus, union

konnect.personas

Array of persona configurations for each model slot.

konnect.enable_tools

Enable web search for models. Defaults to true. When enabled, models can perform web searches and timeouts adjust dynamically.

Example Requests

Standard Chat

cURL
curl https://api.konnect.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "stream": true
  }'

Ensemble Mode

cURL
curl https://api.konnect.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "konnect-ensemble",
    "messages": [
      {"role": "user", "content": "What are the pros and cons of microservices?"}
    ],
    "stream": true,
    "konnect.pattern": "ensemble",
    "konnect.models": ["gpt-4o", "claude-sonnet-4-5-20250929", "gemini-2.0-flash"],
    "konnect.aggregation": "synthesis"
  }'

Debate Mode

cURL
curl https://api.konnect.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "konnect-debate",
    "messages": [
      {"role": "user", "content": "Should AI development be regulated?"}
    ],
    "stream": true,
    "konnect.pattern": "debate",
    "konnect.models": ["gpt-4o", "gemini-2.0-flash", "claude-sonnet-4-5-20250929"],
    "konnect.personas": [
      {"personaId": "advocate", "modelId": "gpt-4o"},
      {"personaId": "critic", "modelId": "gemini-2.0-flash"},
      {"personaId": "impartial_judge", "modelId": "claude-sonnet-4-5-20250929"}
    ]
  }'

The three personas correspond to Pro (advocate), Con (critic), and Judge roles. Personas are optional—defaults are used if omitted.

Streaming Response

When stream: true, responses are sent via Server-Sent Events (SSE).

Standard Chunk

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

Konnect Metadata Chunk

For pattern responses, Konnect includes additional metadata in konnect.metadata:

Ensemble model_complete event
data: {
  "konnect.metadata": {
    "event": "model_complete",
    "model_completed": "gpt-5.2",
    "model_success": true,
    "models_completed": 1,
    "models_total": 3,
    "models_successful": 1,
    "models_failed": 0,
    "tool_calls_made": 2
  }
}
Debate round event
data: {
  "konnect.metadata": {
    "pattern_used": "debate",
    "debate_event": "round",
    "debate_new_round": {
      "round_number": 1,
      "pro_argument": "...",
      "con_argument": "..."
    }
  }
}

Ensemble Streaming Events

model_complete

Fired when each model finishes. Includes model name, success/failure status, content (if successful), and progress count.

aggregation

Final aggregated response from all successful models, streamed in chunks.

Debate Streaming Events

setup

Opening positions from Pro and Con are ready. Includes pro_success and con_success flags.

round

A debate round completed. Contains round number and both arguments. Pro argues first, then Con responds sequentially.

judge

Judge's verdict is ready. Contains winner and reasoning.

Response Format

JSON
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1705312345,
  "model": "konnect-ensemble",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Based on analysis from multiple AI models..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 450,
    "total_tokens": 475
  },
  "konnect.metadata": {
    "pattern_used": "ensemble",
    "aggregation_method": "synthesis",
    "models_successful": 3,
    "models_failed": 0,
    "model_responses": [
      {
        "model": "gpt-4o",
        "content": "...",
        "latency_ms": 1250,
        "success": true,
        "tool_calls_made": 2,
        "tool_call_details": [
          {"tool": "web_search", "query": "..."},
          {"tool": "web_search", "query": "..."}
        ]
      },
      {
        "model": "claude-sonnet-4-5-20250929",
        "content": "...",
        "latency_ms": 980,
        "success": true,
        "tool_calls_made": 0
      },
      {
        "model": "gemini-2.0-flash",
        "content": "...",
        "latency_ms": 1100,
        "success": true,
        "tool_calls_made": 1
      }
    ],
    "total_latency_ms": 1250,
    "estimated_cost": 0.0045
  }
}

The tool_calls_made field shows how many web searches each model performed. The tool_call_details array provides specifics about each tool call.

Available Models

Provider Models

gpt-5.2OpenAI
gpt-5.2-chat-latestOpenAI
gpt-4oOpenAI
gpt-4-turboOpenAI
claude-sonnet-4-5-20250929Anthropic
claude-3-haiku-20240307Anthropic
gemini-2.0-flashGoogle

Konnect Virtual Models

konnect-ensembleEnsemble mode
konnect-debateDebate mode
konnect-councilCouncil mode

SDK Compatibility

Konnect is compatible with OpenAI SDKs. Use the baseURL parameter to point to Konnect.

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.konnect.ai/v1",
    api_key="your-konnect-api-key"
)

# Ensemble mode with Konnect extensions
response = client.chat.completions.create(
    model="konnect-ensemble",
    messages=[{"role": "user", "content": "Explain AI"}],
    extra_body={
        "konnect.pattern": "ensemble",
        "konnect.models": ["gpt-4o", "claude-sonnet-4-5-20250929"],
        "konnect.aggregation": "synthesis"
    }
)

Node.js

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.konnect.ai/v1',
  apiKey: 'your-konnect-api-key'
});

const response = await client.chat.completions.create({
  model: 'konnect-debate',
  messages: [{ role: 'user', content: 'Is remote work better?' }],
  stream: true,
  // Konnect extensions
  'konnect.pattern': 'debate',
  'konnect.models': ['gpt-4o', 'gemini-2.0-flash', 'claude-sonnet-4-5-20250929']
});

Graceful Degradation

Multi-model patterns are designed for reliability. If some models fail (rate limits, timeouts, errors), you still get results from the working models.

Partial Success (200 OK)

If at least one model succeeds, the API returns 200 OK with aggregated results from working models.

*Note: 1 model(s) failed (claude-sonnet-4-5). Results are from 2 successful model(s).*

All Models Failed (200 OK with error message)

If all models fail, you get a 200 OK with a detailed error message listing each model's specific error and troubleshooting suggestions.

Dynamic Timeouts

When models use web search, timeouts adjust automatically to accommodate the extra time needed for tool calls.

Base timeout60s

Default timeout for model responses without tool calls.

Per tool call+20s

Additional time allocated for each web search or tool call made.

Maximum timeout180s

Hard limit to prevent indefinite waiting.

Example: A model making 3 web searches gets 60 + (3 × 20) = 120s timeout.

Error Handling

{
  "error": {
    "message": "Invalid model specified",
    "type": "invalid_request_error",
    "code": "model_not_found"
  }
}
401Invalid or missing API key
400Invalid request or model not found
429Rate limit exceeded
500Internal server error (unrecoverable)

Note: Multi-model patterns use graceful degradation. Partial model failures return 200 OK with results from working models, not 502.

Related