Content validity: 2026-04 | Sources: OpenAI compatibility · Qwen API · Function calling · Models
Qwen text generation models accessed through an OpenAI-compatible interface. Migrate existing OpenAI code by updating three values: base_url, api_key, and model. Supports text generation, multi-turn conversations, code writing, reasoning, and function calling.
| Scenario | Recommended Model | Notes |
|----------|------------------|-------|
| General conversation / content generation | qwen3.6-plus | Latest flagship. Best balance of performance, cost, and speed. 1M context. Recommended default. |
| General conversation (alt) | qwen3.5-plus | Balanced performance, cost, speed, 1M context, thinking on by default. |
| Low-latency real-time interaction | qwen3.5-flash / qwen-turbo | Fastest response time. Suitable for chatbots. |
| Complex tasks / strongest capability | qwen3-max | Largest model. Best for complex reasoning. |
| Code generation / completion | qwen3-coder-next | Top recommendation. qwen3-coder-plus for highest quality, qwen3-coder-flash for speed. |
| Deep reasoning / math | qwq-plus | Chain-of-thought (CoT) reasoning. |
| Ultra-long document processing | qwen-long | 10M token context. |
| Agent / tool calling | qwen3.6-plus / qwen3.5-plus / qwen-plus | Most complete function calling support. |
| Machine translation | qwen-mt-plus | Best quality, 92 languages. qwen-mt-flash for speed, qwen-mt-lite for real-time chat. Uses translation_options parameter. |
| Role-playing / character dialog | qwen-plus-character-ja | Character restoration, empathetic dialog. |
| Region | base_url |
|--------|----------|
| Beijing (default) | https://dashscope.aliyuncs.com/compatible-mode/v1 |
from openai import OpenAI
import osclient = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
resp = client.chat.completions.create(
model="qwen3.6-plus",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"},
],
)
print(resp.choices[0].message.content)
stream = client.chat.completions.create(
model="qwen3.6-plus",
messages=[{"role": "user", "content": "Write a haiku."}],
stream=True,
stream_options={"include_usage": True},
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Workflow: Define tools → Model returns tool call instruction → Execute tool → Send result back → Get final answer.
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"],
},
},
}]resp = client.chat.completions.create(
model="qwen3.6-plus",
messages=[{"role": "user", "content": "What's the weather in Beijing?"}],
tools=tools,
)
resp.choices[0].message.tool_calls contains function name and arguments
Execute the function, then send result back with role="tool"
Supported models: Qwen-Max/Plus/Flash/Turbo, Qwen3.5/3 series, qwen3-vl-plus/flash, qwen3-omni-flash.
Model defaults apply: qwen3.6-plus, qwen3.5-plus and qwen3.5-flash have thinking mode enabled by default. For these models, do NOT set enable_thinking unless you want to override the default behavior.
For other models (qwen3-max, qwen-plus, qwen-turbo, etc.), thinking mode is off by default. Only enable when the user explicitly requests step-by-step reasoning:
For qwen3.6-plus/qwen3.5-plus/flash: thinking is ON by default, no need to set
resp = client.chat.completions.create(
model="qwen3.6-plus",
messages=[{"role": "user", "content": "Solve this problem."}],
)For other models: enable thinking only when user explicitly requests it
resp = client.chat.completions.create(
model="qwen3-max",
messages=[{"role": "user", "content": "Solve 17 × 23 step by step."}],
extra_body={"enable_thinking": True}, # Only for non-default models
)Script usage: add --enable-thinking flag to override defaults
python scripts/text.py --request '{"messages":[...]}' --enable-thinking
When to disable thinking for qwen3.6-plus/qwen3.5-plus/flash: Set enable_thinking: false for simple chat, real-time interaction, or when you want faster responses without extended reasoning.
| Parameter | Type | Description |
|-----------|------|-------------|
| model | string | Required. Model ID. |
| messages | array | Required. Conversation history. Format: {"role": "...", "content": "..."}. Roles: system, user, assistant. system can only appear at messages[0]. Last element must have user role. |
| temperature | float | Controls randomness. Range: [0, 2). Higher values produce more diverse output. |
| top_p | float | Nucleus sampling threshold. Range: (0, 1.0). |
| max_tokens | int | Maximum number of output tokens. |
| stream | bool | Enable streaming output. |
| tools | array | Tool definitions for function calling. |
| stop | string/array | Stop generation when specified string or token is about to be output. |
| Field | Description |
|-------|-------------|
| choices[0].message.content | Generated text. |
| choices[0].message.tool_calls | Tool call instructions (if applicable). |
| choices[0].finish_reason | stop = normal completion; length = max_tokens reached. |
| usage.prompt_tokens / completion_tokens | Token consumption. |
1. Prefer streaming. Non-streaming blocks until the full response is generated (10–60s+ for long outputs). Always use stream=True for interactive scenarios.
2. API keys are region-specific. Use the cn-beijing (Beijing) endpoint with your API key.
3. openai SDK version: Requires ≥1.55.0. Older versions conflict with httpx ≥0.28, causing a proxies TypeError.
4. Thinking mode varies by model. qwen3.6-plus, qwen3.5-plus and qwen3.5-flash have thinking mode enabled by default; other models have it off. Only override with enable_thinking when you want to change the default behavior.
5. Function calling constraints. tools cannot be used with stream=True (older limitation; some newer models support it). Also incompatible with n > 1.
6. messages format. system role can only appear at messages[0]. The last message must have the user role.
7. Some models have limited regional availability. qwen-long (10M context), qwen-math-plus, and third-party
models are not available in cn-beijing. Check
the Model List for the latest availability.
Q: How do I migrate from OpenAI?
A: Change three values: api_key to your DASHSCOPE_API_KEY, base_url to the corresponding regional endpoint, and model to a Qwen model name. All other code remains compatible.
Q: When should I use streaming vs. non-streaming?
A: Use streaming for interactive scenarios (chat, real-time output). Use non-streaming for batch processing or when you need the complete JSON response at once. With streaming, set stream_options={"include_usage": True} to receive token usage in the last chunk.
Q: Which models support function calling? A: Qwen-Max/Plus/Flash/Turbo series, Qwen3.5/3/2.5 series, qwen3-vl-plus/flash, qwen3-omni-flash, and third-party models (deepseek, kimi, glm).
Q: What is the difference between qwen3.6-plus and qwen3.5-plus?
A: qwen3.6-plus is the latest flagship model (2026-04-02) with 1M context and best balance of quality, speed, and cost. qwen3.5-plus is the previous generation. qwen3.6-plus is recommended for new projects.
Q: How do I control output length?
A: Use max_tokens to limit output token count. Use stop to set stop sequences. Each model has its own default output limit.
Q: What should I do when I get a 429 error? A: 429 indicates QPS/QPM rate limit exceeded or insufficient quota. Implement exponential backoff retry, or check remaining quota in the console.