Content validity: 2026-05 | Sources: Text-to-Image API · Image Generation Guide · Wan2.6-Image API · Z-Image-Turbo
Generate and edit images using Wan and Qwen-Image models. The Wan series excels at realistic photography and diverse artistic styles. The Qwen-Image series excels at rendering complex Chinese and English text (posters, layouts). wan2.7-image-pro / wan2.7-image are multi-function models supporting text-to-image, image editing (0–9 references), sequential multi-image, interactive editing, and thinking mode. wan2.6-t2i supports synchronous HTTP calls for text-to-image. wan2.6-image supports image editing, style transfer, subject consistency, and interleaved text-image output. Older models use asynchronous invocation (submit task → poll result).
| Scenario | Recommended Model | Notes |
|----------|------------------|-------|
| General creative / realistic photography | wan2.7-image-pro / wan2.7-image | Multi-function: t2i + editing, thinking mode, 4K (pro). |
| General creative (dedicated t2i) | wan2.6-t2i | Dedicated t2i model, best quality, supports synchronous calls. |
| Posters / complex text rendering | qwen-image-2.0-pro or wan2.6-t2i | Strongest Chinese/English text rendering. |
| Fast drafts / batch generation | wan2.2-t2i-flash | Lowest latency. |
| Custom resolutions | qwen-image-2.0 or Wan series | Flexible aspect ratios. |
| Image editing / style transfer | wan2.7-image-pro / wan2.7-image / wan2.6-image | wan2.7: 0–9 images, bbox editing, thinking mode. wan2.6: 1–4 images. |
| Text editing in images / element manipulation | qwen-image-edit-max or qwen-image-2.0-pro | Precise text modification, element add/delete/replace. |
| Subject consistency across images | wan2.7-image-pro / wan2.7-image / wan2.6-image | Maintain subject identity across generated images. |
| Multi-image composition | wan2.7-image-pro / wan2.7-image / wan2.6-image | Combine style from one image with background from another. |
| Sequential multi-image (same character/story) | wan2.7-image-pro / wan2.7-image | Coherent image sequences (1–12 images), same subject across scenes. |
| Interactive editing (region-based) | wan2.7-image-pro / wan2.7-image | Use bbox_list for precise region editing. |
| Multi-image fusion with text rendering | qwen-image-2.0-pro | 1–3 input images, text rendering, realistic textures. |
| Interleaved text-image output | wan2.6-image | Generate mixed text+image content (tutorials, guides). |
| Fixed-resolution batch text-to-image | qwen-image-plus | 5 fixed resolutions, async API, good for batch workflows. |
Returns the result in a single request. Suitable for most scenarios.
curl -sS 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "wan2.6-t2i",
"input": {
"messages": [{"role": "user", "content": [{"text": "A cozy flower shop with wooden door and flowers on display"}]}]
},
"parameters": {"size": "1280*1280", "n": 1, "prompt_extend": true}
}'
The image URL is at output.choices[0].message.content[0].image in the response.
from dashscope import ImageSynthesis
import dashscope, osdashscope.base_http_api_url = 'https://dashscope.aliyuncs.com/api/v1'
rsp = ImageSynthesis.call(
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen-image-plus",
prompt="A healing-style hand-drawn poster featuring three puppies playing with a ball on green grass",
n=1, size='1664*928', prompt_extend=True, watermark=False,
)
print(rsp.output.results[0].url) # Image URL, valid for 24 hours
| Parameter | Default | Description |
|-----------|---------|-------------|
| size | 12801280 | Resolution as widthheight. Total pixels must be between 1280×1280 and 1440×1440. Aspect ratio must be between 1:4 and 4:1. |
| n | 4 | Number of images to generate (1–4). Billed per image. Set to 1 for testing. |
| prompt_extend | true | LLM rewrites the prompt. Significantly improves results for short prompts but adds 3–4 seconds of latency. |
| negative_prompt | — | Content to exclude from the image. Max 500 characters. |
| watermark | false | Adds an "AI-generated" watermark in the lower-right corner. |
| seed | random | Fixed seed produces more consistent results (not guaranteed identical). Range: [0, 2147483647]. |
| Aspect Ratio | Size | |-------------|------| | 1:1 | 1280×1280 | | 4:3 | 1472×1104 | | 3:4 | 1104×1472 | | 16:9 | 1696×960 | | 9:16 | 960×1696 |
The wan2.6-image model operates in two modes controlled by the enable_interleave parameter:
- Image editing mode (enable_interleave=false, default): Takes 1–4 reference images + text prompt. Performs style transfer, subject consistency, composition, and image editing.
enable_interleave=true): Takes 0–1 reference image + text prompt. Generates mixed text and image content (e.g., tutorials, step-by-step guides).Same as other Wan models:
POST /api/v1/services/aigc/multimodal-generation/generation (editing mode only; interleave requires streaming or async)POST /api/v1/services/aigc/image-generation/generation with X-DashScope-Async: enableStyle transfer (single reference):
curl -sS 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "wan2.6-image",
"input": {
"messages": [{"role": "user", "content": [
{"text": "Convert this photo to a watercolor painting style"},
{"image": "https://img.alicdn.com/imgextra/i1/NotRealJustExample/photo.jpg"}
]}]
},
"parameters": {"size": "1K", "n": 1, "prompt_extend": true, "watermark": false, "enable_interleave": false}
}'
Multi-image composition (style from image 1 + background from image 2):
curl -sS 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "wan2.6-image",
"input": {
"messages": [{"role": "user", "content": [
{"text": "Generate a sunset scene based on the style of image 1 and the background of image 2"},
{"image": "https://img.alicdn.com/imgextra/i1/NotRealJustExample/style_ref.jpg"},
{"image": "https://img.alicdn.com/imgextra/i1/NotRealJustExample/background_ref.jpg"}
]}]
},
"parameters": {"size": "1K", "n": 1, "prompt_extend": true, "enable_interleave": false}
}'
Step 1: Submit task
curl -sS 'https://dashscope.aliyuncs.com/api/v1/services/aigc/image-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-Async: enable' \
-d '{
"model": "wan2.6-image",
"input": {
"messages": [{"role": "user", "content": [
{"text": "Give me a three-image tutorial for making latte art"}
]}]
},
"parameters": {"enable_interleave": true, "max_images": 3, "size": "1280*1280"}
}'Step 2: Poll with task_id (repeat every 10s)
curl -sS "https://dashscope.aliyuncs.com/api/v1/tasks/TASK_ID" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY"
The response contains interleaved {type: "text", text: "..."} and {type: "image", image: "URL"} content items in output.choices[0].message.content.
| Parameter | Default | Editing Mode | Interleave Mode | Description |
|-----------|---------|:---:|:---:|-------------|
| enable_interleave | false | false | true | Switches between image editing and interleaved output modes |
| size | 1K | 1K/2K or WH (total px [768², 2048²]) | WH (total px [768², 1280²]) | Output resolution |
| n | 4 (editing) / 1 (interleave) | 1–4 | Must be 1 | Number of output images. Billed per image. |
| max_images | 5 | — | 1–5 | Max images in interleaved output. Billed per image. |
| prompt_extend | true | Yes | — | Intelligent prompt rewriting (editing mode only) |
| negative_prompt | — | Yes | Yes | Content to exclude |
| watermark | false | Yes | Yes | "AI Generated" watermark |
| seed | random | Yes | Yes | Reproducibility seed [0, 2147483647] |
- Formats: JPEG, JPG, PNG (no alpha), BMP, WEBP
In editing mode (enable_interleave=false): use 1K (default, ~1280×1280) or 2K (~2048×2048), or specify pixel dimensions with total pixels in [768×768, 2048×2048].
In interleave mode (enable_interleave=true): use pixel dimensions with total pixels in [768×768, 1280×1280].
| Aspect Ratio | Size |
The wan2.7-image-pro and wan2.7-image models are multi-function models that support both text-to-image and image editing in a single model. No reference images are required for text-to-image, and up to 9 reference images are supported for editing.
Key capabilities:
enable_sequential=true): generate coherent image sequences (1–12 images)bbox_list for precise region-based editingwan2.7-image-pro vs wan2.7-image:
wan2.7-image-pro: supports 4K resolution for t2i, higher quality. $0.075/image (international)wan2.7-image: max 2K resolution, faster. $0.03/image (international)Same as wan2.6 series:
POST /api/v1/services/aigc/multimodal-generation/generationcurl -sS 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \curl -sS 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "wan2.7-image-pro",
"input": {
"messages": [{"role": "user", "content": [{"text": "A stray orange cat through four seasons: spring cherry blossoms, summer beach, autumn leaves, winter snow"}]}]
},
"parameters": {"size": "2K", "enable_sequential": true, "n": 4, "watermark": false}
}'
curl -sS 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "wan2.7-image-pro",
"input": {
"messages": [{"role": "user", "content": [
{"text": "Apply the graffiti pattern from image 2 onto the car body in image 1"},
{"image": "https://img.alicdn.com/imgextra/i1/NotRealJustExample/car.jpg"},
{"image": "https://img.alicdn.com/imgextra/i1/NotRealJustExample/graffiti.jpg"}
]}]
},
"parameters": {"size": "2K", "n": 1, "watermark": false}
}'
curl -sS 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "wan2.7-image-pro",
"input": {
"messages": [{"role": "user", "content": [
{"text": "Place the clock from image 1 at the marked location in image 2"},
{"image": "https://img.alicdn.com/imgextra/i1/NotRealJustExample/clock.jpg"},
{"image": "https://img.alicdn.com/imgextra/i1/NotRealJustExample/room.jpg"}
]}]
},
"parameters": {"size": "2K", "n": 1, "bbox_list": [[], [[989,515,1138,681]]], "watermark": false}
}'
| Parameter | Default | Description |
|-----------|---------|-------------|
| size | 2K | Resolution: 1K, 2K (default), 4K (pro only, t2i mode). Or pixel dimensions. |
| n | 4 (non-seq) / 12 (seq) | Number of images. Non-sequential: 1–4. Sequential: 1–12. Billed per image. |
| thinking_mode | true | Enhanced reasoning for better quality. Only for t2i (no images, non-sequential). Increases latency. |
| enable_sequential | false | Sequential multi-image mode: coherent image sequences (e.g., same character across scenes). |
| reference_images | — | 0–9 image URLs for editing. Not required for t2i. |
| bbox_list | — | Interactive editing regions. Format: [[[x1,y1,x2,y2],...], ...]. List length = image count. Empty [] for images without edits. |
| color_palette | — | Custom color theme (3–10 colors). Each: {"hex":"#C2D1E6","ratio":"23.51%"}. Sum of ratios = 100%. Non-sequential mode only. |
| negative_prompt | — | Content to exclude. Max 500 chars. |
| watermark | false | "AI Generated" watermark. |
| seed | random | Reproducibility seed [0, 2147483647]. |
- Formats: JPEG, JPG, PNG (no alpha), BMP, WEBP
The wan2.5-i2i-preview model provides image-to-image editing via a simpler prompt+images API. It preserves subject consistency during edits and supports multi-image fusion with up to 3 reference images.
Key differences from wan2.6-image:
- Uses input.prompt + input.images[] format (not messages)
/api/v1/services/aigc/image2image/image-synthesisoutput.results[].url format (vs choices format)cn-beijing) onlyAsync only:
- China (Beijing): POST https://dashscope.aliyuncs.com/api/v1/services/aigc/image2image/image-synthesis
curl -sS 'https://dashscope.aliyuncs.com/api/v1/services/aigc/image2image/image-synthesis' \curl -sS 'https://dashscope.aliyuncs.com/api/v1/services/aigc/image2image/image-synthesis' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-Async: enable' \
-d '{
"model": "wan2.5-i2i-preview",
"input": {
"prompt": "Place the alarm clock from Image 1 next to the vase on the dining table in Image 2",
"images": [
"https://img.alicdn.com/imgextra/i3/O1CN0157XGE51l6iL9441yX_!!6000000004770-49-tps-1104-1472.webp",
"https://img.alicdn.com/imgextra/i3/O1CN01SfG4J41UYn9WNt4X1_!!6000000002530-49-tps-1696-960.webp"
]
},
"parameters": {"prompt_extend": true, "n": 1}
}'
Then poll with GET /api/v1/tasks/{task_id}. Response contains output.results[].url.
| Parameter | Default | Description |
|-----------|---------|-------------|
| prompt | — | Required. Text instruction for the edit. |
| images | — | Required. Array of 1–3 image URLs. Array order = image numbering in prompt. |
| negative_prompt | — | Content to exclude. Max 500 chars. |
| size | auto (12801280 px) | Output resolution. Total pixels [768768, 1280*1280], aspect ratio [1:4, 4:1]. |
| n | 4 | Number of images (1–4). Billed per image. Set to 1 for testing. |
| prompt_extend | true | Smart prompt rewriting. |
| watermark | false | "AI Generated" watermark. |
| seed | random | Reproducibility. If n>1, images use seed, seed+1, seed+2, etc. |
- Formats: JPEG, JPG, PNG (alpha ignored), BMP, WEBP
The Qwen Image series consists of two sub-families with different API endpoints:
- Editing models (qwen-image-2.0-pro, qwen-image-2.0, qwen-image-edit-max/plus/edit): Use the same multimodal-generation/generation endpoint as wan2.6-image. Support image editing with 1–3 reference images, and qwen-image-2.0-pro/2.0 also support pure text-to-image.
qwen-image-plus, qwen-image-max): Use the text2image/image-synthesis endpoint (async-only). Fixed resolutions, input.prompt format.Endpoint: Same as wan2.6-image — POST /api/v1/services/aigc/multimodal-generation/generation
Payload format: messages with image + text content items (same as wan2.6-image).
curl -sS 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \| Parameter | Default | Description |
|-----------|---------|-------------|
| n | 1 | Output images: 1–6 (qwen-image-edit: 1 only). Billed per image. |
| size | 10241024 | width\height. Total pixels 512×512–2048×2048. |
| prompt_extend | true | Intelligent prompt rewriting. |
| watermark | false | "AI Generated" watermark. |
| negative_prompt | — | Content to exclude. |
| seed | random | Reproducibility seed [0, 2147483647]. |
Input image constraints: JPG, JPEG, PNG, BMP, TIFF, WEBP, GIF. 384–3072px per dimension. ≤10MB per image. Max 3 images.
Key differences from wan2.6-image:
n supports up to 6 (vs wan2.6-image's 4)enable_interleaveqwen-image-2.0-pro and qwen-image-2.0 can also do pure text-to-image (text-only message, no reference images)Endpoint: POST /api/v1/services/aigc/text2image/image-synthesis (async-only)
Payload format: input.prompt (NOT messages format).
curl -sS -X POST 'https://dashscope.aliyuncs.com/api/v1/services/aigc/text2image/image-synthesis' \Then poll with GET /api/v1/tasks/{task_id}. Response contains output.results[].url.
| Parameter | Default | Description |
|-----------|---------|-------------|
| n | 1 | Output images (1–4). Billed per image. |
| size | 1328*1328 | Fixed resolutions only (see below). |
| prompt_extend | true | Intelligent prompt rewriting. |
| watermark | false | "AI Generated" watermark. |
| negative_prompt | — | Content to exclude. |
| seed | random | Reproducibility seed. |
Fixed resolutions for qwen-image-plus/max:
| Aspect Ratio | Size | |-------------|------| | 16:9 | 1664×928 | | 4:3 | 1472×1104 | | 1:1 | 1328×1328 | | 3:4 | 1104×1472 | | 9:16 | 928×1664 |
1. Image URLs are valid for only 24 hours. Download and save images immediately after generation.
2. Cost = unit price × number of images. The n and max_images parameters directly affect cost. Always set n=1 during testing.
3. prompt_extend trade-off. Significantly improves short prompts, but adds 3–4s latency and may drift from original intent. Set prompt_extend=false when you need precise control over composition.
4. Synchronous vs. asynchronous. wan2.6-t2i, wan2.6-image (editing mode), and qwen-image-edit series support synchronous calls. Interleaved text-image sync requires streaming; use async mode. wan2.5 and earlier models use async only. qwen-image-plus and qwen-image-max use async text2image endpoint only.
5. Prompt length limit. Supports both Chinese and English. Maximum 2,000 characters for wan2.6-image, 2,100 characters for wan2.6-t2i; excess is automatically truncated.
6. Region isolation. API key, endpoint, and model must belong to the same region. Cross-region calls result in authentication failures.
7. Async task_id is valid for 24 hours. Do not create duplicate tasks; use polling to retrieve the result.
Q: How do I choose between wan2.6-t2i and qwen-image-2.0-pro? A: Use wan2.6-t2i for realistic photography and diverse artistic styles. Use qwen-image-2.0-pro for complex text rendering tasks (posters, PPT illustrations, coupons). Both handle text well, but Qwen-Image is stronger for complex layouts.
Q: When should I use wan2.6-image vs wan2.6-t2i?
A: Always use wan2.6-t2i for pure text-to-image (prompt only, no reference images). wan2.6-image is an image editing model — it requires either reference_images (1–4 images for style transfer, subject consistency, editing) or enable_interleave: true (for interleaved text-image output). Using wan2.6-image without either will error or auto-fallback to wan2.6-t2i.
Q: How do I get more consistent results?
A: Use the seed parameter to fix the random seed. Disable prompt_extend to prevent the LLM from rewriting and drifting from your intent. Use negative_prompt to exclude unwanted elements.
Q: When should I use wan2.7-image-pro vs wan2.6-t2i for text-to-image?
A: wan2.7-image-pro is a multi-function model — it supports both t2i and image editing in one model, with thinking mode for higher quality and 4K support. Use it when you want the highest quality or may later need editing. wan2.6-t2i is a dedicated t2i model — slightly faster for simple text-to-image tasks since it doesn't carry editing overhead.
Q: What is sequential multi-image mode?
A: Set enable_sequential=true with wan2.7-image-pro or wan2.7-image to generate coherent image sequences (1–12 images) with the same subject across different scenes. Useful for storyboards, character sheets, or seasonal series. Note: thinking_mode is disabled in sequential mode.
Q: Does the API support image-to-image / reference images?
A: Yes. wan2.7-image-pro / wan2.7-image support 0–9 reference images with advanced features (bbox editing, sequential mode). wan2.6-image supports 1–4 reference images for style transfer, subject consistency, and interleaved output. qwen-image-edit series supports 1–3 reference images. Use the reference_images field in the script (URLs or local paths; local files are auto-uploaded). For multi-image composition, reference images by order in the prompt: "the style of image 1 and the background of image 2".
Q: How does interleaved text-image output work?
A: Set enable_interleave=true with wan2.6-image. The model generates mixed text and image content. Use async mode (the script auto-enables it). The response contains interleaved text and image items in output.choices[0].message.content. The script saves images and a markdown file. Note: qwen-image-edit series does not support interleaved output.
Q: How do I optimize costs for batch generation?
A: Set n=1 to generate and evaluate one image at a time. Increase n after confirming quality. wan2.2-t2i-flash has the lowest per-image price and is suitable for batch testing.
Q: When to use qwen-image-plus vs qwen-image-2.0-pro for text-to-image?
A: qwen-image-plus uses the text2image endpoint with fixed resolutions — good for batch workflows with standard aspect ratios. qwen-image-2.0-pro uses the multimodal-generation endpoint with flexible resolutions and can also do image editing. Use qwen-image-plus for simple text-to-image; use qwen-image-2.0-pro when you need text rendering precision or image editing.
Q: What's the difference between qwen-image-edit series and wan2.6-image?
A: Both use the multimodal-generation endpoint with messages format. Key differences: qwen-image-edit supports max 3 input images (vs wan's 4), n up to 6 (vs wan's 4), and does not support interleaved output. qwen-image-edit excels at precise text editing in images and element manipulation.
z-image-turbo is an open-source 6B-parameter text-to-image model with 8-step inference, hosted on the QianWen platform. It targets fast, low-cost text-to-image generation. The model has stricter payload constraints than the Wan/Qwen-Image series:
- Sync-only — uses the standard multimodal-generation sync endpoint; async invocation is not supported.
messages[0].content must contain exactly one {text} item.n parameter — multi-image output is not supported (the server returns 400 if n is provided).size, prompt_extend, seed.POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation| Parameter | Default | Description |
|------------------|---------|----------------------------------------------------------------------------|
| size | 10241024 | Output resolution as widthheight. Common: 10241024, 1280720, 720*1280. |
| prompt_extend | true | Auto-rewrite the prompt for better quality (adds 2–4s latency). |
| seed | random | Reproducibility seed [0, 2147483647]. |
[!IMPORTANT]
Do not includen,negative_prompt,watermark,reference_images,reference_image,enable_interleave,enable_sequential,thinking_mode,bbox_list, orcolor_palette— none of these are supported by z-image-turbo and may cause 400 errors or be silently ignored.
curl (sync)
curl -sS 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "z-image-turbo",
"input": {
"messages": [{"role": "user", "content": [{"text": "A cyberpunk street at night, neon signs, rain"}]}]
},
"parameters": {"size": "1024*1024", "prompt_extend": true}
}'
The image URL is at output.choices[0].message.content[0].image in the response.
Python (urllib, sync)
import json, os
from urllib import requestreq = request.Request(
"https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation",
data=json.dumps({
"model": "z-image-turbo",
"input": {"messages": [{"role": "user", "content": [{"text": "A cyberpunk street at night"}]}]},
"parameters": {"size": "1024*1024", "prompt_extend": True},
}).encode("utf-8"),
headers={
"Authorization": f"Bearer {os.environ['DASHSCOPE_API_KEY']}",
"Content-Type": "application/json",
},
method="POST",
)
with request.urlopen(req, timeout=120) as resp:
data = json.loads(resp.read())
print(data["output"]["choices"][0]["message"]["content"][0]["image"])
The script auto-routes z-image-turbo to the dedicated payload builder and forces sync mode:
python3 scripts/image.py \
--model z-image-turbo \
--request '{"prompt":"A cyberpunk street at night","size":"1024*1024"}' \
--output output/z-image/output.png
If reference_images or n>1 is passed, the script logs a warning and silently drops them before calling the API.