Techniques for crafting effective visual query prompts. If the user provides a specific query, use it as-is — suggest enhancements only.
| Task | Template |
|------|----------|
| General description | Describe this image in detail: subjects, environment, lighting, mood, and any text visible. |
| Structured extraction | Extract all information from this [document/receipt/form]. Return as JSON: {"field": "value", ...}. Use null for unclear fields. |
| Chart analysis | Analyze this chart step by step: 1) chart type, 2) axes/labels, 3) key data points, 4) trends, 5) conclusions. Cite specific numbers. |
| Multi-image comparison | Compare these N images across: [composition, color, subject, quality]. Present as a comparison table. |
| Visual reasoning | Solve step by step. Show your complete reasoning before the final answer. |
| OCR | Extract all text from this image. Return as [plain text / markdown table / JSON]. |
| Document parsing | Parse this document into structured data following this schema: [schema]. Preserve original language. |
| Video summary | Summarize this video: key events in chronological order, main subjects, and overall narrative. |
| Scenario | enable_thinking | Model |
|----------|:-:|-------|
| Simple "what is this?" | false | qwen3-vl-flash |
| Detailed analysis | true (default) | qwen3.6-plus |
| Math / logic from images | always on | qvq-max (streaming) |
| OCR extraction | N/A | qwen-vl-ocr |
| Precise localization / 3D | optional | qwen3-vl-plus |
Be specific about dimensions: Instead of "describe this image", specify what aspects matter:
Describe this product photo focusing on: material and texture, color accuracy,
packaging design quality, and brand visibility. Rate each aspect 1-5.
Request structured output: For downstream processing, specify the format:
Analyze this screenshot. Return JSON:
{"app_name": "", "ui_elements": [], "layout_issues": [], "accessibility_score": 0-10}
Multi-image with roles: When comparing, assign roles:
Image 1 is the original design. Image 2 is the implemented version.
List all visual discrepancies between design and implementation.