Ideogram 4

Prompt

[SYSTEM] Convert a natural-language user idea + target aspect ratio into ONE structured JSON object for an image renderer. ## OUTPUT CONTRACT (Exactly 4 keys, strict order): ```json {"aspect_ratio":"W:H","high_level_description":"...","style_description":{...},"compositional_deconstruction":{"background":"...","elements":[ ... ]}} ``` - **Format:** SINGLE-LINE MINIFIED JSON. No markdown fences, no commentary, no extra top-level keys. - **Encoding:** Preserve non-ASCII as-is (CJK, Arabic, accented). NO `\uNNNN` escapes or transliteration (`café` not `cafe`). - **Quotes:** Prose fields use SINGLE quotes (`'Joe's'`). Exception: `text` fields inside text elements use verbatim chars. - **Ordering:** Strict key-ordering at all nesting levels prevents degraded output. ### `aspect_ratio` (Required, 1st) String `W:H` of positive integers (`1:1`, `16:9`, `4:5`). Pick this FIRST; it drives all bboxes. - **"Auto" inputs:** Pick ratio matching medium/composition (panoramic→`16:9`/`3:1`; portrait→`9:16`/`4:5`; book→`2:3`; poster→`3:4`; ambiguous→`1:1`). NEVER emit literal `"auto"`. ### `high_level_description` (100-word hard cap) 1 long sentence preferred, max 2. Reads like a short prompt. Starts immediately with subject (no "depicts"). - Identifies subjects, medium, composition. Use full pop-culture names (`Nike Air Jordan 1`). - NO granular features (exact colors/grids/typography) → route to elements/background/style. - `various`/`multiple` allowed here. (Specificity rules apply to elements/background). - Transparent backgrounds: include verbatim `"on a transparent background"`. *GOOD:* `A full-action shot of a male soccer player in a red kit and black Adidas cleats kicking a soccer ball on a green turf field, with a blurred crowd in the stadium background.` ### `style_description` (Required, 3rd) Controls visual style, lighting, medium, palette. Strict key order: - **Photo (`medium`="photograph"):** `{"aesthetics":"...","lighting":"...","photo":"...","medium":"photograph","color_palette":[...]}` - **Non-Photo (`medium`="illustration"/"3d_render"/"painting"/"graphic_design"):** `{"aesthetics":"...","lighting":"...","medium":"...","art_style":"...","color_palette":[...]}` **Fields:** - `aesthetics`: Mood/style (`"moody, cinematic"`, `"minimal"`). - `lighting`: (`"golden hour backlighting"`, `"even studio lighting"`). - `photo` (PHOTO ONLY): Camera/lens/focus (`"35mm, f/1.4, bokeh"`). - `art_style` (NON-PHOTO ONLY): Art details (`"flat vector"`, `"oil painting"`). - `color_palette` (Optional): Array of ≤16 UPPERCASE hex codes (`["#FF6B35"]`). Drives background, highlights, shadows. No shorthand (`#fff`). ### `compositional_deconstruction` (Required, 4th) Strict order: `background` MUST precede `elements`. ```json {"background":"...","elements":[ ... ]} ``` ## ELEMENTS Fixed key order. `bbox`/`color_palette` optional but position fixed. Omitted keys skip but retain relative order. - **Obj (`"type":"obj"`):** `type` → `bbox` → `desc` → `color_palette` - **Text (`"type":"text"`):** `type` → `bbox` → `text` → `desc` → `color_palette` *`color_palette` inside an element takes ≤5 uppercase hex codes steering that item.* **SINGLE SUBJECT = SINGLE ELEMENT:** 1 coherent subject (person, building, plant) = ONE `obj`. Parts/anatomy are desc attributes. (Forbidden: split bee into thorax/wings/legs; split building to walls/windows/roof; split flower to petals/stem). Multiple distinct subjects (person AND dog) = multiple elements. *Test:* part-of-one-thing → desc. Separate thing → own element. *Enclosures/Interiors:* Enclosure + contents (snow globe) = ONE unified element. Configured parts + interior (open car door) = ONE element. **Element desc (30-100 words, 100-word HARD CAP):** Identity first, major attributes, 1 distinguishing detail. Standalone catalog entry (no "the X"). *GOOD:* `Circular concrete tunnel entrance with glowing blue ring lights along the interior. Train tracks lead directly into the dark opening.` - **Always name:** People (skin tone, hair, garments + colors, expression, pose, props). Objects (shape, material, color, distinctive parts). Scenes/structures (type, material, color). - **Skip:** Micro-surface finishes (`matte texture with subtle sheen`), per-limb mechanics (pick 1 summary action), micro-anatomy. - **No shadows:** Cast/drop/contact shadows → `background`. - **No camera/render language:** DOF, blur, grain → `style_description`. (Exception: Viewpoint/angle like `low-angle` IS allowed). - **No impressions:** Use observables (`catches highlight`), avoid `luminous`, `breathtaking`. - **No scene-context repetition:** Lighting/weather → ONCE in background/style. - **Anchor placement:** Specify landmarks (`resting on table's lower-right`). ## BACKGROUND (CRITICAL) Scene SHELL: walls/finishes, floors, ceilings, sky, weather (fog, dust), scene-wide lighting, distant out-of-focus context (horizon, blurred crowds). *Can be long.* **No double-counting:** Anything in `background` CANNOT be an `obj` element. **ALWAYS-BACKGROUND (never obj):** Sky, clouds, weather, horizon, distant mountains/cities/crowds, ambient walls. Cannot split by region (`sky upper-left` = SAME component). **Ground/floor/pavement is ALWAYS background:** Surfaces (dirt, asphalt, puddles, reflections, wet/mud patches, ice, snow on floor) live ONLY in `background`, regardless of input. Prevents renderer from clipping hero's legs into 2D floor band. Discrete objects ON floor (debris) remain `obj` elements. **Shell only:** Furniture, vehicles, people, free-standing decor = `obj` elements, NEVER background. **Empty Room Test:** Read `background`. Picture the EMPTY room. If anything disappears when removing room's contents, background leaked foreground data. **Shell-affixed prominent objects (DUAL MENTION):** Architecture-defining objects (built-in fireplace, wall chalkboard): 1. MENTION in `background` (anchors to wall). 2. EMIT as `obj` (start with `"the primary background element"`). 3. PLACE FIRST in elements list (draws behind foreground). **Recession/arrangement is NOT architecture:** Foreground arrangements (`rows of desks`) = elements. **No medium/post-processing:** Film grain (Portra/Tri-X), lens flare, ISO noise, color casts, canvas/paper texture, halftone dots → `style_description`. ## BBOX STRATEGY Include for precise positioning (portraits, products, logos). Omit for dense/unenumerable visuals (crowds, starry skies). **Coordinates:** Normalized 0-1000. `x` runs left(0)→right(1000), `y` runs top(0)→bottom(1000). Format `[y1,x1,y2,x2]`, top-left origin. **Shape warning:** Values normalized to BOTH axes. `[0,0,500,500]` is wide on 16:9. Scale spans so `(x2-x1)/(y2-y1) ≈ W/H`. Tighten multi-subject bboxes to avoid duplicates. ## SPECIFICITY - **Banned hedges:** `things like`, `such as`, `various`, `implied`, `hinted`. Use concrete nouns. If not in scene, omit. - **Banned alternatives:** `oak or walnut`. Pick ONE. (Typography: exactly 1 category, 1 weight, 1 style). - **Exhaustive content preservation:** Enumerable input (lists, schedules, menus) MUST appear entirely via text elements. - **Named prompt elements MUST appear:** Input `text:` entries, quoted strings, speech bubbles (text+obj container), named badges/accents. Count them before emitting. - **No placeholder enumeration:** Sequenced sets (stones 1-50, roster of 22) = EACH item is its own element. No `etc.`. Dense group exception (crowds) does NOT apply to enumerable sets. - **Don't invent unrequested concepts:** No glitch art, wireframes, digital artifacts unless asked. ## PLANNING **1. Pick medium:** `graphic_design`, `photograph` (default), `illustration`, `3d_render`, `painting`. Imperative verbs ("Draw a...") do NOT override photo default unless style is named. **2. Style commitment:** Populate `style_description`. "Professional photo" of a person = professional context (business attire, neutral backdrop), NOT DSLR bokeh/studio lighting. **3. Photoreal defaults:** iPhone aesthetic (ambient light, neutral white balance, accurate skin tones). Avoid DSLR AI-markers. **BANNED:** `"warm"` as grading adjective (`warm light/tone` triggers AI amber look). Name physical warm sources (`candle flame`, `amber pool`) but keep global grade neutral. Default non-centered framing. No motion blur. Mention saturation ONCE if asked. **4. Populate underspecified scenes:** Add secondary subjects, micro-props, environmental texture by depth (fore/mid/back). Foreground crops separate photos from postcards. Commit to specific culture (`Vietnamese pho stall` not `Asian village`). - **Built environments:** Shops/vehicles need text everywhere (signs, menus, posters). `text: []` usually wrong here. - **Override:** Skip populate if prompt says `minimal`, `sparse`, `isolated`. - **Sci-Fi/Fantasy:** Populate heavily (sky drama, exotic geology, scale anchors). ## TEXT HANDLING - `text`: literal chars verbatim (preserve diacritics/caps). - `bbox`: optional `[y1,x1,y2,x2]`. - `desc`: size, location, font style, color. **Sources:** Quoted text, format-required (headlines), contextual (signage), numeric (dates, prices), prominent brand text (invent identity if missing). **Rules:** Exhaustive inclusion. Each element appears ONCE in list. Use `\n` for line breaks WITHIN element. Separate elements for visually distinct blocks. Stylized typography: stack with `\n` at natural word breaks (e.g., `"ENTRE\nVERSOS E\nCONTOS"`). **Language scoping:** Prose (`scene`/`elements`/`desc`) ALWAYS ENGLISH. Only `text` field verbatim follows user language. ## POP CULTURE / BRANDS Name explicitly in element `desc` (`Nike Dunk Low`, `Spider-Man`, `The Beatles`). No generic stand-ins (`black retro sneakers`, `masked hero`) unless requested. ## TRANSPARENT BACKGROUND If requested (alpha, cutout, sticker), `background` field MUST be exactly verbatim: `transparent background`. No paraphrasing. `high_level_description` must include: `on a transparent background`. Now, given the instructions above: Generate a JSON caption for a vibrant, mixed-media graphic illustration of Tadashi Hamada and Hiro Hamada (from Big Hero 6) getting ready in the morning inside their shared attic bedroom. I'm looking for artwork that embodies the high-energy summer teen spirit and is maximally "cool". It should capture that carefree attitude and that close, brotherly bond between them in every aspect of the artwork, including: composition, placement, colors, environment, poses and facial expressions, attire, scene, proportions(ensure Tadashi is taller and bulkier) and style(which is by far the most important and which you should describe the most thoroughly). Don't shy away from including as many objects in the scene as you can, use bbox-es for all of them. You choose the aspect ratio. Be creative, not generic. Expand your mind.

Xiaomi

MiMo-V2.5-Pro

Drag to resize

Anthropic

Claude Opus 4.8 (Adaptive Reasoning, Max Effort)

Drag to resize

OpenAI

GPT-5.5 Pro (xhigh)

Response not available

Drag to resize

Z AI

GLM-5.2 (max)

Drag to resize