Table of Contents
- The short answer: how to create a multi-character scene
- Why single-prompt generation fails with 3+ characters
- Step 1: Generate each character on a plain background
- Step 2: Put each character in the action your scene needs
- Step 3: Generate a matching background
- Step 4: Composite everything in Canvas
- Step 5: Add speech bubbles and narrative text
- When to use Canvas vs. Story Scene Pro feature?
- FAQ
- How many characters can you put in one Neolemon Canvas scene?
- Do the characters stay consistent if I edit them in Canvas later?
- Can I use this workflow for KDP or print publishing?
- What's the difference between speech bubbles and thought clouds in storytelling?
- Do I need to generate a new background for every scene?
- Ready to build your first scene?
Do not index
Canonical URL
This guide walks you through how to build multi-character story scenes with three, four, or more characters who all stay consistent, on a background you control, with speech bubbles and narrative text. The trick is to stop trying to generate the whole scene in one prompt and instead build it in layers, the way an animator would.
We'll cover the four-stage workflow: generate each character, give them the right action, generate a matching background, and composite everything in Canvas. By the end you'll have a finished story scene that looks like one artist made it.
Watch the complete tutorial video here:

The short answer: how to create a multi-character scene
To create a multi-character story scene with consistent AI cartoon characters, generate each character separately in Character Turbo (full body front view, plain white background, same illustration style for all), use the Action Editor to put each one in the pose your scene needs, generate a matching background in the same style, then composite everything in Canvas where you can position, resize, and add speech bubbles. This layered workflow handles 3+ characters reliably, which is where most single-prompt AI generators break.

Why single-prompt generation fails with 3+ characters
When you ask an AI to render four named characters interacting in one shot, you're asking it to do four hard things at once: keep four faces consistent, give them four distinct actions, place them correctly in space, and hold the style. Even the best models split their attention and start blending features, swapping outfits, or simplifying everyone into "generic cartoon kid."
This is the moment most people give up on AI illustration for their book or comic. The fix isn't a better prompt. The fix is to stop asking the AI to do all four things in one pass. We go deeper on why this happens in our guide on keeping multiple characters consistent in storybooks with AI, which covers the AI-generation approach for two characters. The Canvas workflow you're about to learn is for the next step up: scenes with three or more characters where you need exact positioning and speech bubbles.
Step 1: Generate each character on a plain background
Open Character Turbo and create each character individually, in its own session. Keep these settings the same for every character so they read as belonging to the same world:
- Action / expression: Full body, front view, neutral pose. This is your anchor image, not a story shot.
- Background: Plain white. You'll add the real background later.
- Aspect ratio: 1:1 square. Easy to work with in Canvas.
- Style: Pick one and lock it in. Pixar-inspired, watercolor, anime, chibi, modern western cartoon, whatever fits your story. The 12+ styles available are walked through in our children's book illustration styles guide if you're not sure which to pick.
Now write a short, specific description for each character. Keep it to 2 or 3 memorable features:
"Curly red hair girl in a ponytail, athletic tracksuit, white sneakers with lightning bolts."
"Boy, spiky black hair, yellow sports tracksuit, white running shoes."
"African American boy with curly brown hair, red athletic tracksuit, white sneakers."
"Asian girl, curly brown hair, purple athletic tracksuit, white sneakers."
Notice the pattern. Each description has the same anatomy: hair, defining outfit element, footwear. That structural symmetry is what makes them feel like a cast instead of four unrelated characters dropped into a frame. For more on writing character descriptions that hold up, see our step-by-step guide to consistent cartoon characters.
Hit Generate. Repeat for each character. Don't move on until you have all of them.
Step 2: Put each character in the action your scene needs
This is where most AI cartoon scenes start to look stiff. Generic characters standing in plain poses make for a flat story. Use the Edit Action button on each character to put them in the pose that matches your scene.
For a racetrack scene, the action prompt looks like this:
"Side view standing, running fast, mid-stride, both arms pumping, legs stretched."
Generate. The character will stay identical (same face, same outfit, same hair) but now they're running, side view, ready to drop into a side-facing background. Do this for every character.
A few things that matter here:
- Match the camera angle across characters. If your background is a side view, every character needs to be in side view too. Mixing front-facing characters into a side-view background instantly looks composited.
- Match the energy. All four runners need to look like they're running. One standing still in a running scene reads as broken.
- Use the original character as the reference, not the previous action shot. Each new action should start from the clean anchor image. This prevents the small drift that can creep in across edits. The same principle applies when you're building a full story sequence from one character.
Once you're happy with each action shot, hover over the image and click Transparent Background. Download each character as a PNG. You now have four cut-out characters ready to drop onto a scene.

Step 3: Generate a matching background
Go to Background Generation and describe the scene:
"Athletic racetrack, four lanes, side view, blue sky."
Set the aspect ratio to 16:9 landscape (this will be your page spread or screen-friendly scene), and crucially, pick the same style as your characters. If your characters are Pixar-inspired 3D, your background needs to be Pixar-inspired 3D too. Style mismatch is the fastest way to make a composite look fake.
Generate. Download. If you're planning multiple scenes for the same story (the backyard scene, the classroom scene, the finish line celebration), generate them all now while the style is fresh in your mind.
Step 4: Composite everything in Canvas
This is where the scene actually comes together. Open Canvas from the homepage. Pick a canvas size that matches your background aspect ratio (16:9 presentation works for most landscape story scenes; for a children's book page you'd pick a custom size matching your trim).
The workflow:
- Upload the background image. Hover over it and click Set as background. Now it fills the canvas perfectly without you needing to stretch it.
- Upload each character PNG. They drop in as separate layers you can move, resize, and rotate.
- Position each character. For the racetrack, that means one runner per lane, all facing the same direction, with the leader furthest right. Resize so the proportions feel real (smaller characters further back read as more distant).
- Use the top toolbar as needed. Flip horizontal if a character is running the wrong way. Add a soft shadow if a character looks too "pasted on." Send to back or bring to front to fix layering when characters overlap.
A few practical tips from doing this a lot:
- If you forgot to remove the background on a character, Canvas has a background remover built into the toolbar. You don't need to go back to Character Turbo.
- Duplicate is your friend for crowd scenes. One generated character can become five townspeople in a market scene with small position and flip variations.
- Layer order matters in overlapping scenes. A character in the front lane should be on top of the character in the back lane.
Step 5: Add speech bubbles and narrative text
This is the part that turns a still illustration into a story panel. Canvas includes speech bubbles, thought clouds, and narrative blobs.
- Speech bubbles for what a character is saying out loud. Place them above or beside the character's head, pointing toward them.
- Thought clouds for internal monologue. The cloud shape signals "this is what they're thinking" without any extra explanation.
- Narrative blobs or text boxes for the narrator's voice. Use these to set the scene ("It was the last lap. Maya was ahead, and she knew it.").
Keep speech short. One or two sentences per bubble is the sweet spot. A bubble crammed with a paragraph reads as text-heavy and breaks the visual flow. For a racetrack scene, "I'm too fast" works better than "I can't believe how fast I'm running, this is amazing."
You can change font, alignment, size, and color in the text toolbar. For children's book work, stick to one or two readable fonts across all your scenes for visual consistency, the same way you stick to one illustration style.
Export when you're done. Then clear the canvas and start the next scene. Because all your characters and backgrounds are reusable assets, the second scene takes a fraction of the time of the first.
When to use Canvas vs. Story Scene Pro feature?
Honest answer: both have a place.
Approach | Use when | Drawbacks |
Story Scene Pro (AI generates the scene with two or three characters together with uploaded background) | You have 2 or 3 characters interacting closely in the provided background (holding hands, hugging, fighting), and you want the AI to handle natural lighting and shadows between them. | Each generation is standalone, so character proportions and small details can shift between scenes. No continuity from scene 1 to scene 2. Capped at 2 or 3 characters in one frame. No built-in speech bubbles. |
Canvas compositing (this guide) | You have 2+ characters, you need exact positioning, you want speech bubbles, or the scene is more "stage with separate actors" than "intimate interaction." | More manual steps upfront: generate characters and backgrounds separately, then compose. You handle composition decisions yourself. Inter-character lighting won't blend as naturally as a single AI generation because each character is a layered image. |
Most children's book pages are actually the Canvas case. You have a clear background, characters placed deliberately around it, and dialogue. Comic-style panels are Canvas. Group scenes (classrooms, races, parties) are Canvas. Two characters making eye contact across a table is Story Scene Pro. The trade-off with Story Scene Pro is that every generation is a fresh roll: proportions, expressions, and small character details can shift between scenes, and there's no continuity from scene 1 to scene 2 the way there is when you reuse the same character PNGs in Canvas. That's why Canvas wins the moment your project is a multi-page book or comic where each scene has to feel like it belongs with the others.
If you're building a full book project end to end, our 7-day children's book workflow covers how to plan your scenes before you start generating, which saves real time when you're combining both workflows across 20+ pages.
FAQ
How many characters can you put in one Neolemon Canvas scene?
There's no hard cap. The practical limit is composition, not the tool. Four to six clearly distinct characters works well for a story page. Beyond that, the eye starts to lose track of who's who, and the speech bubbles get crowded. For crowd scenes, generate a few characters and duplicate them with position and flip variations.
Do the characters stay consistent if I edit them in Canvas later?
Canvas treats each character as an image layer, so visual consistency is locked in once you've generated them. What you adjust in Canvas is position, size, rotation, flip, and shadow. The character's face, hair, outfit, and style don't change because Canvas isn't regenerating them.
Can I use this workflow for KDP or print publishing?
Yes. Set your Canvas size to your book's trim dimensions and use Neolemon's upscaler on your final exports to hit print-ready 300 DPI quality. The Canvas workflow is actually more print-friendly than single-prompt generation because you control the composition exactly, which matters when your art bleeds to the page edge.
What's the difference between speech bubbles and thought clouds in storytelling?
Speech bubbles show dialogue (what a character says out loud). Thought clouds (the puffy cloud-shaped variant) show internal monologue (what a character is thinking but not saying). Using both in the same scene lets you show what characters say versus what they actually mean, which is a basic storytelling tool kids' books often underuse.
Do I need to generate a new background for every scene?
Not always. If multiple pages of your story happen in the same setting, reuse the same background and change only the character positions and dialogue. This is exactly how animation studios reuse background plates across dozens of shots, and it's a fast way to make a book feel visually coherent.
Ready to build your first scene?
The hardest part of multi-character storytelling isn't the drawing. It's keeping every character looking like themselves from page to page. Once that's solved, you're just directing.
Start with Neolemon's free trial, 20 credits, no card required. That's enough to generate three characters, two backgrounds, and composite your first multi-character scene in Canvas. If it works for one scene, the workflow scales to a whole book.
