Wan 2.7 Hands-On Review: The Ultimate Performance Test
Table of Contents
"Every few months, a model ships that reshapes how you think about AI video. Wan 2.7 is the first one in 2026 that made me reconsider a workflow I had already settled into."
That's not a compliment handed out easily. After Sora's abrupt shutdown on March 24 left an enormous gap in the market, a lot of tools rushed to fill the vacuum with marketing rather than substance. Wan 2.7 — released publicly by Alibaba's Tongyi Lab in late March 2026 — arrived in that same frenzied window. The noise was considerable. The question worth answering: does the product justify it?
The short answer is: mostly yes, with specific caveats you need to understand before you commit a workflow to it.
Wan 2.7 Visual Detail - Cinematic Mode
Background: What Wan 2.7 Actually Is
Wan 2.7 is the latest iteration of Alibaba Cloud's Tongyi Wanxiang video generation series — a model family that began gaining serious traction with Wan 2.1 and 2.2, both of which shipped open-source under Apache 2.0, earned ComfyUI integration within days of release, and became staples in independent creators' local pipelines. Whether 2.7 will follow that open-weight path has not been officially confirmed at time of writing, though Alibaba's historical pattern suggests open weights within four to eight weeks of cloud launch. If you're planning self-hosted deployment, watch the official Wan-Video GitHub closely.
The core of the model is a Diffusion Transformer (DiT) architecture with Full Attention processing. In practical terms: the model evaluates spatial and temporal relationships across an entire sequence simultaneously, rather than predicting frame by frame. This is why character identity in Wan outputs tends to remain stable over longer clips where older diffusion-based models would drift. Wan 2.7 keeps this foundation and expands what the model can accept in a single generation call — which is ultimately the story of this release.
Baseline specs:
- Output1080p, up to 15 seconds
- ModesT2V, I2V, First/Last Frame
- Modes (cont.)9-Grid I2V, Subject+Voice Ref
- AccessDashScope, WaveSpeed, fal.ai
- PricingVaries by platform (see below)
What's New: Five Structural Improvements
Before getting into the new features, it's worth noting where the foundation itself improved. Because if 2.6 already met your needs, the question of whether 2.7 is worth switching to rests partly on these baseline changes.
1. Visual Fidelity
Skin textures, fabric movement, and lighting gradients all land closer to commercial 1080p standards in 2.7. The earlier versions occasionally produced what creators in the community called "video game CGI" — particularly on human subjects, where subsurface light scattering looked wrong and hair moved as a single mass. Most of that is addressed. Close-up portrait shots now hold up well without additional upscaling.
That said, if you're evaluating against Sora 2 Pro or Veo 3.1, raw photorealism is still not where Wan's competitive edge sits. Those models are more expensive for a reason.
2. Motion Coherence
Fast movement has historically been the most common failure mode for Wan. Characters drift, limbs detach from their physical context, and physics break in subtle ways that make footage unusable without extensive cleanup. Wan 2.7 reduces this failure rate measurably.
Action sequences — running, turning, cloth movement, camera tracking shots — are more physically grounded. Not perfect, but workable without feeling like you're fighting the model's physics engine.
3. Native Audio Integration
Background music, ambient sound, and synchronized character dialogue are generated alongside the video in a single pass. For anyone who has spent time manually matching audio to AI-generated footage frame by frame, this is the most immediately practical change in the release.
Caveat: for professional audio quality — particularly dialogue in complex scenes or content requiring specific voice direction — Veo 3.1 and Kling 3.0 still have the edge. Wan 2.7's audio is good for a single-pass, cloud-only workflow. It's not a replacement for dedicated audio production.
4. Style Consistency & 5. Temporal Coherence
Wan 2.7 holds cinematic realism, anime, and illustrated styles across a full clip more reliably than 2.6 did. This matters specifically for content that requires a recognizable visual identity: brand content, recurring characters, serialized social posts.
More importantly, the temporal consistency improvement is the single most significant qualitative jump in this release. There are vastly fewer flickering faces, fewer mid-clip wardrobe changes, and fewer scenes where a subject's facial geometry shifts between frames.
The Four New Features — Honest Assessment
First and Last Frame Control
You provide two images: how the video begins, how it ends. The model generates everything in between — motion, transitions, character movement — while keeping both endpoints visually anchored.
This is the headline feature, and it earns that position. Instead of generating until something usable surfaces, you define the narrative boundaries and let the model fill the interior. The transitions between anchored endpoints are smooth and the subject identity holds reliably across the duration.
Where it still struggles: If your two endpoint images contain significant environmental differences, the model sometimes makes physically unconvincing bridging decisions. The feature works best when your endpoints are contextually adjacent rather than dramatically different.
9-Grid Image-to-Video
Upload a 3×3 arrangement of nine reference images. The model converts them into a single continuous video, treating each panel as a scene with smooth transitions between them.
The use case this serves is storyboarding at scale. The grid reads left-to-right, top-to-bottom, so panel arrangement maps directly to scene order. Maintaining visual consistency across all nine panels is harder when the input images vary significantly in color temperature, lighting, or subject distance. For tightly controlled studio inputs, the results are impressive.
Multi-panel Input & Consistency
Wan 2.7 allows a 3×3 grid of reference images to be processed into a single coherent video sequence. Maintaining visual identity in all 9 frames ensures a stable character narrative.
Using consistent lighting and clothing across all 9 panels (as shown) drastically improves the model's temporal stability.

Subject and Voice Reference Cloning
Upload a reference image of a character and a short audio clip. Wan 2.7 replicates both the visual appearance and the vocal characteristics in the generated video.
The visual half is strong. Character appearance is reliably replicated across multiple generations from the same reference. The voice half is more variable — lip sync accuracy is generally good on clean, single-speaker audio, but degrades with fast speech, heavy accents, or audio with significant background noise in the source clip.
Instruction-Based Editing
Upload an existing video clip, type a natural language instruction ("change the background to night," "swap the jacket to red"), and Wan 2.7 applies the edit while preserving the rest of the clip.
Instruction-based editing on moving video is technically harder than editing static images. Early community testing shows good results on background and lighting changes, where the model has more freedom. Changes to moving elements produce less consistent results.
Pricing: The Real Numbers
Wan 2.7 is available through multiple platforms with different pricing structures. No single canonical rate applies.
| Platform | Model | Approx. Cost |
|---|---|---|
| DashScope | Wan 2.7 | ~$0.05–0.08/sec |
| fal.ai | Wan 2.6/2.7 | $0.10 (720p) / $0.15 (1080p) |
| wan27.co | Wan 2.7 | $19.90 / $49.90 |
| Open Weights | Self-hosted | GPU Cost Only |
A 5-second 1080p generation at approximately $0.07/sec costs around $0.35 through API access. That's competitive positioning: Wan 2.6 ran at approximately $0.05/sec, Kling 3.0 runs at approximately $0.10/sec, and Sora 2 Pro — before its shutdown — ran at $0.30–$0.50/sec for 720p–1080p.
The open-weight factor is Wan's pricing moat. If weights ship under Apache 2.0, teams with access to a capable GPU can run generations for the cost of electricity. At scale, that gap against closed-source competitors becomes significant.
Head-to-Head: Wan 2.7 vs. The Competition
The AI video market in late March 2026 is crowded. Here is where Wan 2.7 actually sits relative to the models most creators are choosing between.
Wan 2.7 vs. Kling 3.0
Kling 3.0 (released February 2026) excels at multi-shot cinematic sequences — up to 6 camera cuts per generation — with strong subject consistency across angles and native audio in five languages. Its aesthetic leans artistic and cinematic, which benefits certain content types and limits others.
Wan 2.7 vs. Google Veo 3.1
Veo 3.1 remains the benchmark for native audio quality — specifically for natural lip synchronization, lifelike body language, and full sound design generated in a single pass. It supports up to 4K resolution and has the most mature API infrastructure.
Wan 2.7 does not compete with Veo 3.1 on audio fidelity or resolution ceiling. It does compete on price (Veo 3.1 at $0.40/sec with audio versus Wan 2.7 at ~$0.07/sec).
Wan 2.7 vs. Runway Gen-4.5
Runway Gen-4.5 leads ELO-based quality benchmarks for overall visual output and is the go-to for professional post-production workflows. Runway operates on a subscription model starting at $12/month, with generation costs scaling aggressively from there.
What Wan 2.7 Gets Right
The control architecture is coherent. First/last frame, 9-grid, voice reference, and instruction editing reflect a consistent design philosophy: give the user more anchoring inputs per generation call so fewer generations are needed to reach a usable result.
Temporal consistency at 1080p is production-grade. The flickering faces and mid-clip morphing that plagued Wan 2.5 and 2.6 are largely resolved.
The open-weight model path. Building on a closed-source, commercial-only model is a strategic vulnerability. Wan's history of open releases is its most underrated advantage.
Where Wan 2.7 Falls Short
Instruction-based editing is underdeveloped. Editing moving elements — clothing on a walking character, hair during motion — degrades temporal consistency in ways that make the feature unreliable for production use right now.
Raw photorealism still trails Sora 2 and Veo 3.1. In direct comparison on complex scenes, the gap is visible. It's a gap justified by price difference, but worth knowing.
The 15-second duration ceiling is a real constraint. Wan 2.7 at 15 seconds means any content requiring longer runs requires external sequencing and editing.
Who Should Use Wan 2.7
Highly recommended for:
- Creators building repeatable character-driven content (brand mascots, YouTube series, spokesperson videos)
- Developers building video generation pipelines who need API access and open-weight flexibility
- Small teams doing high-volume production where per-generation cost directly affects margins
- Creators who currently use Wan 2.6 and regularly hit character consistency limitations
Consider alternatives if:
- Your primary bottleneck is audio quality in dialogue-heavy content (Veo 3.1, Kling 3.0)
- You need clips longer than 15 seconds without manual sequencing (Kling)
- You want the absolute highest visual quality and price is secondary (Runway Gen-4.5, Veo 3.1)
- You need instruction-based editing to be reliable on moving elements today (wait for model maturation)
Final Verdict
Wan 2.7 is not the best AI video model in every dimension. It was never going to be. But it doesn't need to be. What it is — and what very few models in this space are — is a model that solves the right problem for the right audience at the right price point.
The right problem: not "generate something impressive," but "generate something I can control, reproduce, and build a production workflow around." Endpoint anchoring, multi-reference grids, character cloning — these features address the practical frustrations of creators doing repeatable, professional-grade work rather than one-off demos.
The right audience: independent creators, small agencies, and developers who need control inputs without enterprise pricing.
The right price point: meaningfully cheaper per generation than most of its feature-equivalent competitors, with a realistic path to open-weight self-hosting that eliminates API dependency risk entirely.
If Wan 2.7 ships open weights under Apache 2.0 — which the Wan family's history suggests is likely within the next two months — it becomes the most practically valuable model in this tier of the market. That's a meaningful position.
| Dimension | Score | Notes |
|---|---|---|
| Visual quality | 4.1 / 5 | Strong at 1080p; trails Sora/Veo on raw realism |
| Motion coherence | 4.2 / 5 | Measurably better than 2.6; occasional fast-motion issues remain |
| Audio quality | 3.8 / 5 | Good for single-pass production; not Veo 3.1-tier |
| Control features | 4.5 / 5 | First/last frame and 9-grid are genuinely differentiated |
| Instruction editing | 3.0 / 5 | Promising but not production-ready for all edits |
| Pricing | 4.6 / 5 | API rates competitive; open-weight path changes the economics |
| API / developer | 3.5 / 5 | Documentation lag at launch is a recurring issue |
| Overall | 4.0 / 5 | Best-in-class for control-focused, cost-efficient workflows |
Frequently Asked Questions
Is Wan 2.7 open source?
Not confirmed yet. Earlier versions (2.1, 2.2) shipped under Apache 2.0 on GitHub. Whether 2.7 follows the same path has not been officially stated. Based on the Wan family's historical release cadence, open weights typically follow cloud launch by four to eight weeks. Watch the official Wan-Video GitHub repository for confirmation.
How does Wan 2.7 compare to Wan 2.6?
Wan 2.6 is faster and better for high-volume exploration. Wan 2.7 adds first/last frame control, 9-grid I2V, voice reference cloning, and instruction-based editing — features that make it better for production workflows requiring repeatability and character consistency. The practical framing: use 2.6 to experiment quickly, use 2.7 to produce content you actually publish.
What happened to Sora, and is Wan 2.7 a replacement?
OpenAI discontinued Sora on March 24, 2026. Wan 2.7 is one of the most capable alternatives currently available, particularly for creators who valued Sora's temporal consistency and needed a cost-efficient option. For raw photorealism, Veo 3.1 and Runway Gen-4.5 are closer comparisons; for pricing and open-weight access, Wan 2.7 is stronger than either.
Can I self-host Wan 2.7?
Not yet officially. Cloud API access via DashScope and third-party platforms is currently the only confirmed path. Open weights are expected to follow, as with previous Wan releases.
What is the 9-grid feature and when should I use it?
9-grid I2V lets you upload a 3×3 arrangement of nine reference images in a single generation call. The model converts them into a continuous video with smooth transitions. It's most effective for storyboard-to-video workflows, sequential product presentations, and multi-angle character reference generation. Inputs should share a consistent aspect ratio and resolution for best results.