Crafting Dynamic OTS Shots with Midjourney V5 Techniques
Written on
The over-the-shoulder (OTS) shot is a powerful visual storytelling method that is prevalent in filmmaking. It allows viewers to engage in conversations by placing the camera behind one character, offering a perspective that enhances the narrative.
With the advent of Midjourney’s V5 model, crafting these shots has become significantly more straightforward. Here's a guide on how to do it effectively.
What Purpose Do OTS Shots Serve? OTS shots primarily illustrate dialogues between two individuals, with the camera positioned behind one character to capture their viewpoint. This perspective often includes part of their shoulder and head out of focus, while the other character is prominently featured in the frame. This technique frequently alternates between the two characters, creating a lively exchange that vividly conveys their dialogue and reactions.
How to Generate OTS Shots Using Midjourney Previously, in Midjourney V4, creating OTS shots was more challenging due to its limitations. The newer V5 model, however, leans towards photorealism, making the process easier. Let's explore how to set up our scene.
We begin by defining the setting and the characters: cinematic shot, two women in a hotel lobby scene --ar 16:9 --seed 4000
(If you are unfamiliar with the “—ar 16:9” and “—seed 4000” specifications, additional guidance can be found here.)
Once the scene and characters are established, we need to adjust the shot size to a close-up to better achieve the OTS effect.
Adding this to our prompt reveals a common issue we might encounter: cinematic close-up shot, two women in a hotel lobby scene --ar 16:9 --seed 4000
Images 1, 2, and 4 do not effectively capture the conversation. However, Image 3 is almost there.
This challenge is typical when generating AI images with Midjourney. The algorithms interpret the broader context of words, not just their literal meanings.
In this case, the phrase “people in a hotel lobby” led to a common depiction of two women conversing, as that’s a typical scenario in such settings. When we shifted to close-up, the model's interpretations varied, leading to different outcomes, including "standing side by side" or "watching something."
From our experiences with V4, we realized that using the phrase “having a conversation” can help: - Position the characters appropriately for our OTS shot. - Facilitate the close-up view we desire.
To refine our approach, let’s start again: cinematic shot, two women having a conversation, hotel lobby scene --ar 16:9 --seed 4000
Next, we proceed to the close-up: cinematic close-up shot, two women having a conversation, hotel lobby scene --ar 16:9 --seed 4000
Now, all four images are aligned with our vision.
One key point: In Midjourney V4, I typically placed shot types at the end of the prompt. However, in V5, it’s advisable to position them at the start to avoid unnecessary complications.
Examine these examples where I erroneously placed “close-up” at the end: cinematic shot, two women having a conversation, hotel lobby scene, close-up shot --ar 16:9 --seed 4000
No close-ups appear in the output.
This may reflect Midjourney's enhanced comprehension of natural language, which is often misconceived. It still accommodates traditional comma-separated prompts that can be beneficial in certain contexts, but knowing when to use each style is crucial.
V5’s inclination towards realism is advantageous, but it does not eliminate the need for cinematic language. For example, if we neglect a cinematic prefix: close-up shot, two women having a conversation, hotel lobby scene --ar 16:9 --seed 4000
The resulting grid diverges even further from our goal.
Returning to our OTS shot: cinematic close-up shot, two women having a conversation, hotel lobby scene --ar 16:9 --seed 4000
We can observe that Image 4 aligns more closely with our desired OTS aesthetic compared to the others.
Why is this? 1. It resembles a “close-up” shot more closely than the alternatives. 2. It effectively utilizes depth of field.
Let’s analyze them one by one, starting by adding “shallow depth of field” to emphasize our intended effect: cinematic close-up shot, two women having a conversation, hotel lobby scene, shallow depth of field --ar 16:9 --seed 4000
Next, we will enhance the close-up shot to underscore its significance. Interestingly, V5 often renders one shot size behind (for instance, "close-up" may yield a medium shot, while "extreme close-up" results in a close-up): cinematic extreme close-up shot, two women having a conversation, hotel lobby scene --ar 16:9 --seed 4000
Finally, let’s merge these two enhancements to achieve the OTS aesthetic across all images in the original grid: cinematic extreme close-up shot, two women having a conversation, hotel lobby scene, depth of field --ar 16:9 --seed 4000
From here, you can begin experimenting further, such as refining shot descriptions: cinematic extreme close-up shot, two women having a conversation, hotel lobby scene, shallow depth of field --ar 16:9 --seed 4000
Alternatively, adjust the emotional tone of the scene:
Exploring Beyond Conversations What about scenarios that don’t involve dialogue? Can we reliably create shots like this: cinematic over-the-shoulder shot, a man stealing a bagel in a bakery, depth of field --ar 16:9 --seed 4000
(Does Image 4 represent a shot over a bagel's shoulder?)
We will delve into this topic in future installments of Mastering Midjourney V5. If you missed earlier episodes, don't worry; you can catch up here, here, and here.
I hope you found this article insightful. Continue your exploration!
For more insights on AI & Creativity, follow me on Twitter or Medium (use my referral link for full access to all my articles and those of thousands of other writers).
Join Medium with my referral link - Tristan Wolff Read every story from Tristan Wolff (and thousands of other writers on Medium). Your membership fee directly supports... medium.com
If you enjoyed this content, consider giving a “clap” at the end of this article to help others discover it!