The Essential Video Semiotics Glossary

The Essential Video Semiotics Glossary

29

min read

The Essential Video Semiotics Glossary

The Essential Video Semiotics Glossary

29

min read

The Essential Video Semiotics Glossary

The Essential Video Semiotics Glossary

29

min read

This glossary accompanies the two-part article series on the semiotic grammar of video

Every term from classical semiotics, film theory, multimodal discourse analysis, and platform studies that appears in the articles is defined here in three registers: 

  1. The academic definition.

  2. A plain-language translation.

  3. A concrete example drawn from the world of YouTube, TikTok, Reels, and Shorts. 

Use it as a reference while reading the articles, as a training resource for your team, or as a standalone introduction to the semiotic vocabulary of video marketing.

FILM SEMIOTICS (Metz, Deleuze)

Grande Syntagmatique

  • Academic: Christian Metz's taxonomy of eight autonomous segment types constituting the deep grammatical structure of narrative film editing.

  • In plain language: Think of it as the periodic table of film editing. Just as chemistry has a finite set of elements that combine to make everything, Metz showed that all of cinema's editing choices boil down to eight basic building blocks, and every movie you've ever seen is assembled from combinations of them.

  • Example: A TikTok "outfit check" montage uses what Metz called a bracket syntagma (thematically linked shots with no story order), while a "day in my life" vlog uses an episodic sequence (time compressed into highlights). Same platform, completely different grammar.

Autonomous Shot

  • Academic: A single, uninterrupted take that constitutes a complete narrative segment independent of surrounding shots.

  • In plain language: One continuous shot that tells its own little story with no cuts needed. It stands on its own.

  • Example: The classic face-to-camera video where a creator talks directly to the audience without a single edit. One shot = one complete message.

Bracket Syntagma

  • Academic: A series of brief scenes with no temporal or spatial relationship, unified only by a shared conceptual or thematic thread.

  • In plain language: A collection of clips held together not by a storyline but by a vibe or idea. They don't happen in order but in the same mental neighborhood.

  • Example: An Instagram Reel showing "things that make me happy": a coffee cup, a sunset, a dog, a bookshelf. No timeline, no story: just a mood board in motion.

Episodic Sequence

  • Academic: An organized series of brief scenes representing stages of a chronological development, with temporal compression between stages.

  • In plain language: Time-lapse storytelling. You see the beginning, a few highlights in the middle, and the end, and all the boring parts edited out.

  • Example: A before/after home renovation video showing Day 1, Day 15, Day 30, and the final reveal. Weeks of work compressed into 60 seconds.

Parallel Syntagma

  • Academic: Two or more alternating motifs juxtaposed without precise temporal or spatial relationship, creating meaning through symbolic comparison.

  • In plain language: Cross-cutting between two things not because they happen at the same time, but because putting them side by side makes a point.

  • Example: A brand video alternating between shots of the old way (frustration, mess, slow) and the new way (ease, clarity, speed). The contrast IS the message.

Descriptive Syntagma

  • Academic: A sequence of shots depicting coexisting aspects of a single subject or space, without temporal progression.

  • In plain language: Showing different angles or facets of the same thing, like walking around a sculpture. Time isn't passing but you're just looking more closely.

  • Example: A product showcase video: the shoe from the front, the side, the sole, a close-up of the stitching. No story, just thorough seeing.

Alternating Syntagma

  • Academic: Two or more series of events shown in alternation, implying simultaneity of action across different spaces.

  • In plain language: Cutting back and forth between two things happening at the same time, like a phone conversation shown from both ends.

  • Example: A YouTube documentary cutting between an interview with the CEO and footage of the factory floor, implying both exist in the same present moment.

Movement-Image

  • Academic: Deleuze's concept describing cinema's regime of sensory-motor schemas, where perception leads to affect leads to action within rational narrative logic.

  • In plain language: The kind of image where something happens: you see a situation, you feel something about it, and then someone acts. It's the engine of every story that has a beginning, middle, and end.

  • Example: A YouTube tutorial: you see the problem (perception), feel the frustration (affect), then watch the solution being demonstrated (action). Classical cause-and-effect storytelling.

Time-Image

  • Academic: Deleuze's concept describing a cinematic regime where time is presented directly rather than through action, breaking sensory-motor chains and creating pure optical and sonic situations.

  • In plain language: The kind of image where nothing "happens" in the plot sense, but time itself becomes visible. You're not watching a story unfold but  experiencing duration, memory, or thought.

  • Example: A YouTube video essay layering archival footage, film clips, and philosophical commentary until you lose track of which era you're in. The thinking IS the content, not the action.

Affection-Image

  • Academic: A Deleuzian image-type defined by the close-up of the face, expressing pure quality and potentiality before action is taken.

  • In plain language: The face filling the screen, registering emotion... not doing anything yet but just feeling. It's the close-up as a window into someone's inner state.

  • Example: A TikTok reaction video: the entire screen is a face responding to something. No action, no plot but only the raw registration of surprise, delight, or horror.

Scene

  • Academic: In Metz's taxonomy, an autonomous segment depicting continuous time and continuous space, aka the closest cinematic equivalent to a theatrical scene.

  • In plain language: What it sounds like: one unbroken stretch of time in one place, captured through multiple shots. The camera may change angles, but the clock keeps ticking and nobody teleports.

  • Example: A talking-head YouTube segment where the creator explains a concept from their desk, with multiple camera angles cutting between wide and close-up, but continuous time in the same room. That's a scene in Metz's terms.

Ordinary Sequence

  • Academic: A syntagmatic type presenting continuous action with minor temporal ellipses; insignificant moments are skipped, but the logical chain of events remains unbroken.

  • In plain language: Real-time storytelling with the boring bits snipped out. You see someone walk to the door, then they're already inside. You didn't need to watch them turn the handle. The story flows naturally, and you just skip the dead air.

  • Example: A tutorial video showing a recipe: the creator chops onions, then we cut to the onions already in the pan (skipping the walk to the stove). Continuous logic, minor time jumps. That's an ordinary sequence.

Jump Cut

  • Academic: An edit within continuous footage that removes a temporal segment, creating a visible discontinuity in the subject's position or action within the same spatial framing.

  • In plain language: A cut that breaks the rules of smooth editing on purpose: the person stays in frame but visibly jumps forward in time. In cinema it was once a mistake; on YouTube it became the default style.

  • Example: A creator recording a monologue in one take, then editing out every pause and "um." The result: they visibly jump and shift between sentences. It's technically a continuity error elevated to a platform convention.

Neuro-Image

  • Academic: Patricia Pisters's extension of Deleuze proposing a third image-type for digital screen culture, operating through database logic, mental landscapes, and networked associations rather than linear narrative.

  • In plain language: The kind of image native to the internet age, where you move through a web of connections and associations rather than following a straight line. Think of it as the image that works like your brain's browsing history, not like a train on a track.

  • Example: TikTok's algorithmic feed itself: each video connects not by story or chronology but by invisible associative logic: one clip leads to the next through pattern-matching that feels more like dreaming than watching.

Crystal-Image

  • Academic: Deleuze's concept of the point where actual (present) and virtual (past/possible) images become indiscernible, creating a circuit of mutual reflection.

  • In plain language: When you can't tell where the present ends and memory or imagination begins, and the real and its reflection fuse into one.

  • Example: A perfectly looping TikTok video where the ending seamlessly becomes the beginning. Linear time dissolves: you can't tell if you're watching it for the first time or the fifth.

STRUCTURAL SEMIOTICS (Saussure)

Signifier / Signified

  • Academic: Saussure's two-part model of the sign: the signifier is the material form (sound, image, mark) and the signified is the mental concept it evokes. Their relationship is arbitrary and held in place by social convention, not natural resemblance.

  • In plain language: Every sign has two halves: the thing you perceive (the sound of the word "dog," the image of a red octagon) and the idea it triggers in your mind (a four-legged animal, "stop"). These two halves are glued together by nothing but shared agreement. Change the culture, and the glue dissolves.

  • Example: A blue checkmark on social media. The signifier is a small blue icon. The signified used to be "verified, trustworthy." After platform policy changes, the same signifier now signifies "paid subscriber" for many users. The icon didn't change but the convention did.

Langue / Parole

  • Academic: Saussure's distinction between langue (the abstract system of rules and conventions shared by a language community) and parole (the individual act of speech or expression within that system).

  • In plain language: Langue is the rulebook everyone shares, aka the grammar, the vocabulary, the conventions. Parole is what you actually say using those rules. You can break the rules creatively (that's style), but you need the shared system for anyone to understand you at all.

  • Example: TikTok trending sounds are langue or, in other words, a shared code system where each sound carries accumulated meaning from thousands of prior uses. Your specific video using that sound is parole;  your individual expression within the shared system. If you use the "Oh No" sound over a happy moment, you're breaking the langue, and the dissonance is either confusing or brilliantly ironic.

Syntagmatic Axis

  • Academic: The axis of combination or the sequential chain along which signs are arranged in a specific order, where meaning arises from the relationships between elements placed next to each other.

  • In plain language: The "and then" axis. It's about sequence — what comes first, what follows, and how the order creates meaning. In film, it's the edit timeline. In language, it's the sentence. Change the order and you change the meaning.

  • Example: In a YouTube video: hook → context → evidence → conclusion → CTA. That specific sequence is a syntagmatic chain. Rearrange it — put the conclusion first and the hook last — and you've created an entirely different viewing experience (and probably killed your retention).

Paradigmatic Axis

  • Academic: The axis of selection or the set of elements that could substitute for one another at any given point in a syntagmatic chain, where meaning arises from the choice of one element over its alternatives.

  • In plain language: The "instead of" axis. At every point in a sequence, you chose one thing from a menu of options. The meaning of your choice comes partly from what you picked and partly from everything you didn't pick. A red dress means something because it's not the black one, the white one, or no dress at all.

  • Example: For a YouTube intro, you chose upbeat electronic music instead of acoustic guitar, orchestral, or silence. Each option would have placed the video in a different semantic field. The paradigmatic choice — what you selected from the available alternatives — shapes meaning before a single word is spoken.

Denotation / Connotation

  • Academic: Barthes's two orders of signification: denotation is the literal, descriptive meaning of a sign; connotation is the secondary, culturally contingent meaning layered on top of it.

  • In plain language: Denotation is what a picture literally shows. Connotation is what it makes you feel or think. A photo of a beach denotes sand, water, and sky. It connotes vacation, freedom, escape. The first meaning is in the image; the second meaning is in the culture.

  • Example: A video of a person drinking coffee in slow motion. Denotation: someone consuming a beverage. Connotation (depending on lighting, music, and setting): luxury, morning ritual, self-care, sophistication, or hipster culture. Same literal content, wildly different connotative readings.

MULTIMODAL DISCOURSE ANALYSIS (Kress & van Leeuwen)

Visual Grammar

  • Academic: Kress and van Leeuwen's systematic framework treating visual compositions as structured meaning-making systems governed by grammar-like rules, organized around representational, interactive, and compositional metafunctions.

  • In plain language: Images aren't random, but they have a grammar just like sentences do. Where you put things in a frame, how big they are, what's sharp and what's blurred, and all of these are choices that carry meaning, whether the creator knows it or not.

  • Example: A YouTube thumbnail with a face on the left and a product on the right isn't just "a layout." In visual grammar, the face is Given (familiar) and the product is New (the news). The layout itself communicates.

Information Value

  • Academic: The semiotic significance assigned to elements based on their placement within a composition,  specifically along the Given/New (left/right), Ideal/Real (top/bottom), and Centre/Margin axes.

  • In plain language: Where something sits in the frame tells you what it means. Left = what you already know. Right = the new thing. Top = the dream. Bottom = the details. Center = what matters most.

  • Example: On a vertical TikTok video, text overlays at the top ("Ideal" zone) tend to carry the aspirational hook ("The trick nobody talks about"), while specific details and platform UI sit at the bottom ("Real" zone).

Given/New Axis

  • Academic: The horizontal compositional axis where left-positioned elements carry established, familiar information and right-positioned elements carry novel, noteworthy information, derived from Western left-to-right reading conventions.

  • In plain language: In a horizontal frame, the left side is home base aka the familiar, agreed-upon starting point. The right side is where the surprise lives. Your eye naturally travels left to right, and meaning flows with it.

  • Example: In a YouTube interview, the host sits screen-left (Given — you know them) while the guest sits screen-right (New — they're the reason you're watching). On vertical video, this axis barely functions because the frame is too narrow.

Ideal/Real Axis

  • Academic: The vertical compositional axis where upper elements carry generalized, aspirational, or emotionally appealing content and lower elements carry specific, practical, grounded information.

  • In plain language: Top of the frame = the promise, the dream, the big idea. Bottom of the frame = the facts, the proof, the fine print. This is why movie posters put the hero's face up high and the credits down low.

  • Example: In a YouTube thumbnail: dramatic facial expression at top (Ideal — emotional hook), specific text like "$0 Budget" at bottom (Real — practical detail). In TikTok, platform buttons colonize the Real zone.

Salience

  • Academic: The degree to which a visual element attracts the viewer's attention, determined by size, sharpness, tonal contrast, color saturation, foregrounding, and cultural factors.

  • In plain language: Whatever your eye hits first, that's the most salient element. Creators control this through size, color, focus, and position. It's the visual equivalent of someone shouting in a quiet room.

  • Example: A YouTube thumbnail where everything is slightly blurred except one sharply focused face with a bright red circle around it. The face is maximally salient and, so, it dominates your attention before you process anything else.

Framing

  • Academic: The visual devices (lines, white space, color boundaries, discontinuities) that connect or disconnect compositional elements, signaling whether they belong together or are separate units.

  • In plain language: The invisible walls and bridges in a composition. Strong framing separates elements ("these are different things"). Weak framing connects them ("these belong together").

  • Example: A split-screen TikTok with a clear dividing line between two clips has strong framing, where the line says, "compare these two separate things." A vlog with seamless background blur has weak framing, and subject and environment flow together.

Modality (Visual)

  • Academic: The degree to which a visual representation claims to depict reality, ranging from high modality (photorealistic) to low modality (abstract or stylized), evaluated against genre-specific coding orientations.

  • In plain language: How "real" does the image claim to be? A photograph says, "this actually happened." A cartoon says, "this is an idea." Every visual sit somewhere on this spectrum, and where it sits shapes how much you trust it.

  • Example: Raw, unfiltered iPhone footage on TikTok has high naturalistic modality, and it claims authenticity. A heavily color-graded cinematic YouTube video has lower naturalistic modality but higher sensory modality, and it claims emotional truth instead.

Modal Affordance

  • Academic: Kress's concept describing the inherent potentials and limitations of a specific semiotic mode (image, sound, text, gesture) for making particular kinds of meaning.

  • In plain language: Every communication channel is good at some things and bad at others. Music is great at conveying emotion but terrible at giving directions. Text is great at precision but weak at showing spatial relationships. Each mode has built-in strengths and limits.

  • Example: TikTok's sound-first affordance makes it ideal for emotional storytelling through music. YouTube's search-first affordance makes it ideal for informational depth through spoken language. Same video concept, different platform = different mode should lead.

Intersemiotic Complementarity

  • Academic: Terry Royce's framework describing how different semiotic modes (text, image, sound) complement each other in projecting cohesive meaning within a multimodal text.

  • In plain language: When words, images, and sounds work together in a way where each adds something the others can't — like instruments in a band, where the whole is greater than any solo.

  • Example: A TikTok where the visual shows a calm face, the text overlay reads "I'm fine," but the audio plays a melancholic song. The three modes contradict each other, and that tension IS the meaning. None of them alone conveys the irony.

  • Social Distance (Shot Type as Sign)

  • Academic: Kress and van Leeuwen's adaptation of Edward Hall's proxemics, where camera distance (close-up, medium, long shot) semiotically encodes the social relationship between depicted participant and viewer.

  • In plain language: How close the camera gets to someone isn't just a technical choice but a relationship choice. Close-up = intimacy ("we're in this together"). Long shot = detachment ("observe this from a distance").

  • Example: A skincare creator filming in extreme close-up creates an intimate, confessional feeling: you're inside their personal space. A drone establishing shot of a resort creates social distance: you're an observer, not a participant.

Demand vs. Offer (Gaze)

  • Academic: Kress and van Leeuwen's distinction between images where the depicted participant looks directly at the viewer ("demand" — engaging/interpellating the viewer) versus away from the viewer ("offer" or presenting the viewer something to contemplate).

  • In plain language: When someone on screen looks straight into the camera, they're talking TO you and demanding your engagement. When they look away, they're being observed BY you, hence offering themselves as a subject for your consideration.

  • Example: A creator staring directly into the camera saying "You need to hear this" creates demand; it's a direct address that's hard to scroll past. A documentary subject filmed in profile, unaware of the camera, creates offer; you're watching, not being spoken to.

Kineikonic Mode

  • Academic: Andrew Burn's extension of Kress and van Leeuwen's framework to moving image, analyzing how film and video construct meaning through the integrated interaction of image track, sound track, gestural performance, and editing rhythm as a unified multimodal system.

  • In plain language: A grammar specifically designed for video rather than still images. It recognizes that a moving image isn't just a photograph plus time, but  a completely different kind of sign-system where camera movement, editing, sound, and human gesture all generate meaning simultaneously, in ways that static visual grammar can't fully describe.

  • Example: A slow zoom into a speaker's face while the background music drops out and their voice softens. None of these elements — the zoom, the silence, the vocal shift — mean much alone. But combined in motion and time, they create an unmistakable signal: "this is the important part." That's the kineikonic mode at work.

Encoding / Decoding

  • Academic: Stuart Hall's communication model proposing that media messages are encoded by producers within specific frameworks of meaning and decoded by audiences who may adopt dominant, negotiated, or oppositional reading positions, and meaning is not transmitted but constructed at both ends.

  • In plain language: The creator puts meaning IN; the viewer takes meaning OUT, but those two meanings don't have to match. The viewer might read the message exactly as intended (dominant), partly accept and partly resist it (negotiated), or reject it entirely and read it against the grain (oppositional). Communication is not a pipeline; it's a negotiation.

  • Example: A luxury brand encodes a video with aspirational signifiers: golden light, slow motion, premium materials. Most viewers decode it as intended ("this is desirable"). But some decode it oppositionally: "this is performative excess designed to make me feel inadequate, so I'll buy something." Same video, opposite meanings.

Multimodal Redundancy

  • Academic: A design strategy where the same core information is conveyed simultaneously across multiple semiotic modes (visual, verbal, textual, auditory), ensuring message reception even when one mode is inaccessible.

  • In plain language: Saying the same thing in multiple ways at once, so even if you miss one channel, another one catches you. It's the communication equivalent of a safety net: belt AND suspenders.

  • Example: A YouTube creator explaining a statistic while simultaneously showing the number as on-screen text AND displaying a chart. If a CTV viewer glances at their phone during the voiceover, they can still catch the number visually when they look back up. The information survives the attention gap.

NARRATIVE SEMIOTICS (Greimas, Barthes)

Actantial Model

  • Academic: Greimas's structural model identifying six functional narrative roles — Subject, Object, Sender, Receiver, Helper, Opponent — organized along three axes: desire (Subject→Object), communication (Sender→Receiver), and power (Helper vs. Opponent).

  • In plain language: Every story, from the Odyssey to a 30-second ad, has the same six invisible characters playing the same six roles. Someone wants something, someone helps them, someone blocks them, someone sent them on the quest, and someone receives the result. Map these six roles and you've decoded the skeleton of any narrative.

  • Example: A skincare brand video: the viewer (Subject) wants clear skin (Object). Acne (Opponent) blocks them. The brand (Sender) introduces its product (Helper). The before/after transformation is delivered to the viewer (Receiver). Six roles, 30 seconds.

Canonical Narrative Schema

  • Academic: Greimas's four-phase narrative structure: manipulation (establishing the contract/desire), competence (acquiring means to act), performance (the decisive transformation), and sanction (evaluation of the result).

  • In plain language: Every complete story moves through four acts: first you're given a reason to care, then you gather what you need, then you do the thing, then you see what happened. This sequence is as old as storytelling itself,  and it still runs underneath every video hook.

  • Example: In a 15-second TikTok: "You've been washing your face wrong" (manipulation — in 1 second). Competence is skipped. Performance is the product demonstration (seconds 2–12). Sanction is the glowing result (seconds 13–15). The whole schema but radically compressed.

Semiotic Square

  • Academic: Greimas's logical model mapping the contrary, contradictory, and complementary relationships within a semantic opposition, generating four positions from a single binary axis.

  • In plain language: Take any either/or opposition — say, luxury vs. economy — and the semiotic square reveals two hidden positions you weren't seeing: not-luxury and not-economy. This turns a flat line into a map with four quadrants, each representing a different strategic territory.

  • Example: If your competitors all make "practical/utilitarian" product videos (one corner of the square), the model reveals that "ludic/playful" and "utopian/aspirational" content territories are wide open. It's a positioning compass.

Hermeneutic Code

  • Academic: Barthes's narrative code governing the sequential construction and resolution of enigmas: the process of posing, thematizing, delaying, partially answering, and finally disclosing a mystery within the text.

  • In plain language: The mystery engine. It's the code that makes you ask, "what happens next?" and keeps you watching to find out. Every unanswered question in a video, every "wait for it," every cliffhanger runs on this code.

  • Example: A YouTube video titled "I Spent 100 Days in Minecraft Hardcore" — the title poses the enigma. The opening shows a dramatic moment from Day 97 — that's a partial reveal. The full answer is withheld until minute 18. Every second of watch time in between is the hermeneutic code doing its job.

Proairetic Code

  • Academic: Barthes's code of sequential actions that implies logical consequence, with each action creating expectations about what must follow, generating narrative momentum.

  • In plain language: The "and then what?" code. When you see someone pick up a phone, you expect them to dial. When someone opens a door, you expect to see what's on the other side. Each action implies the next, and that chain of implications pulls you forward.

  • Example: In a cooking video: chopping → sautéing → plating. Each step implies the next. If you cut from chopping to the finished plate and skip the sautéing, it feels wrong. The proairetic code is the logic of sequential expectation.

Cultural/Referential Code

  • Academic: Barthes's code referring to shared cultural knowledge, stereotypes, scientific references, and common wisdom that the text invokes without explanation, assuming the audience's competence.

  • In plain language: The code of "you know what I mean." Every time a video relies on shared cultural knowledge — a meme format, a trending sound, a historical reference — it's activating this code. It's the invisible wink between creator and audience.

  • Example: Using the "Oh No" sound on TikTok: the creator doesn't need to explain that something went wrong. The sound carries that meaning from thousands of prior uses. If you don't know the sound, you miss the meaning: that's the cultural code at work.

Symbolic Code

  • Academic: Barthes's code organizing meaning through binary oppositions and antitheses, which is the deep structure of contrast (life/death, nature/culture, constraint/freedom) underlying the surface narrative.

  • In plain language: The code of deeper meaning through contrasts. Whenever a video sets up an opposition — before/after, chaos/order, old way/new way — it's tapping into this code. The contrast itself carries the message.

  • Example: A transformation Reel showing a cluttered desk (chaos/constraint) → clean desk (order/freedom). The video never says, "buy our organizer for a better life," but the symbolic opposition says it for them.

Semic Code

  • Academic: Barthes's code of connotation aka the system by which visual and textual details accumulate to construct character traits, atmospheric qualities, and associative meanings beyond their denotative content.

  • In plain language: The code of vibes. Warm lighting, leather textures, and jazz music don't literally mean "sophisticated" but together they connote it. Every visual detail whispers something beyond what it literally shows.

  • Example: A brand video shot in a minimalist white studio with sans-serif typography and ambient electronic music. None of these elements literally mean "innovative tech company" but the semic accumulation builds that connotation.

Isotopy

  • Academic: Greimas's concept of a redundant set of semantic categories distributed along a text's syntagmatic chain, creating thematic coherence through the recurrence of compatible meaning elements.

  • In plain language: The "red thread" of meaning running through a text. When every visual, sound, and word choice points in the same semantic direction, you have a strong isotopy. When they point in different directions, the text feels incoherent.

  • Example: A travel brand video where every element — golden light, flowing fabrics, slow motion, acoustic guitar, the word "discover" — belongs to the same semantic field of "freedom and escape." That consistency is isotopic coherence.

Propp's Narrative Functions

  • Academic: Vladimir Propp's identification of 31 invariant narrative functions (e.g., Departure, Interdiction, Violation, Villainy, Lack, Mediation, Task, Solution, Recognition, Reward) constituting the morphological deep structure of folk narrative.

  • In plain language: Before Greimas, Propp discovered that all Russian fairy tales — and by extension, most popular stories — are built from the same 31 plot moves arranged in the same order. Not every story uses all 31, but they always appear in sequence. It's the original narrative DNA map.

  • Example: A brand challenge video follows Propp's structure: the creator receives a task (function: Difficult Task), attempts it (Struggle), succeeds or fails (Victory/Defeat), and receives recognition (Recognition). The same functions that drive fairy tales drive YouTube engagement. If the Hero’s Journey popped in your mind, it’s because it is very close to the Propp’s narrative function.

Narrative Program

  • Academic: In Greimasian semiotics, a syntagmatic unit representing a state transformation or the passage of a subject from one state (e.g., disjunction from an object of value) to another (conjunction with that object).

  • In plain language: The smallest complete unit of story: someone goes from not-having to having, from not-knowing to knowing, from stuck to unstuck. Every story is built from one or more of these state-change units stacked together.

  • Example: A 15-second tutorial TikTok is a single narrative program: the viewer goes from disjunction with knowledge ("I didn't know this trick") to conjunction with it ("now I do"). A 20-minute YouTube video chains multiple narrative programs together, each building on the last.

PEIRCEAN SEMIOTICS (Peirce)

Icon

  • Academic: A sign that signifies through resemblance or similarity to its object. In other words, it looks, sounds, or feels like the thing it represents.

  • In plain language: A sign that works by looking like what it means. A photograph of a dog is an icon of a dog. A map is an icon of a territory. The sign and the thing share a visible family resemblance.

  • Example: In a product video, showing the actual product being used is iconic: the image resembles the real object. An animated logo, by contrast, is symbolic.

Index

  • Academic: A sign that signifies through a causal, physical, or existential connection to its object — it points to something by being genuinely affected by or connected to it.

  • In plain language: A sign that works as evidence; not because it looks like the thing, but because it was caused by or connected to the thing. Smoke is an index of fire. A footprint is an index of a foot. The sign and the thing are physically linked.

  • Example: In a video, steam rising from a coffee cup indexes heat and freshness. Shadows lengthening index the passage of time. Tears index genuine emotion. These aren't symbols but they're caused by the things they point to.

Symbol

  • Academic: A sign that signifies through arbitrary cultural convention: the connection between sign and meaning is learned, not natural.

  • In plain language: A sign that works only because we all agreed it would. There's nothing inherently "stop-like" about the color red or the shape of a hexagon but we just all learned the convention. If you don't know the convention, you miss the meaning entirely.

  • Example: In a video, a ticking clock symbolizes urgency (convention, not causation). A wedding ring symbolizes commitment. A red "Subscribe" button symbolizes channel membership. None of these resemble what they mean but you have to know the code.

COMMERCIAL & APPLIED SEMIOTICS

Residual, Dominant, Emergent (RDE) Mapping

  • Academic: Malcolm Evans's framework classifying cultural codes as Residual (fading from relevance), Dominant (mainstream current conventions), or Emergent (signals of where culture is heading).

  • In plain language: A trend radar for meaning. Residual codes are yesterday's news (still around but losing power). Dominant codes are today's mainstream. Emergent codes are tomorrow's conventions — still niche but gaining traction. Brands that spot emergent codes early own the future.

  • Example: In travel video: the heavily filtered, oversaturated "wanderlust" aesthetic is Residual. Clean, documentary-style footage with natural color is Dominant. Raw, imperfect, community-sourced content is Emergent. If your brand is still producing Residual content, you're behind.

Semiotic Audit

  • Academic: A systematic analysis evaluating the sign-systems deployed across a brand's communications at three levels: the brand itself, the competitive category, and the broader cultural context.

  • In plain language: A full diagnostic of what your brand's visual and verbal language actually says — not what you think it says. It examines your content, compares it to competitors, and maps it against what culture is currently rewarding. Think of it as getting your brand's semiotics bloodwork done.

  • Example: An audit might reveal that a luxury hotel brand is using the same color palette, font, and music as three budget competitors — meaning their content semiotically signals "mid-range" regardless of their pricing or positioning.

Coding Orientation

  • Academic: Kress and van Leeuwen's concept that different social groups evaluate visual modality (truth-value) against different standards: naturalistic, sensory, technological, or abstract.

  • In plain language: Different audiences judge "realness" by different standards. Scientists trust diagrams. Fashion editors trust mood. Engineers trust blueprints. Instagram audiences trust aesthetics. Each group has its own truth-detector for images.

  • Example: On TikTok, raw iPhone footage codes as more "true" (naturalistic orientation) than polished studio footage. On Instagram, the opposite applies — high aesthetic quality signals credibility (sensory orientation). Same content, different platform, different truth standard.

Floch's Four Axiologies of Consumption

  • Academic: Jean-Marie Floch's application of the semiotic square to consumer valuation, generating four positions: practical (utilitarian features), utopian (existential meaning), critical (value-for-money), and ludic (aesthetic pleasure).

  • In plain language: Four fundamental reasons people value anything — and four corresponding ways to position your content. Are you selling usefulness, meaning, value, or delight? Most brands default to one. The smart ones map where competitors cluster and move to the open quadrant.

  • Example: If every competitor in your category makes "practical" how-to videos, Floch's framework reveals three unoccupied territories: utopian (aspirational brand films), critical (comparison/value content), and ludic (entertaining, playful content).

PLATFORM & DIGITAL SEMIOTICS

Semiotic Technology

  • Academic: Djonov and van Leeuwen's concept describing digital platforms as technologies that provide specific sets of semiotic resources and constrain how meaning can be made, functioning as both tools and shapers of communication.

  • In plain language: A platform isn't just a pipe that delivers your content — it's a grammar that shapes what your content can mean. TikTok, Instagram, and YouTube each offer different meaning-making tools and impose different meaning-making rules. The platform is part of the message.

  • Example: TikTok's Duet feature is a semiotic technology: it creates a specific kind of meaning (dialogic response) that cannot exist outside the platform. The tool itself structures what can be said.

Paratext

  • Academic: Gérard Genette's concept of the textual and visual elements that surround and frame a primary text — titles, prefaces, covers, blurbs — shaping its reception without being part of the text itself.

  • In plain language: Everything around the main content that tells you how to approach it. A book's cover, a movie's poster, a video's thumbnail and title — these aren't the content, but they powerfully shape how you interpret the content before you even start.

  • Example: A YouTube thumbnail showing a shocked face + the title "This Changed Everything" = a paratextual system that activates the hermeneutic code (curiosity) before a single frame of the actual video plays.

Retention Editing

  • Academic: A platform-native editing grammar optimized for maximizing the percentage of a video watched, deploying pattern interrupts, open loops, and visual variety at empirically determined intervals.

  • In plain language: An editing style designed not for beauty or storytelling but for one specific metric: keeping you watching. Every cut, graphic, sound effect, and camera angle change is placed where the data says people tend to leave.

  • Example: A YouTube video where the editor inserts a B-roll cutaway every 15 seconds, drops a sound effect on every key point, and changes the background music at the 8-minute mark — all because retention data shows those are the moments viewers tend to click away.

Open Loop

  • Academic: A narrative device where a question, promise, or incomplete action is introduced and deliberately left unresolved, creating sustained cognitive tension that motivates continued viewing.

  • In plain language: A question your brain can't let go of until it gets an answer. Creators plant these throughout videos like breadcrumbs: "I'll show you the trick that changed everything... but first." Your brain stays tuned in because it hates unfinished business.

  • Example: A creator says in minute 2: "The fourth thing on this list literally doubled my revenue — I'll get to it." You're now mentally committed to watching through items 1, 2, and 3. The open loop is holding you hostage, and you're a willing captive.

Cold Open

  • Academic: A narrative structure beginning in medias res — dropping the viewer into a dramatic, emotionally charged moment from later in the content before reverting to chronological exposition.

  • In plain language: Starting with the climax, then rewinding to the beginning. It's the video equivalent of a novel's first line: "I never should have opened that door." You're hooked by the outcome before you know the story.

  • Example: A YouTube video opens with: "This is the moment we realized everything went wrong" — 3 seconds of dramatic footage — then cuts to "48 hours earlier..." and starts the real story. The ending came first to guarantee you'd stay for the beginning.

Dual-Screen Semiotic Environment

  • Academic: The viewing context where a primary screen (television) provides an audio-visual stream while a secondary screen (smartphone) provides a parallel text-and-image stream, creating split-attention reception conditions.

  • In plain language: The modern living room reality: YouTube plays on the TV, but your phone is in your hand. Your attention bounces between two screens, which means the TV content needs to work through audio alone during the moments you're looking at your phone.

  • Example: During a 20-minute YouTube video on a TV, a viewer might glance at their phone 30+ times. If the video's key message is delivered only through an on-screen graphic with no voiceover, anyone in dual-screen mode misses it entirely.

FRESCO Framework

  • Academic: Morra et al.'s (2024) computational semiotics framework bridging computer vision and visual semiotics by analyzing images across three semiotic levels: the plastic level (formal visual features), the figurative level (identifiable entities and concepts), and the enunciation level (point of view and spectator construction).

  • In plain language: A bridge between how computers see images and how semioticians read them. It breaks every image into three layers: the raw visual stuff (colors, lines, shapes), the recognizable things (faces, objects, places), and the invisible hand of the director (where the camera is, what angle it takes, what it includes or excludes). These three layers map directly to what you specify in an AI video prompt.

  • Example: When you prompt Sora with "close-up of a woman in warm golden light, 35mm lens, shallow depth of field" — you're specifying the plastic level (golden light, shallow DOF), the figurative level (woman), and the enunciation level (close-up = intimacy, 35mm = naturalistic). FRESCO formalizes what good prompt engineers do intuitively.

Pattern Interrupt

  • Academic: An editing or compositional device that breaks an established visual, auditory, or narrative rhythm, resetting viewer attention through unexpected stimulus change at empirically determined intervals.

  • In plain language: A deliberate surprise that snaps your brain back to attention. When a video establishes a rhythm — same shot, same voice, same pace — your brain starts to tune out. A pattern interrupt (a sound effect, a camera change, a graphic popping up) breaks that drift and forces you to re-engage.

  • Example: A YouTube creator talking to camera for 20 seconds suddenly cuts to B-roll with a whoosh sound effect, then cuts back. That 3-second interruption resets the viewer's attention clock. Retention data shows these need to happen every 15-40 seconds depending on the audience's age.

Sound Meme

  • Academic: A short audio clip that accumulates semiotic meaning through iterative reuse across thousands of user-generated videos, functioning as a shared cultural sign whose signified is constituted by the collective history of its prior uses.

  • In plain language: A sound that means something because of how everyone else has used it. The audio clip isn't just music or a voice — it's a template carrying meaning accumulated from every previous video that used it. Choosing a sound is choosing a meaning-field.

  • Example: The "Oh No" sound on TikTok: the clip itself is just a few seconds of a 1960s song. But through thousands of uses, it now pre-structures meaning — whatever visual follows MUST depict something going wrong. The sound has become a sign with a fixed signified, created entirely through collective repetition.

Algorithmic Model Viewer

  • Academic: An extension of Eco's "model reader" concept to platform video, describing the ideal viewer profile computationally constructed by recommendation algorithms — the hypothetical audience member whose predicted engagement patterns determine a video's distribution.

  • In plain language: The algorithm doesn't just distribute your video — it imagines the perfect viewer for it, then goes looking for people who match that profile. This invisible "model viewer" is constructed from engagement data, not demographics. Your content is competing for the attention of a viewer who exists primarily as a statistical prediction.

  • Example: YouTube's algorithm determines that your video about camera gear is most likely to be watched through by males aged 25-34 who previously watched three specific tech channels. That predicted viewer profile — the algorithmic model viewer — determines which of the platform's 2 billion users ever sees your thumbnail.

Gianluca Fiorelli

Article by

Gianluca Fiorelli

With almost 20 years of experience in web marketing, Gianluca Fiorelli is a Strategic and International SEO Consultant who helps businesses improve their visibility and performance on organic search. Gianluca collaborated with clients from various industries and regions, such as Glassdoor, Idealista, Rastreator.com, Outsystems, Chess.com, SIXT Ride, Vegetables by Bayer, Visit California, Gamepix, James Edition and many others.

A very active member of the SEO community, Gianluca daily shares his insights and best practices on SEO, content, Search marketing strategy and the evolution of Search on social media channels such as X, Bluesky and LinkedIn and through the blog on his website: IloveSEO.net.

Share on social media
Share on social media

stay in the loop

Subscribe for more inspiration.