The Musings of Jaime David
The Musings of Jaime David
@jaimedavid.blog@jaimedavid.blog

The writings of some random dude on the internet

1,120 posts
1 follower

Tag: user experience

  • The Absurd Wall Around Picture‑in‑Picture for Music on YouTube

    The Absurd Wall Around Picture‑in‑Picture for Music on YouTube

    There is something uniquely frustrating about running headfirst into a limitation that feels completely artificial. Not a technical constraint. Not a hardware shortcoming. Not even a genuine legal impossibility. Just a wall, quietly erected, that exists because someone decided it should. YouTube’s refusal to allow music content to run in Picture‑in‑Picture mode on iPhone and iPad is one of those walls. It stands there, immovable, while everything around it suggests that it should not exist at all. Videos can do it. Movies can do it. Shows can do it. Long‑form documentaries, podcasts with visuals, talking heads, gaming streams, and even YouTube’s own ads can shrink down and float obediently in a corner of the screen. But the moment the content is classified as music, suddenly it becomes impossible, impractical, or “too complex” to allow the same basic functionality.

    Picture‑in‑Picture, at its core, is not some experimental or bleeding‑edge feature anymore. On iOS and iPadOS, it has been a standard part of the operating system for years. Apple provides native APIs for it. Developers do not have to invent it from scratch, reverse‑engineer obscure behavior, or hack together unstable workarounds. It is documented, supported, and widely used. Countless apps implement it with relative ease. Video players use it. Streaming platforms use it. Even YouTube itself uses it extensively, as long as the content being played falls into the “acceptable” category. Which makes the exclusion of music content not just annoying, but baffling.

    What makes this restriction feel especially absurd is how arbitrary the distinction actually is. A music video is still a video. From a technical standpoint, there is no magical difference between a music video and any other video file hosted on YouTube’s servers. The codec is the same. The playback pipeline is the same. The streaming infrastructure is the same. The app already knows how to keep a video playing while the user switches apps. The floating window already exists. The only thing that changes is a label, a category, a business rule. Suddenly, a video that could float freely a moment ago is now locked in place, demanding your full attention or nothing at all.

    This is where the frustration really sets in for users, especially on mobile devices like the iPhone and iPad. These are inherently multitasking devices. Apple markets them that way. People use them that way. You listen to music while reading, while writing, while scrolling, while checking messages, while doing literally anything else. Picture‑in‑Picture fits perfectly into that reality. It allows content to remain present without monopolizing the entire screen. It respects the user’s time, attention, and workflow. Blocking music from that experience feels less like a technical oversight and more like a deliberate act of control.

    The irony is that YouTube clearly understands the value of Picture‑in‑Picture. They did not reluctantly implement it under pressure. They actively promote it as a premium feature, especially on iOS. They advertise it as part of the YouTube Premium experience, a way to keep videos playing while you use other apps. They know users want this. They know it improves usability. They know it aligns with how people actually use their devices. And yet, when it comes to music, they draw an arbitrary line and pretend that crossing it would somehow break the universe.

    From the user’s perspective, this makes absolutely no sense. Music is arguably the most natural candidate for Picture‑in‑Picture. If anything, music needs visuals less than other content. Most people are not actively watching a music video the entire time it plays. The visuals are often secondary, symbolic, or simply background flair. The primary purpose is the audio. If YouTube can keep a floating video window active for a two‑hour podcast where the visuals are a static shot of someone talking into a microphone, then claiming that a four‑minute music video is somehow incompatible with Picture‑in‑Picture strains all credibility.

    The situation becomes even more absurd when you consider that YouTube already allows background playback for music under certain conditions, again usually tied to Premium. The audio can continue when the screen is off. The audio can continue when you leave the app. The system clearly has no problem handling continuous music playback. So what exactly is the obstacle to letting that same content play in a small, floating window? There isn’t one, at least not a technical one. The infrastructure is already there. The behavior already exists in slightly different forms. The restriction is selective and intentional.

    This selective limitation feeds into a broader pattern that many users have noticed over the years with YouTube’s mobile apps, especially on iOS. Features are not withheld because they are impossible. They are withheld because they are useful. The more useful a feature is, the more likely it is to be gated, restricted, or segmented into a paid tier. Picture‑in‑Picture for music feels like a textbook example of this philosophy. By making music playback more inconvenient, YouTube nudges users toward YouTube Music, YouTube Premium, or alternative listening habits that better serve the company’s revenue goals.

    But even from a business perspective, the logic feels shortsighted. Frustrating users does not necessarily push them toward paid subscriptions. In many cases, it pushes them away from the platform entirely. When people realize that a basic, expected feature is being withheld for no defensible reason, resentment builds. That resentment does not always translate into loyalty or conversion. Sometimes it translates into users seeking out other platforms, other apps, or other ways of consuming the same content with fewer artificial barriers.

    There is also an accessibility dimension to this issue that rarely gets discussed. Picture‑in‑Picture is not just a convenience feature. For some users, it is a usability necessity. People with attention differences, neurodivergent users, or those who rely on multitasking to stay engaged often benefit from having content present without dominating their screen. Music, in particular, can be grounding, regulating, or focus‑enhancing. Denying Picture‑in‑Picture for music disproportionately affects these users, all in service of a categorization rule that exists purely at the platform level.

    On iPad especially, the restriction feels almost comical. The iPad is designed for multitasking. Split View, Slide Over, Stage Manager, and Picture‑in‑Picture are core features of the device’s identity. Using YouTube on an iPad and discovering that a lecture, a movie, or a random vlog can float neatly in the corner, while a music video stubbornly refuses to do so, highlights how unnatural the limitation really is. The device is capable. The OS is capable. The app is capable. The content is capable. Only the policy is not.

    Some defenders of the status quo might argue that music licensing complicates things, that record labels impose restrictions, or that contracts somehow prohibit Picture‑in‑Picture. But this argument quickly falls apart under scrutiny. Music already plays in the background. Music already streams across devices. Music already appears in countless contexts where the visuals are incidental. If licensing were the real obstacle, we would see far more consistent limitations across playback modes. Instead, what we see is a finely tuned set of restrictions that align suspiciously well with monetization strategies.

    The inconsistency becomes even clearer when you compare YouTube to other platforms. Many music and video apps have no problem allowing Picture‑in‑Picture or equivalent behavior for audio‑focused content. Some apps go even further, integrating mini players, persistent controls, and seamless transitions between visual and audio modes. These apps demonstrate, again and again, that there is nothing inherently difficult about letting music coexist with multitasking. YouTube’s refusal to do the same stands out precisely because it is an outlier, not a norm.

    There is also a philosophical question at the heart of this issue: who controls how content is consumed? When a platform decides that certain types of content must be consumed in a specific, constrained way, it sends a message about ownership and agency. YouTube hosts the content, but users experience it. Blocking Picture‑in‑Picture for music is a subtle assertion of control, a way of saying that even if your device can do this, even if the OS encourages it, even if it would improve your experience, the platform gets the final say.

    This tension between platform control and user autonomy is not new, but it becomes especially visible in cases like this because the justification is so thin. If Picture‑in‑Picture for music genuinely broke something, degraded quality, or introduced instability, users might accept it reluctantly. But when everything else works perfectly and only music is excluded, the explanation rings hollow. It feels less like a technical decision and more like a power move.

    The end result is a worse experience for everyone except, perhaps, the balance sheet. Users are forced to keep the YouTube app in the foreground just to listen to a song. They are discouraged from multitasking. They are subtly punished for using YouTube as a music platform rather than switching to a separate, branded music app. All of this friction accumulates, turning what should be a seamless, modern experience into something clunky and outdated.

    What makes this particularly frustrating is how easily it could be fixed. There is no need for groundbreaking engineering. No need for new standards. No need for radical redesigns. The feature already exists. The app already supports it. The only thing required is the decision to allow it. Flip the switch. Remove the arbitrary exception. Treat music videos like the videos they are. Respect the reality of how people use their devices.

    Until that happens, YouTube’s handling of Picture‑in‑Picture for music will remain a symbol of a broader problem in modern platforms: the tendency to prioritize control and monetization over user experience, even when doing so makes the product objectively worse. It is a reminder that many of the frustrations people feel with large tech platforms are not about bugs or limitations, but about choices. Choices to restrict, to gate, and to complicate things that should be simple.

  • Musing Mondays #5: The Cost of Convenience: How AI Voice Assistants Are Changing Customer Experience

    Musing Mondays #5: The Cost of Convenience: How AI Voice Assistants Are Changing Customer Experience

    Technology is evolving at a rapid pace, and with it comes a slew of innovations that promise to make our lives easier. One area where this is particularly visible is in the realm of customer service, where automated voice assistants are increasingly replacing human operators. While these systems are designed to streamline processes and improve efficiency, they can also introduce a host of new challenges — particularly for users who rely on certain accommodations or prefer more personalized interactions.

    Take Capital One’s recent change to its phone-based voice assistant system, for example. The company has transitioned from a human-like, slow-paced AI to a more robotic-sounding one that speeds through instructions. While the change is likely designed to improve speed and efficiency, it has left many users, especially those with specific needs, frustrated and dissatisfied.

    This shift is more than just a matter of convenience; it brings to light critical questions about how technology serves its users. As AI becomes more integrated into our daily lives, we must consider the ways it impacts accessibility, inclusivity, and user experience. What happens when the “smart” systems we rely on start to overlook the diverse ways in which people interact with technology?


    Accessibility and the Hidden Costs of “Efficiency”

    When a company like Capital One rolls out a new AI voice assistant, the goal is often to create a system that can handle more users faster. And, on the surface, this seems like a win for efficiency. However, for those who are neurodivergent, have sensory sensitivities, or simply need a little extra time to process spoken information, the faster, more robotic assistant is anything but a win.

    For many, using keypad inputs or interacting with slower, more human-like assistants was a much more comfortable and effective way to manage tasks like paying bills or checking balances. But the shift to a voice-only system with no alternative can feel alienating. Users are forced into a style of interaction that may not suit their needs, and without proper accommodations, they’re left to adapt — or struggle.

    This isn’t an isolated issue. Across the tech industry, from customer service lines to smartphone apps, companies are increasingly opting for voice-first or AI-driven solutions. Yet, in this push for automation, the subtle human element of customer service is often lost — along with the empathy that comes with it.


    The Pushback: How Users Are Reacting

    As the AI assistant landscape shifts, many users are vocal about their dissatisfaction with these changes. Some argue that AI can never truly replace human interaction, especially when it comes to understanding the needs of a diverse user base.

    From Reddit:
    One user said:

    “The older system let me use the keypad for everything, and I didn’t have to speak at all. Now it forces me to talk even when I don’t want to.”
    This user’s frustration reveals the key problem with forcing voice-based interactions: it ignores the reality that some users are not comfortable speaking or may find it difficult to process information quickly.

    From X (formerly Twitter):
    Another user tweeted:

    “I miss the old voice — it felt like it understood I needed time. This new one just speeds through everything.”
    Here, the user is expressing a need for more time and a slower pace, something that a robotic-sounding assistant is unable to provide.

    From Trustpilot:
    A user posted:

    “It talks too fast and I can’t even understand the menu options half the time.”
    This user points out the speed of the new voice and how it affects comprehension — something especially concerning for those with auditory processing challenges.

    From Reddit (again):
    One more comment shared:

    “This new robot voice is annoying AF. Bring back the old assistant!”
    For this user, the problem isn’t just about speed — it’s about how the assistant’s robotic tone makes the experience feel less human and more disconnected.

    These reactions aren’t simply complaints; they are signals that AI systems need to evolve alongside the diverse ways people interact with technology. It’s not just about functionality; it’s about understanding the needs of users in a nuanced, empathetic way.


    How Tech Companies Can Do Better

    While it’s clear that AI and voice assistants are here to stay, it’s essential that companies make their services more inclusive and accessible. The rapid adoption of AI shouldn’t come at the expense of those who rely on alternative methods of interaction.

    Here are a few suggestions for how companies like Capital One (and others in the banking and tech sectors) can better serve their customers:

    • Offer a Choice of Interaction Methods: Companies should allow users to choose between keypad inputs, voice prompts, and other modes of interaction, ensuring that users can find the method that works best for them.
    • Slow Down AI Speech: For users who need extra time to process information, slowing down the speech rate could improve the experience for many people.
    • Involve Diverse User Groups in Testing: When developing AI systems, companies should include a range of neurodivergent users and others with accessibility needs in the testing phase, ensuring that the system works for everyone.
    • Avoid Over-Promising on Speed: The assumption that faster equals better doesn’t work for everyone. Companies need to be mindful that in the pursuit of speed, they don’t alienate the people who rely on more thoughtful, human-paced interactions.

    Tech for All: Striving for Inclusivity

    As AI technology continues to evolve, we must ask ourselves: Who is it really benefiting? A new, faster system may improve efficiency, but if it alienates users who need slower, more customizable options, is it really an improvement?

    In a world where we are increasingly dependent on technology for day-to-day tasks, it’s essential that we strive for solutions that are inclusive and accessible for everyone. After all, the most efficient technology is the one that works for everyone, not just those who fit a particular mold.


    Have you encountered similar frustrations with voice assistants? Share your experience in the comments below — let’s keep the conversation going about accessibility in AI.