A Theory of Zoom Fatigue, Hacker News

Why is video-conferencing so exhausting?

The question is worth asking because it might be helpful for us to understand what is involved in an activity many of us cannot now avoid . It’s also worth asking because of how it helps us think about human communication under any circumstances.

Here’s how I came across the question this morning:

There are a number factors at play here, but many of them do tend to coalesce around one important consideration: the body matters to the work of communicating and sense-making.

I’ll get back to that in just a moment, but first some other factors, briefly noted.

Communication is always hard work, and the fact that we are pretty good at it in certain contexts makes it easy to forget this. But think for a moment about the wonder of it all. You have things going on inside of you, sometimes very complex things, and you manage to get some sense of those things across to someone who is not you through the use of sounds, gestures, facial expressions, inscriptions, images, etc. From a certain perspective, it’s a small miracle whenever it happens.
The media we use to communicate with one another always make it easier to do so in some respects and harder in others. This may seem like a banal observation, but, again, I think we tend to forget this. It is especially useful to consider what a medium makes possible by way of expression, what it makes impossible, what it encourages, what it discourages, etc. I would suggest, too, that there is an art to using each medium well.
With regards to video conferencing specifically, it’s much too tempting to multi-task while we do so. But as we should all know by now, nobody multi-tasks well. It’s especially exhausting to be continuously dropping a conversational thread and picking it up again. Something as seemingly benign as a notification flashing on the screen, even if we don’t attend to it for more than a split second, can throw us off the thread of thought and the momentary work of trying to pick it up again takes a mental toll.
We should bear in mind the physical strain of video-conferencing, which we might associate with poor posture required by less than ideal set -ups, having to remain relatively still in front of a camera, staring at a screen for long periods of time, etc. All of this can be enough to make the experience challenging.
When video conferencing it’s not just that we are tempted to check email or social media while simultaneously attending to meeting participants. It is also the case that we are paying attention to ourselves in an odd way. Thanks to my image on the screen, I’m conscious of myself not only from within but also from without. We are always to some degree internally conscious of ourselves, of course, but this is the usual “I” in the “I-Thou” relation. Here we are talking about something like an “I-Me-Thou” relation. It would be akin to having a mirror of ourselves that only we could see present whenever we talked with others in person. This, too, amounts to a persistent expenditure of social and cognitive labor as I inadvertently mind my image as well as the images of the other participants.

Now on to the main point, which is not altogether unrelated to some of the foregoing considerations. The body matters in communication, but perceiving an image of a body in virtual space rather than perceiving a body itself in shared space may be worse than not perceiving a body at all.

Several years ago I drew on the work of the philosopher Maurice Merleau-Ponty to explore how a smartphone mediates conversations. I’ll do the same here to think about video-conferencing.

It starts with a particular theory of perception. Hubert Dreyfus helpfully explained an important element of how Merleau-Ponty thought about what was going on when we perceive the world:

“It is crucial that the agent does not merely receive input passively and then process it. Rather, the agent is already set to respond to the solicitations of things. The agent sees things from some perspective and sees them as affording certain actions. What the affordances are depends on past experience with that sort of thing in that sort of situation. ”

In other words, when we perceive the world our senses are not just objectively reporting facts about the world to our minds. Rather, our mind is already at work interpreting and construing the world according to its store of past experience. It’s always seeing-as, not simply seeing.

Consider how Charles Taylor explains this dynamic:

“As I navigate my way along the path up the hill , my mind totally absorbed anticipating the difficult conversation I’m going to have at my destination, I treat the different features of the terrain as obstacles, supports, openings, invitations to tread more warily or run freely, and so on. Even when I’m not thinking of them, these things have those relevances for me; I know my way about among them. ”

And in order to make our way in the world in this way, our bodies seek out a position from which to achieve an “optimal grip” on our environment. One common example is how we position ourselves before a painting in a gallery in order to take it in. Ordinarily, there’s no sign that says “Stand Here,” but we naturally find the place anyway without giving it much conscious thought.

Likewise, in face-to-face conversation we are constantly seeking out the components of meaning afforded by the body of our interlocutors, we are seeking an optimal grip on the communicative process. While our conscious attention is focused on words and their meaning, our fuller perceptive capabilities are engaged in reading the whole environment. In conversation, then, each person becomes a part of a field of communication that includes, but is not limited to verbal expression.

The problem with video-conferencing is that the body is but isn’t there. This means that our minds are at least partly frustrated as they deploy their non-conscious repertoire of perceptive skills. The situation is more like a face-to-face encounter than most any other medium, but, for that very reason, it frustrates us because it is not quite the same. I suppose we might think of it as something like a conversational uncanny valley. The full range of what the mind assumes should be available to it when it perceives a body, simply isn’t there.

The body is there as a two dimensional image before us, interacting with us in something approaching simultaneity. But because we are not actually sharing the same physical space, we can’t quite achieve the optimal grip we’re searching for. For one thing, unless someone is making a determined effort, eye contact is never quite right. And someone is making a determined effort to look into the camera, then that means they are not seeing your eyes. It’s a no-win situation.

Other factors make the video-conferencing exchange slippery to a mind seeking optimal grip:

Even with a very good connection, there can be an almost imperceptible lag time that adds another layer of slipperiness resisting our mind’s efforts to achieve an optimal grip. This is to say nothing of the sometimes frequent and obvious lags created by unstable internet connections.

Faces are present to us, but typically at a less than ideal distance, making it difficult to perceive the subtle cues we rely on to gauge whether someone is following along, interested, confused, disengaged, etc . The more participants, the smaller the screen, the harder to pick up such cues.

Participants are not, in fact, sharing the same physical space, making it difficult to perceive our conversation partners as part of a cohesive perceptive field. They lose their integrity as objects of perception, which is to say they don’t appear whole and independent; they appear truncated and as parts of a representation within another object of perception, the screen.

What all of this amounts to, then, is a physically, cognitively, and emotionally taxing experience for many users as our minds undertake the work of making sense under such circumstances. Which is not to say that one should avoid video-conferencing altogether or that it does not have certain virtues. Right now most everything is operating in a less than ideal manner, and we’re fumbling our way toward some version of “good enough.” But in order to use these tools well, it’s worth reckoning with what Zoom or Skype can and cannot do. We should understand, too, how they might be undermining our stated objectives and what we are asking of others when we mandate their use.