Virtual communication curbs creative idea generation

Laboratory experiment

Following the methodological recommendation38 of “generaliz[ing] across stimuli by replicating the study across different stimuli within a single experiment”, we collected our laboratory data with stimulus replicates in two batches. Where possible, we combined the data from the two batches to increase statistical power. When the two batches of data were combined, our power to detect a difference in conditions at our effect size was 89%. Below, we outline the methods for each stimulus batch.

Stimulus 1: frisbee

Procedure

Three hundred participants (202 female, 95 male, Mage = 26.1; s.d.age = 8.61; three participants did not complete the survey and are therefore missing demographic information) from a university student and staff pool in the United States participated in the study in exchange for US$10. We posted timeslots in an online research portal that allowed each participant to enroll anonymously into a pair. The participants provided consent before beginning the study. This study was approved by the Stanford University Human Subjects Ethics Board (protocol 35916). The laboratory study was conducted by university research assistants blinded to the hypothesis who were not present during the group interaction.

On arrival, the pairs were informed that their first task was to generate creative alternative uses for a Frisbee and that their second task was to select their most creative idea. These tasks were incentive-aligned: each creative idea that was generated (as scored by outside judges) earned the pair one raffle ticket for a US$200 raffle, and selecting a creative idea earned the pair five additional raffle tickets. Half of the teams (n = 75) learned that they would be working together on the task in the same room, whereas the other half (n = 75) were told that they would be working in separate rooms and communicating using video technology (WebEx, v.36.6–36.9). Groups were assigned in an alternating order, such that the first group was in-person, the second group was virtual and so on. This ensured an equal and unbiased recruitment of each condition.

Before being moved to the task room(s), one participant was randomly selected to be the typist (that is, to record the ideas during the idea-generation stage and indicate the selected idea in the idea-selection stage for the pair) by drawing a piece of paper from a mug. In both communication modalities, each team member had an iPad with a blank Google sheet open (accessed in 2016). The typist had a wireless keyboard and editing capabilities, whereas the other team member could only view the ideas on their iPad. Thus, only the typist could record the generated ideas and select the pair’s top idea, but both members had equal information about the team’s performance (that is, the generated ideas and the selected idea). In-person pairs sat at a table across from each other. Virtual pairs sat at identical tables in separate rooms with their partner displayed on video across from them. The video display was a full-screen video stream of only their partner (the video of the self was not displayed) on a 15-inch retina-display MacBook Pro.

Each pair generated ideas for 5 min and spent 1 min selecting their most creative idea. They indicated their top creative idea by putting an asterisk next to the idea on the Google sheet. Nine groups did not indicate their top idea on the Google sheet; in the second batch of data collection, we used an online survey that required a response to prevent this issue. Finally, as an exploratory measure, each pair was given 5 min to evaluate each of their ideas on a seven-point scale (1 (least creative) to 7 (most creative)).

Once pairs completed both the idea-generation and the idea-selection task, each team member individually completed a survey on Qualtrics (accessed in 2016) in a separate room.

Stimulus 2: bubble wrap

Procedure

Participants (334) from a university student and staff pool in the United States participated in the study in exchange for US$15. We also recruited 18 participants from Craigslist in an effort to accelerate data collection. However, the students reported feeling uncomfortable, and idea generation performance dropped substantially with student–craigslist pairs, so we removed these pairs from the analysis. Our final participant list did not overlap with participants in the first batch of data collection in the laboratory. The participants provided consent before beginning the study. This study was approved by the Stanford University Human Subjects Ethics Board (protocol 35916). The laboratory study was conducted by university research assistants blinded to the hypothesis who were not present during the group interaction.

We a priori excluded any pairs who experienced technical difficulties (such as screen share issues, audio feedback or dropped video calls) and aimed to collect 150 pairs in total. Our final sample consisted of 302 participants (177 females, 119 males, 2 non-binary, Mage = 23.5, s.d.age = 7.09; we are missing demographic and survey data from four of the participants). Mimicking the design of the first batch of data collection in the laboratory, pairs generated uses for bubble wrap for 5 min and then spent 1 min selecting their most creative idea. As before, half of the teams learned that they would be working together on the task in the same room (n = 74), whereas the other half (n = 77) were told that they would be working in separate (but identical) rooms and communicating using video technology (Zoom v.3.2). The groups were assigned in an alternating order, such that the first group was in-person, the second group was virtual and so on. This ensured an equal and unbiased recruitment of each condition. Again, one partner was randomly assigned to be the typist. The tasks were incentivized using the same structure as the first batch of collection.

For in-person pairs, each participant had a 15-inch task computer directly in front of them with their partner across from them and situated to their right. For virtual pairs, each participant had two 15-inch computers: a task computer directly in front of them and computer displaying their partner’s face to their right (again, self-view was hidden). This set-up enabled us to unobtrusively measure gaze by using the task computer to record each participant’s face during the interaction: in both conditions, the task was directly in front of each participant and the partner was to each participant’s right.

In contrast to the first batch of data collection, we used Qualtrics (accessed in 2018) to collect task data. Pairs first generated alternative uses for bubble wrap. After 5 min, the page automatically advanced. We next asked each pair to select their most creative idea and defined a creative idea as both novel (that is, different from the normal uses of bubble wrap) and functional (that is, useful and easy to implement). The pair had exactly 1 min to select their most creative idea. After 1 min, the page automatically advanced. If the pair still had not selected their top idea, the survey returned the selection page and marked that the team went over time. Virtual and in-person pairs did not significantly differ in the percentage of teams that went over time (that is, took longer than a minute); 17.6% of in-person pairs and 16.9% of virtual pairs went over time (Pearson’s χ21 = 0.001, P = 0.926). Finally, as exploratory measures, each pair (1) selected an idea from another idea set and then (2) evaluated how novel and functional their selected idea was on a seven-point scale.

Importantly, in both conditions, the task rooms were populated with ten props: five expected props (that is, props consistent with a behavioural laboratory schema (a filing cabinet, folders, a cardboard box, a speaker and a pencil box)) and five unexpected props (a skeleton poster, a large house plant, a bowl of lemons, blue dishes and yoga ball boxes; Extended Data Fig. 1, inspired by ref. 39). Immediately after the task, we moved the participants into a new room, separated them and asked the participants to individually recreate the task room on a sheet of paper39.

After the room recall, to measure social connection, each participant responded to an incentive-aligned trust game40. Specifically, each participant read the following instructions: “Out of the 150 groups in this study, 15 groups will be randomly selected to win $10. This is a REAL bonus opportunity. Out of the $10, you get the choose how much to share with your partner in the study. The amount of money you give to your partner will quadruple, and then your partner can choose how much (if any) of that money they will share back with you.”

The participants then selected how much money they would entrust to their partner in US$1 increments, between US$0 and US$10. Finally, the participants then completed a survey with exploratory measures.

Dependent measures

Measure of idea generation performance

Researchers conducting the analyses were not blinded to the hypothesis and all data were analysed using R (v.4.0.1). We first computed total idea count by summing the total number of ideas generated by each pair. Then, for the key dependent measure of creative ideas, we followed the consensual assessment technique41 and had two undergraduate judges (from the same population and blind to condition and hypothesis) evaluate each idea on the basis of novelty. Specifically, each undergraduate judge was recruited by the university’s behavioural laboratory to help code data from a study. Each judge was given an excel sheet with all of the ideas generated by all of the participants in a randomized order and was asked to evaluate each idea for novelty on a scale of 1 (not at all original/innovative/creative) to 7 (very original/innovative/creative) in one column of the excel sheet and to evaluate each idea for value on a scale of 1 (not at all useful/effective/implementable) to 7 (very useful/effective/implementable) in an adjacent column. Anchors were adopted from ref. 42.

Judges demonstrated satisfactory agreeability (stimulus 1: αnovelty = 0.64, αvalue = 0.68, stimulus 2: αnovelty = 0.75, αvalue = 0.67) on the basis of intraclass correlation criteria delineated previously43. The scores were averaged to produce one creativity score for each idea. We computed the key measure of creative idea count by summing the number of ideas that each pair generated that surpassed the average creativity score of the study (that is, the grand mean of the whole study for each stimulus across the two conditions). Information about average creativity is provided in Supplementary Information R.

Measure of selection performance

We followed previous research and calculated idea selection using two different methods23,24. First, we examined whether the creativity score of the idea selected by each pair differed by communication modality (both with and without controlling for the creativity score of the top idea). Second, we calculated the difference between the creativity score of the top idea and the creativity score of the selected idea. A score of 0 indicates that they selected their top-scoring idea, and a higher score reflects a poorer decision.

Stimulus 2 process measures

Room recall

The room contained five expected props and five unexpected props. If virtual participants are more visually focused, they should recall fewer props and, specifically, the unexpected props that cannot be guessed using the schema of a typical behavioural laboratory. To test this, we counted the number of total props (out of the ten) and unexpected props (out of five) that participants drew and labelled when sketching the room from memory. We did not include other objects in the room (such as the computer and door) in our count.

Eye gaze

We used OpenFace (v.2.2.0), an opensource software package, to automatically extract and quantify eye gaze angles using the recording of each participant taken from their task computer34. From there, we had at least two independent coders (blinded to the hypothesis and condition) look at video frames of eye gaze angles extracted from the software and indicate the idiosyncratic threshold at which each participant’s eye gaze shifted horizontally (from left to centre, and centre to right, α = 0.98) and vertically (up to centre, and centre to down, α = 0.85). Out of 302 participants, 275 videos of participants yielded usable gaze data. Nine videos were not saved, six videos cut off participants’ eyes, four videos were too dark to reliably code, two videos were corrupted and could not load, two videos contained participants with glasses that resulted in eye gaze misclassification, two videos (one team) did not have their partner to their right and two videos were misclassified by OpenFace.

Using these thresholds, we calculated how often each participant looked at their partner, the task and the surrounding room. To repeat, the recording came from the task computer, and the partner was always situated to the participant’s right (or from the perspective of a person viewing the video, to the left). As human coders marked the thresholds (blind to the hypothesis and condition), we report the categorizations from the perspective of an observer of the video. Specifically, looking either (1) horizontally to the left and vertically centre or (2) horizontally to the left and vertically down was categorized as ‘partner gaze’; looking either (1) horizontally centre and vertically centre or (2) horizontally centre and vertically down was categorized as ‘task gaze’; and the remaining area was categorized as ‘room gaze’, which encompassed looking (1) horizontally left and vertically upward, (2) horizontally centre and vertically upward, (3) horizontally right and vertically upward, (4) horizontally right and vertically centre, and (5) horizontally right and vertically down (Fig. 2; consent was obtained to use these images for publication). We chose this unobtrusive methodology instead of more cumbersome eye-tracking hardware to maintain organic interactions—wearing strange headgear could make participants consciously aware of their eye gaze or change the natural dynamic of conversation.

We excluded six videos that were less than 290 s long. The effects do not change in significance when these videos are included in the analyses. With these excluded videos, as before, virtual groups spent significantly more time looking at their partner (Mvirtual = 90.6 s, s.d. = 58.3, Min-person = 52.6 s, s.d. = 54.3, linear mixed-effect regression, n = 276 participants, b = 38.00, s.e. = 6.95, t139 = 5.46, P < 0.001, Cohen’s d = 0.68, 95% CI = 0.43–0.92) and spent significantly less time looking at the surrounding room (Mvirtual = 32.4 s, s.d. = 34.6, Min-person = 60.9 s, s.d. = 43.7, linear mixed-effect regression, n = 276 participants, b = 28.44, s.e. = 4.96, t145 = 5.74, P < 0.001, Cohen’s d = 0.73, 95% CI = 0.48–0.98; Fig. 2). There was again no evidence that time spent looking at the task differed by modality (Mvirtual = 176 s, s.d. = 63.6, Min-person = 184 s, s.d. = 63.0, linear mixed-effect regression, n = 276 participants, b = 7.39, s.e. = 7.63, t274 = 0.97, P = 0.334, Cohen’s d = 0.12, 95% CI = −0.12–0.35). Importantly, gaze around the room was significantly associated with an increased number of creative ideas (negative binomial regression, n = 146 pairs, b = 0.003, s.e. = 0.001, z = 3.10, P = 0.002). Furthermore, gaze around the room mediated the effect of modality on idea generation (5,000 nonparametric bootstraps, 95% CI = 0.05 to 1.15).

Related Posts

Leave a Reply

Your email address will not be published.

%d bloggers like this: