Voluntary Modulations of Attention in a Semantic Auditory-visual Matching Task: an Erp Study

The present study explores the neural correlates of voluntary modulations of attention in an auditory-visual matching task. Visual stimuli (a female or a male face) were preceded in close temporal proximity by auditory stimuli consisting of the Spanish word for " man " and " woman " (" hombre " or " mujer "). In 80% of the trials the gender of the two stimuli coincided. Participants were asked to mentally count the specific instances in which a female face appeared after hearing the word " man " (10 % of the trials). Our results show attention-related amplitude modulation of the early visual ERP components N1 and anterior P2, but also amplitude modulations of (i) the N270 potential usually associated with conflict detection, (ii) a P300 wave related to infrequency, and (iii) an N400 potential related to semantic incongruence. The elicitation of these latter components varied according to task manipulations, evidencing the role of voluntary allocation of attention in fine-tuning cognitive processing, which includes basic processes like detection of infrequency or semantic incongruity often considered to be volition-independent.


INTRODUCTION
Selective attention is known to exert a topdown modulation of cognitive mechanisms at different levels.The system described as the voluntary network of attention constitutes one of the main mechanisms for redistributing attentional resources according to a macrocontext or an experimental setting, for example determining the relevance of a specific stimulation condition in a task-solving oriented strategy (Kok, 2001;Corbetta and Shulman, 2002).In this context, voluntary attention has been shown to modify the timing and allocation of resources related to processes usually considered to be automatic (Yantis and Jonides, 1990).Attentional allocation is usually inferred from improved behavioral performance and increased amplitude in attention-sensitive ERP components (Posner, 1980;Hillyard and Anllo-Vento, 1998).
Conflict detection, on the other hand, is a well characterized process that produces specific electrophysiological and hemodynamic responses that depend on stimulus complexity (van Veen and Carter, 2002;Mao and Wang, 2007).Recently, there has been increasing interest in conflict detection mechanisms during multimodal incongruity tasks (Larsen et al., 2003;Teder-Salejarvi et al., 2005).Although the relationship between attention and conflict detection has been explored in single modality tasks (Wang et al., 2003), their interplay in the case of crossmodal integration needs further research.Our goal in this study is to address the effects of voluntary attention in conflict detection mechanisms, using a bimodal auditoryvisual gender-matching task.
Event Related Potentials (ERP) has been widely used to study attention, crossmodal integration and conflict detection.Recently, Wang et al. (2002b) demonstrated the elicitation of a negative component around 270 ms related to incongruity (N270) in a visual-auditory intermodal integration task where male or female faces where contrasted with the voice of a male or a female pronouncing a vowel.The amplitude of this component increased when there was gender incongruence between the face and the voice.Along the same line, several studies report a family of negative ERP components associated with detection and processing of conflictive information, which are elicited depending on the number and nature of stimulus attributes in conflict (Wang et al., 2004).The N270 potential can be elicited in tasks where two visual stimuli differ in form (Cui et al., 2000) , color (Tian et al., 2001), spatial localization (Yang and Wang, 2002) or other characteristics (Zhang et al., 2001;Wang et al., 2002a;Zhang et al., 2002;Zhang et al., 2003).Facing more complex conflicts, for example when stimuli differ in more than one attribute, the N270 wave is also elicited but followed by a later negative deflection peaking around 400 ms (Wang et al., 2004).This late negative component has been widely reported in association with semantic incongruence (Kutas and Hillyard, 1980a;Kutas and Federmeier, 2000).In this context, the absence of an N400 ERP component in the study by Wang et al. (2002b) described above, may reflect the fact that only phonological, and not semantic processing, of the auditory stimuli was necessary to solve the task.
The task relevance of the conflict has been also explored presenting relevant and irrelevant incongruent conditions (Wang et al., 2001).A larger N270 for the relevant condition has been reported and interpreted as evidence of an attentional enhancement of conflict processing activity when the conflict is relevant to the task, in spite of the usually automatic operation of this process (Wang et al., 2003).The present work was intended to address the following questions: a) what is the role of voluntary allocation of attention in the conflict detection system?b) how does the relative salience of the conflict (relevant vs. irrelevant for the task) and its semantic nature (male vs. female) modify the activity of this system?METHODS Participants: Ten right-handed healthy volunteers, native Spanish speakers (five of them women), between 18 and 30 years of age, with normal or corrected-to-normal vision and no reported auditory problems were recruited.Informed consent was obtained from all participants.
Stimuli: Auditory and visual stimulation was accomplished using the STIM™ system (NeuroScan-Compumedics) synchronized with a digital electroencephalograph.Auditory stimuli (the words "man" and "woman" in Spanish language) were presented binaurally through insertion earphones at 90 dB spl.Visual stimuli consisted of images of a male or a female face presented at the center of a 21' computer screen (5º of visual field).The monitor was placed at a distance of 50 cm in front of the subject.
Procedure: Each trial began with a fixation cross presented in the center of the screen for 200 ms.This was followed by the auditory stimulus, which had an approximate duration of 500 ms.A silent and blank screen with a random duration of 50 -150 ms separated the auditory stimulus from the visual presentation.Visual stimuli were presented for 250 ms and followed by a blank screen for 600 ms that preceded the next trial (Fig. 1).
The experimental design consisted of 240 trials.In 80% of the trials, auditory and visual input were coincident (either the word "man" was followed by a male face, or the word "woman" was followed by a female face), and in the other 20% of the trials, stimuli were incongruent (which was the word "man" followed by the female face, or the word "woman" was followed by a male face).Although there were two incongruent conditions, each appearing 10% of the time, participants were asked to pay attention and mentally count only the occurrence of the female face after hearing the word "man".Thus, there were two equally incongruent and equally infrequent conditions, but one was being voluntarily attended (relevant condition) while the other was irrelevant to the task.
The task was conducted in two blocks of equal duration, separated by a resting period of at least 5 minutes.After each block, the participants reported the number of occurrences of the target conflictive pair, which was the female face after the word "man".All participants had accuracy reports above 90 % in both blocks.The experiment was conducted in afternoons for all subjects.They were asked to relax while they were prepared for the EEG recording session.
EEG Recording: Electrophysiological signals were recorded using a NeuroScan™ 80-channel digital electroencephalograph with high-resolution NuAmps™ amplifiers.An 80-channel cap (QuickCap™) from the same company was used for electrode Fig. 1: Trial sequence and experimental conditions.Auditory stimulus was followed by a visual stimulus that could match or not regarding gender (Match -Mismatch conditions).Subjects were asked to pay attention and mentally count only the mismatch condition in which the word "man" was followed by a female face (task relevant -irrelevant conditions).
placement.Impedances were kept below 5 kΩ throughout the recordings.A/D sampling frequency was set at 250 Hz.A band pass digital filter between 0.5 and 30 Hz was later applied to remove unwanted frequency components.Two additional bipolar derivations were used to monitor vertical and horizontal ocular movements (EOG).Continuous EEG data were segmented between 200 ms prior to each visual stimulus and 800 ms after it.All segments with eye movement contamination, or any other technical or biological artifact, were removed from any further analysis.Artifact free segments were averaged to obtain the ERPs.The EEGLAB (Delorme and Makeig, 2004) Matlab toolbox was used for EEG off-line processing and analysis.
Statistical Analysis: All statistical calculations were performed using individual waveforms.A repeated measures ANOVA design with two factors (gendermatch X task-relevance) was conducted, using the amplitude values of ERP components that differed between conditions.Each one of these factors had two levels (match-mismatch, relevantirrelevant).Latency windows for statistic analysis of ERP effects were: N1 [170-200 ms], anterior P2 [170-200 ms], N270 [260-290 ms], P300 [360-400ms] and N400 [450-500 ms].Results were corrected with Greenhouse-Geisser and Huynh-Feldt methods to adjust the univariate output of repeated measures ANOVA for violations of the compound symmetry assumption.

RESULTS
The ERPs recorded in the present experiment showed a series of negative and positive deflections that according to their peak latency, polarity, scalp topography and resemblance to classical effects reported in the literature were named as N1, P2, N270, P300 and N400 (Fig. 2).The amplitude of the N1 component was basically modulated by the photo-gender match or mismatch conditions, but the gender of the participants yielded no significant effect.This component exhibited larger amplitude on posterior sites when elicited by a visual stimulus semantically congruent with the preceding auditory stimulus.This effect was less conspicuous when the same combination was presented in a taskrelevant condition.The ANOVA showed a significant main effect for match F (1,9) =14.3, p<0.001.Task relevance had marginal statistical significance F (1,9) =5.28, p=0.04 but there were no significant interactions between these two factors F (1, 9) =0.92, p=0.36.In the same time window, but on anterior sites positivity becomes evident in the waveform corresponding to the task relevant and gender mismatch condition, i.e. the instructed target.This effect, named anterior P2, resulted statistically significant F (2, 8) =15.3, p<0.001.
A negative deflection was observed around 270 ms in both incongruent conditions, relevant and irrelevant.This N270 negativity had an increased amplitude in the gender mismatch condition in comparison to the match condition F (2, 8) =16.9, p<0.001.Task-relevance had no significant effect over N270 amplitude F (2, 8) =0.74, p=0.50 and there were no significant interactions F (2, 8) =1.02, p=0.39.
Additionally, two significant effects were found on the ERP waveforms corresponding to the task relevant and gender-mismatch condition.The first is positivity with a mean latency around 380 ms and central-frontal scalp distribution (Fig. 3).This P300 component had significantly larger amplitude in the taskrelevant mismatch condition F (3, 7) = 31.02,p<0.001.The second is a negative deflection with an approximate latency of 470 ms that was also elicited only in the task-relevant mismatch condition F (3, 7) = 15.26,p<0.001 (see Table I).

DISCUSSION
The present results show attention related modulations at several levels of cortical processing, as indexed by the ERP components N1, P2, N270, P300 and N400.The cortical generators of these components have been extensively studied and while the  early components are generated in primary sensory cortices and nearby association areas (Di Russo et al, 2003), the N270 seems to be generated in the anterior cingulate (Li et al, 2003), and the late components (P300 and N400) have a more complex pattern of generation involving multiple areas (Kutas and Federmeier, 2000;Kok, 2001).The early N1 components exhibited larger amplitudes in conditions of match between visual and auditory information about gender.This effect was evident regardless of stimulus relevance, but in approximately the same time window a positive component became evident on anterior sites, and was only elicited by the relevant targets (taskrelevant + gender mismatch).These effects are in line with previous results reported in single modality tasks that have been interpreted as enhancements due to an attentional set established prior to stimulus delivery (Potts et al., 2004).They also complement recent reports about early attentional modulation depending on the spatial location of auditory and visual stimuli (Nager et al., 2006;Schaefer et al., 2006).Nevertheless, our results show that this modulation occurs not only for spatially coincident stimuli but is also found when they concur in more complex features, including meaning.
The evidences presented here support the view that early attentional selection depends not only on the activity within a single sensory modality, but also on the activity from cortical areas belonging to other sensory systems and from multimodal integration areas, as has been recently argued by others (Talsma et al., 2007).Thus, early attentional selection can be influenced by previously established widerframe cognitive sets like the expressed instructions to solve the task.
At later stages of processing, amplitude modulation of ERP components seems to reflect the involvement of other attention related processes.The task relevant condition elicited N270, P300 and N400 effects, but in the task irrelevant condition only the N270 was observed.The elicitation of the N270 by incongruent pairs of stimuli independently of their relevance to solve the task suggests that this component reflects the activity of a conflict detection process of automatic nature, rather resistant to top-down influences from voluntary attention networks.This result is in disagreement with a previous report by Wang et al. (Wang et al., 2001) that found that although the N270 was elicited independently of the task relevance, its amplitude increased when the conflicting attribute was relevant to the task.The differences between the two experimental designs, especially the complexity of the task and the sensory modality, impede further comparisons regarding this point.The impact of the task complexity in the operation of this conflict detection system and its associated ERP effects have been documented before (Wang et al., 2004).It has also been argued that the N270 could be functionally similar to other ERP components like the N2b or N220 known to be elicited in tasks involving stimuli comparisons, but recent studies reaffirm its direct relation to conflict detection (Zhang et al., 2005;Szucs et al., 2007).
The presence of ERP effects like P300, usually related with stimulus infrequency (for a review see Kok, 2001), and N400, classically associated with semantic incongruence (Kutas and Hillyard, 1980b;Kutas y Federmeier, 2000) only in the incongruent-infrequent condition relevant to the task is in agreement with the high sensitivity to attentional factors previously reported for these components (Kutas and Federmeier, 2000;Kok, 2001).The salience of a stimulus due to other characteristics, for example frequency of presentation, has not been studied in this context before.Infrequency is known to capture and reorient attention and evokes the P300.All the previous reports that associate conflict detection with the elicitation of a sequence of negativities (N270 and N400) used the same frequency of presentation for coincident and non-coincident stimuli, perhaps in order to avoid confounding factors, but precluding the exploration of a highly common combination in real life: conflictive information also tends to be infrequent.The present design reaffirms the relevance of top-down attentional mechanisms in the elicitation or suppression of these components: the same type of incongruence and the same infrequency might or might not elicit N400 and P300, depending on a previously set "allocation policy" for cognitive resources.This pattern of complex resource allocation seems to be in complete accordance with a goal directed strategy.
Regarding the appearance of the sequence N270 and N400, our results suggest that besides the complexity of conflict and the number of conflicting attributes, the need for semantic processing and the relevance of conflict detection to solve the task might be additional factors influencing the elicitation of the second negativity.
All these findings suggest that, aside from automatic conflict detection, voluntary attention is necessary for cognitive processing oriented to task resolution.Topdown modulations seem to modify the processing of stimulus related information not only at late stages but also at very early stages of attentional selection as reflected by the elicitation of the anterior P2.Further research is needed to clarify how these multiple processes come together in a coordinate manner, but the present work emphasizes the need of considering task instructions, a priori information and the task-solving strategy implemented by the participants as important elements to understand attention modulations in ERP experiments.Experience and available information play an important role tuning conflict detection mechanisms according to task demands.