The mechanism by which a complex auditory scene is parsed into coherent objects depends on poorly understood interactions between task-driven and stimulus-driven attentional processes. We illuminate these interactions in a simultaneous behavioral-neurophysiological study in which we manipulate participants' attention to different features of an auditory scene (with a regular target embedded in an irregular background). Our experimental results reveal that attention to the target, rather than to the background, correlates with a sustained (steady-state) increase in the measured neural target representation over the entire stimulus sequence, beyond auditory attention's well-known transient effects on onset responses. This enhancement, in both power and phase coherence, occurs exclusively at the frequency of the target rhythm, and is only revealed when contrasting two attentional states that direct participants' focus to different features of the acoustic stimulus. The enhancement originates in auditory cortex and covaries with both behavioral task and the bottom-up saliency of the target. Furthermore, the target's perceptual detectability improves over time, correlating strongly, within participants, with the target representation's neural buildup. These results have substantial implications for models of foreground/background organization, supporting a role of neuronal temporal synchrony in mediating auditory object formation.