Bregman, A. S. (1993). Auditory scene analysis: Hearing in complex environments. In S. McAdams & E. Bigand (Eds), Thinking in Sound: The Cognitive Psychology of Human Audition (pp. 10–36). Oxford: Clarendon Press.  
      " is reasonable to conclude that the principles of grouping that were discovered and named by the Gestalt psychologists exist in order to perform the role of scene analysis."
Chertoff, D. B., Schatz, S. L., McDaniel, R., & Bowers, C. A. (2008). Improving presence theory through experiential design. Presence: Teleoperators and Virtual Environments, 17(4), 405–413.  
      "presence is an emergent factor due to the interaction of many components [...] it is a result that is greater than the sum of its parts."
Kramer, G. (1994). Some organizing principles for representing data with sound. In G. Kramer (Ed.), Auditory Display: Sonification, Audification, and Auditory Interfaces (pp. 185–221). Reading MA: Addison-Wesley.  
      "When a complex multivariate auditory stream is used to convey data, an important perceptual process comes into play. In addition to the system user's ability to scan his or her attention through the sound, relationships between variables and entire system states are perceived "at a glance." Which is to say, without attention-directed effort, all the auditory variables are perceived as a whole, or a "gestalt."
Schafer, R. M. (1994). The soundscape: Our sonic environment and the tuning of the world. Rochester Vt: Destiny Books.  
      "When one travels, new sounds snap at the consciousness and are thereby lifted to the status of figures."
      A sound is perceived as figure (signal or soundmark) or ground (keynote ambient sound) on the basis of acculturation, training, mood, social relation to the soundscape.
Scruton, R. (2009). Sounds as secondary objects and pure events. In M. Nudds & C. O'Callaghan (Eds), Sounds & Perception (pp. 50–68). Oxford: Oxford University Press.  
      Scruton uses examples and explanation of sound grouping/streaming (cf Bregman) to support his view of sounds as pure events because such auditory grouping needs no "bridges to the physical world" in the way that visual Gestalt figures do.
Slater, M., & Steed, A. (2000). A virtual presence counter. Presence: Teleoperators and Virtual Environments, 9(5), 413–434.  
      "We can think of presence as a selector among environments to which to respond, which operates dynamically from moment to moment [...] A fundamental proposal of this paper is that the set of stimuli of the present environment forms an overall gestalt, providing a consistent believable world in itself."
Williams, S. M. (1994). Perceptual principles in sound grouping. In G. Kramer (Ed.), Auditory Display: Sonification, Audification, and Auditory Interfaces (pp. 95–125). Reading MA: Addison-Wesley.  
      "Sounds are allocated to perceptual groups, or streams, depending on their perceived attributes rather than as a direct result of the attributes of the acoustic signal, so the resulting percept may depend on attentional factors or previous training or familiarity with other sounds".
      Using gestalt theory to formalise auditory grouping and quoting from Bregman:

"A stream may be defined as a sequence of auditory events whose elements are related perceptually to one another, the stream being segregated perceptually from other co-occurring auditory events ... A source is a physical event. ... A stream, on the other hand, is a psychological organization whose function is to mentally represent acoustic activity of a single source over time+. ... Timbre seems to be a perceptual description of a stream, not an acoustic waveform*."

+ Bregman, A. S. and J. Campbell. "Primary Auditory Stream Segregation and Perception of Order in Rapid Sequences of Tones" in Journal of Experimental Psychology 98(2) (1971) pp.244-249
* Bregman, A. S. and S. Pinker. "Auditory Streaming and the Building of Timbre" in Canadian Journal of Psychology 32 (1978) pp.19-31
      A long section in which Gestalt principles of visual perception are applied to auditory perception. The examples provided are for musical tones rather than everyday sounds.
