From analogue-request@magnus.acs.ohio-state.edu Tue May 25 20:58:26 1993
Received: by quark.magnus.acs.ohio-state.edu (5.65/3.910213)
	id AA06463; Tue, 25 May 93 20:57:52 -0400
Errors-To: analogue-request@magnus.acs.ohio-state.edu
Sender: analogue-request@magnus.acs.ohio-state.edu
Received: from SPEECH1.CS.CMU.EDU by quark.magnus.acs.ohio-state.edu (5.65/3.910213)
	id AA06458; Tue, 25 May 93 20:57:51 -0400
Received: from SPEECH1.CS.CMU.EDU by SPEECH1.CS.CMU.EDU id aa20626;
          25 May 93 20:57:04 EDT
To: Andrea TONI <andrea@SIHP03.SI.ESTEC.ESA.NL>
Cc: analogue@magnus.acs.ohio-state.edu
Subject: Re: -- vocoder -- 
In-Reply-To: Your message of "Mon, 24 May 93 10:48:40 +0700."
             <9305240842.AA02172@quark.magnus.acs.ohio-state.edu> 
Date: Tue, 25 May 93 20:56:57 -0400
Message-Id: <20622.738377817@SPEECH1.CS.CMU.EDU>
From: Yoshiaki_Ohshima@SPEECH1.CS.CMU.EDU
Status: OR


hi:

dan wiebe's message quoted by andrea toni contains misconceptions and
also seems to miss out some important bits from the viewpoint of the 
acoustic theory of speech production, speech perception, and timbre 
perception in terms of their roles in the applications of the channel 
vocoder for musical purposes.

					--aki	(aki@speech1.cs.cmu.edu)

----------------------------------------------------------------------------
>        A vocoder is a device that combines the frequency distribution of
>one signal with the waveform of another to produce a single output signal.

this should be rectified.  it's the quasi-stationary envelope of the
energy spectrum that controls the carrier, not the frequency distribution.
anyone who knows the source-filter theory of speech production would see
the difference.

>The frequency bands on the equalizer are slaved to the corresponding frequency
>bands on the analyzer...so that if the high-frequency content of the
>spectrum-control signal suddenly goes up, the high end of the equalizer is
>instantly boosted a corresponding amount, and you end up hearing more of the
>high end of the waveform-control signal.

well, not really. it's again the quasi-stationary nature of the vocal tract
filter envelope that controls the filter bank. the typical update rate
should be 5~10ms, and each set of time-varying filter control signal must
represent a 20~30ms segment of speech in an overlapped manner. it's not
instantaneous.


>waveform-control input.  Since in a lot of (non-Oriental) languages,
                                   ^ ^^^ ^^  ^^^^^^^^^^^^
>spoken words depend more on dynamic filtering than on pitch, you can speak
>into the microphone in such a system and, by imposing the spectral character-
                                                           ^^^^^^^^ ^^^^^^^^^
>istics of your voice on the output from a wildly-fuzzed electric guitar, make
 ^^^^^^
>the guitar seem to sing words.

this is doubly wrong. if it addresses the intelligibility issues at the 
phonetic identification level, it is indeed SO in all languages and there is
no difference among oriental and non-oriental languages. on the other hand, 
if it addresses the word intelligibility issues or the perception of speech 
in general, it is NOT so in any languages. 

it was actually marginally better describing it as "spectral characteristics"
than spectrum distribution. but again the spectral characteristics, being the
short-term energy spectrum of human speech, is the resulting convolution of 
glottal spectral features, the spectral envelope of the vocal tract filter, 
the harmonic fine structures of the voicing (or the energy spectrum of the 
turbulant unvoiced source), and the radiation characteristics. among them,
what the channel vocder is trying to extract and make use of as the control 
signal is mostly the spectral envelope of the vocal tract filter, which 
makes the bpf'ed carrier sound like "talking".

>        A more flexible arrangement than the one illustrated above would
>allow you to move the center frequencies of the analyzer and equalizer
>bands, so that, for instance, you could modulate the entire 20-20KHz
>frequency range of the equalizer with only the lower half (20-10KHz) of the
>spectrum of the input signal, with higher-frequency information being
>discarded.  An even more flexible arrangement would allow you to change

i have no idea how allowing cf's moving around has anything to do with
disregarding the band above 10kHz. actually these are different threads.
the one being the principle of codec for more efficient use of the transmission
bandwidth, the other being related to the fact that we don't use information
above 10kHz to figure out what was spoken, leading that it's bascially a 200Hz~
10kHz bandwidth that is required to make the vocal tract filter decent.
besides, it only requires from 350Hz to 3.4kHz (remember telephone?) for
speech to be truly understandable. before blindly going for more channels
and finer resolution or bandlimiting the signal on the contrary, we
should first assess the nature of the problem and the desirable quality
of the end results by better understanding them. 

also if we look at vocoding as signal processing applied to the musical
instruments. perceptual effects on the "carrier" instrument is also
very important. this should be discussed in terms of psychophysical
findings in timbre perception and we should keep in mind that the resolution
of the bpf's and their phase alignment are also extremely important,
not to speak of the bandwidth of the filters.

>the control connections between analyzer and equalizer frequency bands--
>so that you could reverse them, and have high-frequency material control
>low-frequency equalization, and vice versa.  (Wonder what that would sound
>like.  Any of you transistor jockeys out there have the equipment (and the
>motivation) to try it?)

it's just some weird sort of a bpf bank controlled by irrelevant band-limited 
signals. someone may find it creative but it's no longer vocoding.
so i'd disregard nonsense. mind you it may sound cool inasmuch as it's
a modulation effect of which control signal comes from human activity.


From analogue-request@magnus.acs.ohio-state.edu Wed May 19 18:59:44 1993
Received: by quark.magnus.acs.ohio-state.edu (5.65/3.910213)
	id AA19385; Wed, 19 May 93 18:57:19 -0400
Errors-To: analogue-request@magnus.acs.ohio-state.edu
Sender: analogue-request@magnus.acs.ohio-state.edu
Received: from relay2.UU.NET by quark.magnus.acs.ohio-state.edu (5.65/3.910213)
	id AA19376; Wed, 19 May 93 18:57:17 -0400
Received: from spool.uu.net (via LOCALHOST) by relay2.UU.NET with SMTP 
	(5.61/UUNET-internet-primary) id AA15752; Wed, 19 May 93 18:57:21 -0400
Received: from island.UUCP by spool.uu.net with UUCP/RMAIL
	(queueing-rmail) id 185540.26751; Wed, 19 May 1993 18:55:40 EDT
Received: from guam.island.com by island.COM (4.1/SMI-4.1)
	id AA10468; Wed, 19 May 93 15:27:19 PDT
Received: by guam.island.com (4.1/SMI-4.1)
	id AA00237; Wed, 19 May 93 15:29:24 PDT
Date: Wed, 19 May 93 15:29:24 PDT
From: kin@guam.island.COM (Kin Blas)
Message-Id: <9305192229.AA00237@guam.island.com>
To: analogue@magnus.acs.ohio-state.edu
Subject: Another Question
Status: OR


Hi,

  Thanks to all of you who replied to my Vocoder question!

  Some of you on this list seem to have alot of knowledge about
sounds and synth architectures ... I'm a guitar player trying to play
and learn about keyboards and have no background in sound at all and was
wondering if it was possible to simulate a Leslie effect using an LFO, TVA,
and TVF?  I've tried doing this, and it just doesn't
sound right.  Am I on the right track?  Should I give up?

Thanks,
--== Kin Blas ==--
kin@island.com

From analogue-request@magnus.acs.ohio-state.edu Tue May 25 20:58:26 1993
Received: by quark.magnus.acs.ohio-state.edu (5.65/3.910213)
	id AA06463; Tue, 25 May 93 20:57:52 -0400
Errors-To: analogue-request@magnus.acs.ohio-state.edu
Sender: analogue-request@magnus.acs.ohio-state.edu
Received: from SPEECH1.CS.CMU.EDU by quark.magnus.acs.ohio-state.edu (5.65/3.910213)
	id AA06458; Tue, 25 May 93 20:57:51 -0400
Received: from SPEECH1.CS.CMU.EDU by SPEECH1.CS.CMU.EDU id aa20626;
          25 May 93 20:57:04 EDT
To: Andrea TONI <andrea@SIHP03.SI.ESTEC.ESA.NL>
Cc: analogue@magnus.acs.ohio-state.edu
Subject: Re: -- vocoder -- 
In-Reply-To: Your message of "Mon, 24 May 93 10:48:40 +0700."
             <9305240842.AA02172@quark.magnus.acs.ohio-state.edu> 
Date: Tue, 25 May 93 20:56:57 -0400
Message-Id: <20622.738377817@SPEECH1.CS.CMU.EDU>
From: Yoshiaki_Ohshima@SPEECH1.CS.CMU.EDU
Status: OR


hi:

dan wiebe's message quoted by andrea toni contains misconceptions and
also seems to miss out some important bits from the viewpoint of the 
acoustic theory of speech production, speech perception, and timbre 
perception in terms of their roles in the applications of the channel 
vocoder for musical purposes.

					--aki	(aki@speech1.cs.cmu.edu)

----------------------------------------------------------------------------
>        A vocoder is a device that combines the frequency distribution of
>one signal with the waveform of another to produce a single output signal.

this should be rectified.  it's the quasi-stationary envelope of the
energy spectrum that controls the carrier, not the frequency distribution.
anyone who knows the source-filter theory of speech production would see
the difference.

>The frequency bands on the equalizer are slaved to the corresponding frequency
>bands on the analyzer...so that if the high-frequency content of the
>spectrum-control signal suddenly goes up, the high end of the equalizer is
>instantly boosted a corresponding amount, and you end up hearing more of the
>high end of the waveform-control signal.

well, not really. it's again the quasi-stationary nature of the vocal tract
filter envelope that controls the filter bank. the typical update rate
should be 5~10ms, and each set of time-varying filter control signal must
represent a 20~30ms segment of speech in an overlapped manner. it's not
instantaneous.


>waveform-control input.  Since in a lot of (non-Oriental) languages,
                                   ^ ^^^ ^^  ^^^^^^^^^^^^
>spoken words depend more on dynamic filtering than on pitch, you can speak
>into the microphone in such a system and, by imposing the spectral character-
                                                           ^^^^^^^^ ^^^^^^^^^
>istics of your voice on the output from a wildly-fuzzed electric guitar, make
 ^^^^^^
>the guitar seem to sing words.

this is doubly wrong. if it addresses the intelligibility issues at the 
phonetic identification level, it is indeed SO in all languages and there is
no difference among oriental and non-oriental languages. on the other hand, 
if it addresses the word intelligibility issues or the perception of speech 
in general, it is NOT so in any languages. 

it was actually marginally better describing it as "spectral characteristics"
than spectrum distribution. but again the spectral characteristics, being the
short-term energy spectrum of human speech, is the resulting convolution of 
glottal spectral features, the spectral envelope of the vocal tract filter, 
the harmonic fine structures of the voicing (or the energy spectrum of the 
turbulant unvoiced source), and the radiation characteristics. among them,
what the channel vocder is trying to extract and make use of as the control 
signal is mostly the spectral envelope of the vocal tract filter, which 
makes the bpf'ed carrier sound like "talking".

>        A more flexible arrangement than the one illustrated above would
>allow you to move the center frequencies of the analyzer and equalizer
>bands, so that, for instance, you could modulate the entire 20-20KHz
>frequency range of the equalizer with only the lower half (20-10KHz) of the
>spectrum of the input signal, with higher-frequency information being
>discarded.  An even more flexible arrangement would allow you to change

i have no idea how allowing cf's moving around has anything to do with
disregarding the band above 10kHz. actually these are different threads.
the one being the principle of codec for more efficient use of the transmission
bandwidth, the other being related to the fact that we don't use information
above 10kHz to figure out what was spoken, leading that it's bascially a 200Hz~
10kHz bandwidth that is required to make the vocal tract filter decent.
besides, it only requires from 350Hz to 3.4kHz (remember telephone?) for
speech to be truly understandable. before blindly going for more channels
and finer resolution or bandlimiting the signal on the contrary, we
should first assess the nature of the problem and the desirable quality
of the end results by better understanding them. 

also if we look at vocoding as signal processing applied to the musical
instruments. perceptual effects on the "carrier" instrument is also
very important. this should be discussed in terms of psychophysical
findings in timbre perception and we should keep in mind that the resolution
of the bpf's and their phase alignment are also extremely important,
not to speak of the bandwidth of the filters.

>the control connections between analyzer and equalizer frequency bands--
>so that you could reverse them, and have high-frequency material control
>low-frequency equalization, and vice versa.  (Wonder what that would sound
>like.  Any of you transistor jockeys out there have the equipment (and the
>motivation) to try it?)

it's just some weird sort of a bpf bank controlled by irrelevant band-limited 
signals. someone may find it creative but it's no longer vocoding.
so i'd disregard nonsense. mind you it may sound cool inasmuch as it's
a modulation effect of which control signal comes from human activity.


