%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Probabilities for observing mixed quantum states %%% given limited prior information. %%% %%% From ``Quantum Communications and Measurement'', %%% pages 411 - 418, %%% edited by V.P. Belavkin, O. Hirota, and R.L. Hudson, %%% Plenum (1995). %%% %%% PlainTeX, 8 pages %%% %%% Matthew J. Donald %%% %%% web site: http://people.bss.phy.cam.ac.uk/~mjd1014 %%% %%% e-mail : mjd1014@cam.ac.uk %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\count17=0 %%to use pdfTeX comment out this line and uncomment the next \count17=1 \pdfoutput=\count17 %%to use plain TeX comment out this line and %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% uncomment the previous %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \ifnum\count17=1 \def\cmykBlue{1 1 0 0} \def\cmykBlack{0 0 0 1} \def\Blue{\pdfsetcolor{\cmykBlue}} \def\Black{\pdfsetcolor{\cmykBlack}} \def\pdfsetcolor#1{\pdfliteral{#1 k}} \def\setcolor#1{\mark{#1}\pdfsetcolor{#1}} \def\maincolor{\cmykBlack} \pdfsetcolor{\maincolor} \pdfinfo { /Title (Probabilities for observing mixed quantum states given limited prior information.) /Author (Matthew J. Donald) /CreationDate (July 1994) /ModDate (\number\year/\number\month/\number\day) /Subject (quantum probability) /Keywords (quantum measurement theory, relative entropy)} \pdfcompresslevel=9 \fi \magnification=1200 \hsize=13cm \headline={\hfil} \footline={\ifnum\count0 = 1 \hfil \else\hss\tenrm\folio\hss \fi} \topskip10pt plus30pt \def\proclaim#1#2{\medskip\noindent{\bf #1}\quad \begingroup #2} \def\endproclaim{\endgroup\medskip} \def\<{{<}} \def\>{{>}} \def\Prob{\mathop{\rm Prob}\nolimits} \def\tr{\mathop{\rm tr}\nolimits} \def\app#1#2#3{{\mathop{\rm app}\nolimits}_{#1}(#2|#3)} \def\newline{\hfil\break} \abovedisplayskip=3pt plus 1pt minus 1pt \belowdisplayskip=3pt plus 1pt minus 1pt \def\hcrh{\hfill \cr \hfill} \def\crh{\cr \hfill} \def\hcr{\hfill \cr} \font\brm=cmbx12 \def\til{\lower 1.1 ex\hbox{\brm \char'176}} \ifnum\count17=1 \def\link#1#2{\leavevmode\pdfstartlink attr{/Border [0 0 0]} goto name{#1}\setcolor\cmykBlue #2\pdfendlink\setcolor\cmykBlack} \def\outlink#1#2{\leavevmode \pdfstartlink attr{/Border [0 0 0]} user{/Subtype /Link /A << /S /URI /URI (#1) >>} \setcolor\cmykBlue #2\pdfendlink\setcolor\cmykBlack} \def\name#1{\pdfdest name{#1} xyz} \def\pdfproclaim#1#2#3{\medskip\name{#3}\noindent{\bf #1}\quad \begingroup #2} \def\pdfeject{\eject} \else \def\link#1#2{{#2}} \def\outlink#1#2{{#2}} \def\name#1{} \def\pdfproclaim#1#2#3{\proclaim{#1}{#2}} \def\pdfeject{} \fi \centerline{\bf Probabilities for Observing Mixed Quantum States} \centerline{\bf given Limited Prior Information.} \vfill \centerline{\bf Matthew J. Donald} \medskip {\bf \hfill The Cavendish Laboratory, JJ Thomson Avenue, \hfill Cambridge CB3 0HE, Great Britain.} \medskip {\bf \hfill e-mail:\quad mjd1014@cam.ac.uk} \bigskip {\bf \hfill web site:\quad \catcode`\~=12 \outlink{http://people.bss.phy.cam.ac.uk/~mjd1014} {http://people.bss.phy.cam.ac.uk/\til mjd1014}} \vfill \proclaim{abstract}{} The original development of the formalism of quantum mechanics involved the study of isolated quantum systems in pure states. Such systems fail to capture important aspects of the warm, wet, and noisy physical world which can better be modelled by quantum statistical mechanics and local quantum field theory using mixed states of continuous systems. In this context, we need to be able to compute quantum probabilities given only partial information. Specifically, suppose that ${\cal B}$ is a set of operators. This set need not be a von Neumann algebra. Simple axioms are proposed which allow us to identify a function which can be interpreted as the probability, per unit trial of the information specified by ${\cal B}$, of observing the (mixed) state of the world restricted to ${\cal B}$ to be $\sigma$ when we are given $\rho$ -- the restriction to ${\cal B}$ of a prior state. This probability generalizes the idea of a mixed state ($\rho$) as being a sum of terms ($\sigma$) weighted by probabilities. The unique function satisfying the axioms can be defined in terms of the relative entropy. The analogous inference problem in classical probability would be a situation where we have some information about the prior distribution, but not enough to determine it uniquely. In such a situation in quantum theory, because only what we observe should be taken to be specified, it is not appropriate to assume the existence of a fixed, definite, unknown prior state, beyond the set ${\cal B}$ about which we have information. The theory was developed for the purposes of a fairly radical attack on the interpretation of quantum theory, involving many-worlds ideas and the abstract characterization of observers as finite information-processing structures, but deals with quantum inference problems of broad generality. \endproclaim \noindent Keywords: {\it quantum measurement theory, relative entropy.} \vfill \noindent{\bf From ``Quantum Communications and Measurement'', pages 411 - 418, edited by V.P. Belavkin, O. Hirota, and R.L. Hudson, Plenum (1995).} \vfill \eject Quantum mechanics started with the study of simple isolated systems like an electron and a proton in empty space. Such systems can be described by pure states --- by wavefunctions. When we turn to non-isolated systems, however, in quantum statistical mechanics, or many body theory, or quantum optics, or local quantum field theory, description by wavefunctions is often inappropriate. Density matrix descriptions are called for not least because our observations of complex systems are always incomplete. In this paper, I shall present a mathematical tool for dealing with incomplete information in quantum systems. I shall start by commenting briefly on the foundational questions which haunt any serious consideration of ignorance in quantum mechanics. I have expressed my views on these matters more fully in [1--3], in which I attempt to start from first principles and give a complete quantum measurement theory, but the mathematics given here does not depend on the details of those papers, and it is my hope that it might be of more general use. Indeed, it is that hope which motivates this attempt to bring the mathematics to a wider audience. In spite of the importance of infinite-dimensional spaces, I shall leave aside some mathematical technicalities by assuming for most of this paper that we are working with a finite-dimensional Hilbert space. This assumption is merely for convenience of exposition and may ignored by those who normally ignore mathematical details. Such details are to be found in [2, 4, and 5]. The formalism will be introduced here through some elementary examples which are sufficient to demonstrate the essential ideas. These examples are given in terms of probability distributions which correspond to quantum states on sets of mutually commuting operators. Following the examples, a set of simple axioms is given which allows the generalization to arbitrary states and sets of operators. Finally, the full definition and some of its properties are presented in a theorem which also states the extension to infinite-dimensions. According to theory, time propagation in a closed quantum system is governed by a deterministic evolution defined by unitary maps of the form $e^{-itH/\hbar}$. This evolution is well understood and experimentally confirmed in many circumstances. Nevertheless, the fundamental problem of the interpretation of quantum theory remains. This is that such Hamiltonian evolution does not seem sufficient to describe the world that we see. From time to time, the quantum state of the world appears to change abruptly. These changes are referred to as ``collapses''. Orthodox wave mechanics, describes ``collapse'' as the replacement of a wave-function by an eigenfunction of some ``measured'' operator. However, there are several problems with this idea. In a real experiment it is often hard to see how the operator being measured can be unambiguously defined. For example, in a bubble chamber experiment, are we measuring bubble positions or particle positions? For practical purposes, it doesn't make much difference, but, if ``collapse'' is a genuine physical process then it ought to be precisely definable. I have suggested in [1--3] that this problem can be dealt with working at the level of the observer and by characterizing an observer as an abstract processor of information. Observers however, are complex, localized, thermal systems and so should be described by density matrices rather than simply by pure states. By working with local states, it is also possible to deal with the problem that an instanteous collapse of a wave-function for the whole universe would be manifestly non-relativistic. Suppose then that we are given a density matrix $\rho$. We want to develop a formalism to meaure the a priori probability of a `` generalized state collapse'' taking $\rho$ to some other state $\sigma$. The physical interpretation I propose, for such a collapse, is that, although $\rho$ may be the ``true'' state, it is not a state we are capable of seeing. It is not possible for us, because of our natures, to see a macroscopic system in a mixture of macroscopically distinct states. We cannot see a cat except as something which is either dead or alive. $\sigma$ --- the state collapsed to --- is one of the states which we are capable of seeing. A weaker interpretation just says that measurement in quantum mechanics involves state change. We have information, which is far from complete, about the state before the measurement. We also have information about the possible states after the measurement. For example, we know that, as long as the experiment works, our macroscopic measuring device is going to give us a definite result. At this general level, we want to be able to compute probabilities. Our preliminary goal then, is a function which we shall denote by $\app{}\sigma\rho$ which we can interpret as the probability of being able to mistake the state of the world for $\sigma$, despite the fact that it is actually $\rho$. At the simplest level, a density matrix $\rho$ can be expanded, in terms of eigenstates, into a sum $\rho = \sum_{m=1}^M r_m |\psi_m\>\<\psi_m|$ and the number $r_m$ is interpreted as the probability of observing state $\sigma_m = |\psi_m\>\<\psi_m|$ given the prior state $\rho$. The first question in generalizing this idea is to ask for the probability of observing a general state $\sigma$. The second question asks whether it is appropriate to assume that our observations are sufficient to give us complete knowledge of either $\rho$ or $\sigma$. The second question goes to the heart of this contribution. In general, we can have very limited information about a complex system. But this is quantum mechanics! Nothing which is not observed should be taken for granted. For example, the EPR experiment may be interpreted as telling us that we must not assume that the spin of an electron is determined except through the complete circumstances of its observation. What one sees, depends on how one looks. Suppose then that we have knowledge of only a few expectation values for $\rho$. We want to know the probability of observing different values for those expectations. In a classical inference problem, we would assume that there was a definite fixed unknown prior distribution. In quantum theory, because only what we observe should be taken to be definite, this assumption is inappropriate. It is possible to express the idea of an incompletely known state either in terms of sets of states or in terms of restrictions of states. I find the latter to be conceptually simpler. The former is slightly more general but can be easily accomodated in the present framework [2]. \medskip \noindent{\bf Definition.} \noindent 1) Let ${\cal H}$ be a Hilbert space and ${\cal B}({\cal H})$ be the set of all bounded operators on ${\cal H}$. Then a state $\rho$ on ${\cal B}({\cal H})$ is defined to be a density matrix. Such a state can be considered to be a complex-valued function on ${\cal B}({\cal H})$. Thus, if $\rho = \sum_{m=1}^M r_m |\psi_m\>\<\psi_m|$ and $B \in {\cal B}({\cal H})$ then $\rho(B) = \sum_{m=1}^M r_m \<\psi_m| B |\psi_m\>.$ \noindent 2) If ${\cal B} \subset {\cal B}({\cal H})$ then a state $\rho$ on ${\cal B}$ will be the restriction to ${\cal B}$ of a state on ${\cal B}({\cal H})$. In other words $\rho$ is a complex-valued function which takes the form $\rho(B) = \rho'(B)$ for some density matrix $\rho'$. \medskip Definition 2 may seem pernickety, but the central point of the present work is that if we have only have information identifying a state on ${\cal B}$ then we should not identify it with any particular possible density matrix $\rho'$, but rather with the set of all possible extensions. In other words, $\rho$ should be identified with $\{\rho': \rho'|_{{\cal B}} = \rho\}$. \medskip \noindent{\bf Classical example 1.} \quad Many of the conceptual issues in the present theory can be understood by considering how the theory applies to inference problems in classical probability theory. Such problems arise when we consider a set ${\cal B}$ of commuting operators. A state on ${\cal B}$ is then a set of expectations compatible with some probability distribution corresponding to a state on an Abelian von Neumann algebra containing ${\cal B}$. For example, when ${\cal B}$ takes the form $\{ A_n = \sum_{m=1}^M a_{nm} P_m: n = 1, \dots, N\}$ for a sequence $(P_m)_{m=1}^M$ of commuting projections and some given matrix $(a_{nm})$, then a state $\rho$ on ${\cal B}$ is determined by given values $\rho(A_n) : n = 1, \dots, N$ and $$\displaylines{ \rho = \{(r_m)_{m=1}^M: 0 \leq r_m \leq 1, \sum_{m=1}^M r_m =1, \hbox{ and } \sum_{m=1}^M a_{nm} r_m = \rho(A_n) \hcrh \hbox{ for } n = 1, \dots, N\}. }$$ In general, we shall work with an arbitrary set of operators ${\cal B}$. The specification of ${\cal B}$ may well be the most difficult task in using the present theory for a practical quantum inference problem. ${\cal B}$ should be a set of operators about which the observer has direct information. In general, this will be a set of observables for the macroscopic experimental device rather than for the microscopic system being investigated. $\rho$ will be the restriction to ${\cal B}$ of the initial quantum state. With ${\cal B}$, we now have another fundamental element in our interpretation and our goal must be revised. The aim now is to define a function $\app{\cal B}\sigma\rho$ which is to be interpreted as the probability per unit trial of the information in ${\cal B}$ of being able to mistake the state of the world on ${\cal B}$ for $\sigma$, despite the fact that it is actually $\rho$. A von Neumann algebra represents a complete subsystem. Specifying a state on a von Neumann algebra, corresponds on the classical level to the complete specification of a probability distribution. Having a state on a non-algebra corresponds to having one of an unknown range of probability distributions. \medskip \noindent{\bf Classical example 2.} \quad Suppose that ${\cal B} = {\cal Z}$ - a finite Abelian von Neumann algebra generated by a sequence $(P_m)_{m=1}^M$ of commuting projections. States $\rho$ and $\sigma$ on ${\cal Z}$ correspond to probability distributions $(r_m)_{m=1}^M$ and $(s_m)_{m=1}^M$ on $\{1, \dots, M\}$. If the state of the world on ${\cal Z}$ is actually $\rho$ then, when we choose a sequence of points from $\{1, \dots, M\}$, the probability of getting $m$ is given by $r_m$. We would mistake this state for $\sigma$, if in $N$ trials we found that, for each $m$, we got $m$ roughly $s_mN$ times. The probability $(A)$ of such a result can be explicitly calculated using the multinomial distribution. Indeed, if each $s_mN$ is an integer, then $$A = N! \prod^M_{m=1} {r_m^{s_mN} \over (s_mN)!}.$$ As $N \rightarrow \infty$, $\log A$ is asymptotic to $$N\{\sum^M_{m=1}(-s_m \log s_m + s_m \log r_m)\}.$$ This suggests that it would be appropriate for ${\mathop{\rm app}\nolimits}$ to satisfy $$\app{\cal Z}\sigma\rho = \exp\{\sum^M_{m=1}(-s_m \log s_m + s_m \log r_m)\}.$$ \noindent{\bf Classical example 3.} \quad In a specific case of example 1, we take \newline ${\cal B} = \{ P = P_1 + P_2, Q = P_1 + P_3 \}$ where $(P_m)_{m=1}^4$ are orthogonal projections such that $\sum_{m=1}^4 P_m =1$. A state $\rho$ on ${\cal B}$ is a set of probability distributions on $\{1, \dots, 4\}$ with $\Prob\{1, 2\} = \rho(P)$ and $\Prob\{1, 3\} = \rho(Q)$ determined: $$\rho = \{(r_m)_{m=1}^4 : 0 \leq r_m \leq 1, \sum_{m=1}^4 r_m =1 \hbox{ and } r_1 + r_2 = \rho(P), r_1 + r_3 = \rho(Q) \}.$$ It is possible to make a complete computation of ${\mathop{\rm app}\nolimits}_{\cal B}$ when ${\cal B}$ has the form given in this example [5]. It would be interesting to find a physical situation where such a set was relevant. More typical, physically, might be a case where we have non-commuting projections $P$ and $Q$ and know $\rho(P)$, $\rho(Q)$, and $\rho(P \wedge Q)$, but not $\rho(P^m Q^n)$ for $m, n > 1$. \medskip \noindent{\bf Classical example 4.} \quad In the situation of example 3, knowledge of $\rho$ on ${\cal B}$ does not yield a complete prior distribution $(r_m)_{m=1}^4$. In example 2, we considered observing a sequence of $N$ trials of the set $\{1, \dots, M\}$. Each trial can be thought of as an evaluation of the $M$ random variables $(X^m)$ on $\{1, \dots, M\}$, where $X^m$ is the characteristic function of $\{m\}$. In example 3, however, we only have access at a trial to two random variables: $X^P$ --- the characteristic function of $\{1,2\}$ and $X^Q$ --- the characteristic function of $\{1,3\}$. All that is known about the distribution of these variables is that $E(X^P) = \rho(P)$ and that $E(X^Q) = \rho(Q)$. We want to find a value for the probability per trial that some distribution compatible with $\rho$ gives values compatible with $\sigma$. If we make a choice of $(r_m)_{m=1}^4$, then an appropriate value would be given by the asymptotic probability per trial that, under the distribution $(r_m)_{m=1}^4$, the set $\{1,2\}$ is visited with relative frequency $\sigma(P)$ and the set $\{1,3\}$ is visited with relative frequency $\sigma(Q)$. A mathematical expression for this value is given by $$ V = \lim_{\epsilon \rightarrow 0} \lim_{N \rightarrow \infty}\Bigl(\Prob(|{1 \over N}(\sum_{n=1}^N X^P_n) - \sigma(P)| \leq \epsilon, |{1 \over N}(\sum_{n=1}^N X^Q_n) - \sigma(Q)| \leq \epsilon)\Bigr)^{1/N} $$ where $\Prob$ is defined by $(r_m)_{m=1}^4$, and $(X^P_n)_{n=1}^N$ and $(X^Q_n)_{n=1}^N$ are the relevant sequences of independent identically distributed random variables with distributions determined by $\Prob$. According to Sanov, [6, thm. 2], $$\displaylines{ V = \sup\{\exp[\sum_{m=1}^4 -s_m \log (s_m/r_m)] : 0 \leq s_m \leq 1, \sum_{m=1}^4 s_m =1 \hbox{ and } \hcrh s_1 + s_2 = \sigma(P), s_1 + s_3 = \sigma(Q) \}. }$$ The definition I am proposing suggests that, in this situation, given no other information, it is appropriate to choose $(r_m)_{m=1}^4$ so as to maximize this value. This suggestion is definitely a quantum mechanical one. It is appropriate because what we do not measure has no influence, or even, no reality. We see what we are capable of seeing, in the form which our apparatus has been set up to allow us to see. The situation pulls out one of the states which is possible for us with probabilitities determined only by the information available to us. \medskip These examples have demonstrated the concepts at issue. The generalization to states on arbitrary sets of operators will now be given by the following set of axioms. The first three axioms are entirely self-explanatory. It should be noted that the arguments for the other axioms are merely arguments --- they are not proofs. A much more extensive discussion is given in [2], but the reader of this paper may be more impressed just by the simplicity of the axioms and by the fact that they are consistent. Ultimately, the axioms are justified if they provide a definition which is useful and compatible with observation. \medskip \noindent{\bf Axiom 1.} \quad $0 \leq \app{\cal B}\sigma\rho \leq 1.$ \medskip \noindent{\bf Axiom 2.} \quad $\app{\cal B}\sigma\rho = 1$ if and only if $\sigma = \rho$. \medskip \noindent{\bf Axiom 3.} \quad Let $U \in {\cal B}({\cal H})$ be unitary and define $\tau: {\cal B}({\cal H}) \rightarrow {\cal B}({\cal H})$ by $\tau(B) = UBU^*$. Then $$\app{\cal B}\sigma\rho = \app{\tau^{-1}({\cal B})}{\sigma \circ \tau}{\rho \circ \tau}.$$ \noindent{\bf Axiom 4.} \quad Suppose that $\rho = p_1 \rho_1 + p_2 \rho_2$ where $p_1 , p_2 \in [0, 1]$ with $p_1 + p_2 = 1$. Supppose that there exists a projection $P \in {\cal B}$ with $\rho_1(P) = 1$ and $\rho_2(P) = 0$ then $\app{\cal B}{\rho_1}\rho = p_1$. \medskip In this case, $\rho$ is a mixture of the distinct states $\rho_1$ and $\rho_2$ so axiom 4 means that ${\mathop{\rm app}}$ is a generalization of the idea that a mixed state can be written as a sum of terms weighted by probabilities. \medskip \noindent{\bf Property 4.} \quad Suppose that $\rho = p_1 \rho_1 + p_2 \rho_2$ where $p_1 , p_2 \in [0, 1]$ with $p_1 + p_2 = 1$. Supppose that there exists a projection $P \in {\cal B}$ with $\rho_1(P) \geq 1 - \epsilon$ and $\rho_2(P) \leq \epsilon$ for some $\epsilon \in [0, {1\over 2}]$. Then $p_1 \leq \app{\cal B}{\rho_1}\rho \leq p_1 - 3\epsilon\log \epsilon$. \medskip This is a significant improvement on axiom 4. Quantum measurement theory has long been concerned with providing models of ``state decomposition with negligible inference effects''. Recently, for example, this has been done under the banner of ``environment-induced decoherence'' [7]. Such models demonstrate the validity of the sort of state decomposition proposed in property 4 at the macroscopic level with $p_1$ playing the role of a physically significant and experimentally measurable probability. They can be taken as providing justification and experimental confirmation for the theory proposed here. The central problem in measurement theory is to justify the coarse-graining involved. Here that coarse-graining is represented by the restriction of attention to the set ${\cal B}$. An appropriate fundamental choice for ${\cal B}$ is justified in [3]. \medskip \noindent{\bf Property 5.} \quad Let ${\cal B}_1 \subset {\cal B}_2$ and $\sigma_2 , \rho_2$ be extensions to ${\cal B}_2$ of states $\sigma_1 , \rho_1$ on ${\cal B}_1$. Then $\app{{\cal B}_2}{\sigma_2}{\rho_2} \leq \app{{\cal B}_1}{\sigma_1}{\rho_1}$. \medskip This is justified in terms of the proposed interpretation of $\app{{\cal B}}{\sigma}{\rho}$ because an observer who can draw finer distinctions is less likely to make mistakes. \medskip \noindent{\bf Axiom 5.} \quad Let ${\cal B}_1 \subset {\cal B}_2$. Then $$\app{{\cal B}_1}{\sigma_1}{\rho_1} = \sup\{\app{{\cal B}_2}{\sigma_2}{\rho_2} : \sigma_2|_{{\cal B}_1} = \sigma_1, \rho_2|_{{\cal B}_1} = \rho_1\}.$$ This improvement on property 5 is justified because although we know nothing else about the extension from ${\cal B}_1$ to ${\cal B}_2$, we do know for certain, as a consequence of definition 2, that $\sigma_1$ and $\rho_1$ are restrictions of states on ${\cal B}_2$. By axiom 5 and definition 2, $\mathop{\rm app}\nolimits$ on an arbitrary set ${\cal B}$ is determined by $\mathop{\rm app}\nolimits$ on ${\cal B}({\cal H})$. \medskip \noindent{\bf Property 6.} \quad Let $\sigma_1, \sigma_2$ and $\rho$ be states on a set ${\cal B}$. Let $p_1 , p_2 \in [0, 1]$ with $p_1 + p_2 = 1$. Let $\sigma = p_1 \sigma_1 + p_2 \sigma_2$. Then $$\app{{\cal B}}{\sigma}{\rho} \geq {\app{{\cal B}}{\sigma_1}{\rho}}^{p_1} {\app{{\cal B}}{\sigma_2}{\rho}}^{p_2}.$$ This is plausible because one of the ways for an observer to observe the state $\sigma$ is to observe $\sigma_1$ a fraction $p_1$ of the time and $\sigma_2$ a fraction $p_2$ of the time. \medskip \noindent{\bf Axiom 6.} \quad Let $\sigma_1, \sigma_2$ and $\rho$ be states on a finite-dimensional von Neumann algebra ${\cal A}$. Let $p_1 , p_2 \in [0, 1]$ with $p_1 + p_2 = 1$. Let $\sigma = p_1 \sigma_1 + p_2 \sigma_2$. Then $$\Lambda = {\app{{\cal A}}{\sigma_1}{\rho}}^{p_1} {\app{{\cal A}} {\sigma_2}{\rho}}^{p_2}/\app{{\cal A}}{\sigma}{\rho}$$ is independent of $\rho$. \medskip Consider sequences of trials of the information in ${\cal A}$ performed on a system which is in the state $\rho$. Suppose that all the results from all the sequences are compatible with the state $\sigma$. Some of these sequences may have the property that the results are compatible with the state $\sigma_1$ a fraction $p_1$ of the time and with the state $\sigma_2$ a fraction $p_2$ of the time. Because ${\cal A}$ is a von Neumann algebra and all the states involved are completely specified, $\Lambda$ is a measure of the relative probability of this property. Whether a sequence of trials compatible with $\sigma$ has the property in question or not depends only on $\sigma$. On a non-algebra, however, it is possible for $\Lambda$ to depend on $\rho$ because $\rho$ is not a complete specification of the situation being tested by a trial. In the classical context of examples 1--4, this argument is a correct factual statement. In the wider context of non-commutative probability theory, it tells us not just about the definition of ${\mathop{\rm app}}$, but also about the meaning of a ``trial'' and of a state being ``completely specified''. It is, perhaps, remarkable that it is possible to extend such an argument to the non-commutative situation. \medskip \noindent{\bf Theorem.} \quad Let ${\cal H}$ be a finite-dimensional Hilbert space and ${\cal B} \subset {\cal B}({\cal H})$. Then there is a unique function $\app{\cal B}\sigma\rho$ satisfying Axioms 1--6. $\app{\cal B}\sigma\rho$ also satisfies properties 4, 5, and 6. When ${\cal B}$ is ${\cal B}({\cal H})$ --- the set of all bounded operators --- then $$\app{{\cal B}({\cal H})}\sigma\rho =\exp(\tr(-\sigma \log \sigma + \sigma \log \rho)).$$ For general ${\cal B} \subset {\cal B}({\cal H})$, $\app{\cal B}\sigma\rho$ is then defined by axiom 5. If ${\cal B}$ is a von Neuman algebra then $\app{\cal B}\sigma\rho$ is the exponential of the relative entropy of $\sigma$ with respect to $\rho$. The formula given in example 2 is correct as is the value proposed for example 4. There is also a unique function $\app{\cal B}\sigma\rho$ on infinite-dimensional systems. Axioms 1 - 5 and properties 4, 5, and 6 are satisfied as stated, except that the supremum in axiom 5 has to be taken over all states (non-normal as well as normal). Axiom 6 holds for normal states on an injective von Neumann algebra. Uniqueness is achieved by requiring that $\app{\cal B}\sigma\rho$ be the minimal w$^*$ upper semicontinuous function having these properties. \bigskip \noindent{\bf References.} {\frenchspacing \noindent [1]\quad M.J. Donald, ``Quantum theory and the brain''. {\sl Proc. R. Soc. Lond. A} {\bf 427}, 43--93 (1990). \noindent [2]\quad M.J. Donald, ``A priori probability and localized observers''. {\sl Found. Phys. } {\bf 22}, 1111--1172 (1992). \noindent [3]\quad M.J. Donald, ``A mathematical characterization of the physical structure of observers''. {\sl Found. Phys. } (to appear). \noindent [4]\quad M.J. Donald, ``On the relative entropy''. {\sl Commun. Math. Phys. } {\bf 105}, 13--34 (1986). \noindent [5]\quad M.J. Donald, ``Further results on the relative entropy''. {\sl Math. Proc. Camb. Phil. Soc. } {\bf 101}, 363--373 (1987). \noindent [6]\quad I.N. Sanov, ``On the probability of large deviations of random variables''. {\sl Mat. Sbornik } {\bf 42}, 11--44 (1957) and {\sl Selected Translations in Math. Stat. and Prob. } {\bf 1}, 213--244 (1961). \noindent [7]\quad W.H. Zurek, ``Decoherence and the transition from quantum to classical''. \newline {\sl Physics Today} 36--44 (October, 1991).} \end{document}