%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% A Priori Probability and Localized Observers %%% Foundations of Physics, %%% Volume 22, pages 1111-1172 (1992). %%% %%% Plain TeX, 53 pages %%% %%% Matthew J. Donald %%% %%% web site: http://people.bss.phy.cam.ac.uk/~mjd1014 %%% %%% e-mail : mjd1014@cam.ac.uk %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\count17=0 %%to use pdfTeX comment out this line and uncomment the next \count17=1 \pdfoutput=\count17 %%to use plain TeX comment out this line and %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% uncomment the previous %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \ifnum\count17=1 \def\cmykBlue{1 1 0 0} \def\cmykBlack{0 0 0 1} \def\Blue{\pdfsetcolor{\cmykBlue}} \def\Black{\pdfsetcolor{\cmykBlack}} \def\pdfsetcolor#1{\pdfliteral{#1 k}} \def\setcolor#1{\mark{#1}\pdfsetcolor{#1}} \def\maincolor{\cmykBlack} \pdfsetcolor{\maincolor} \pdfinfo { /Title (A Priori Probability and Localized Observers) /Author (Matthew J. Donald) /CreationDate (November 1991) /ModDate (\number\year/\number\month/\number\day) /Subject (A Many-Minds Interpretation of Quantum Theory) /Keywords (quantum theory,many minds,philosophy of physics)} \pdfcompresslevel=9 \fi \magnification=1200 \hsize=13cm \headline={\hfil} \footline={\ifnum\count0 = 1 \hfil \else\hss\tenrm\folio\hss \fi} \topskip10pt plus30pt \def\<{{<}} \def\>{{>}} \def\dsize{\displaystyle} \def\text{\hbox} \def\newline{\hfil\break} \def\tr{\mathop{\rm tr}} \def\ent#1#2#3{\mathop{\rm ent}\nolimits_{#1}(#2\,|\,#3)} \def\app#1#2#3{\mathop{\rm app}\nolimits_{#1}(#2\,|\,#3)} \def\implies{\Rightarrow} \def\A{{\cal A}} \def\B{{\cal B}} \def\H{{\cal H}} \def\shoveright{\hfill} \def\allowdisplaybreaks{} \def\nomultlinegap{} \def\multline{\displaylines\bgroup} \def\endmultline{\egroup} \def\hcrh{\hfill \cr \hfill} \def\crh{\cr \hfill} \def\hcr{\hfill \cr} \abovedisplayskip=3pt plus 1pt minus 1pt \belowdisplayskip=3pt plus 1pt minus 1pt \def\proclaim#1#2{\medskip\noindent{\bf #1}\quad \begingroup #2} \def\endproclaim{\endgroup\medskip} \def\proof{\noindent{\sl proof}\quad} \def\K{{\cal K}} \def\H{{\cal H}} \def\parasign{\S} \def\blacksquare{\vrule height4pt width3pt depth2pt} \font\smc =cmcsc10 \font\Bbb =msbm10 \def\Real{{\hbox{\Bbb R}}} \def\Complex{{\hbox {\Bbb C}}} \font\bs=cmmib10 \def\bomega{\text{\bs \char'41}} \font\brm=cmbx12 \def\til{\lower 1.1 ex\hbox{\brm \char'176}} \ifnum\count17=1 \def\link#1#2{\leavevmode\pdfstartlink attr{/Border [0 0 0]} goto name{#1}\setcolor\cmykBlue #2\pdfendlink\setcolor\cmykBlack} \def\name#1{\pdfdest name{#1} xyz} \def\pdfproclaim#1#2#3{\medskip\name{#3}\noindent{\bf #1}\quad \begingroup #2} \def\outlink#1#2{\leavevmode \pdfstartlink attr{/Border [0 0 0]} user{/Subtype /Link /A << /S /URI /URI (#1) >>} \setcolor\cmykBlue #2\pdfendlink\setcolor\cmykBlack} \def\pdfeject{\eject} \else \def\link#1#2{{#2}} \def\outlink#1#2{{#2}} \def\name#1{} \def\pdfproclaim#1#2#3{\proclaim{#1}{#2}} \def\pdfeject{} \fi \vbox to 4cm{} \centerline{\bf A Priori Probability and Localized Observers.} \vbox to 2cm{} \centerline{\bf Matthew J. Donald} \vbox to 2cm{} {\bf \hfill The Cavendish Laboratory, JJ Thomson Avenue, \hfill Cambridge CB3 0HE, Great Britain.} \vbox to 1cm{} {\bf \hfill e-mail:\quad mjd1014@cam.ac.uk} \vbox to 0.75cm{} {\bf \hfill web site:\quad \catcode`\~=12 \outlink{http://people.bss.phy.cam.ac.uk/~mjd1014} {http://people.bss.phy.cam.ac.uk/\til mjd1014}} \vbox to 4cm{} {\bf \hfill November 1991 \hfill Revised: July 1992 \hfill Appears: Foundations of Physics 22, 1111-1172 (1992)} \vfill \eject {\parindent = 0pt {\bf Abstract.} \quad A physical and mathematical framework for the analysis of probabilities in quantum theory is proposed and developed. One purpose is to surmount the problem, crucial to any reconciliation between quantum theory and space-time physics, of requiring instantaneous "wave-packet collapse" across the entire universe. The physical starting point is the idea of an observer as an entity, localized in space-time, for whom any physical system can be described at any moment, by a set of (not necessarily pure) quantum states compatible with his observations of the system at that moment. The mathematical starting point is the theory of local algebras from axiomatic relativistic quantum field theory. A function defining the a priori probability of mistaking one local state for another is analysed. This function is shown to possess a broad range of appropriate properties and to be uniquely defined by a selection of them. Through a general model for observations, it is argued that the probabilities defined here are as compatible with experiment as the probabilities of conventional interpretations of quantum mechanics but are more likely to be compatible, not only with modern developments in mathematical physics, but also with a complete and consistent theory of measurement. \bigskip {\smc CONTENTS} } {\bf \item{1.} \link{sec 1}{Introduction and synopsis.} \item{2.} \link{sec 2}{What are the fundamental concepts for interpretations of universally valid quantum theories?} \item{3.} \link{sec 3}{Mathematical background.} \item{4.} \link{sec 4}{Postulates about observers.} \item{5.} \link{sec 5}{The a priori probability function and measurement theory.} \item{6.} \link{sec 6}{A priori probability and observations.} \item{7.} \link{sec 7}{Properties of a priori probability.} \item{8.} \link{sec 8}{The definition of the a priori probability function.} \item{9.} \link{sec 9}{A priori probability for a succession of collapses.} \item{10.} \link{sec 10}{Elementary models.} \item{11.} \link{sec 11}{Consistency.} \link{Ref}{References} \bigskip \name{sec 1} \noindent 1.\quad Introduction and synopsis.} \medskip There seems to be a wide gap between the entities traditionally seen as central in the interpretation of quantum mechanics (particles, eigenvalues, wave-functions) and the von Neumann algebras associated with localized regions of space-time, which are focused on by mathematical physicists interested in the conceptual foundations of relativistic quantum field theories. The purposes of this paper are; first, to draw attention to this gap; second, to discuss a function measuring relative probabilities between local algebra states which is intended to be a tool for interpretations based on field theory; and third, to develop a framework for defining the a priori probability of a localized observer with a given quantum-mechanically specified life history up to a given time observing a given set of local quantum states. Most of the paper should be accessible to physicists unfamiliar with the mathematics of local algebras. The characterization of ``wave-packet collapse'' might well be thought of as the primary problem of the foundations of quantum theory. We understand very well how to model many physical situations, at a given moment, by an appropriate wave-function. The problem is that the wave-function appropriate at one moment appears to change abruptly whenever an act of measurement or an observation occurs. There is no widely-accepted detailed theory of such acts. In this paper, as in \link{otre}{Donald (1990)}, we shall consider a more general idea of ``collapse'' as being some process, to be characterized, which results in the discontinuous replacement of one quantum state by another, whether or not these states are pure. This process must, of course, be distinguished from Hamiltonian time propagation, which is continuous, and, indeed, does not cause any change in state if we work, as we always shall here, in the Heisenberg picture. Working in a local space-time region, we are not given wave-functions, but rather states in the operator algebra sense, which correspond, in general, to mixed-state density matrices. Local information is not sufficient to distinguish whether, at the global level, a given state is pure or not. In the conventional approach to quantum theory, where we work globally, we are given an initial wave-function $\psi$ and a choice $(\varphi_n)^N_{n=1}$ of wave-functions for the result of a collapse. It is then claimed that the a priori probability of the collapse to $\varphi_n$ is given by the squared amplitude $|\<\varphi_n|\psi\>|^2$. The most fundamental unsolved problems in this approach lie in giving a detailed specification of the sequence $(\varphi_n)^N_{n=1}$ and of the times at which collapse occurs. Of course, the $(\varphi_n)^N_{n=1}$ are usually taken to be the eigenvectors of some ``measured operator''. The difficulty lies in identifying this operator without ambiguity, given a real physical measuring apparatus. It is also far from easy to see how such an approach can be made compatible with relativity theory, because of the implicit assumption of a globally simultaneous collapse. The desirability of working locally comes partly from this problem and partly from the manifest fact that observers and observations are intrinsically localized. In this paper, at the simplest level, the aim is to develop the possibility of considering an initial state $\rho$ to be given on some local region specified by a set of operators $\B$, and to give an appropriate definition for a function $\app{\B}{\sigma}{\rho}$ to measure the a priori probability of collapse occurring to a new state $\sigma$. I have previously considered, in \link{otre}{Donald (1986, 1987a)}, the mathematics of this definition, but, here, I emphasize its physical motivation. By itself, however, the function $\app{\B}{\sigma}{\rho}$ is merely a tool, albeit one which might be useful to any interpreter of quantum mechanics wishing to work with density matrices rather than wave functions. The really challenging question is how that tool might be used in progress towards an interpretation of quantum mechanics based on the idea of locally defined states. This paper, therefore, is mainly concerned with building a framework for such an interpretation. Here too there is a problem of detailed specification, but, I believe, that the framework proposed here is more likely to permit the construction of such a specification than is the framework of conventional quantum mechanics. Indeed, I began such a construction in \link{otre}{Donald (1990)}, based on the idea of the physical structure of ``an observer'' (or equivalently, of ``a consciousness''), as being constituted by an information processor operating through a localized family of abstractly definable quantum switches. The definition of a quantum switch and a demonstration that such switches exist in the human brain are given in \link{otre}{Donald (1990)}. As an interpretation, even the combination of that paper and this will still be incomplete. This paper gives an analysis of observation processes which allows most of the problems of detailed specification to be passed back to the characterization of the observer. However, only an outline description of an observer is given here. I intend to continue this work in a further paper, already in progress, which will look at observers more closely. Although the ideas in this paper and in \link{otre}{Donald (1990)} were originally developed together, this paper may be read independently of its predecessor and of its sequel. I hope, in fact, that the material in much of this paper is sufficiently general that it might be possible to use it to develop more than one route towards a complete interpretation of quantum theory. It is clearly not possible to develop interpretations of quantum theory from universally accepted first principles; comparing at each stage with all the alternatives. Instead, one must choose the first principles which one wishes to use as the fundamental ingredients of an interpretation. One should try to make those ingredients explicit in the form of premises which cannot be totally superseded without the abandonment of the interpretation. Then one can begin to evaluate the consequences of these premises. Disagreements about premises are inevitable, but they tend not to be as fruitful as discussions of their consequences. In this paper, I wish to develop techniques for the calculation of a priori probability, in the context of the premise that quantum field theory is universally valid and the premise that observers are individual, localized, animate repositories of information gained about the universe which are describable in quantum mechanical terms. The emphasis on individual observers is a direct reflection of the idea that in relativity theory we assign separate proper times to separate observers. This leads to a many-worlds type theory in which, at each moment, each observer assigns his own separate quantum state to the ``reality'' with which he is interacting. In such a theory, compatibility with relativity is not a problem essentially because ``collapse'' is an alteration of the state currently assigned by a given observer to a given system rather than a message sent to or from that system. For the same reason, the violation of Bell's inequality is also not a problem. One must, of course, answer questions about the relationship between the observations of different observers, but, as we shall see, this is no more difficult than giving a general account of observation. The problem which is difficult, on the other hand, is the development and understanding of a coherent formalism for such a theory. This paper forms part of a serious attack on that problem. Without alternative proposals reconciling relativity and quantum theory, merely to dismiss such a many-worlds theory as being philosophically unacceptable is, in my opinion, the least fruitful of all the potential ways of disagreeing with this paper. For myself, I see it as being quite natural to allow each observer his own independent objective existence, his own independent observations, and his own independent a priori probabilities. The development of these vague premises into a detailed interpretation will come in the form of ``postulates'' or ``hypotheses''. These will convert the premises into specific formalism; providing a model with consequences which may be ascertained and argued over. They will be open to modification and refinement as the theory develops. In this paper, as in \link{otre}{Donald (1990)}, I shall only be presenting a partial set of postulates. While this is dangerous, since over-all consistency is one of the hardest goals for an interpretation to achieve, it should, nevertheless, be permitted, in view of the difficulty of the problem and in order to encourage variant developments. Widespread dissatisfaction has been expressed about every interpretation of quantum theory proposed in the course of the last sixty years. I suspect that any successful interpretation will have to introduce so many difficult ideas as to be, in its entirety, well-nigh incomprehensible at first sight. It seems to me, therefore, that it is worthwhile trying to discuss separately possible postulates for potential interpretations. Despite its incompleteness, it may well appear that this paper rapidly becomes swamped by a welter of abstract technical detail when the postulates finally appear in section \link{sec 4}{4} and thereafter. This is regrettable, although I believe that it is also inevitable. Much of the abstraction arises because we shall be working not with individual quantum states but with set of states. An analogous abstraction allowed tremendous progress in probability theory when the introduction of measure theory required attention to pass from individual events to sets of events. Of course, in the present context, there is a philosophical question about the desirability of ascribing ontological priority to sets of states: Is it desirable to describe an observer as existing as a set of sequences of density matrices? This question is left for the reader to ponder. However, even leaving aside the claim that locally defined neighbourhoods of states are more natural entities within relativistic quantum field theories than globally defined eigenstates of specified operators, it may be noted that traditional measurement theory is at a dead end, and so any theory allowing greater flexibility should be explored. \proclaim{Synopsis}{} In section \link{sec 2}{2}, it is proposed that the local quantum state is the natural fundamental entity in quantum theory, the idea of working with sets of such states is motivated, seven different \link{notions15}{notions} of quantum probability are introduced, and the importance of correlations between different observables is stressed. Section \link{sec 3}{3} discusses the mathematical background necessary for the analysis of local states and gives a formal definition of correlation. Attention is drawn to an important but troublesome property of local algebras without which the mathematics which we shall need, might be considerably more straightforward. In order to indicate how a complete interpretation of quantum theory might be based on the concepts central to this paper, some fairly broad postulates about observers are put forward in section \link{sec 4}{4}. Section \link{sec 5}{5} presents a general model for an observation in terms of a state decomposition with negligible interference effects. This imposes an important property on the function $\app{\B}{\sigma}{\rho}$. Section \link{sec 6}{6} extends the model of section \link{sec 5}{5} to the context of the postulates of section \link{sec 4}{4}. In section \link{sec 7}{7}, properties for $\app{\B}{\sigma}{\rho}$ are proposed which are appropriate for a definition of a priori probability and which include the property imposed in section \link{sec 5}{5}. Much of this work involves interpreting the notion of a quantum state. In section \link{sec 8}{8}, a selection of the properties is shown to be sufficient to provide a unique definition which satisfies all the other properties. Section \link{sec 9}{9} discusses the mathematics of multiple-observation probabilities, section \link{sec 10}{10} sketches mathematically elementary models of some of the structures introduced, and the \link{sec 11}{final} section considers the various consistency issues which have arisen. \endproclaim \pdfproclaim{2. \quad What are the fundamental concepts for interpretations of universally valid quantum theories?}{}{sec 2} \endproclaim Sometimes, quantum mechanics can seem no more than a disparate collection of calculational recipies. For example, we can calculate the frequency-dependent susceptibility of a family of atoms in a cavity, we can calculate electron-positron annihilation cross-sections, and we can relate the zero-temperature energy gap in weak-coupling superconductivity to the transition temperature. The glory of quantum mechanics, however, is the unity it brings to such disparate topics. It is not just that the methods for calculations are often similar; involving perturbation theory and phenomenological, approximate Hamiltonians. It is also the suspicion that there is a true universal Hamiltonian in some wonderful super-theory of everything, and that all our calculations are valid approximations for different situations in that universal theory. Of course, it is not known whether footballs, planets, and steam engines are solely governed by quantum mechanical laws. The Copenhagen interpretation of quantum theory demands the existence of a ``classical regime'' in which macroscopic objects can be described using Newtonian mechanics. On the other hand, \link{otre}{Everett (1957)}, developed the many-worlds interpretation, which he referred to as ``The Theory of the Universal Wave Function'', precisely in order to be able to apply quantum theory to the entire universe. In this paper the universal validity of quantum theory will also be postulated. However, the framework of the development here will be different. In constructing a universal quantum theory, Everett conceived a brilliant intuition about the nature of observers which will be discussed below. He expounded his intuition in the framework of elementary quantum mechanics. I shall use Everett's intuition, but I shall try to expound it in a framework compatible with quantum field theory. In most interpretations of quantum mechanics, the concept of the individual particle is fundamental. In interacting relativistic quantum field theories, however, local particle number becomes indefinite, because of the possibility of particle (or particle-antiparticle) creation. Only in scattering theories are particle states clearly definable. In other words, either we deal with an idealized infinite-volume theory, or we work locally and accept that the concept of ``an individual particle'' is as ``classical'' in relativistic quantum theory as the concept of ``a particle with precisely defined position'' in non-relativistic quantum theory. Indeed, in a way, the former concept is even more ``classical'', since, charge quantum numbers apart, it is not even plausible that it is possible to specify, for example by appropriate renormalizations or dressing transformations, operators unambiguously ``measuring'' the expected number of particles within a given region. The mathematical construction of quantum field theory models in two-dimensional space-time has provided models in which this sort of problem could be made explicit (\link{GarHaag}{Glimm and Jaffe (1971, 1972, 1979)}). It is a first principle in this paper that the quantum measurement problem should be discussed and solved in terms of localized entities -- entities that exist and are defined within bounded regions of space-time. In particular, observers and experiments are assumed to be localized in this sense. In order to define local entities, one must be able to defined ``local quantum states''. Fortunately, a mathematical theory of such states already exists. This theory will be reviewed briefly below. Of course, the idea of the ``local quantum state'' is natural, and does not need any deep mathematics for an initial physical perspective. From such a perspective, we would say that the quantum state in a space-time region $\Lambda$ (by convention, a region is an open bounded connected set) is simply the state that one assigns to the objects within $\Lambda$. In view of quantum measurement problems, however, this statement does require considerable eludication and modification. In particular, the word ``one'' presumably refers to an observer and must require specification of that observer and his time frame. Also, since we have assumed that $\Lambda$ is extended in time (and this may well be necessary in relativistic field theories (\link{Strat}{Streater and Wightman (1964)} \parasign 3.1)), we should expect that it is possible for a process analogous to ``wave-packet collapse'' to require the assignment of more than one state to $\Lambda$. This paper provides tools for attacking these difficulties. Despite being bounded, the space-time regions that we shall be working with are to be thought of as macroscopic; perhaps, for example, big enough to contain an animate observer over an extended time period. This implies, in particular, the consideration of multiple observations and the assignment to the region of a sequence of several or many states. The theory for single observations is considerably simpler than that for multiple observations. Much of the mathematics of the single observation theory is presented in \link{otre}{Donald (1986, 1987a)}. This paper provides a broader physical framework for that mathematics and also tackles the multiple observation theory. As already mentioned, a local state is to be thought of as a density matrix, which is, in general, mixed; rather than as a pure state wave-function. There are several grounds for wishing to base interpretations on density matrices rather than on wave-functions. Of course, quantum theory often provides wave-functions as initial models for a given physical system, but it is elementary to use these to construct density matrices. For a first ground, at the mathematical level, it is worth mentioning for the experts, that on a type III von Neumann algebra (the sort relevant for quantum statistical mechanics and local quantum field theory), there is no such thing as a pure normal state. Secondly, at the physical level, we shall be dealing with the best guess that a given observer can make for the current state of a given macroscopic subsystem of the universe. It is certainly more natural for him to assign thermodynamical properties, for example, entropy and temperature, to such subsystems than to try to assign, for example, an exact eigenstate of some, not necessarily well-defined, energy operator. Finally, at the formal level, it is often claimed that one can simply replace a density matrix by an element of its eigenexpansion. However, this is difficult to justify because the relationship between the two is both ambiguous (e.g. ${1\over2}|\psi_1\>\<\psi_1| + {1\over2}|\psi_2\>\<\psi_2| = {1\over4}|\psi_1+\psi_2\>\<\psi_1+\psi_2| + {1\over4}|\psi_1-\psi_2\>\<\psi_1-\psi_2|$ for orthogonal $\psi_1$ and $\psi_2$) and unstable (e.g. $({1\over2}+\varepsilon)|\psi_1\>\<\psi_1| + ({1\over2}-\varepsilon)|\psi_2\>\<\psi_2|$ is arbitrarily close to $({1\over4}+\varepsilon)|\psi_1+\psi_2\>\<\psi_1+\psi_2| + ({1\over4}-\varepsilon)|\psi_1-\psi_2\>\<\psi_1-\psi_2|$ for sufficiently small $\varepsilon > 0$). The assumption that this relationship is easily dealt with is one of the major problems with, for example, Everett's formalism. By always working at the density matrix level, we can give a much more sophisticated analysis and a generalization of this relationship; thus, allowing, for example, for imprecision. This is one of the central tasks of this paper. Our starting point, nevertheless, is that for a density matrix \name{eq2.1} $$\rho = \sum_{i = 1}^\infty r_i|\psi_i\>\<\psi_i| \eqno{(2.1)}$$ where $(\psi_i)_{i = 1}^\infty$ is an orthonormal basis of $\H$ and $0 \leq r_i \leq 1$ with $\sum_{i = 1}^\infty r_i = 1$, $r_i$ should be the a priori probability of going from $\rho$ to $|\psi_i\>\<\psi_i|$ when such a collapse is warranted. A decomposition of the form (\link{eq2.1}{2.1}) will be recognized as a decomposition ``without interference effects''. Justifying such a decomposition for ``the true physical state'' in appropriate circumstances has long been seen by many to be a solution to the problems of measurement theory. However, while it seems to be widely accepted that, in as far as interference effects can often be physically negligible, such decompositions can often be plausible models for physical situations, it also seems that justification from first principles has been lacking or unsatisfactory. The validity of such a decomposition depends on an appeal to non-unitary, irreversible time propagation or to some sort of ``coarse-graining''. In this paper, coarse-graining comes from the restriction to localized regions, and the localization of observers is a first principle. The realization that coarse-graining arises naturally from the incomplete, non-universal, structure of an observer is precisely the brilliant intuition which I ascribed above to Everett. \name{notions15} No less than seven different notions of quantum probability arise in this paper or are in common usage. The first notion is that of the a priori probability of observation of a subsystem occupying a given set of states by an observer with a given life history up to a particular time. This notion is presented in section \link{sec 4}{4}. The second notion is the function $\app{\B}{\sigma}{\rho}$ measuring the a priori probability of the collapse of the state $\rho$ on $\B$ to the state $\sigma$. The third notion involves taking a density matrix as a mixture of components weighted by probabilities, as exemplified by (\link{eq2.1}{2.1}). $\app{\B}{\sigma}{\rho}$ will be constructed as a generalization of this notion. The fourth notion is the idea of the expected value of a projection $P$ in a state $\rho$. \name{eq2.2} Recall that if $\rho = \sum_{i = 1}^\infty r_i|\psi_i\>\<\psi_i|$ is a state and $P = \sum_{n = 1}^\infty |\varphi_n\>\<\varphi_n|$ is a projection, for suitable (orthonormal) wave-functions $\psi_i$ and $\varphi_n$, then this expected value is defined as $$\rho(P) = \sum_{n,i} r_i|\<\psi_i|\varphi_n\>|^2. \eqno{(2.2)}$$ The left hand side of (\link{eq2.2}{2.2}) uses an alternative notation for $\tr(\rho P)$, which is basic to the mathematical analysis of states on von Neumann algebras and which will be used throughout this paper (see \link{def3.1-3.3}{3.3}). Notion four is also a generalization of notion three, since $r_i$ is the expected value of the projection $|\psi_i\>\<\psi_i|$ in the state given by (\link{eq2.1}{2.1}). This notion is fundamental to textbook quantum theory; being at the heart of such calculated quantities as scattering cross sections and decay rates where it often appears in the guise of the fifth notion which is that of probability as amplitude squared. Indeed, whenever this arises as a ``transition probability'' in textbook calculations, it can always be interpreted as a special case of notion four. \name{notions67} The sixth notion may be referred to as ``relative frequency'', or, more generally, as ``statistical probability''. A particle lifetime, for example, equivalent to a probability per unit time for decay, is not directly observable by a single measurement on one particle. Instead, statistics have to be gathered from many measurements. The quoted ``observed'' lifetime is found by an analysis of these statistics. Statistical probability itself comes in many different varieties, depending not only on the analysis made of a given set of measurements, but also on the particular methods of observation. The empirical justification of conventional quantum theory lies in the observed agreement between these statistical probabilities and theoretical calculations of probabilities of the third, fourth, and fifth kinds. In section \link{sec 6}{6}, it will be argued that this empirical evidence yields just as strong a justification for a theory based on the first notion. Ultimately, each of these six notions depend on, and are consistent with, the seventh, and fundamental, notion of the ``typical'' observer. The typical observer observes a world in which, when suitably applied, the first six notions are almost always consistent. The typical observer also lies at the heart of the justification of classical probability theory, because it is only for the typical observer that relative frequencies measured over long periods usually come very close to predicted probabilities. The typical observer may occasionally win a lottery, but cannot win every lottery he enters. The deepest idea in this paper is a definition of a priori probability for the families of sets of states occupied by individual observers. Typical observers are those for which this a priori probability is relatively high. In quantum theory, as in classical theory, any method of calculating probabilities will be justified in as far as it can be argued that typical observers will tend to observe relative frequencies which agree, in the long run, with those calculations. The first five notions of quantum probability described \link{notions15}{above} provide tools with which, in appropriate circumstances, such calculations can be made. By choosing appropriate local quantum states for a given physical situation, it is possible to model the ``different situations'' which distinguish between, for example, regions where there are lasers, and laser theory applies, and regions where there are superconductors, and superconductivity theory applies. Alternatively, appropriate local quantum states might be chosen to be approximate scattering states and could, say, represent ``close to'' five ``reasonably well-separated'' particles initially at ``around'' five given places and with ``roughly'' five given momenta. Elementary quantum mechanics tells us how to define ``wave-packets'' and more sophisticated theories build on the same intuition. There is, however, a central dilemma here, exemplified by the fact that a wave-packet is neither a position nor a momentum eigenstate (of course, it could not be so, because none such exist within the space of normable wave-functions). The dilemma is that, while a range of approximating wave-packets is possible, with limits set only by the uncertainty principle, there is no definitive theory for choosing within that range. The essence of the problem of state collapse is that we cannot merely assume that some state within the range is given to us by an ultimately deterministic dynamics -- we need to understand the processes by which the states that we do see come to be seen. In the standard interpretation of quantum mechanics, the resolution of this dilemma is supposed, somehow, to lie in the correct description of the measurement process and its equivalent, the state preparation process. Any measurement is supposed, somehow, to be a measurement of some given, predetermined, operator with a given discrete spectrum. The association between measuring apparatus and measured operator, however, is seldom made explicit, and there seem to be no generally applicable techniques for defining such an association. It is not clear, for example, that a bubble chamber experiment is not just as much a measurement of bubble positions as of elementary particle positions. As a first step in reaction to this state of affairs, we shall defocus; allowing that there is a wide variety of operators that one is, to some approximation, measuring. We shall, therefore, consider a measuring apparatus or an observation as describable not in terms of eigenvectors of a given operator, but merely in terms of a given family of neighbourhoods of local quantum states. For example, in the traditional approach, the results of an energy measurement will be the eigenvalues $\{E_i : i\in I\}$ of a prescribed Hamiltonian $H$. These correspond to density matrices $\{\sigma_i : i \in I\}$ such that $\sigma_i((H-E_i)^2) = 0$. The methods introduced in this paper would allow the results of an approximate energy measurement, or a measurement in which $H$ is an approximate Hamiltonian, to correspond to sets of density matrices -- say, \name{eq2.3} $$\{S_i : i\in I\} \text{ where } S_i = \{\sigma : \sigma((H-E_i)^2) < \varepsilon\} \text{ for some } \varepsilon > 0. \eqno{(2.3)}$$ While an appropriate formalism for any sort of approximate measurement could be developed from this paper, the focus here will be on allowing for small variations in the states assigned by conventional measurement theory. Indeed, we shall focus on unobservably small variations, because these will still be enough to avoid the assumption, for example, that a particle must occupy an exactly specified wave-packet. It is possible to allow such variations because there is no need to try to associate definitive structure with each measuring apparatus in a theory which gives the cardinal role to the observer. The measurement made by the apparatus need be specified only to the extent that the observer's observations of the apparatus requires the apparatus states to be defined. Ultimately, the ``observed apparatus'' in which we shall be most interested and to which these remarks will still apply, is the observer's own brain. The distinction between different neighbourhoods of states is made by the expected values which these states assign to various operators. These operators could include any of the possible operators which we might traditionally think that we were measuring. Thus, in a bubble chamber experiment, the results are distinguished both by the original elementary particle positions and by the bubble positions, because the entire purpose of the apparatus is to correlate these variables. More generally, the purpose of quantum mechanical experiments is to correlate variables linked to microscopic objects with variables linked to macroscopic objects. Bohr saw such macroscopic variables as constituting a ``classical regime''. In my opinion, it is their observed behaviour which is most important, rather than any direct relation to Newtonian mechanics. A macroscopic variable is one which is always seen by observers to take definite values in such a way that different values are only seen to be taken on states which differ extensively throughout macroscopic space-time regions. Explaining the existence of such variables is a major task for an interpretation of quantum theory; one which will need a considerable input from statistical mechanics as well as a theory of observers. Macroscopic variables are always extensively correlated with other macroscopic variables. At a descriptive level, any one of a wide variety of correlated macroscopic variables will suffice to distinguish the different possible results of an experiment. The a priori probability of a given result will be determined, to a good approximation, by the initial expectation value of any of a wide variety of projection operators correlated to these macroscopic variables. The compatibility between these various approximations suggests the possibility of replacing textbook quantum mechanics by an alternative theory without violating the compatibility between theory and observation. We shall give this a more formal treatment in sections \link{sec 5}{5} and \link{sec 6}{6}. Macroscopic situations can be of great complexity. For example, one can observe not only an experimental result, but also the reaction of a colleague to that result. The fact that different observers will, in general, agree between themselves about the result of an experiment, is reflected, at the theoretical level, in correlations between appropriate macroscopic variables. Thus a theory which deals adequately with correlations will automatically be a theory in which the observations of different observers appear to be compatible. The range of correlated macroscopic variables makes it difficult to choose definitive observables for each particular experiment. Ultimately, however, the purpose of any experiment is to cause changes in the brain of the observer. For this reason, it is possible to reduce all observations to neural observations. Indeed, this is the only way that I can see of finding ``simple'', ``natural'' descriptions, sufficient to specify arbitrary observations. Moreover, I think it is then necessary to use abstract definitions and to give an abstract definition of an observer as an information processor. All entities satisfying such a definition are to be allowed equivalent status as possible manifestations of the observer, each with its own a priori probability. This led in \link{otre}{Donald (1990)} to the introduction of a ``quantum switch'' as an information processing primitive. Abstractly specified neighbourhoods of states were used in that definition partly in order to allow for perturbations and yield a ``structurally stable'' theory and partly in order to deal with the fact that there do seem to be many alternative sets of equivalent quantum switches with which a human brain can be seen as functioning as an information processor. In section \link{sec 4}{4}, I introduce a preliminary model, compatible with that work, of an observer as a neighbourhood of sequences of quantum states with correlations expressed on a distinguished set of observables. In the sequel to this paper, the ideas in section \link{sec 4}{4} and in \link{otre}{Donald (1990)} will be developed to give a full abstract model of an observer. Leaving the details of all this to one side, however, there remains plenty to do in this paper in finding a method for calculating probabilities in such a context. \eject \pdfproclaim{3. \quad Mathematical Background.}{}{sec 3} \endproclaim At an elementary level, a quantum mechanical system $S$ is described by a Hilbert space $\H_S$ of wave functions defined in terms of variables appropriate to that system. If the Hilbert space for the entire universe is taken to be $\H$, then it is assumed that there is another system $S'$ describing the rest of the universe such that $\H$ can be written as a tensor product: $\H = \H_S\otimes\H_{S'}$. Observables on the system $S$ are defined as operators on $\H_S$, that is as elements of $\B(\H_S)$ -- the space of all bounded operators on $\H_S$. For simplicity of language, I shall use the word ``observable'' to refer to any bounded operator, not just to a Hermitian operator. Also, I shall not distinguish between an operator $A \in \B(\H_S)$ and the operator $A\otimes1_{S'} \in \B(\H)$, where $1_{S'}$ is the identity operator on $\H_{S'}$. The states of $S$ are defined to be the density matrices of $\H_S$. A mathematically more sophisticated approach allows for more general sets of operators to correspond to the observables of $S$. This approach says that to $S$ corresponds a von Neumann algebra $\A_S$, and that physical states of $S$ correspond to normal von Neumann algebra states on $\A_S$. It will always be assumed here that there is a global Hilbert space $\H$ describing the entire universe. Every subsystem von Neumann algebra $\A$ will then be a subset of $\B(\H)$. \medskip \name{def3.1-3.3} {\parindent=0pt {\bf Definition} 3.1) Given $\B \subset \B(\H)$, we define $\B' = \{ A\in\B(\H) : [A,B] = 0 \hbox{ for all } B\in\B \}$ and $\B'' = (\B')'$. 3.2) A von Neumann algebra $\A$ is a subset of $\B(\H)$ such that i) $\lambda_1 A_1 + \lambda_2 A_2$, $A_1 A_2$, and $A_1^* \in \A$ for all $\lambda_1, \lambda_2 \in \Complex$, $A_1, A_2 \in \A$, ii) $\A'' = \A$. 3.3) Mathematicians define a normal state $\rho$ on a von Neumann algebra $\A$ to be a $\sigma$-weakly continuous, positive, linear function on $\A$ such that $\rho(1) = 1$ where $1$ is the identity operator. There is an equivalent but less sophisticated definition. First, a density matrix $\omega = \sum_{i = 1}^\infty p_i |\psi_i\>\<\psi_i|$ on $\H$ is identified with a function on $\B(\H)$ by writing $\omega(A) = \sum_{i = 1}^\infty p_i \<\psi_i|A|\psi_i\>$ for $A \in \B(\H)$. Then a normal state $\rho$ on a subsystem corresponding to an algebra $\A$ is defined to be a function on $\A$ which is the restriction of some (not necessarily unique) density matrix $\rho'$ in the sense that $\rho(A) = \rho'(A)$ for all $A \in \A$. } Von Neumann algebras are useful tools in quantum statistical mechanics (\link{Bor}{Bratteli \& Robinson (1979, 1981)}), and they are essential for describing local properties in quantum field theory. It is therefore necessary to be able to develop our ideas at the von Neumann algebra level. While this will be done in the sequel, physicists who are not familar with the notation will not lose any of the fundamental content of the paper by reading throughout ``density matrix'' for ``von Neumann algebra state'' and ``set of all bounded operators on a Hilbert space defining a subsystem'' for ``von Neumann algebra''. Attention, however, must be paid to the relationship between subsystems and regions of space-time. According to the theory of local algebras in quantum field theory, to each region $\Lambda$ of space-time there is associated a von Neumann algebra $\A(\Lambda)$ consisting of the set of all observables defining properties within that region. We shall refer to a state on $\A(\Lambda)$ as being a state in the region $\Lambda$. The theory of local algebras has been most completely developed in the context of Wightman fields, but there seems little doubt that it can be extended to cover gauge field theories (\link{RoosShore}{Seiler (1982)}). In my opinion, analogous structures will also exist in the context of the quantization of gravity. Various properties have been postulated for local algebras (\link{GarHaag}{Haag and Schroer (1962)}, \link{GarHaag}{Haag and Kastler (1964)}, \link{otre}{Driessler, Summers, and Wichmann (1986)}, \link{BucDew}{Buchholz and Wichmann (1986)}). Two which will be relevant below are: \name{3.4,3.5} \noindent 3.4) If $\Lambda_1$ and $\Lambda_2$ are spacelike separated then $[A_1, A_2] = 0$ for all $A_1 \in \A(\Lambda_1)$ and $A_2 \in \A(\Lambda_2)$. \noindent 3.5) If $\Lambda_1$ and $\Lambda_2$ are spacelike separated by a strictly positive distance, and $\rho_1$ and $\rho_2$ are arbitrary states on $\A(\Lambda_1)$ and $\A(\Lambda_2)$ then, there exists a state $\rho$ on $\B(\H)$ with $\rho|_{\A(\Lambda_1)} = \rho_1$ and $\rho|_{\A(\Lambda_2)} = \rho_2$. (\link{3.4,3.5}{3.5}) is an expression of the physical independence of strictly spacelike separated regions. It is related, but not equivalent, to (\link{3.4,3.5}{3.4}) (\link{GarHaag}{Haag and Kastler (1964)}, \link{RoosShore}{Roos (1970)}, \link{otre}{Ekstein (1969}, Appendix C), \link{BucDew}{De Facio and Taylor (1973)}, \link{BucDew}{Buchholz, D'Antoni, and Fredenhagen (1987)}). $\A(\Lambda)$ is the set of all operators that could conceivably be measured within the region $\Lambda$. For many purposes this set might appear to be much too large. However, the $\A(\Lambda)$ are the only fundamental sets of observables supplied by quantum field theory, and, in my view, they are the only fundamental sets of observables on which to base interpretations of quantum theory. Fundamental sets of observables are called for, in particular, in the task, mentioned in section \link{sec 2}{2}, of finding abstract definitions for observers. At the elementary level, if we have two subsystems $S_1$ and $S_2$ then we have, correspondingly, two Hilbert spaces $\H_1$ and $\H_2$, and two sets of observables $\B(\H_1)$ and $\B(\H_2)$. The observables $\B(\H_1\otimes \H_2)$ of the combined system on $\H_1 \otimes \H_2$ do not consist just of the union of $\B(\H_1)\otimes 1_2$ and $1_1\otimes\B(\H_2)$. There are also correlations between $S_1$ and $S_2$. Similarly, for two space-time regions $\Lambda_1$ and $\Lambda_2$, $\A(\Lambda_1\cup\Lambda_2)$ is, in general, strictly larger than $\A(\Lambda_1) \cup \A(\Lambda_2)$. In what follows, we shall be considering sets of observables like, for example, $\B = \cup_{n=1}^N \A(\Lambda_n) \cup {\cal C}$ for some set $\{\Lambda_n: n = 1\dots, N\}$ of relevant space-time regions and some subset ${\cal C}$ of $\A(\cup_{n=1}^N\Lambda_n)$ expressing a physically relevant selection of the correlations between the $\Lambda_n$. In general, $\B$ will not be a von Neumann algebra because it will not be true for every pair $B_1$, $B_2 \in \B$ that we also have $B_1 B_2 \in \B$. Physically, we shall be dealing with sets of subsystems, rather than with a single all-embracing subsystem. This leads to mathematical complications, so an explanation of why it would be inappropriate to use $\A(\cup_{n=1}^N\Lambda_n)$ is called for. {\parindent=0pt \name{def3.6,3.7} {\bf Definition} 3.6) A double cone $C(x,y)$ is a region in space-time, defined by two points $x$ and $y$ such that $x - y$ is timelike and future directed, which takes the form $$C(x,y) = \{z : x - z \hbox{ and } z - y \hbox{ are timelike and future directed} \}.$$ 3.7) The causal shadow $\Lambda^{sh}$ of a region $\Lambda$ is defined as $$\allowdisplaybreaks\displaylines{ \Lambda^{sh} = \{x: \text{ every future directed timelike path from $x$ meets $\Lambda$}\} \hcrh \cup \ \{y: \text{ every past directed timelike path from $y$ meets $\Lambda$}\}. }$$ } In terms of (\link{def3.6,3.7}{3.7}), it is very natural to assume that our local algebras should possess the property that, for any region $\Lambda$, \name{3.8} $$\A(\Lambda) = \A(\Lambda^{sh}). \eqno{(3.8)}$$ This property, proposed by \link{GarHaag}{Haag and Schroer (1962)}, amounts to the assumption of determinism for quantum field theory, in the sense that the Cauchy problem is well-posed. If (\link{3.8}{3.8}) holds and we know the state of the world in the region $\Lambda$, then we also know the state of the world in the causal shadow of $\Lambda$. Haag and Schroer exhibited a certain Wightman field theory (a generalised free field) which did not have property (\link{3.8}{3.8}). This failure, which should lead us to discard such fields as being unphysical, has been examined further by \link{GarHaag}{Garber (1975)}. On the other hand, property (\link{3.8}{3.8}) has been proved for ordinary free fields and for $P(\varphi)_2$ and $Y_2$ fields (see \link{GarHaag}{Glimm and Jaffe (1971, 1972)}). The following property is closely related, but takes a different angle on the casual shadow: \pdfproclaim{Property 3.9}{}{3.9} Suppose that $x, y \in \Lambda$ with $x - y$ timelike and future directed, and suppose that $\Lambda$ contains a neighbourhood of some timelike path from $y$ to $x$. Then $\A(C(x,y)) \subset \A(\Lambda)$, where $C(x,y)$ is given by (\link{def3.6,3.7}{3.6}). \endproclaim It has been proved by \link{Bor}{Borchers (1961)} and \link{Araki}{Araki (1963)} that this property holds for all Wightman fields. I shall assume that it holds for all quantum field theories. Property \link{3.9}{3.9} is troublesome: Suppose that local algebras do provide the fundamental variables for describing physical systems. Consider a small physical object on the surface of the Earth. Suppose that we study that object for three seconds, and that we learn its state precisely over that entire period. If we think of that state as being a state on the local algebra of the space-time region swept out by the object, then according to property \link{3.9}{3.9}, we would know the precise state of everything from the Earth to the Moon at some time within that interval. This is essentially because the amount of information and degree of precision being required is absurd. Events on the Moon affect the precise correlations between two events on the Earth separated by a time interval of the appropriate length. Indeed, property \link{3.9}{3.9} says that so many correlation variables can be generated that correlations of Earth events can reveal Moon events. This would appear to destroy the potential of localization for providing a mechanism for coarse-graining. The problem is to loosen the force of the requirement that a local state be known precisely over an extended period, while retaining a theory compatible with relativistic quantum field theory and definable in abstract terms. One way of doing this is to consider states of physical objects to be defined on non-algebras. We shall throw away information about all except a specified subset of the correlations between time-like separated events. Simplicity of language will be maintained by introducing the following, to some extent non-standard, definition: \pdfproclaim{Definition 3.10}{}{def3.10} A normal state $\sigma$ on a subset $\B \subset \B(\H)$ is the restriction of a density matrix $\sigma'$ on $\H$ to $\B$. \endproclaim As noted above, this definition agrees with the standard definition of a normal von Neumann algebra state when $\B$ is a von Neumann algebra. If $\B \subset \A(\Lambda)$ for some bounded space-time region $\Lambda$, then the set of states defined by \link{def3.10}{3.10} is exactly the same as the set of restrictions to $\B$ of pure normal states on $\H$. This is a consequence of the fact that the vacuum may be assumed to be a separating vector for the algebra $\A(\Lambda)$ (\link{Strat}{Streater and Wightman (1964}, theorem 4.3), \link{otre}{Driessler, Summers, and Wichmann (1986)}, \link{Strat}{Str\u atil\u a and Zsid\'o (1979}, \parasign 5.24)). When appropriate, mathematical readers should extend definition \link{def3.10}{3.10} and consider also restrictions of non-normal states. Non-normal states are states with inferior continuity properties which are related to normal states very much as irrational numbers are to rational numbers. They arise, for example, in mathematically natural limiting processes. The existence of such states is largely irrelevant to the physics of this paper because the formalism will allow us to impose whatever properties we find physically necessary on the sets of local states we want to use. For physicists, it is much more important to bear in mind that ``state'' here means ``density matrix'' rather than ``wave function''. At the technical level, however, when I need to restrict myself to normal states, I shall do so explicitly. In particular, the supremum in \link{sec 8}{8.4} and all related suprema are intended to be over all states rather than just over normal states (cf. the remarks following \link{7.5,7.6}{7.6} and \link{otre}{Donald (1986}, example 6.6) and (\link{otre}{1987a}, lemma 4.3). \pdfproclaim{Remark 3.11}{}{rmk3.11,3.12} It is an immediate consequence of \link{def3.10}{3.10} that every state on a subset $\B$ does have extensions to $\B(\H)$. These extensions will usually not be unique. It will often be convenient in the sequel to allow ambiguity in notation by not distinguishing between a set $C$ of states on $\B$ and the set $C'$ of states such that $\sigma'|_\B \in C$. (For a function $f$ on $\B(\H)$, we write $f|_\B$ -- the restriction of $f$ to $\B$ -- to denote the function on $\B$ defined by $(f|_\B)(B) = f(B)$ for $B \in \B$.) \endproclaim \proclaim{Remark 3.12}{} A zealous logician would now notice that definition \link{def3.10}{3.10} really refers to the set of states on $\B(\H)$ given by $\{\sigma': \sigma'|_\B = \sigma\}$, and would suggest that everything below be rephrased entirely in terms of sets of states on $\B(\H)$. This would be possible and we shall need to return to this idea in section \link{sec 7}{7}, but the sets referred to by definition \link{def3.10}{3.10} are of such central significance that it is useful to adopt a special notation for them. As these sets do define (single-valued) functions on $\B$, the notation is not inappropriate. \endproclaim \pdfproclaim{Definition 3.13}{}{def3.13} As we shall only be interested in the set $\B$ through the states definable on it, it will always be possible to replace $\B$ by the largest set with the same states. This will be denoted by $c(\B)$ and is the norm closure of the linear span of $\{1\} \cup \B \cup \{B^*: B \in \B\}$. $c(\B)$ is the largest set to which every state on $\B$ has a unique extension in the sense that for every pair of states $\sigma$ and $\rho$ on $\B(\H)$ if $\sigma|_\B = \rho|_\B$ then $\sigma|_{c(\B)} = \rho|_{c(\B)}$. \endproclaim Throughout, this paper focuses on states rather than on operators. Traditionally, measurement theory has focused on operators and the assignment to them of definite values, but I think that this may have been a mistake; resulting in a narrow theory which is viewed as irrelevant by most physicists. The versatility of quantum mechanics is demonstrated in the provision of state descriptions for every physical situation. The measurement problem arises because such descriptions do not remain valid over extended time periods. The idea of correlation is central to this paper. This idea is given a mathematical translation by the following lemma and definition, which demonstrate in particular that correlation is state dependent. \pdfproclaim{Lemma 3.14}{\sl}{l3.14} Let $\rho$ be a normal state on a von Neumann algebra $\A$. Let $P, Q \in \A$ be projections and $R = P \wedge Q$. Suppose that $\rho(P) > 0$, $\rho(Q) > 0$. Then the following are equivalent: \item{(i)} $\rho(P) = \rho(Q) = \rho(PQ)$. \item{(ii)} $P \rho P / \rho(P) = Q \rho Q / \rho(Q)$. \item{(iii)} $\rho(P-R) + \rho(Q-R) = 0$. \endproclaim \proof Let $s(\rho)$ be the support projection of $\rho$, which, by definition, is the smallest projection $P$ such that $\rho(P) = 1$. We shall use the fact that $\rho(AA^*) = 0 \iff s(\rho)A = 0$ (\link{Strat}{Str\u atil\u a and Zsid\'o (1979}, \parasign 5.15)). $$\allowdisplaybreaks\displaylines{ (i) \implies \rho(QP) = \overline{\rho(PQ)} = \overline{\rho(P)} = \rho(P) \implies \rho((P-Q)^2) = 0 \implies s(\rho)(P-Q) = 0 \hcr \implies P \rho P = Ps(\rho) \rho s(\rho)P = Qs(\rho) \rho s(\rho)Q = Q \rho Q \implies (ii) \hcr \implies \rho(PQP) = \rho(P) \implies 0 = \rho((P-PQ)(P-QP)) \implies s(\rho)(P-PQ) = 0 \hcr \implies s(\rho)P = s(\rho)PQ= s(\rho)PQP = s(\rho)PQPQ = \dots = s(\rho)(PQ)^n = s(\rho)R \hcrh \text{(since $R = \mathop{\rm{s-lim}}_{n\rightarrow\infty} (PQ)^n$)} \cr \implies \rho(P-R) = 0. \qquad \text{ (iii) follows by exchanging $P$ and $Q$.} \hfill }$$ Suppose (iii). Since $P-R \geq 0$, we have $s(\rho)P = s(\rho)R = s(\rho)Q = s(\rho)PQ \implies (i)$. \leavevmode \hfill $\blacksquare$ \pdfproclaim{Definition 3.15}{}{d3.15} We shall say that projections $P$ and $Q$ are exactly correlated by the state $\rho$ if the conditions of lemma \link{l3.14}{3.14} hold. More loosely, we shall say that $P$ and $Q$ are correlated by $\rho$ when the conditions hold to an approximation which is adequate for a particular context. \endproclaim For an example of the type of situation in which correlations arise in this paper, suppose that we are given two von Neumann algebras $\A_1$ and $\A_2$. Suppose that an observer knows the state on $\A_1$ and that he is observing the subsystem defined on $\A_2$. Suppose that $P_1 \in \A_1$ and $P_2 \in \A_2$ are projections. In order to discover whether $P_1$ and $P_2$ are exactly correlated in a given state it will be sufficient (by \link{l3.14}{3.14}(i)) to specify that state on $P_1$, $P_2$, and $P_1 P_2$. It may not be sufficient simply to specify the state on $\A_1 \cup \A_2$. To express these pair correlations, it is appropriate to use the set \name{eq3.16} $$c\{ A_1A_2: A_1 \in \A_1 \text{ and } A_2 \in \A_2 \} \eqno{(3.16)}$$ where $c$ is the closure operation defined by (\link{def3.13}{3.13}). In general below (for example, (\link{posFour}{4.2}), (\link{posEightpv}{4.3})), we shall be considering unions of sets of the form (\link{eq3.16}{3.16}). These unions could possibly be taken instead over components corresponding to the von Neumann algebra generated by $\A_1$ and $\A_2$, without running into problems with property \link{3.9}{3.9}. However, I have chosen, as a general principle, always to look for minimal sets of operators on which to define states. If $\A_1$ and $\A_2$ are mutually commuting algebras, then the von Neumann algebra generated by $\A_1$ and $\A_2$ is the weak closure of the set given by (\link{eq3.16}{3.16}). \pdfproclaim{4. \quad Postulates about observers.}{}{sec 4} \endproclaim \pdfproclaim{Postulate One}{}{posOne} Quantum theory is the correct theory for all forms of matter and applies to macroscopic systems as well as to microscopic ones. \endproclaim \pdfproclaim{Postulate Two}{}{posTwo} For any given observer, any physical system can be described, at any time, by a neighbourhood of quantum states, which is the set of states compatible with his observations of the system at that time. A function can be defined on such a neighbourhood to measure the a priori probability, or likelihood, of any state or set of states, within the neighbourhood given the observer's prior knowledge. \endproclaim The task in this paper is to find an appropriate definition for ``a priori probability'' in this postulate. Note that technically a neighbourhood of a state $\sigma$ is a set containing an open set containing $\sigma$. The word is used here simply to indicate that there will always be some variations of any given state which the observer will be unable to perceive. The precise nature of these variations, however, may be quite subtle as will be explained in example \link{ex9.8}{9.8}. \pdfproclaim{Postulate Three}{}{posThree} An observer $O$, existing between times $t_0$ and $T$, is localized in a succession $\{\Lambda(t) : t \in [t_0, T]\}$ of space-time regions. $O$ is characterized by the neighbourhoods of quantum states which he assigns to the physical systems $\A(\Lambda(t))$ for $t \in [t_0 , T]$. There is a distinguished subset $\Gamma(T)$ of $\{A : A \in \A(\Lambda(t)), t \in [t_0, T]\}$ with respect to which $O$ defines correlations. \endproclaim The simplest version of this postulate would involve a single space-time region; taking $\Lambda(t)$ to be independent of $t$. However, in view of the problem raised by property \link{3.9}{3.9}, it is necessary to restrict the temporal extension of the algebras on which we define the observer's states. Thus, for each $t$, $\Lambda(t)$ will only be extended over time sufficiently to allow the momentary properties of the observer, at time $t$, to be defined. The time co-ordinate here is simply an observer-dependent parameter. It may be assumed to be an approximate local proper time. Postulate \link{posThree}{three} assigns to an observer space-time regions which may be thought of as containing his body or his brain, and insists that he is characterized by the states which he assigns to those regions. A generalization of postulate \link{posThree}{three} would allow for several ``body parts''. Another generalization would involve neighbourhoods of geometrical structures. Indeed, this is probably necessary in order to make the theory ``structurally stable''. It is also necessary to consider variations in the sets $\Gamma(T)$. These generalizations will largely be ignored in this paper since they introduce additional levels of complexity without essentially modifying the central concepts. For example, it is straightforward to allow for both small and large variations in the geometrical structures and in the $\Gamma(T)$ within the general process of finding entities of highest a priori probability. Large variations would be called for in the project of considering as equivalent all entities acting as equivalent information processors of the required nature. \pdfeject \name{posFour} In \link{otre}{Donald (1990)}, I have proposed that observers are composed of elementary body parts which function as quantum switches, and I have given a characterization of the states of a quantum switch on a family of local algebras of the form $\{\A(\Lambda(t)) : t \in [t_0, T]\}$. This proposal allows one specific implementation of postulate \link{posThree}{three}. In this implementation, with the notation of \link{otre}{Donald (1990)} hypothesis V, $\Gamma(T)$ will be the set of time translates of the switch projections $P$ and $Q$: $$\Gamma(T) = \{ \tau_{y(t_k)}(P), \tau_{y(t_k)}(Q) : k =1, \dots, K \}. \eqno{(4.1)}$$ \proclaim{Postulate Four}{} There is a finite sequence of times $t_0 < t_1 < \dots < t_M = T$ and a neighbourhood ${\cal N}_T$ of $M$-component sequences of states such that for any sequence $(\sigma_m)_{m=1}^M \in {\cal N}_T$, $O$ views $\sigma_M$ as a possible present state and the sequence $(\sigma_m)_{m=1}^{M-1}$ as comprising a possible past. In this sequence, $\sigma_m$ is the state which $O$ sees himself as having occupied in the time interval $[t_{m-1} , t_m)$. Thus $(\sigma_m)_{m=1}^M \in {\cal N}_T$ implies $(\rho_m)_{m=1}^M \in {\cal N}_T$ for all sequences $(\rho_m)_{m=1}^M$ such that, for each $m \leq M$, $\rho_m(A) = \sigma_m(A)$ for all $A \in \A(\Lambda(t))$, $t \in [t_{m-1}, t_m)$. The states $\sigma_m$ are all taken to be states, in the sense of \link{def3.10}{3.10}, on $$\B_M = c\{ AC : A \in \A(\Lambda(t)), t \in [t_0, T], C \in {\cal C}_M \} \eqno{(4.2)}$$ where ${\cal C}_M$ is the von Neumann algebra generated by $\Gamma(T)$ (compare (\link{eq3.16}{3.16})). \endproclaim The finiteness of the sequence in this postulate is necessary if we are to maintain the idea that Hamiltonian evolution is fundamental with quantum measurement providing merely occasional interruptions to reset that evolution. This finiteness also allows us to evade the ``quantum Zeno paradox'' to which we shall return in section \link{sec 9}{9}. The change in state from $\sigma_{m-1}$ to $\sigma_m$ is a ``collapse''. Collapse is required to occur because the observer, by his nature, can only occupy certain types of state. This means that ${\cal N}_T$ has to be defined in such a way as to forbid superpositions, or mixtures, of $O$ observing different events (distinguishable by $O$ himself), in order to reflect the fact that $O$ only ever sees himself as observing a single reality. A choice of the neighbourhood ${\cal N}_T$ corresponds to a choice of a single possible reality history seen by $O$. In many worlds language, ``different worlds'' would correspond to different neighbourhoods -- each different sequence within a single neighbourhood corresponds to a different possible structure through which $O$ may observe one single ``world''. Postulates \link{posThree}{three} and \link{posFour}{four} introduce an arrow of time and propose that by time $t_m$, $O$ is gaining information about the set $\B_m$. The fact that, at time $T$, $O$ considers all of the members of ${\cal N}_T$ as being sequences of states on $\B_M$ is possible in view of \link{rmk3.11,3.12}{3.11} and will be required both so that ``collapse'' may be between states defined on a common domain, and, more importantly, as a reflection of the fact that $\B_M$ is the set of operators of which $O$ is then aware. The use of extensions of states from $\B_m$ to $\B_M$ is unrestricted, because, in calculating probabilities, we shall always take suprema over all extensions. \pdfproclaim{Postulate Five}{}{posFive} There is an initial quantum state $\omega$ describing the universe prior to the existence of any observer. \endproclaim $\omega$ is far removed, by a long sequence of collapses, from the state of the universe that we see. Underlying this work is a picture of observers as information processors of a particular type of structure. That structure defines the possible neighbourhoods ${\cal N}_T$ of postulate \link{posFour}{four}. The aim here is to define an a priori probability for each such neighbourhood given a particular initial state $\omega$. It must be required that this can be done in such a way that the only observers which exist with relatively high a priori probability are those which are processing information giving them an accurate picture of reality and that we ourselves are such observers. This implies that $\omega$ should be such that the most likely sequence of information-carrying collapses would result in states which model the flesh and blood brains of apparently evolved life. Of course, there is an unconventional viewpoint involved in postulating that collapse is observer-dependent. Thus, in the framework of postulate \link{posFour}{four}, the collapses implicit in the usual idea of the past must be seen as being made indirectly by each separate observer as he learns about the apparent consequences of such a past. For example, consider an observer discovering a fossil. The conventional view would be that the fossil must essentially have been fixed from not long after the moment of death. The suggestion here is that the observer in looking at a rock surface is now collapsing out one single image from all the possible images that that surface might reflect. Only a priori probabilities for those images were fixed just before the observer studied the surface, and even they depended on the observer. This is related to Everett's answer to Einstein's difficulty with the idea of a mouse bringing about drastic changes in the universe simply by looking at it (\link{otre}{Everett (1957)} p.116) -- if it is the mouse and not the universe which changes then the universe has a state which always remains uncollapsed. That state is $\omega$. As far as I can see, despite the change of viewpoint, the speculations of cosmologists are still appropriate in identifying $\omega$. In particular, on a space-time region around what we see as the solar system over a recent period, $\omega$ will approximate a $2.7^\circ$K thermal equilibrium state. \pdfproclaim{Postulate Six}{}{posSix} The a priori probability of $O$ existing at time $T$ in a specific present state $\sigma_M$ on $\B_M$ with a specific past sequence $(\sigma_m)_{m=1}^{M-1}$ of states on $\B_M$ is a function $\app{\B_M}{(\sigma_m)_{m=1}^M}{\omega}$ of $\B_M$, of $(\sigma_m)_{m=1}^M$, and of $\omega|_{\B_M}$. \endproclaim \proclaim{Postulate Seven}{} $\app{\B_M}{(\sigma_m)_{m=1}^M}{\omega} = \prod_{m=1}^M \app{\B_M}{\sigma_m}{\sigma_m-1}$ where we define $\sigma_0 = \omega|_{\B_M}$, and where $\app{\B}{\sigma}{\rho}$ is a function of states $\sigma$ and $\rho$ on $\B$. \endproclaim This postulate expresses the intuition that, for the observer $O$ at time $T$, aware of the set $\B_M$, the world starts in the state $\omega|_{\B_M}$, and then undergoes a finite succession of ``collapses'' through a sequence of states $\sigma_1, \sigma_2, \dots, \sigma_M$ on $\B_M$. Postulate \link{posSix}{seven} requires that the collapse from $\sigma_{m-1}$ to $\sigma_m$ depends only on those two states. It introduces the function $\app{\B}{\sigma}{\rho}$ of just two states which is to be interpreted as measuring the a priori probability of the collapse of the state $\rho$ on $\B$ to the state $\sigma$. Perhaps the most striking difference, at the conceptual level, between postulates \link{posFour}{four} to \link{posSix}{seven} and the conventional framework in physics, is that $O$ is described, at a given moment, not just by the state which he has reached at that moment but by the family of states which he has passed through. The conventional framework requires that past information is used to specify a unique global state at the present, but this is not appropriate in the present framework because we are working locally and with neighbourhoods of states. As we shall see, even within such neighbourhoods, there may not be any states of maximal a priori probability, and, even when there are, they need not be unique. This underscores the proposal of postulate \link{posThree}{three} that it is the entire set ${\cal N}_T$ which characterizes $O$. As an aside, I note that one of the fundamental long-term goals of my work, expressed in part in \link{otre}{Donald (1990)} and in this paper, is to identify the (objective) physical substrate of consciousness. Part of the idea behind postulate \link{posFour}{four} is that, even at a given moment, that physical substrate is not just the instantaneous state of a brain but instead involves the history of that brain. For example, we do not understand the meaning of a word like ``blue'' because it can make us remember clear skies -- that would merely be a functional definition of consciousness -- but because the meaning for us is, in part, our previous experience of those skies. More precisely, the neural structures brought into play by hearing the word ``blue'' are closely linked to structures formed on sunny days. The meaning of those structures lies in the past. Consciousness reads, for itself, the present neural excitement through being all its previous patterns of excitement. The \link{posEightpv}{next} postulate demonstrates how the probabilities of individual sequences of states are to be combined in order to give a total a priori probability for ${\cal N}_T$. This postulate will be given in a form which is more general than is strictly necessary. It defines, in terms of ${\cal N}_T$ and the subsidiary function $\app{\B}{\sigma}{\rho}$ , an a priori probability for $O$ to observe a given physical system occupying a given set of states. At the fundamental level, it would be sufficient to define the a priori probability of $O$ observing ${\cal N}_T$, indeed, ultimately, we only need an appropriate relationship between the \link{notions67}{sixth} and the \link{notions67}{seventh} notions of probability given in section \link{notions67}{2}. The more general definition introduced by postulate \link{posEightpv}{eight} will provide a model making that relationship plausible and will be useful in motivating the definition of $\app{\B}{\sigma}{\rho}$. While the work of subsequent sections will confirm the broad consistency of the general version of postulate \link{posEightpv}{eight} with other methods of calculation in quantum mechanics, at the deepest level, when applied to arbitrary sets of states, it is only a tool, like \link{notions15}{notions 3, 4, and 5} of section \link{notions15}{2}. There are many circumstances in which the definition provides a useful tool, but there may be other circumstances in which it is not useful. In view of these remarks, the theory of multiple observations given here will be restricted to the succession of observations by the observer of his own structure. This succession will extend over his entire life to date. Two versions of postulate eight will be given. The \link{posEightpv}{preliminary} version expresses the fundamental physical idea of a succession of collapses to states of maximal a priori probability given the state already reached. The \link{posEightfin}{second} version allows for the fact that, while we can always find states of close to maximal a priori probability, there may be no states for which the maxima are actually attained. In this version, we look for (infinite) sequences of elements of ${\cal N}_T$, (these elements are themselves, of course, finite sequences), which come arbitrarily close to maximal a priori probability at each successive moment of collapse. Maximizing a priori probability will be a central technique used throughout this paper. This might seem strange, since it is obvious that it is not only the most likely events which happen. The idea is that the physical structure of a given observer observing a given event can be manifested by many different but equivalent forms -- in particular, by any one of the elements of a single set ${\cal N}_T$. The a priori probability of that observer is then measured by a series of steps each maximizing a priori probability over equivalent and, as far as the observer is concerned, indistinguishable structures. No restriction is placed on the different events which the observer may come to find himself observing, which will correspond to different ${\cal N}_T$, beyond the restrictions imposed by his own nature. These are to be expressed through the definition of the ${\cal N}_T$. (It is because its statement does not take full account of these restrictions that postulate \link{posEightpv}{eight} is ``over-general''.) \pdfproclaim{Postulate Eight (preliminary version)}{}{posEightpv} Suppose that, at time $T$, $O$ observes a physical system defined on a set of observables $\B_S$. $$\text{Define} \quad \B_{SM} = c(\B_M \cup \{ BC : B \in \B_S, C \in {\cal C}_M\}). \eqno{(4.3)}$$ For $m = 1,\dots, M$, define sets of initial sequences $${\cal N}^m = \{(\sigma_i)_{i=1}^m: \exists (\sigma_i)_{i=m+1}^M \text{ with } (\sigma_i)_{i=1}^M \in {\cal N}_T\},$$ where, in view of \link{rmk3.11,3.12}{3.11}, these are considered to be sets of sequences of states on $\B_{SM}$. Define, by induction on m, the following maximal a priori probabilities and corresponding sets. Start with $$\displaylines{ \mathop{\rm app}\nolimits^0({\cal N}_T, \B_{SM}, 1, \omega) = \sup\{ \app{\B_{SM}}{\sigma}{\omega} : \sigma \in {\cal N}^1\} \quad \text{and} \cr \widetilde{\cal N}_0^1(\B_{SM}) = \{\sigma \in {\cal N}^1: \app{\B_{SM}}{\sigma}{\omega} = \mathop{\rm app}\nolimits^0({\cal N}_T, \B_{SM}, 1, \omega)\}. }$$ Then, for $1 < m+1 \leq M$, set $\mathop{\rm app}\nolimits^0({\cal N}_T, \B_{SM}, m+1, \omega) = 0$ and $\widetilde{\cal N}_0^{m+1}(\B_{SM}) = \emptyset$ if $\widetilde{\cal N}_0^m(\B_{SM}) = \emptyset$ and if not set $$\displaylines{ \mathop{\rm app}\nolimits^0({\cal N}_T, \B_{SM}, m+1, \omega) = \sup\{ \app{\B_{SM}}{(\sigma_i)_{i=1}^{m+1}}{\omega} : (\sigma_i)_{i=1}^{m+1} \in {\cal N}^{m+1} \hcrh \text{and } (\sigma_i)_{i=1}^m \in \widetilde{\cal N}_0^m (\B_{SM}) \} \quad \text{and} \cr \widetilde{\cal N}_0^{m+1}(\B_{SM}) = \{ (\sigma_i)_{i=1}^{m+1} \in {\cal N}^{m+1} : (\sigma_i)_{i=1}^m \in \widetilde{\cal N}_0^m(\B_{SM}) \hcrh \text{and } \app{\B_{SM}}{(\sigma_i)_{i=1}^{m+1} }{\omega} = \mathop{\rm app}\nolimits^0({\cal N}_T, \B_{SM}, m+1, \omega)\}. \crh \llap{(4.4)} }$$ (This says that $\widetilde{\cal N}_0^{m+1}$ is the set of partial sequences with maximal a priori probability given an initial sequence in $\widetilde{\cal N}_0^m$.) Then, the a priori probability of $O$, at time $T$, observing the subsystem to occupy a set of states $C$ is defined by $$\displaylines{ \mathop{\rm app}\nolimits^0(O, T, \B_S, C|\omega) = 0 \quad \text{if } \widetilde{\cal N}_0^M(\B_{SM})= \emptyset \qquad \text{and otherwise by} \hcr \mathop{\rm app}\nolimits^0(O, T, \B_S, C|\omega) \hcrh = \sup\{ \app{\B_{SM}}{(\sigma_m)_{m=1}^{M+1}}{\omega} : (\sigma_m)_{m=1}^M \in \widetilde{\cal N}_0^M(\B_{SM}), \text{ and } \sigma_{M+1} \in C \}. \hfill \llap{(4.5)} }$$ \endproclaim \eject \pdfproclaim{Postulate Eight (final version)}{}{posEightfin} Suppose that, at time $T$, $O$ observes a physical system defined on a set of observables $\B_S$. Define $\B_{SM}$ and ${\cal N}^m$ as \link{posEightpv}{above}. Define, by induction on $m$, the following a priori probabilities. Start once again with $$\mathop{\rm app}\nolimits({\cal N}_T, \B_{SM}, 1, \omega) = \sup\{ \app{\B_{SM}}{\sigma}{\omega} : \sigma \in {\cal N}^1\}.$$ Then, for $1 < m+1 \leq M$, set $$\displaylines{ \mathop{\rm app}\nolimits({\cal N}_T, \B_{SM}, m+1, \omega) = \sup\{ \limsup_{n\rightarrow\infty} \app{\B_{SM}}{(\sigma_i^n)_{i=1}^{m+1}}{\omega} \hcrh : ((\sigma_i^n )_{i=1}^{m+1} )_{n\geq1} \text{ is a sequence of elements of } {\cal N}^{m+1} \crh \text{and, for }1 \leq k \leq m, \ \app{\B_{SM}}{(\sigma_i^n)_{i=1}^k}{\omega} \rightarrow \mathop{\rm app}\nolimits({\cal N}_T, \B_{SM}, k, \omega)\}. \crh \llap{(4.6)} }$$ (This says that the partial sequences $(\sigma_i^n )_{i=1}^k$ approach the successively maximal a priori probabilities as $n \rightarrow \infty$.) \name{eq4.7} Then, the a priori probability of $O$, at time $T$, observing the subsystem to occupy a set of states $C$ is defined by $$\displaylines{ \mathop{\rm app}\nolimits(O, T, \B_S, C|\omega) = \sup\{ \limsup_{n\rightarrow\infty} \app{\B_{SM}}{(\sigma^n_m)_{m=1}^{M+1}}{\omega} : \hcrh ((\sigma^n_m)_{m=1}^M)_{n\geq1} \subset {\cal N}^M, (\sigma^n_{M+1})_{n\geq1} \subset C, \text{ and, for } 1 \leq k \leq M, \crh \app{\B_{SM}}{(\sigma_i^n)_{i=1}^k}{\omega} \rightarrow \mathop{\rm app}\nolimits({\cal N}_T, \B_{SM}, k, \omega)\}. \crh \llap(4.7) }$$ \endproclaim The mathematics of this postulate is explored in section \link{sec 9}{9}. The inductive definition expressed by (\link{posEightfin}{4.6}) imposes a natural causal framework on $O$'s present description of his past as a developing sequence of collapses. In particular, note that $\mathop{\rm app}\nolimits({\cal N}_T, \B_{SM}, m, \omega)$ does not depend on the specific set of states $C$ which $O$ will observe on $\B_{SM}$. There is no particular sequence of states which we can point to as the sequence which $O$ actually occupies or observes. However, for $\delta > 0$, it is useful to define \name{eq4.8} $$\displaylines{ \widetilde{\cal N}_\delta^m(\B_{SM}) = \{ (\sigma_i)_{i=1}^m \in {\cal N}^m : \app{\B_{SM}}{(\sigma_i)_{i=1}^k}{\omega} \geq \mathop{\rm app}\nolimits({\cal N}_T, \B_{SM}, k, \omega) - \delta, \hcrh \text{for } k = 1\dots, m\} \text{ and} \cr \widetilde{\cal N}_\delta^{M+1}(\B_{SM}, C) = \{ (\sigma_m)_{m=1}^{M+1} : (\sigma_m)_{m=1}^M \in \widetilde{\cal N}_\delta^M(\B_{SM}), \sigma_{M+1} \in C, \text{ and } \hcrh \app{\B_{SM}}{(\sigma_m)_{m=1}^{M+1}}{\omega} \geq \mathop{\rm app}\nolimits(O, T, \B_S, C|\omega) - \delta\}. \crh \llap(4.8) }$$ For sufficiently small $\delta$, elements of $\widetilde{\cal N}_\delta^m(\B_{SM})$ will successively have close to maximal a priori probability (lemma \link{l9.10,p9.11}{9.10}). However, these sets do depend on $\delta$, on $m$, on $M$, and on $S$. The validity of the approach taken to quantum mechanics in this paper depends on the claim that, for $\delta$ sufficiently small and for suitable systems $S$, elements of $\widetilde{\cal N}_\delta^m(\B_{SM})$ will be sequences of states close to those which would be assigned, for the appropriate time intervals, to $O$ or to $S$ in conventional quantum theory. This claim is fundamental and we shall return to it in section \link{sec 6}{6}. It implies, for example, that the various dependencies of $\widetilde{\cal N}_\delta^m(\B_{SM})$ are not problematical. It will only be justified in as far as it is demonstrated that it leads to a consistent and plausible theory. Considerable philosophical heresy is inherent in postulate \link{posEightfin}{eight}. It defines a theory of the world as it is observed which, although postulating the existence of a ``real world'' outside the observer, also claims that how that world is likely to be seen depends essentially on the structure of the individual observer. The ``real'' ``fixed'' ``external'' world enters through the state $\omega$, defined by postulate \link{posFive}{five}, through the Hamiltonian time propagation implicit in our continual reliance on the Heisenberg picture, and through the use of an observer-independent a priori probability. The observer dependence enters through the sets ${\cal N}_T$ and $\B_M$. A full definition of these sets will also make objective the class of physical structures through which an observer may be embodied. Even when all these definitions are given, we shall not have a full elucidation of the meaning of the defined term ``a priori probability''. Philosophers have written voluminously about the meaning of conventional (measure-theoretic) probability theory and without reaching a consensus (see \link{BucDew}{Cohen (1989}, chapter II) for a review). The probability introduced by postulate \link{posEightfin}{eight} is not even conventional because, for example, it does not define a measure on the space of sets of states. In particular, the a priori probability of a union of sets of states is not, in general, the sum of the separate a priori probabilities (see property \link{propFG}{G} of section \link{sec 7}{7} and lemma \link{l8.9}{8.9}). Indeed, as anticipated in postulate \link{posTwo}{two}, if the set $C$ consists of a single state, (\link{eq4.7}{4.7}) will be meaningful and may well not vanish. It is for this reason that it is necessary to introduce a postulate showing how to combine probabilities of individual sequences of states, rather than expecting to be able to define general a priori probabilities by sums of terms of the form $\app{\B_M}{(\sigma_m)_{m=1}^M}{\omega}$. Thus, philosophical questions both old and new are raised by this paper. Perhaps some of the old questions are made more interesting by providing an alternative framework in which they can be discussed. In this paper, these questions will not be addressed directly. The goal is a method of calculating objective probabilities for individual observations. The meaning of ``a priori probability'' will be left at the intuitive (and circular) level of ``a number measuring (relative) likelihood''. The next two sections provide examples of circumstances in which this intuition is satisfactory. The numbers defined by (\link{eq4.7}{4.7}) are trivially Lorentz invariant, by property \link{propDE}{D} of section \link{sec 7}{7}, if the underlying quantum field theory also is. This is an important step is establishing the compatibility of the proposed theory with special relativity. Indeed, given that collapse is observer dependent, it only remains to note that the relation of the observer to space-time geometry is supposed, through postulates \link{posThree}{three} and \link{posFour}{four}, to be explicit, so that the causal framework in which collapse appears to occur can be made specific. One way in which relativity might lead to difficulties in conceivable variants of the theory proposed here stems from the fact that the number given by (\link{eq4.7}{4.7}) will depend on the ordering of the underlying collapses. This might be a problem, particularly with the generalization of postulate \link{posThree}{three} to allow for several body parts, if some of those collapses refer to mutually simultaneous changes in structure (for example, ``switchings'' in the sense of \link{otre}{Donald (1990)}). For a single observer, we can deal with this dependence, either by extending the supremum in (\link{eq4.7}{4.7}) over all re-orderings of such mutually simultaneous changes, or by assuming a fixed ordering as part of the observer's fundamental structure. However, this ordering problem does seem to me to rule out any attempt to extend the theory to include all observations made by all observers within a single structure analogous to ${\cal N}_T$. Analogous remarks might be made about the conventional interpretation of quantum mechanics. The Lorentz invariance of probability amplitudes is an automatic consequence of a unitary representation of Lorentz transformations but observer-independent global collapse gives rise to an intractable ordering problem. In the conventional interpretation, global collapse also requires global specification of the eigenfunctions of each measured operator -- or equivalently global specification of the operator itself. This once again emphasizes how difficult the specification problem is for the conventional interpretation. \pdfproclaim{5.\quad The a priori probability function and measurement theory.}{}{sec 5} \endproclaim (\link{eq4.7}{4.7}) introduces a function measuring the a priori probability of observation of a given set of states by an observer with a given life history up to a particular time. Through postulate \link{posSix}{seven}, that function is defined in terms of the subsidiary function $\app{\B}{\sigma}{\rho}$. In this section, we link the values of this latter function to conventional measurement theory, first, by requiring that $\app{\B}{\sigma}{\rho}$ should generalize the notion that the weight of a component in a density matrix is the a priori probability of that component, and, second, by examining the link between that notion and measurement theory. The requirement of the first stage is expressed by the following property, which we shall be able to impose on $\app{\B}{\sigma}{\rho}$: \medskip \name{5.1} \noindent 5.1) Suppose that $\rho = p_a\rho_a + (1-p_a)\rho_d$ for $0 \leq p_a \leq 1$ and $\rho$, $\rho_a$, and $\rho_d$ states on some set $\B$. Suppose that there exists a projection $Q_a \in \B$ such that $\rho_a(Q_a) \geq 1-\varepsilon$ and $\rho_d(Q_a) \leq \varepsilon$ for some $\varepsilon \in [0, {1\over2}]$. Then $$p_a \leq \app{\B}{\rho_a}{\rho} \leq p_a - 3\varepsilon \, \log \varepsilon.$$ This implies that $\app{\B}{\rho_a}{\rho}$ is close to $p_a$ when $\varepsilon$ is sufficiently small and $\rho_a$ and $\rho_d$ are, in an obvious sense, close to disjoint. For the second stage, we introduce: \medskip \noindent 5.2) {\bf A General Model of an Observation.} Suppose that an observer $O$ observes the outcome of a measurement on a macroscopic system defined by a set of observables $\B_S$. Let $C$ be the set of states on $\B_S$ compatible with the observations of $O$ just before he learns the outcome of the measurement and let $C_a$ be the set of states on $\B_S$ compatible with his observations when he has discovered that the outcome is $a$. Suppose that according to conventional quantum mechanics, outcome $a$ has probability $p_a$ and corresponds to a state $\sigma_a$ on $\B_S$. Then \smallskip \noindent A) there is a projection $Q_a$ in $\B_S$ such that $\sigma_a(Q_a) = 1$, and, for some small $\varepsilon \geq 0$, all the most likely states in $C$ will belong to $\{ \rho : p_a - \varepsilon \leq \rho(Q_a) \leq p_a + \varepsilon \}$ and all the most likely states in $C_a$ will belong to $\{ \rho : \rho(Q_a) \geq 1 - \varepsilon \}$. \smallskip \pdfeject \name{5.2B} \noindent B) Among the most likely states in $C$, there are states $\rho$ taking the form $$\rho = p_a\rho_a + (1-p_a)\rho_d \eqno{(5.3)}$$ where $\rho_a \in C_a$, $\rho(Q_a) \geq 1 - \varepsilon$, and $\rho_d$ is some state on $\B_S$ with $\rho_d(Q_a) \leq \varepsilon$. \medskip Some of the terms in this model are undefined. The words ``macroscopic system'' allow for the imposition of suitable conditions on the set $\B_S$. The scenario to be invoked is of a human experimenter studying $\B_S$ directly with his unaided senses. ``Small'' will not be defined, but, as will be explained, it is plausible that, in the present context, it can be taken to refer only to unmeasurably small effects. ``Most likely'' will be interpreted intuitively in this section. The appeal to ``most likely states'' is part of a consistency argument for the a priori probability being introduced. Postulate \link{posEightfin}{eight} permits us to restrict attention to such states. As we are working with a many worlds theory, in which there is no unique state which is the ``true'' observed state of the world at a given moment, it is necessary to allow for highly unlikely or bizarre states in which, for example, entropy decreases on a macroscopic scale during the experiment. There will be such states in $C_a$ and there will be other highly unlikely states in which the correlations invoked below do not hold. In as far as it is a state decomposition with negligible interference effects, (\link{5.2B}{5.3}) is like (\link{eq2.1}{2.1}), and so is very much part of the normal focus of measurement theory. For example, \link{BucDew}{Daneri, Loinger, and Prosperi (1962)}, \link{HeppRS}{Hepp (1972)}, \link{Strat}{Whitten-Wolfe and Emch (1976)}, \link{HeppRS}{Machida and Namiki (1980)}, \link{Araki}{Araki (1980, 1986)}, and \link{Strat}{Zurek (1981, 1982)}, provide a variety of models and arguments all giving plausible evidence that inference effects can be neglected in practice because of the macroscopic nature of measuring devices. These reference mainly rely, or can be read as relying, on coarse-graining and, in particular, on the fact that we cannot have perfect and complete knowledge of the states of macroscopic systems. The aim in this paper is to take a step beyond these arguments and to introduce a formalism within which the limited nature of the knowledge of an observer can be expressed. This will be a formalism appropriate for a universe in which, as far as any observer is concerned, inference effects are indeed negligible, in the appropriate circumstances, but over which, nevertheless, a global, unitary, time propagation still holds sway. The central task in this section and the \link{sec 6}{next} is to make explicit plausible minimal assumptions under which (\link{5.2B}{5.3}) will be valid in the framework of section \link{sec 4}{4}. As we shall see, a theory based on the function app is a theory in which information only becomes definite in circumstances where interference effects can be neglected. The coarse-graining implicit in (\link{5.2B}{5.3}) comes from restricting $\B_S$ to be only part of the total local algebra affected by the measurement. This section gives an elementary explanation of how such coarse-graining works as a preliminary to the \link{sec 6}{next} section in which the scenario will be less conventional and much harder to handle explicitly. In particular, in this section we shall consider states on $\B_S$ but not the extensions to $\B_{SM}$ required by postulate \link{posEightfin}{eight}. If we put $\varepsilon = 0$ then \link{5.1}{5.2A} would say that the most likely states in $C_a$ are eigenvectors of $Q_a$. As it is claimed that $\varepsilon$ is negligible, it might be suggested that there is not much difference between the present model and conventional quantum mechanics -- we are considering eigenvectors of a less than completely defined operator. However, the framework is quite different. Any macroscopically observable situation allows for the definition, in many different possible ways, of projections $Q_a$. Being able to deal in the same way with any of the enormous variety of different projections is a mark of consistency for the present theory, rather than an ambiguity problem, as it is for conventional quantum theory. The reason for this is that in the conventional theory, the fundamental entity is supposed to be exactly one of these projections, but we have not been told which one, while in the present theory, the fundamental entities are to be defined by the structure of an observer, and the measurement model merely allows that observer to make a variety of mutually consistent predictions. Ultimately, also, there is a tremendous difference at the theoretical level between an $\varepsilon$ which vanishes and an $\varepsilon$ which is unobservably small. Only in the latter case, do we really avoid having to specify definitive observables for each measuring apparatus. The closeness to conventional quantum mechanics otherwise is, of course, no bad thing, because of the success of that theory. It is a reflection of that success that the model of this section can be claimed to be ``general'' even although it does not use the full generality of postulate \link{posEightfin}{eight}, according to which $C_a$ might be any set of states. Nevertheless, the existence of that further generality is important; refuting any suggestion that (\link{5.1}{5.2}) is more than a ``general model''. In particular, the further generality provides consistency between postulates \link{posEightfin}{eight} and \link{posFour}{four} and allows for situations like (\link{eq2.3}{2.3}) to which the present model will apply only indirectly. We begin the justification of the model, with our assumptions about $\B_S$. We assume that it is restricted in temporal and spatial extension, in the sense that $\B_S \subset \A(\Lambda_S)$ for some restricted space-time region $\Lambda_S$. $\Lambda_S$ could, for example, be a region containing a photograph, which would constitute the result of the experiment for the observer, at around the moment when he first begins to glance at that photograph. We shall further assume that the result of the experiment could be observed in a different restricted region $\Lambda_S{}'$ which is space-like separated from $\Lambda_S$. In the example, $\Lambda_S{}'$ might be a region simultaneous with $\Lambda_S$ containing light which had reflected from the photograph just prior to $\Lambda_S$, or it might be a region containing the photographic negative. The existence of projections $Q_a$ such that $Q_a$ projects onto the wave functions modelling the outcome $a$ is an essential part of the content of traditional measurement theory. For example, in a bubble chamber experiment, $Q_a$ could be a projection onto an orthonormal basis of wave-functions modelling the scattering particle heading in a given direction, or it could be a projection onto wave-functions of local regions of gas (bubbles) in the chamber along the given path, or it could involve wave-functions of the photographic plate, or of the light moving away from a developed photograph. $Q_a$ could even involve suitably chosen wave-functions in the brain of the observer. These projections are not unambiguously defined, but that is merely a reflection of the fact that the ``given result'' cannot be precisely defined; except, indeed, as a class of correlated projections. The variation in expected value between these projections in likely states is one of the unmeasurably small effects referred to earlier. It will be possible to choose $Q_a$ to be in $\B_S$ by the assumption that $\B_S$ is large enough to define a physical system. In this case, the statement that $Q_a$ projects onto the wave functions modelling the outcome a implies that the most likely states in $C_a$ belong to $\{ \rho : \rho(Q_a) \geq 1 - \varepsilon \}$ for some small $\varepsilon$, because a state in $C_a$ will not belong to that set only if, in that state, $Q_a$ is not correlated with the projections with which $O$ interacts most directly when making his observations. Such states are highly unlikely. There are two quite different aspects to the claim that, for $\rho$ a state of high a priori probability in $C$, we must have $\rho(Q_a)$ close to $p_a$. On the one hand, we are requiring that $O$ have sufficient information about the experimental set-up, whether or not he can translate that information into numbers, to pre-determine the probabilities of the various possible outcomes. On the other hand, we are assuming that conventional calculations correctly provide expectations of appropriate projections in appropriate states. This is a consistent assumption as the relevant calculations only use quantum dynamics prior to collapse. It is possible to find two of these result-determining projections $Q_a \in \B_S$ and $Q_a{}' \in \B_S{}{}'$ where $\B_S{}'$ is the commutator of $\B_S$ (see (\link{def3.1-3.3}{3.1})). This follows from the assumption about the existence of two space-like separated regions $\Lambda_S$ and $\Lambda_S{}'$ with $\B_S \subset \A(\Lambda_S)$ in each of which the result of the experiment could be determined. We can choose $Q_a{}' \in \A(\Lambda_S{}')$ and, from (\link{3.4,3.5}{3.4}), $\A(\Lambda_S{}') \subset \A(\Lambda_S)' \subset \B_S{}'$. Among the most likely states in $C$ there will be some which can be extended from $\B_S$ to $\A(\Lambda_S \cup \Lambda_S{}')$ in such a way that they are good models on the whole of that set of the physical situation as $O$ sees it. This is because the same physical causes which $O$ observes giving rise to the state on $\B_S$ act with just the same probability on the whole region $\Lambda_S \cup \Lambda_S{}'$. In such a state $\rho$, the commuting projections $Q_a$ and $Q_a{}'$ will be correlated, in the sense of \link{d3.15}{3.15} because we have three projections $Q_a$, $Q_a{}'$, and $Q_a Q_a{}'$ each of which serves to pick out the observed result with the same pre-determined probability, so that (to order $\varepsilon$) $\rho(Q_a) = \rho(Q_a{}') = \rho(Q_a Q_a{}') = p_a$. Then, for $B \in \B_S$, $$\eqalignno{ \rho(B) &= \rho(Q_a{}'B) + \rho((1-Q_a{}')B) \cr &= \rho(Q_a{}'BQ_a{}') + \rho((1-Q_a{}')B(1-Q_a{}')) \qquad (\text{since } Q_a{}' \in \B_S{}') \cr &= \rho(Q_a{}') \rho(Q_a{}'BQ_a{}')/\rho(Q_a{}') + \rho(1-Q_a{}') \rho((1-Q_a{}')B(1-Q_a{}'))/\rho(1-Q_a{}') \cr &= p_a\rho_a(B) + (1-p_a) \rho_d(B) }$$ where $\rho_a$ and $\rho_d$ are defined by $$\rho_a(B) = \rho(Q_a{}'BQ_a{}')/\rho(Q_a{}') \hbox{ and } \rho_d(B) = \rho((1-Q_a{}')B(1-Q_a{}'))/\rho(1-Q_a{}').$$ But $\rho(Q_a) = \rho(Q_a{}') = \rho(Q_aQ_a{}') = p_a$ implies that $\rho_a(Q_a) = 1$ and that $\rho_d(Q_a) = 0$ (all to order $\varepsilon$). $\rho_a$ is constructed from wave-functions modelling outcome $a$ of the experiment so that $\rho_a \in C_a$. There is a shortcut to the justification of (\link{5.2B}{5.3}). This involves the idea of entropy as a measure of the number of available states. For example, the entropy $S$ of a glass of water can be written as $k \log N$ where $k$ is Boltzmann's constant, and the experimentally determined value for $S$ gives $N$ as roughly $10^{3\times10^{2 5}}$. Identifying $S$ with $k \tr(-\sigma \log \sigma)$ for some density matrix $\sigma$, $N$ is a lower bound on the number of orthogonal pure states into which $\sigma$ decomposes (i.e. writing $\tr(-\sigma \log \sigma) = -\sum_{i=1}^M p_i \log p_i$ in the usual way, we have $\tr(-\sigma \log \sigma) \leq \log M)$. $\sigma$ can be decomposed into two disjoint states in at least $2^{N-1} -1$ ways, since this is the number of ways that the set $\{1, \dots, N\}$ can be split into two disjoint non-empty subsets. Because the measured entropies of macroscopic systems are so large, when interpreted in the way described, it is not implausible to assert that relation (\link{5.2B}{5.3}) automatically holds for any real physical measurement. For conventional quantum statistical mechanics, the set $\B_S$ would be taken to be a fixed-time local algebra for the system in question. The argument is that any state of a macroscopic system at normal temperature can be decomposed with an astonishing fineness. According to statistical mechanical theory, the decomposition is into energy eigenstates of the system. Those energy eigenstates however are enormously degenerate, so that even the pure states of the finest decomposition are far from unique. To assert (\link{5.2B}{5.3}) is then to claim that one can rebuild some decomposition on the left-hand side into the decomposition required on the right. Thus (\link{5.2B}{5.3}) will be a simple consequence of the fact that the state splits in so many different ways and the fact that the required splitting is compatible with everything which has been observed about the system. What makes this argument most plausible is the fact that $\rho$ in equation \link{5.2B}{5.3} is only defined up to the precision of human observation. This level of precision is one on which the laws of classical thermodynamics appear to be obeyed absolutely regardless of finite volume or time effects. Classical thermodynamics tells us that inference effects are negligible, even if it does not explain why. The well-known work of \link{BucDew}{Daneri, Loinger, and Prosperi (1962)}, which uses time-averaging, can be interpreted as attempting an explanation by suggesting that a decomposition like (\link{5.2B}{5.3}) happens with high ergodic-theory probability. In this paper, localization, a physically much better-founded method of coarse-graining, is used, and instead of appealing to ergodic theory, we can invoke a direct relationship between entropy, or more precisely negative free energy, and the a priori probability function to be defined (see the discussion following property \link{propH}{H} of section \link{sec 7}{7}). This will make thermodynamically plausible states more likely than other states, and, in particular, other things being equal, states of higher entropy will tend to have higher a priori probability. The first argument for (\link{5.2B}{5.3}) is more satisfactory because of its directness. The argument based on entropy is weaker, but it does at least suggest that the smallness of Boltzmann's constant in practical units may be quite as important in explaining the fact that the macroscopic world appears to behave in a classical fashion as the smallness of Planck's constant. It also provides an upper bound on the number of macroscopically distinguishable results for any given experiment and a quantitative measure of macroscopicness. \pdfproclaim{6.\quad A priori probability and observations.}{}{sec 6} \endproclaim In section \link{notions15}{2}, seven different \link{notions15}{notions} of quantum probability were mentioned. In this section, we consider the link between the a priori probability defined by postulate \link{posEightfin}{eight} and the statistical probability which an observer assigns to the outcome of his next experiment following a long sequence of prior trials. Empirical justification for a theory based on the postulates of section \link{sec 4}{4} will be provided, by arguing that all the empirical evidence normally used to justify conventional quantum mechanics can be transferred to the framework of these postulates. The empirical evidence for conventional quantum theory lies in the observed agreement between measured statistical probabilities and theoretically defined probabilities. In order to transfer this evidence, it is only necessary to demonstrate that the new theory defines similar probabilities in similar circumstances. Once this has been done, the choice between the theories can only be made on such grounds as completeness, internal consistency, and aesthetic appeal. The circumstances in which conventional quantum theory defines measurable probabilities can essentially all be subsumed under the general measurement model proposed in section \link{sec 5}{5}. This, of course, is apparently to ignore the distinction between discrete and continuous observables. However, this is acceptable, since we are modelling only circumstances in which information has become observable at a macroscopic level. At such a level, not only does thermal noise inevitably wash out the finest gradations of an observable property, but also, those gradations which are observable, are observable precisely because there is a difference in the macroscopic properties to which they are correlated. The measurement model can be extended to the context of section \link{sec 4}{4} by the following postulate: \pdfproclaim{Postulate Nine}{}{posNine} Suppose that $O$ observes the outcome of an experiment on a macroscopic system defined by a set of observables $\B_S$. Suppose that model \link{5.1}{5.2} applies, and adopt the notation of that model. Recall definition (\link{eq4.8}{4.8}). Then, for every sufficiently small positive $\delta$, \smallskip \noindent A) every sequence $(\sigma_m)_{m=1}^M \in \widetilde{\cal N}_\delta^M(\B_{SM})$ will satisfy $p_a - \varepsilon \leq \sigma_M(Q_a) \leq p_a +\varepsilon$, and every sequence $(\sigma_m)_{m=1}^{M+1} \in \widetilde{\cal N}_\delta^{M+1}(\B_{SM}, C_a)$ will satisfy $\sigma_{M+1}(Q_a) \geq 1 -\varepsilon$. \smallskip \noindent B) There is a sequence $(\sigma^\delta_m)_{m=1}^M \in \widetilde{\cal N}_\delta^M(\B_{SM})$ which is such that $\sigma^\delta_M$ takes the form $$\sigma^\delta_M = p_a\rho^\delta_a + (1-p_a)\rho^\delta_d \eqno{(6.1)}$$ where $\rho^\delta_a |_{\B_S} \in C_a$, $\rho^\delta_a(Q_a) \geq 1-\varepsilon$, and $\rho^\delta_d$ is some state on $\B_{SM}$ with $\rho^\delta_d(Q_a) \leq \varepsilon$. \endproclaim Once again, we have the phrase ``macroscopic system'', in \link{posNine}{this} postulate, referring to the need to impose the same conditions on $\B_S$ here as were needed for model \link{5.1}{5.2}. $\varepsilon$ will again be left unspecified. \link{posNine}{A} gives an explicit interpretation of the phrase ``most likely states'' used in \link{5.1}{5.2A}. As will be shown in proposition \link{l9.10,p9.11}{9.11}, it is a consequence of postulate \link{posNine}{nine} and the definition of app to be introduced below that the conditional a priori probability of $O$ observing $a$ is close to $p_a$ in the sense that \name{6.2,6.3} $$\mathop{\rm app}\nolimits(O, T, \B_S, C_a|\omega)/\mathop{\rm app}\nolimits(O, T, \B_S, \Sigma|\omega) \sim p_a \eqno{(6.2)}$$ where $\Sigma$ is the set of all states on $\B_S$. (This means that the denominator is the a priori probability of the vacuous experiment in which $O$ observes any result on $\B_S$.) It follows that, if $b$ is a second possible result of the experiment, with conventionally calculated probability $p_b$, then the ratio of the a priori probability of $O$ observing $a$ to the a priori probability of $O$ observing $b$ will be close to the conventionally calculated ratio $p_a/p_b$ in the sense that $$\mathop{\rm app}\nolimits(O, T, \B_S, C_a|\omega)/\mathop{\rm app}\nolimits(O, T, \B_S, C_b|\omega) \sim p_a/p_b. \eqno{(6.3)}$$ \pdfeject \name{eq6.4} It is also demonstrated in proposition \link{l9.10,p9.11}{9.11} that, if we set $\sigma^\delta_{M+1} = \rho^\delta_a$, then, for $\delta$ sufficiently small, $\app{\B_{SM}}{(\sigma^\delta_m)_{m=1}^{M+1}}{\omega}$ is close to $\mathop{\rm app}\nolimits(O, T, \B_S, C_a|\omega)$ and that $$\app{\B_{SM}}{\rho^\delta_a}{\sigma^\delta_M} \sim \app{\B_{SM}}{\rho^\delta_a|_{\B_S}}{\sigma^\delta_M|_{\B_S}} \sim p_a, \eqno{(6.4)}$$ so that, in this particular circumstance, $\app{\B_{SM}}{\rho^\delta_a|_{\B_S}}{\sigma^\delta_M|_{\B_S}}$ is indeed an appropriate ``collapse'' probability. (\link{6.2,6.3}{6.2}) and (\link{6.2,6.3}{6.3}) are empirically justified because they make the same predictions as conventional quantum mechanics. They can therefore be used to annex the empirical evidence which is normally taken to support conventional quantum mechanics. What is more, they seem to me to be sufficient to annex all of that empirical support. My disagreement with conventional quantum mechanics is based on that theory's inexpiable incompleteness rather than on its predictions. Unlike postulate \link{posEightfin}{eight}, postulate \link{posNine}{nine} is not a definition. Indeed, at the deepest level, it is merely a model -- in other words, something which is a useful general picture rather than a precise and complete description of any particular experiment. Such a complete description would require a much more detailed analysis of the observer and of observation. Postulate \link{posNine}{nine} makes a hypothesis about the nature of the states most likely to be experienced by $O$ in the given circumstances. In judging the validity of this hypothesis, there are various aspects to be considered. The fundamental claim, already mentioned in section \link{sec 4}{4}, is that we can find a mathematical definition for a priori probability which is appropriate in the sense of yielding likelihoods for quantum states which are consonant with intuition. In other words, a definition which is such that the most likely states modelling a given situation are states which conventional quantum theorists would allow to be assigned to that situation. This will then imply, for example, that, for $\delta$ sufficiently small, $\sigma^\delta_M$ can be taken to be a standard quantum state modelling an observer, or his brain, on $\B_M$ and an experimental apparatus on $\B_S$, just before the observer becomes aware of the result of the experiment. In particular, this is compatible with the analysis in \link{otre}{Donald (1990)}, where the brain was considered from the viewpoint of conventional neurophysiology and conventional statistical physics. It will also imply that $\sigma^\delta_M$ does provide the correlations which we would expect between $\B_M$ and $\B_S$, and that this holds for any of the variety of possible sets $\B_S$, compatible with model \link{5.1}{5.2}. The claim implies part A of postulate \link{posNine}{nine} by underpining the argument given in section \link{sec 5}{5} for \link{5.1}{5.2A}. Another implication, which could fail for pathological choices of $\B_S$ -- in particular, were we to take $\B_S = \B(\H)$, is that $(\sigma^\delta_m)_{m=1}^M$ has a priori probability close to $\sup\{ \app{\B_M}{(\sigma_m)_{m=1}^M}{\omega} : (\sigma_m)_{m=1}^M \in \widetilde{\cal N}_\delta^M(\B_M) \}.$ This implication reflects the expectation that all the constraints on $(\sigma^\delta_m)_{m=1}^M$ are imposed through $\B_M$. The considerable attention devoted below to motivating the definition of app is with the purpose of making this fundamental claim plausible. According to (\link{posNine}{6.1}), certain states of highest a priori probability have a property of being mixed. (\link{posNine}{6.1}) would be justified, if the splitting of $\rho$ in (\link{5.2B}{5.3}) could be extended from $\B_S$ to $\B_{SM}$. By (\link{5.2B}{5.3}) and the fundamental claim just made, we can assume that $\sigma^\delta_M|_{\B_S}$ takes the form $p_a\hat \rho_a + (1-p_a)\hat \rho_d$ for suitable $\hat \rho_a$ and $\hat \rho_d$. We can also assume that we know the state $\sigma^\delta_M|_{\B_M}$. Even with these assumptions, however, a considerable step is required to reach (\link{posNine}{6.1}). This step depends on a degree of physical independence between the observer and the observed system. That there is a real problem here may not seem obvious, but it must be remembered that, in general, particularly at the macroscopic level, individual quantum states only provide ``realistic'' descriptions over short time intervals. All our intuition is based on such intervals because the world always appears as if state collapse was occurring with high frequency. Imagine, for example, smoke rising from a fire and then ask how many collapses, in the generalized sense used throughout this paper, are required to produce a particular observed pattern. When, as now, we are required \name{eq6.5} to consider quantum states over extended time intervals without permitting collapse, we need to be extremely careful about existence. It need not be true that there exists any pair of states $\rho^\delta_a$ and $\rho^\delta_d$ such that $$\rho^\delta_a|_{\B_S} = \hat \rho_a, \quad \rho^\delta_d|_{\B_S} = \hat \rho_d, \quad \text{and } \rho^\delta_a|_{\B_M} = \rho^\delta_d|_{\B_M} = \sigma^\delta_M|_{\B_M}. \eqno{(6.5)}$$ This is because $\B_M$ and $\B_S$ do not commute and are not independent in the sense of (\link{3.4,3.5}{3.5}), so that, given a state $\rho_1$ on $\B_M$ and a state $\rho_2$ on $\B_S$, there need not exist any state $\rho$ on $\B_{SM}$ such that $\rho|_{\B_M} = \rho_1$ and $\rho|_{\B_S} = \rho_2$. Nevertheless, if $\rho_1$ equals $\sigma^\delta_M|_{\B_M}$; which, by assumption, is a typical state for the body of the observer to have reached just before he becomes aware of the result of the experiment, and if $\rho_2$ is any state modelling a given result of the experiment, then there does exist such an extension $\rho$. This extension is constructed simply by imagining circumstances, however unlikely, in which the state modelled by $\rho_2$ on $\B_S$ is produced deterministically without the knowledge of the observer. In a bubble chamber experiment, for example, it is possible to imagine a duplicitous graduate student slipping an old bubble chamber photograph, or even an artifically constructed one, into the pile marked ``output''. The first step in this argument is the assumption that there is an extension $\tilde \rho^\delta$ of $\sigma^\delta_M|_{\B_M}$ to a system $\B_{MG}$ such that $\tilde \rho^\delta|_{\B_{MG}}$ models not only $O$ but also the student. Even this assumption is not necessarily true, because of the temporal extension of $\B_M$ (see (\link{posFour}{4.2})), but from both mathematical and physical points of view I certainly find it extremely plausible. The second step is to note that, by construction, $\rho_2$ is the extension of $\tilde \rho^\delta$ to $\B_S$. It is consistent with section \link{sec 5}{5} to assume that $\hat \rho_d$ can be constructed as a mixture of states modelling possible results of the observation other than $a$, so this thought experiment yields states $\rho^\delta_a$ and $\rho^\delta_d$ which do satisfy (\link{eq6.5}{6.5}). The mixture $\rho^\delta = p_a\rho^\delta_a + (1-p_a)\rho^\delta_d$ on $\B_{SM}$ is compatible with all the information available to $O$ just before he observes the outcome of his experiment and is a physically plausible state in the sense that in each separate spatial region it has typical thermodynamic properties. Of course, the fantasy required to discover this state is absurd and there is no suggestion that the graduate student is anything other than part of a mathematical construction. We have no interest in the extensions of $\rho^\delta$ away from $\B_{SM}$ to the operators with which the state of the graduate student would be described, were he to exist. Given that states satisfying (\link{eq6.5}{6.5}) do exist, (\link{posNine}{6.1}) reduces to the hypothesis that they are the most likely such states. This hypothesis can now be taken to be part of our fundamental claim, backed up by (\link{5.2B}{5.3}) and the mixing-enhancing property of app which will be given as property \link{propH}{H} in the \link{sec 7}{next} section. There is a problem with postulate \link{posNine}{nine} which ought at least to be mentioned. This is that insufficient allowance has perhaps been made for approximation. (\link{posNine}{6.1}) is claimed to hold for a component of a sequence of elements which approach arbitrarily close to a supremum of a priori probabilities. The mathematical structure is such, particularly as app will turn out not to be continuous, that, for example, small changes in $\B_M$ or $\B_S$ could conceivably cause large changes in $\widetilde{\cal N}_\delta^m(\B_{SM})$. I believe, however, that this is a mere mathematical pathology and that structural stability can be given to the theory by allowing variations in the precise definition of the system $S$ and in the geometric structures implicit in the definition of $\B_M$. \pdfproclaim{7.\quad Properties of a priori probability.}{}{sec 7} \endproclaim In this section, properties appropriate to an a priori probability which can play the role required, will be listed and justified. In the \link{sec 8}{following} section, it will be shown that certain of these properties are sufficient to give a unique definition which does indeed possess all the properties listed. In my opinion, the totality of properties satisfied by this definition does make it both suitable and hard to modify. Of course, this section has been written with hindsight in that it does proceed towards a pre-established definition: given by property \link{propBC}{B} and (\link{sec 8}{8.1 -- 8.4}). Thus its true purpose is to introduce and justify that definition. \pdfproclaim{Property A}{}{propA} $\app{\B}{(\sigma_m)_{m=1}^M}{\rho}$ is a function of a set $\B \subset \B(\H)$ and of $(\sigma_m)_{m=1}^M$ and $\rho$ -- restrictions of states to $\B$. \endproclaim This property is consistent in statement with postulate \link{posSix}{six}, but to be fully consistent in notation we should earlier have written $\app{\B_M}{(\sigma_m)_{m=1}^M}{\omega|_{\B_M}}$ rather than $\app{\B_M}{(\sigma_m)_{m=1}^M}{\omega}$. Although fundamental to the definition of app, property \link{propA}{A} is probably the most difficult property to motivate. In essence, it is a postulate about how the global state $\omega$ influences a localized observer. Mathematically, it may be noted that, since measuring a priori probability involves the comparison of states, it is desirable to define those states on a single set. There are two obvious alternatives. One of these, requiring that app depends on a given, fixed, globally-defined extension of each $\sigma_m$, contradicts the idea of locality. This alternative takes us straight back to the Copenhagen interpretation and to all its problems. The other althernative permits the set on which $\sigma_m$ is defined to increase with $m$. As will be shown in section \link{sec 9}{9}, under this alternative, $\omega$ would not necessarily influence $\sigma_m$ at all. More generally, the a priori probability of $(\sigma_m)_{m=1}^M$ could equal one without $\sigma_M$ being equal to $\omega$ wherever defined. \pdfproclaim{Property B}{}{propBC} $\app{\B}{(\sigma_m)_{m=1}^M}{\rho} = \prod_{m=1}^M \app{\B}{\sigma_m}{\sigma_{m-1}}$ where we define $\sigma_0 = \rho$. \endproclaim This property is postulate \link{posSix}{seven}. It allows us to confine attention in the rest of this section and in the \link{sec 8}{next} to the function $\app{\B}{\sigma}{\rho}$ of just two states. \proclaim{Property C}{} \noindent (i) \quad $0 \leq \app{\B}{\sigma}{\rho} \leq 1$ for all $\sigma$ and $\rho$. \smallskip \noindent (ii) \quad $\app{\B}{\sigma}{\rho} = 1$ if and only if $\sigma = \rho$. \endproclaim (i) is necessary if app is to be a probability in any sense. However, in general, app will not be a probability in the more technical sense of defining a measure. (ii) requires that a priori probability is always lost by a non-trivial collapse. \pdfproclaim{Property D}{}{propDE} Let $U \in \B(\H)$ be unitary, and define $\tau : \B(\H) \rightarrow \B(\H)$ by $\tau(B) = U B U^*$. Then $\app{\B}{\sigma}{\rho} = \mathop{\rm app}\nolimits_{\tau^{-1}(\B)}(\sigma\circ\tau\,|\,\rho\circ\tau)$. \endproclaim This is an obviously necessary isomorphism invariance. \proclaim{Property E}{} Suppose that $\rho = p_a\rho_a + (1-p_a)\rho_d$ for $0 \leq p_a \leq 1$ and suppose that there exists a projection $Q_a \in \B$ such that $\rho_a(Q_a) \geq 1-\varepsilon$, $\rho_d(Q_a) \leq \varepsilon$ for some $\varepsilon \geq 0$. Then, for $\varepsilon$ sufficiently small, $\app{\B}{\rho_a}{\rho} \sim p_a$. \endproclaim This is the essence of (\link{5.1}{5.1}) Properties \link{propBC}{C}, \link{propDE}{D}, and \link{propDE}{E} are not sufficient, by themselves, to yield a unique definition for $\app{\B}{\sigma}{\rho}$. In order to proceed further, it is necessary to discover a general interpretation for the function being sought. I have little doubt but that the only way to achieve this is to work backwards; first finding a suitable definition and then inventing an interpretation for it. If the interpretation can subsequently be shown to lead uniquely to the definition that is important justification. I have already (\link{otre}{Donald (1986)}) published one version of this process. In the remainder of this section, I aim to revise and deepen the interpretative analysis sketched in that paper. This will be done, in part, by arguing for a modified, although equivalent, set of defining properties; in part, by motivating a broad set of mutually compatible properties, rather than seeking a minimal set of axioms; and, in part, by the relevance of the rest of this paper. Readers who are prepared to take mathematical proofs on trust will be able to read this paper independently of \link{otre}{Donald(1986)}. The process of giving meaning to $\app{\B}{\sigma}{\rho}$ will largely be the process of interpreting the notion of a quantum state. Generalizing (\link{eq6.4}{6.4}) suggests that $\app{\B}{\sigma}{\rho}$ should be interpreted as the a priori probability that an observer, observing a subsystem defined by a set of observables $\B$, sees the state $\sigma$ on $\B$ as the outcome of an experiment when $\rho$ was the state on $\B$ just prior to that outcome being ascertained. I have not proposed this generalization as a postulate. This is because I am unable to specify circumstances more general than those given in postulate \link{posNine}{nine} under which (\link{eq6.4}{6.4}) can be justified. Nevertheless, the general interpretation is compatible with all the ideas of sections \link{sec 4}{4}, \link{sec 5}{5}, and \link{sec 6}{6}. An alternative way of looking at this generalization of (\link{eq6.4}{6.4}) emphasizes that, from the point of view expressed in this paper, ``collapse'' is not a physical process external to, and independent of, the observer. Instead, ``collapse'' is required to occur because the observer can only occupy certain types of state (section \link{sec 4}{4}). In other words, collapse is, one might say, a mistake which the observer makes about the state of the world because he is physically incapable of seeing its true state. He cannot experience a cat as being both alive and dead, so he experiences it as being either one or the other. $\app{\B}{\sigma}{\rho}$ is to be interpreted as a measure of his falsity of vision in the following sense: \medskip \name{7one} \noindent 7.1) $\app{\B}{\sigma}{\rho}$ is the probability, per unit trial of the information in $\B$, of being able to mistake the state of the world on $\B$ for $\sigma$, despite the fact that it is actually $\rho$. \medskip In \link{otre}{Donald (1986)}, I used this interpretation to motivate four axioms which provide the complete definition of $\app{\B}{\sigma}{\rho}$. The first of these axioms derived app on an arbitrary set $\B$ from app on $\B(\H)$. This axiom appears below as (\link{7.5,7.6}{7.5}) in consequence of property \link{propFG}{G}. The two axioms in \link{otre}{Donald (1986)} which defined app$_{\B(\H)}$ for finite dimensional $\H$ are replaced here by properties \link{propBC}{C}, \link{propDE}{D}, \link{propDE}{E}, and \link{propI}{I}, which are both more general and more primitive. Finally, there has to be a technical property allowing the extension to infinite dimensional spaces. In this paper this is property \link{propK}{K}, which is once again more general than axiom IV of \link{otre}{Donald (1986)}. With the background of sections \link{sec 4}{4} and \link{sec 6}{6}, it is possible to be conceptually more sophisticated in this paper. Recall from remark \link{rmk3.11,3.12}{3.12} that a state $\sigma$ on $\B$ may be interpreted as the set of states on $\B(\H)$ given by $\{\sigma' : \sigma'|_\B = \sigma\}$. The set $\B$ is the maximal set of operators which the observer can use to distinguish between states. This idea of indistinguishability may be generalized. The neighbourhood ${\cal N}_T$ introduced in postulate \link{posFour}{four} may be viewed as a set of sequences of states which $O$ cannot distinguish among. In these terms, let $R$ and $S$ be arbitrary sets of states on $\B(\H)$. Consider an observer who collapses from some state in $R$ to some state in $S$ but who cannot distinguish between the states in $R$ or between the states in $S$. The a priori probability of this collapse will be a function $\app{}{S}{R}$ which may be interpreted as follows: \medskip \name{7.2,7.3} \noindent 7.2) For $R$ and $S$ sets of states on $\B(\H)$, $\app{}{S}{R}$ is the probability per unit trial by an observer who cannot distinguish between different states in $R$ or between different states in $S$, of being able to mistake the world for some state in $S$ despite the fact that it is actually some state in $R$. \medskip In these terms, (\link{7one}{7.1}) should be restated as: \medskip \noindent 7.3) $\app{\B}{\sigma}{\rho}$ is the probability, per unit trial of the information in $\B$, of being able to mistake the state of the world for a state compatible on $\B$ with $\sigma$, despite the fact that it is actually some state compatible on $\B$ with $\rho$. \medskip \pdfproclaim{Property F}{}{propFG} If $R \subset R'$ and $S \subset S'$ then $\app{}{S}{R} \leq \app{}{S'}{R'}$. \endproclaim This simply says that an observer who can draw finer distinctions is less likely to make mistakes. \proclaim{Property G}{} $$\app{}{S_1\cup S_2}{R_1\cup R_2} = \max\{ \app{}{S_1}{R_1}, \app{}{S_1}{R_2}, \app{}{S_2}{R_1}, \app{}{S_2}{R_2} \}.$$ \endproclaim Property \link{propFG}{F} implies that the left hand side is at least as large as the right hand side. Equality should obtain because no more information is given to the observer collapsing from $R_1\cup R_2$ to $S_1\cup S_2$ than that the collapse starts either in $R_1$ or $R_2$ and finishes either in $S_1$ or $S_2$. Notice that this is only an argument; \link{7.2,7.3}{7.2} is an interpretation rather than a definition and property \link{propFG}{G} is an axiom rather than a theorem. A simple extension of property \link{propFG}{G} implies, for any pair $R$, $S$, that $$\app{}{S}{R} = \sup\{ \app{\B(\H)}{\sigma}{\rho}: \sigma \in S, \rho \in R\}. \eqno{(7.4)}$$ In terms of (\link{def3.10}{3.10})--(\link{rmk3.11,3.12}{3.12}), (\link{propFG}{7.4}) implies that we should define \name{7.5,7.6} $$\app{\B}{\sigma}{\rho} = \sup\{ \app{\B(\H)}{\sigma'}{\rho'}: \sigma'|_\B = \sigma, \rho'|_\B = \rho\}. \eqno{(7.5)}$$ This allows app$_\B$ to be derived from app$_{\B(\H)}$. As another consequence, also a special case of property \link{propFG}{F}, we have the following ``monotonicity'' result: \medskip \noindent 7.6) Let $\B_1 \subset \B_2$ and $\sigma_2$, $\rho_2$ be extensions to $\B_2$ of states $\sigma_1$, $\rho_1$ on $\B_1$. Then $\app{\B_1}{\sigma_1}{\rho_1} \geq \app{\B_2}{\sigma_2}{\rho_2}$. \medskip It must be emphasized that no physical relevance is to be attached to states $\sigma'$, $\rho'$ on $\B(\H)$ at which the supremum in (\link{7.5,7.6}{7.5}) is attained. Such states are only guesses based on limited information and (\link{7.2,7.3}{7.3}) is only a way of giving meaning to a function depending on such limited information. In particular, (\link{7.2,7.3}{7.3}) is not to be thought of as referring to some physical $\sigma$-independent global state $\rho'$ about which the observer is making mistakes, because even when such a state exists, the observer has no access to it except through its restriction to $\B$. This applies specifically to the state $\omega$ given by postulate \link{posFive}{five}. The observer only interacts with $\omega|_{\B_M}$ and the set of extensions of that (partial) state is used in (\link{7.5,7.6}{7.5}) only as a mathematical equivalent to $\omega|_{\B_M}$. For different $\sigma$, we may well need to use different extensions of $\rho$ in calculating the supremum in (\link{7.5,7.6}{7.5}). This mathematics is compatible with a many-worlds theory in which separate possible worlds at a given instant are totally uncommunicating modes of experiencing the universe. In the present theory, different worlds may occupy overlapping quantum states. Everett required that different worlds use orthogonal wave-functions, implying a picture of physically different worlds in different physical dimensions. This would correspond to chopping up a single state given globally by $\omega$. I prefer to think of limited information about a single universe made sense of in many ways. The simplest case to which \link{7.2,7.3}{7.2}, or indeed \link{7one}{7.1}, can be applied arises when $\B = {\cal Z}$ -- a finite dimensional Abelian algebra. In this case, it is possible to interpret the notion of a state on ${\cal Z}$ in terms of conventional probability theory or information theory. This yields a complete definition (equation (\link{eq7.7}{7.7})) for $\app{\cal Z}{\sigma}{\rho}$. We review this analysis with the object of extending its elements to the non-Abelian situation: ${\cal Z}$ is generated by a finite sequence $(P_m)_{m=1}^M$ of orthogonal projections with $\sum_{m=1}^M P_m = 1$. A state $\rho$ on ${\cal Z}$ corresponds to a probability distribution $(\rho(P_m))_{m=1}^M$ on $\{1,2, \dots,M\}$ -- ($0 \leq \rho(P_m) \leq 1$ and $\sum_{m=1}^M \rho(P_m) = 1$). A trial of the state $\rho$ can be taken to be an observation of a random variable $X$ with values in $\{1,2, \dots,M\}$ which is distributed according to $\mathop{\rm Prob}\{X = m\} = \rho(P_m)$. If we perform $N$ trials on $\rho$, then, when $N$ is large, we shall be justified in mistaking $\rho$ for $\sigma$ if the frequency distribution of outcomes of those trials corresponds to the distribution defined by $\sigma$ (i.e. if $X = m$ in $N\sigma(P_m)$ of the trials). The outcome of a trial on $\rho$ is distributed according to the multinomial distribution, so the probability of the given frequency distribution arising can be explicitly calculated. Setting $s_m = \sigma(P_m)$, $r_m = \rho(P_m)$, and making the assumption that each $s_m N$ is an integer, it equals $N! \prod_{m=1}^M r_m^{s_m N} /(s_m N)!$. By Stirling's formula, the logarithm of this probability is asymptotic to $N \{ \sum_{m=1}^M(-\sigma(P_m) \log \sigma(P_m)/\rho(P_m))\}$ (\link{RoosShore}{Sanov (1957)}, \link{Bor}{Bratteli and Robinson (1981}, pp 425-427)). This indicates that it would be appropriate for app to satisfy \name{eq7.7} $$\app{\cal Z}{\sigma}{\rho} = \exp\{ \sum_{m=1}^M(-\sigma(P_m) \log \sigma(P_m)/\rho(P_m))\}. \eqno{(7.7)}$$ This analysis is particularly straightforward because it has been possible to treat states on ${\cal Z}$ as single entities rather than as sets. As will be discussed below, it turns out that this is possible for states on any injective von Neumann algebra. On non-algebras, however, states must, in general, be taken to be sets: \pdfproclaim{Example 7.8}{}{ex7.8} Let $(\psi_i)_{i=1}^4$ be an orthonormal basis for $\Complex^4$ and $$P = |\psi_1\>\<\psi_1| + |\psi_2\>\<\psi_2|, \quad Q = |\psi_1\>\<\psi_1| + |\psi_3\>\<\psi_3|.$$ Let $\B = c\{P, Q\}$. A complete mathematical analysis of $\app{\B}{\sigma}{\rho}$ can be given (see \link{otre}{Donald (1987a}, \parasign 5). Conceptually, we are still working in terms of conventional probability theory because $P$ and $Q$ commute and because, as a matter of fact, we need only consider extensions of states on $\B$ to the Abelian algebra generated by $\{|\psi_i\>\<\psi_i| : i = 1, \dots,4\}$. The set of states on that algebra compatible with a given state $\rho$ on $\B$ is equivalent to the set of probability distributions on $\{1, 2, 3, 4\}$ given by $$\{(r_i)_{i=1}^4: 0 \leq r_i \leq 1, \sum_{i=1}^4 r_i = 1, \ r_1 + r_2 = \rho(P),\ r_1 + r_3 = \rho(Q)\}.$$ In these terms, (\link{7.2,7.3}{7.3}) does not pose a well-defined classical problem with a unique solution but an inference problem with incomplete information. Specifically, the problem is to estimate a maximal value, as $N \rightarrow \infty$, for $$(\mathop{\rm Prob}(|{1\over N}(\sum_{n=1}^N X^P_n) - \sigma(P)| < {1\over 2N}, \ |{1\over N}(\sum_{n=1}^N X^Q_n) - \sigma(Q)| < {1\over 2N}))^{1/N}$$ where $(X^P_n)_{n=1}^N$ (resp. ($X^Q_n)_{n=1}^N$) are independent identically distributed random variables on $\{1, 2, 3, 4\}$ such that $X^P_n(\omega) = 1$ if and only if $\omega \in \{1, 2\}$ (resp. $X^Q_n(\omega) = 1$ iff $\omega \in \{1, 3\}$) and all that is known about the distribution is that it is in the set above -- which is equivalent to saying that $E(X^P_n) = \rho(P)$ and $E(X^Q_n) = \rho(Q)$. Whether it is reasonable to assume properties F and G (or their Abelian analogues) for the solution to such an inference problem may depend on the context in which the problem arises, but, in the present context, it is my opinion that they are appropriate. \hfill $\blacksquare$ \endproclaim The analysis of app$_\B$ on sets other than Abelian algebras will focus on concavity properties of a priori probability. By definition \link{def3.10}{3.10}, a convex combination $\sigma = x_1\sigma_1+ x_2\sigma_2$ of two states $\sigma_1$ and $\sigma_2$ on any set $\B$ is also a state. Here $ x_1$ and $x_2$ satisfy $0 \leq x_1, x_2 \leq 1$ and $x_1+ x_2 = 1.$ Generalizing the standard interpretation, already invoked in section \link{sec 2}{2}, of a density matrix as a mixture of components weighted by probabilities, it is appropriate to assume that an observer given the information in $\B$ will have observed $\sigma$ if he has observed $\sigma_1$ a fraction $x_1$ of the time and $\sigma_2$ a fraction $x_2$ of the time. Suppose then that $N_1$ and $N_2$ are integers with $x_1 = N_1/(N_1+N_2)$ and $x_2 = N_2/(N_1+N_2)$. According to \link{7.2,7.3}{7.3}, $\app{\B}{\sigma_1}{\rho}^{N_1} \app{\B}{\sigma_2}{\rho}^{N_2}$ is the probability that an observer given the information in $\B$ will be able to mistake the state of the world for a state compatible with $\sigma_1$ in an initial sequence of $N_1$ trials and for $\sigma_2$ in a subsequent sequence of $N_2$ trials despite the fact that the state is actually some state compatible with $\rho$. Following the total sequence of $N_1+N_2$ trials, however, he will be prepared to believe that the state of the world is compatible with $x_1\sigma_1+ x_2\sigma_2$. This justifies the assumption of the following property: \pdfproclaim{Property H}{}{propH} Let $\sigma_1$, $\sigma_2$, and $\rho$ be states on a set $\B$. Let $x_1, x_2 \in [0,1]$ with $x_1+ x_2 = 1$. Let $\sigma = x_1\sigma_1+ x_2\sigma_2$. Then $$\app{\B}{\sigma}{\rho} \geq \app{\B}{\sigma_1}{\rho}^{x_1} \app{\B}{\sigma_2}{\rho}^{x_2}.$$ \endproclaim Property \link{propH}{H} is a mixing-enhancing or entropy increasing property because it implies that the a priori probability of the mixture $x_1\sigma_1 + x_2\sigma_2$ is at least as great as the minimum a priori probability of the component states $\sigma_1$ and $\sigma_2$. In fact, the link between app and entropy is much more direct and an alternative interpretation of app in thermodynamic terms is possible (see (\link{otre}{Donald (1987b)}). For this interpretation, we again consider a pair $(S, R)$ of sets of states on $\B(\H)$, or, indeed, on any other algebra on which thermodynamic systems can be defined. We now interpret each state in $R$ as a possible equilibrium state at an arbitrary temperature $T$ -- any state will be such for some system. Then $- kT \log \app{}{S}{R}$ is the minimum free energy which would be sufficient to move one of these systems from its equilibrium state $\rho \in R$ to a state $\sigma \in S$. Thus far, the idea of a ``trial'' has been left at an entirely intuitive level. In conventional probability theory, this idea can be given an operational definition, but this is not appropriate in the more general context. A trial of a quantum state is to be thought of not as a physical experiment on that state, but rather as an imagined way in which that state may be experienced. It is postulate \link{posNine}{nine} that gives the link between app and experiments. The problem is that, in quantum theory, an experiment, or indeed any kind of acquisition of knowledge, involves the process of ``collapse'' and the resulting change of state. The experiencing of a state as being the state which one is occupying is different from investigating that state. The image of a trial as a mode of experiencing is introduced as a fundamental image of a quantum state. Ultimately, it is for the reader to consider whether that image is compatible with all the other interpretative structures used in this paper. On ${\cal Z}$, a single trial gave a choice from the set $\{1,2, \dots,M\}$ and a series of trials built up a picture of the state. Now we shall work by analogy on a finite-dimensional Hilbert space $\H$ and we shall take $\B = \B(\H)$. Suppose then that we are given an orthogonal basis $(\psi_m)_{m=1}^M$ for $\H$ and let $S = \{|\psi_m\>\<\psi_m|: m = 1, \dots,M\}$ be the corresponding set of pure states on $\B(\H)$. Then imagine that one of the ways of experiencing a state $\rho$ on $\B(\H)$ is by an ``$S$-trial'' which is to give a choice from $S$. This notion will be shown to be useful by deriving various mutually consistent properties for app from plausible assumptions about what such trials would mean if the notion were useful. Once again, we are not proving theorems here about formally defined concepts, but trying to construct a consistent circle of ideas. It is because this is all that we are doing that the restriction to finite dimensions is permissible at this stage. This allows us to put various technicalities to one side for the moment. By \link{7.2,7.3}{7.3}, the a priori probability of choosing $\sigma_m$ in an $S$-trial should be $\app{\B(\H)}{\sigma_m}{\rho}$. A series of $S$-trials should allow us, if their results have an appropriate distribution, to mistake the state of the world for some state in the linear span of $S$. These are the only states that can be precisely identified by such a series of trials. This is reflected in the fact (lemma \link{l8.9}{8.9}) that $\sum_{m=1}^M \app{\B(\H)}{\sigma_m}{\rho} \leq 1$ with equality only if $\rho$ is in the span of $S$. $1 - \sum_{m=1}^M \app{\B(\H)}{\sigma_m}{\rho}$ may be interpreted as the a priori probability of an $S$-trial providing the experience that $\rho$ is not in that span. For example, suppose that the state $\sigma$ on $\B(\H)$ is in the span of $S$, having the form $\sigma = \sum_{m=1}^M s_m\sigma_m$ where $0 \leq s_m \leq 1$ and $\sum_{m=1}^M s_m = 1$. If, in accordance with our interpretation of (\link{eq2.1}{2.1}), we interpret $\sigma$ as being a mixture of the states $\sigma_m$ with probabilities $s_m$ then we would mistake $\rho$ for $\sigma$ in a series of $N$ $S$-trials, with $N$ large, if the result $\sigma_m$ occurred $s_m N$ times. Thus $(\app{\B(\H)}{\sigma}{\rho})^N$ should be asymptotic to the probability that $N$ trials of a random variable $X$ with values in $\{1, \dots, M\} \cup \{\infty \}$ and distribution $\mathop{\rm Prob}\{X = m\} = \app{\B(\H)}{\sigma_m}{\rho}$, $\mathop{\rm Prob}\{X = \infty \} = 1 - \sum_{m=1}^M \app{\B(\H)}{\sigma_m}{\rho}$ has the result $X = m$ in $s_m N$ of the trials. By the same calculation that gave (\link{eq7.7}{7.7}), this yields \name{propI} $$\app{\B(\H)}{\sigma}{\rho} = \prod_{m=1}^M (\app{\B(\H)}{\sigma_m}{\rho}/s_m)^{s_m}. \eqno{(7.9)}$$ \proclaim{Property I}{} Let $\sigma_1$, $\sigma_2$, and $\rho$ be states on a finite-dimensional algebra $\A$. Let $x_1, x_2 \in [0,1]$ with $x_1 + x_2 = 1$. Let $\sigma = x_1\sigma_1+ x_2\sigma_2$. Then $\app{\A}{\sigma_1}{\rho}^{x_1} \app{\A}{\sigma_2}{\rho}^{x_2} / \app{\A}{\sigma}{\rho}$ is independent of $\rho$. \endproclaim For simplicity, we shall only consider the cases of $\A = \B(\H)$, for $\H$ a finite-dimensional Hilbert space, and of $\A$ an Abelian algebra. Readers familiar with the structure theory for finite-dimensional algebras (\link{Strat}{Takesaki (1979}, \parasign 1.11)) should have no difficulty in generalizing to the wider context. For $\A = \B(\H)$, suppose that, for $i = 1,2$, $\sigma_i = \sum_{m=1}^M s^i_m\sigma^i_m$ for some sets $S^i = \{\sigma^i_m: m = 1, \dots,M\}$ of disjoint pure states. A test for $\sigma$ may then be made by performing $N_1$ $S^1$-trials followed by $N_2$ $S^2$-trials where $N_1$ and $N_2$ are large and $N_1/(N_1 + N_2) = x_1$. $\app{\B(\H)}{\sigma}{\rho}^{N_1 + N_ 2}$ is to be interpreted as the a priori probability of $\rho$ being mistaken for $\sigma$ in this test. One set of test results justifying this mistake arises when all the $S^1$-trials have results compatible with $\sigma_1$ and all the $S^2$-trials have results compatible with $\sigma_2$. This set should have a priori probability $\app{\B(\H)}{\sigma_1}{\rho}^{N_1} \app{\B(\H)}{\sigma_2}{\rho}^{N_2}$. The ratio $$\displaylines{ \app{\B(\H)}{\sigma_1}{\rho}^{N_1} \app{\B(\H)}{\sigma_2}{\rho}^{N_2}/\app{\B(\H)}{\sigma}{\rho}^{N_1+ N_ 2} \hcrh = ( \app{\B(\H)}{\sigma_1}{\rho}^{x_1} \app{\B(\H)}{\sigma_2}{\rho}^{x_2}/\app{\B(\H)}{\sigma}{\rho})^N }$$ is to be interpreted as the relative probability of a test, compatible with $\sigma$, having a result in this particular set. This relative probability should be independent of $\rho$ because it depends only on the set of possible $\sigma$-compatible tests. \pdfeject \name{7ten} For example, if $\sigma_1 = \sigma_2$ then any $\sigma$-compatible sequence of trials will be both $\sigma_1$ and $\sigma_2$ compatible, so that $$\app{\B(\H)}{\sigma_1}{\rho}^{N_1} \app{\B(\H)}{\sigma_2}{\rho}^{N_2}/\app{\B(\H)}{\sigma}{\rho}^{N_1+ N_ 2} = 1.$$ On the other hand, if $\sigma_1$ and $\sigma_2$ are disjoint in the sense of having orthogonal support projections, then it is possible to choose $S^1 = S^2$. Then any $\sigma$-compatible test is some rearrangement of $N_1$ trials compatible with $\sigma_1$ and $N_2$ trials compatible with $\sigma_2$. We should then have $$\app{\B(\H)}{\sigma_1}{\rho}^{N_1} \app{\B(\H)}{\sigma_2}{\rho}^{N_2}/\app{\B(\H)}{\sigma}{\rho}^{N_1+ N_2} \sim N_1! N_2!/(N_1+N_2)! \eqno{(7.10)}$$ as the right hand side is the reciprocal of the number of such rearrangements. For $\A = {\cal Z}$ -- an Abelian algebra, property \link{propI}{I}, like (\link{eq7.7}{7.7}), is a provable statement concerning the asymptotic distribution of outcomes of repeated trials of a multinomial distribution and the justification just given can be seen to be sound. Indeed, with an obvious extension of the notation introduced in the discussion of (\link{eq7.7}{7.7}), the core of the justification is the simple fact that, when all the relevant numbers are integers, for $N = N_1+ N_2$ trials on the random variable $X$, the probability $$\displaylines{ \text{$\mathop{\rm Prob}$(For each $m$, $X = m$ in $N_1 s^1_m$ of the first $N_1$ trials and in $N_2 s^2_m$ of the last} \hcrh \text{$N_2$ trials, given that $X = m$ in ($N_1+N_2)(x_1 s^1_m+x_2 s^2_m$) of all the trials)} }$$ is independent of the underlying distribution of $X$. It is, perhaps, remarkable that this independence can be generalized to the non-Abelian situation, when one considers that on $\B(\H)$ a state $\sigma$ may split in many different ways, corresponding to $S^1 = \{|\psi^1_m\>\<\psi^1_m|: m = 1, \dots,M \}$ and $S^2 = \{|\psi^2_m\>\<\psi^2_m|: m = 1, \dots,M \}$ for many different mutually incompatible bases $(\psi^1_m)_{m=1}^M$ and $(\psi^2_m)_{m=1}^M$. Indeed, $S$-trials can be defined consistently for arbitrary sets of disjoint states. Property \link{propI}{I} generalizes further, applying for $\sigma_1$, $\sigma_2$, and $\rho$ normal states on any injective von Neumann algebra $\A$. An analogous justification can be given in this context. In particular, property \link{propI}{I} extends to normal states on $\B(\H)$ for $\H$ infinite-dimensional. However, such a generalization may fail if $\A$ is not an algebra; indeed, using \link{otre}{Donald (1987a)}, it can be seen to fail for suitable states when $\B$ is as given in example \link{ex7.8}{7.8}. For a non-algebra, the argument for $\rho$ independence fails because, in terms of \link{7.2,7.3}{7.3} and \link{7.5,7.6}{7.5}, the states on $\B(\H)$ compatible with $\rho$ which make $\sigma_1$ or $\sigma_2$ most likely may necessarily be different from the states which make $\sigma$ most likely. This means that the ratio $\app{\B}{\sigma_1}{\rho}^{N_1} \app{\B}{\sigma_2}{\rho}^{N_2}/\app{\B}{\sigma}{\rho}^{N_1 + N_2}$ cannot be interpreted as a relative probability because the trials for the numerator cannot be interpreted as trials on the same unique state extension as the trials for the denominator. On the other hand, for any injective von Neumann algebra $\A$, there exists a map (a conditional expectation) $\varepsilon : \B(\H) \rightarrow \A$ with the property that, for any pair of states on $\A$, $\app{\A}{\sigma}{\rho} = \app{\B(\H)}{\sigma\circ\varepsilon}{\rho\circ\varepsilon}$. This implies that whatever state $\sigma$ we are collapsing to, we can deal with the unique extension $\rho\circ\varepsilon$ of $\rho$. This mathematics is compatible with, and may be taken to explain, the common experience that physical subsystems defined on algebras can be treated as complete and independent entities. The point about property \link{propI}{I}, to be shown in the \link{8.6,8.7}{next} section, is that it is sufficient, in combination with properties \link{propBC}{C}, \link{propDE}{D}, and \link{propDE}{E}, to yield a complete definition of app on any finite-dimensional algebra. This definition is compatible with everything else which has been claimed for app and, in particular, with (\link{7.5,7.6}{7.5}), (\link{eq7.7}{7.7}), (\link{propI}{7.9}), and (\link{7ten}{7.10}). (\link{7.5,7.6}{7.5}) will give app on an arbitrary set $\B \subset \B(\H)$ once we have app on $\B(\H)$. Thus to complete our definition, we only need some method for extending from $\B(\H)$ with $\H$ finite-dimensional to the infinite-dimensional case. This is largely a technical problem. The properties already assumed are sufficient to define $\app{\B(\H)}{\sigma}{\rho}$ for a large set of pairs $\sigma$ and $\rho$ even if $\H$ is infinite-dimensional. For example, if $\sigma = \sum_{j=1}^J s_j|\varphi_j\>\<\varphi_j|$ and $\rho = \sum_{i=1}^I r_i |\psi_i\>\<\psi_i|$ where $(\varphi_j)_{j=1}^J$ and $(\psi_i)_{i=1}^I$ are both contained in a single finite-dimensional subset $\H_1$ of $\H$, then, by (\link{7.5,7.6}{7.5}), $\app{\B(\H)}{\sigma}{\rho} = \app{\B(\H_1)}{\sigma|_{\B(\H_1)}}{\rho|_{\B(\H_1)}}$, as the extensions to $\B(\H)$ of $\sigma|_{\B(\H_1)}$ and $\rho|_{\B(\H_1)}$ are unique. In fact, the set of pairs $(\sigma, \rho)$ for which $\app{\B(\H)}{\sigma}{\rho}$ is already defined is dense in the w*-topology. Thus it is possible to extend our definition by imposing a continuity property on app. \pdfproclaim{Property J}{}{propJ} Let $((\sigma_\alpha, \rho_\alpha))_{\alpha\in I}$ be a net of pairs of states on some set $\B \subset \B(\H)$ with $\sigma_\alpha(B) \rightarrow \sigma(B)$ and $\rho_\alpha(B) \rightarrow \rho(B)$ for all $B \in \B$. Then $$\app{\B}{\sigma}{\rho} \geq \limsup_{\alpha\in I} \app{\B}{\sigma_\alpha}{\rho_\alpha}.$$ \endproclaim Note that a net is a generalization of a sequence (see \link{HeppRS}{Reed and Simon (1972}, \parasign IV.2)). Property \link{propJ}{J} is referred to as ``w* upper semicontinuity''. For a justification, suppose $\alpha$ to parametrize a small perturbation of a physical situation. A circumstance in which $\app{\B}{\sigma_\alpha}{\rho_\alpha} > \app{\B}{\sigma}{\rho}$ for all $\alpha \in I$, in other words in which property \link{propJ}{J} is violated, would mean that one could be arbitrarily close to $(\sigma, \rho)$ with a given a priori probability but that the limit point itself would be strictly less likely. We then make the assumption that the physical distinguishability of two states is determined entirely by the values those states take on bounded operators. As will be discussed in connection with example \link{ex9.8}{9.8}, this assumption may not always be appropriate, but no substitute seems sufficiently general to give a natural alternative to property \link{propJ}{J}. Anyway, under this assumption, however good our technology, there is always a point $(\sigma_\alpha, \rho_\alpha)$ so close to $(\sigma, \rho)$ that it would be physically impossible to distinguish the two points. This means that, allowing for arbitrarily small perturbations, $\limsup_{\alpha\in I} \app{\B}{\sigma_\alpha}{\rho_\alpha}$ would be a maximal measure of likelihood for a situation which would inevitably be described as $(\sigma, \rho)$. That measure, however, is precisely what $\app{\B}{\sigma}{\rho}$ is supposed to describe. If such a situation is never to arise, then property \link{propJ}{J} must hold. The contrary situation in which we have a net such that $\app{\B}{\sigma}{\rho} > \newline \limsup_{\alpha\in I} \app{\B}{\sigma_\alpha}{\rho_\alpha}$ does arise under the proposed definition. In such a situation, the pair $(\sigma, \rho)$ is strictly more likely than some arbitrarily close neighours. This just means that small perturbations from $(\sigma, \rho)$ on to $((\sigma_\alpha, \rho_\alpha))_{\alpha\in I}$ are unlikely. An example is as follows: \pdfeject \pdfproclaim{Example 7.11}{}{ex7.11} Let $|\varphi\>\<\varphi|$ and $|\psi\>\<\psi|$ be pure states on $\B(\H)$. $$\eqalign{\text{Then} \quad \app{\B(\H)}{|\varphi\>\<\varphi|}{|\psi\>\<\psi|} &= 1 \cr &=0 \cr} \quad \eqalign{ \text{if } \varphi &= \psi \cr \text{if } \varphi &\ne \psi.}$$ \endproclaim This might, at first, seem surprising or even undesirable, but it can be proved just by property \link{propBC}{C} and by applying (\link{7.5,7.6}{7.6}) and (\link{eq7.7}{7.7}) to the algebra generated by the projection onto $\psi$. What it says is that if one has information about all the observables in $\B(\H)$ then, although it is possible to mistake a state which is a mixture for one of its components, it is not possible to make any mistake about a pure state. This indicates the importance of the set $\B$ and shows that, under the present theory, coarse-graining is necessary for non-trivial measurement. Once again, we must turn to (\link{eq2.2}{2.2}) for the role of the squared amplitude $|\<\psi|\varphi\>|^2$. W* upper semicontinuity, unlike continuity, is not sufficient by itself to give a unique extension of a definition from a w* dense subset to the set of all pairs of states. The problem is, for example, the possibility of isolated pairs at which app jumps up. Nevertheless, there is a unique minimal w* upper semicontinuous extension. For precisely this extension, the supremum in (\link{7.5,7.6}{7.5}) can be approximated with arbitrary accuracy by near-by states satisfying constraints like, for example, normality or finite expected energy, which we might wish to impose on physical grounds. Thus, let ${\cal D}$ be a dense vector subspace of $\H$ and define sets of states on $\B(\H)$ by $S({\cal D}) = \{\sigma : \sigma = \sum_{j=1}^J s_j|\varphi_j\>\<\varphi_j|$ for $J$ finite and $\varphi_j \in {\cal D}\}$ and on $\B$ by $S_\B({\cal D}) = \{\sigma : \sigma = \sigma'|_\B$ for some $\sigma' \in S({\cal D})\}$. \pdfproclaim{Property K}{}{propK} For any pair $(\sigma, \rho)$ of states on $\B$ and any dense vector subspace ${\cal D} \subset \H$, there exist nets $(\sigma_\alpha)_{\alpha\in I}, (\rho_\alpha)_{\alpha\in I} \subset S_\B({\cal D})$ such that $\sigma_\alpha(B) \rightarrow \sigma(B)$ and $\rho_\alpha(B) \rightarrow \rho(B)$ for all $B \in \B$ and such that $\app{\B}{\sigma_\alpha}{\rho_\alpha}\rightarrow \app{\B}{\sigma}{\rho}$. \endproclaim \pdfproclaim{8.\quad The definition of the a priori probability function.}{}{sec 8} \endproclaim The unique function satisfying the properties proposed in the \link{sec 7}{previous} section can be defined as follows: $$\app{\B}{\sigma}{\rho} = \exp\{ \ent{\B}{\sigma}{\rho} \} \eqno{(8.1)}$$ where $\ent{\B}{\sigma}{\rho}$ satisfies $$\displaylines{ \rlap{8.2)}\hfill \ent{\B(\H)}{\sigma}{\rho} = \sum_{i,j} (-s_j \log s_j + s_j \log r_i + s_j - r_i) |\<\varphi_j|\psi_i\>|^2 \hcr = \tr(-\sigma \log \sigma + \sigma \log \rho) }$$ for $\sigma = \sum_j s_j |\varphi_j\>\<\varphi_j|$ and $\rho = \sum_i r_i |\psi_i\>\<\psi_i|$ normal states on $\B(\H)$ expanded in orthonormal eigenvectors. $$\displaylines{ \rlap{8.3)} \hfill \ent{\B(\H)}{\sigma}{\rho} = \inf\{ F(\sigma, \rho) : F \text{ is w* upper semicontinuous, concave, and given} \crh \text{ by \link{sec 8}{8.2} for $\sigma$ and $\rho$ normal} \}. }$$ $$\ent{\B}{\sigma}{\rho} = \sup\{ \ent{\B(\H)}{\sigma'}{\rho'} : \sigma'|_\B = \sigma \text{ and } \rho'|_\B = \rho \}. \leqno{8.4)}$$ \medskip $\ent{\B}{\sigma}{\rho}$ is referred to as the relative entropy of $\sigma$ with respect to $\rho$ on the set $\B$. This function was introduced in \link{otre}{Donald (1986)} as a generalization of a widely-studied function defined for $\B$ an algebra by \link{Araki}{Araki (1976)}, who, in turn, was generalizing ideas of \link{Strat}{Umegaki (1962)} and \link{HeppRS}{Lindblad (1974)} about quantum information theory. Two alternative derivations of (\link{sec 8}{8.2}) have been given by \link{HeppRS}{Petz (preprint)} and \link{HeppRS}{Hiai and Petz (1991)}. These authors show that ent on finite-dimensional algebras satisfies two striking mathematical properties either of which could, perhaps, be used as the basis of an interpretation in place of property \link{propI}{I}. Equation \link{eq7.7}{7.7} is the exponential of the (negative) ``cross-entropy'', which has been extensively used for the solution of inference problems in classical probability. A review of this work and an axiomatic derivation are given by \link{RoosShore}{Shore and Johnson (1980, 1981)}. The mathematical properties of $\ent{\B}{\sigma}{\rho}$ have been studied at length in \link{otre}{Donald (1986, 1987a)}. These papers rely heavily on the work initiated by \link{Araki}{Araki}. Direct proofs, or reference to proofs, can be found there for many of the claims made in the \link{sec 7}{previous section}. In proving the remaining claims, in this entirely technical section, familiarity will be assumed with those papers. \proclaim{proof of Property \link{propDE}{E}}{} By \link{otre}{Donald (1987a}, lemma 2.11), $\ent{\B}{\rho_a}{\rho} \geq \log p_a$. But by (\link{7.5,7.6}{7.6}) and (\link{eq7.7}{7.7}) applied to the algebra generated by $Q_a$, \name{8.5} $$\displaylines{ \ent{\B}{\rho_a}{\rho} \hcr \leq - \rho_a(Q_a) \log \rho_a(Q_a)/\rho(Q_a) - \rho_a(1-Q_a) \log \rho_a(1-Q_a)/\rho(1-Q_a) \hfill \llap{(8.5)} \cr \sim \log p_a. \hfill }$$ Deriving from (\link{8.5}{8.5}) the upper bound presented in \link{5.1}{5.1} is a lengthy but prosaic exercise in mathematical analysis. This will be omitted here as that bound was only presented in order to give an explicit meaning to the symbol $\sim$ in property \link{propDE}{E}. \hfill$\blacksquare$ \endproclaim \pdfproclaim{Proposition 8.6}{\sl}{8.6,8.7} Properties \link{propBC}{C}, \link{propDE}{D}, \link{propDE}{E}, and \link{propI}{I} define a unique function (given by (\link{sec 8}{8.1})) on a finite-dimensional algebra $\A$. \endproclaim \proof Attention will be restricted to the case of $\A = \B(\H)$ for $\H$ finite-dimensional. Other cases may be handled similarly using the structure theory for finite-dimensional algebras (\link{Strat}{Takesaki (1979}, \parasign I.11)). Let ${\cal D}(\H)$ be the set of pairs of states on $\B(\H)$ on which any function satisfying properties \link{propBC}{C}, \link{propDE}{D}, \link{propDE}{E}, and \link{propI}{I} agrees with the function given by (\link{sec 8}{8.1}). That function does indeed satisfy these properties on all pairs. All that is required is to show that ${\cal D}(\H)$ is the set of all pairs of states. Let $\mathop{\rm A}\nolimits(\sigma\,|\,\rho)$ denote an arbitrary function satisfying \link{propBC}{C}, \link{propDE}{D}, \link{propDE}{E}, and \link{propI}{I}, and let $\mathop{\rm E}\nolimits(\sigma\,|\,\rho) = \log \mathop{\rm A}\nolimits(\sigma\,|\,\rho)$. By taking logarithms and using property \link{propBC}{C}, property \link{propI}{I} implies that $$x_1 \mathop{\rm E}\nolimits(\sigma_1\,|\,\rho) + x_2 \mathop{\rm E}\nolimits(\sigma_2\,|\,\rho) = \mathop{\rm E}\nolimits(\sigma\,|\,\rho) + x_1\mathop{\rm E}\nolimits(\sigma_1\,|\,\sigma) + x_2 \mathop{\rm E}\nolimits(\sigma_2\,|\,\sigma) \eqno{(8.7)}$$ for all $\sigma_1$, $\sigma_2$, and $\rho$, where $\sigma = x_1 \sigma_1+ x_2 \sigma_2$. \proclaim{lemma 8.8}{\sl} Let $(\psi_n)_{n=1}^N$ be an orthonormal basis for $\H$ and $\sigma = \sum\limits_{i=1}^N s_i |\psi_i\>\<\psi_i|$, $\rho = \sum\limits_{i=1}^N r_i |\psi_i\>\<\psi_i|$. Then $(\sigma, \rho) \in {\cal D}(\H)$. \endproclaim \proof By property \link{propDE}{E}, $\mathop{\rm E}\nolimits(|\psi_i\>\<\psi_i|\,|\,\rho) = \log r_i$. A unique value is then given to $\mathop{\rm E}\nolimits(\sigma\,|\,\rho)$ by induction on (\link{8.6,8.7}{8.7}). \hfill $\blacksquare$ The proof of Proposition \link{8.6,8.7}{8.6} now follows \link{otre}{Donald (1987a}, \parasign 3). \hfill $\blacksquare$ \pdfproclaim{lemma 8.9}{\sl}{l8.9} Let $\A$ be a finite-dimensional algebra. Let $(P_m)_{m=1}^M \in \A$ be a sequence of orthogonal projections and $S = \{\sigma_m: m = 1, \dots,M\}$ be a set of states such that $\sigma_m(P_m) = 1$. Then, for any state $\rho$ on $\A$, $\sum\limits_{m=1}^M \app{\A}{\sigma_m}{\rho} \leq 1$ with equality if and only if $\rho$ is in the linear span of $S$. \endproclaim \proof Note that \link{otre}{Donald (1986)} and \link{Araki}{Araki (1977}, theorem 3.6) give (\link{propI}{7.9}) in its logarithmic form: $$\ent{\A}{\sigma}{\rho} = \sum_{m=1}^M (s_m \ent{\A}{\sigma_m}{\rho} - s_m \log s_m) \text{ for } \sigma = {\textstyle \sum\limits_{m=1}^M} s_m \sigma_m. \eqno{(8.10)}$$ Write $a_m = \app{\A}{\sigma_m}{\rho}$ and $\alpha = \sum\limits_{m=1}^M a_m$. Without loss of generality suppose that $\alpha > 0$. Then, setting $b_m = a_m/\alpha$, (\link{l8.9}{8.10}) gives $$\ent{\A}{{\textstyle \sum\limits_{m=1}^M} b_m \sigma_m}{\rho} = \sum_{m=1}^M (b_m \log a_m - b_m \log b_m) = \log \alpha.$$ By property \link{propBC}{C}, $\log \alpha \leq 0$ (which is $\sum\limits_{m=1}^M \app{\A}{\sigma_m}{\rho} \leq 1$) and equality is attained only if $\rho = \sum\limits_{m=1}^M b_m \sigma_m$. On the other hand, if $\rho = \sum\limits_{m=1}^M s_m \sigma_m$ for arbitrary $(s_m)_{m=1}^M$ then $$0 = \ent{\A}{{\textstyle \sum\limits_{m=1}^M} s_m \sigma_m}{\rho} = \sum_{m=1}^M (s_m \log a_m - s_m \log s_m) \leq \sum_{m=1}^M (a_m - s_m),$$ so that $\sum\limits_{m=1}^M a_m \geq 1$. This contradicts what has just been proved unless $\sum\limits_{m=1}^M a_m = 1$, so that equality does hold whenever $\rho$ is in the linear span of $S$. \hfill $\blacksquare$ \proclaim{proof of Property \link{propK}{K}}{} Let $(\psi_n)_{n=1}^\infty$ be an orthonormal basis for ${\cal D}$. This is constructable by Gram-Schmidt orthogonalization. Let $P_N$ be the projection onto the space spanned by $(\psi_n)_{n=1}^N$. Note that $P_N \mathop{\rightarrow}\limits^s 1$. Use axiom four of \link{otre}{Donald (1986)} to find states $\sigma'$ and $\rho'$ on $\B(\H)$ satisfying $\sigma'|_\B = \sigma$ and $\rho'|_\B = \rho$ and a net $((\sigma'_\beta , \rho'_\beta ))_{\beta \in J}$ of pairs of normal states on $\B(\H)$ such that $((\sigma'_\beta , \rho'_\beta ))_{\beta \in J} \mathop{\rightarrow}\limits^{w^*} (\sigma', \rho')$ and $\lim_{\beta \in J} \app{\B(\H)}{\sigma'_\beta }{\rho'_\beta } = \app{\B(\H)}{\sigma'}{\rho'} = \app{\B}{\sigma}{\rho}$. Choose $N_\beta$ such that $N \geq N_\beta \implies \sigma'_\beta (P_N) > 0$ and $\rho'_\beta (P_N) > 0$, and, for $N \geq N_\beta$ define $\sigma''_{(N,\beta)} = P_N \sigma'_\beta P_N/\sigma'_\beta (P_N)$, $\rho''_{(N,\beta)} = P_N \rho'_\beta P_N/\rho'_\beta (P_N)$. Note that $\sigma''_{(N,\beta)}$ and $\rho''_{(N,\beta)} \in S({\cal D})$. By \link{otre}{Donald (1987a}, lemma 2.5), as $N \rightarrow \infty$, $(\sigma''_{(N,\beta)} , \rho''_{(N,\beta)}) \mathop{\rightarrow}\limits^{w^*} (\sigma'_\beta , \rho'_\beta )$ and \newline $\app{\B(\H)}{\sigma''_{(N,\beta)}}{\rho''_{(N,\beta)}} \rightarrow \app{\B(\H)}{\sigma'_\beta}{\rho'_\beta}$. Now given $\varepsilon > 0$ and a w*-open neighbourhood $U$ of $(\sigma',\rho')$ we can choose $\beta$ such that $(\sigma'_\beta , \rho'_\beta) \in U$ and $|\app{\B(\H)}{\sigma'_\beta}{\rho'_\beta} - \app{\B(\H)}{\sigma'}{\rho'}| < \varepsilon/2$. Then we can choose $N \geq N_\beta$ and $(\sigma''_{(N,\beta)}, \rho''_{(N,\beta)}) \in U$ with $$|\app{\B(\H)}{\sigma''_{(N,\beta)}}{\rho''_{(N,\beta)}} - \app{\B(\H)}{\sigma'_\beta}{\rho'_\beta}| < \varepsilon/2.$$ This is sufficient to yield the required property. \hfill $\blacksquare$ \endproclaim \pdfproclaim{9.\quad A priori probability for a succession of collapses.}{}{sec 9} \endproclaim In this section, mathematics arising from properties \link{propA}{A} and \link{propBC}{B} of section \link{sec 7}{7} and from postulate \link{posEightfin}{eight}, is analysed. The most important result in this section is proposition \link{l9.10,p9.11}{9.11} which confirms the claimed consequences of postulate \link{posNine}{nine}. From (\link{sec 8}{8.1}), $\prod_{m=1}^M \app{\B}{\sigma_m}{\sigma_{m-1}} = \exp\{ \sum_{m=1}^M \ent{\B}{\sigma_m}{\sigma_{m-1}}\}$ so we shall also be considering sums of relative entropies. As far as property \link{propA}{A} is concerned, we need to compare $\prod_{m=1}^M \app{\B}{\sigma_m}{\sigma_{m-1}}$ with fixed $\B$ to $\prod_{m=1}^M \app{\B_m}{\sigma_m}{\sigma_{m-1}}$ with an increasing sequence of sets $\B_m$. A function of the latter form does not give an appropriate definition for a priori probability, because of the role of $\sigma_0$ in giving the only input from the global initial state of postulate \link{posFive}{five} (section \link{sec 4}{four}). In the extreme case, putting $\B_1 = \{1\}$ removes all input from $\omega$. More generally, note that $\prod_{m=1}^M \app{\B_m}{\sigma_m}{\sigma_{m-1}} = 1$ if $\sigma_m|_{\B_m} = \sigma_{m-1}|_{\B_m}$, which implies only that $\sigma_m|_{\B_1} = \sigma_0|_{\B_1}$ for $m = 1, \dots, M$. $\prod_{m=1}^M \app{\B}{\sigma_m}{\sigma_{m-1}} = 1$ only if $\sigma_m|_\B = \sigma_0|_\B$ for $m = 1, \dots,M$. It is immediate from the results for $\ent{\B}{\sigma}{\rho}$ that $\sum_{m=1}^M \ent{\B}{\sigma_m}{\sigma_{m-1}}$ has properties of monotonicity, concavity, w* upper-semicontinuity, non-positivity, non-triviality, the Uhlmann-Lindblad inequality, and Araki's property corresponding to those described in \link{otre}{Donald (1986)} for $\ent{\B}{\sigma}{\rho}$. The following proposition corresponds to Theorem 4.4 of \link{otre}{Donald (1987a)} and has a similar proof. We introduce the notation $\Sigma^*(\A)^{\times M}$ (resp. $\Sigma_*(\A)^{\times M}$) for the set of $M$-tuples of states (resp. normal states) on an algebra $\A$. \pdfproclaim{Proposition 9.1}{\sl}{Prop9.1} Let $\omega$ be a faithful normal state on an injective von Neumann algebra $\A$. Let $K \subset \Sigma^*(\A)^{\times M}$ be a w*-closed convex set with non-empty interior. Then there is a unique element $(\tilde \sigma(K,\omega)_m)_{m=1}^M \in K$ which attains \newline $\sup\{ \app{\B}{(\sigma_m)_{m=1}^M}{\omega} : (\sigma_m)_{m=1}^M \in K \}$. $(\tilde \sigma(K,\omega))_{m=1}^M \in \Sigma_*(\A)^{\times M}$. \hfill $\blacksquare$ \endproclaim This result is very satisfactory mathematically, and may be useful elsewhere, but, in the context of this paper, it is irrelevant in several different ways. In postulate \link{posEightfin}{eight}, an inductive sequence of suprema is considered rather than a single supremum over the entire set of sequences. Such an inductive sequence is physically more appropriate and helps to avoid the problem to be raised by example \link{ex9.3}{9.3}. Under the conditions of proposition \link{Prop9.1}{9.1}, Theorem 4.4 of \link{otre}{Donald (1987a)} can also be applied to show that such an inductive definition will lead to a unique best sequence of states in $K \subset \Sigma^*(\A)^{\times M}$. This too is irrelevant here, as because of property \link{3.9}{3.9}, $\B_{SM}$ cannot be assumed to be an algebra. Section 5 of \link{otre}{Donald (1987a)} demonstrates that on non-algebras uniqueness of suprema-attaining states cannot be claimed. Finally, no variant of proposition \link{Prop9.1}{9.1} is directly relevant because, as will be explained in example \link{ex9.8}{9.8}, the set ${\cal N}_T$ of section \link{sec 4}{4} cannot be assumed to be w*-closed. It is essential to the interpretation of $\app{\B}{(\sigma_m)_{m=1}^M}{\omega}$ that it should establish a non-trivial correlation between $\omega$ and $\sigma_M$. That it does is shown by the following extension of the non-triviality property: \proclaim{Proposition 9.2}{\sl} $$\sum_{m=1}^M \ent{\B}{\sigma_m}{\sigma_{m-1}} \leq \inf\{ -|\sigma_M(A)-\sigma_0(A)|^2/(2M||A||^2) : A \in \B \}.$$ \endproclaim \proof By concavity and property f of \link{otre}{Donald (1986)}, for all $A \in \B$, $$\displaylines{ \sum_{m=1}^M \ent{\B}{\sigma_m}{\sigma_{m-1}} = M( \sum_{m=1}^M 1/M \ent{\B}{\sigma_m}{\sigma_{m-1}}) \hcr \leq M \ent{\B}{1/M\, \sigma_M + 1/M \sum_{m=1}^{M-1} \sigma_m}{1/M \, \sigma_0 + 1/M \sum_{m=1}^{M-1} \sigma_m} \hcr \leq -M |1/M \, \sigma_M(A) - 1/M \, \sigma_0(A)|^2/(2||A||^2). \hfill \blacksquare }$$ \medskip The $M$ dependence in this result is important. The next example shows that by interpolating specified sequences of arbitrarily many ``collapses'' between fixed $\sigma_0$ and $\sigma_M$ we can make $\app{\B}{(\sigma_m)_{m=1}^M}{\omega}$ arbitrarily small. This is analogous for the present theory to the ``quantum Zeno paradox'' of the conventional interpretation (see \link{otre}{Exner (1985}, \parasign 2.4)). However, in the present theory the paradox is avoided, partly because $M$, the number of collapses, is forced to be finite by the natural device of linking it to a sequence of real changes in the brain of the observer, and partly because the inductive definition (\link{posEightfin}{4.6}) is such as to disallow a sequence of collapses precisely adjusted towards some future goal. In the conventional interpretation, with no unambiguous definition of measurement, there can be no telling how many measurements take place. \pdfproclaim{example 9.3}{\sl}{ex9.3} Let $\varphi, \psi \in \H$ be orthogonal and normalized vectors. Let $\sigma_m = {1\over2}(1 + {m\over M})|\varphi\>\<\varphi| + {1\over2} (1 - {m\over M}) |\psi\>\<\psi|$, so that $\sigma_0 = {1\over2}|\varphi\>\<\varphi| + {1\over2}|\psi\>\<\psi|$ and $\sigma_M = |\varphi\>\<\varphi|$. Then $\app{\B(\H)}{(\sigma_m)_{m=1}^M}{\sigma_0} \rightarrow 1$ as $M \rightarrow \infty$. \endproclaim \proof By (\link{sec 8}{8.2}), $$\displaylines{ \sum_{m=1}^M \ent{\B(\H)}{\sigma_m}{\sigma_{m-1}} \hcrh = -{\textstyle{1\over 2}} \sum_{m=1}^M\{(1 + {m\over M}) \log({1 + m/M \over 1 + m/M - 1/M}) + (1 - {m\over M}) \log({1 - m/M \over 1 - m/M + 1/M})\}. }$$ Set $x = {m\over M}$ and consider the sum as the integral of a step function on the interval $[0, 1]$. $\log y \leq y -1$ for $y \geq 0$, so, for $0 \leq x \leq 1$ and $M \geq 2$, $$\displaylines{ 0 \leq M\{(1 + x ) \log({1+x \over 1 + x - 1/M})\ +\ (1 - x ) \log({1 - x \over 1 - x + 1/M})\} \hcr\leq {1+x \over 1 + x - 1/M} - {1 - x \over 1 - x + 1/M} = {2 \over M(1 + x - 1/M)(1 - x + 1/M)} \leq 4. \hfill }$$ The dominated convergence theorem gives the result. \hfill \blacksquare \proclaim{example 9.4}{} Let $(\psi_n)_{n\geq1}$ be an orthonormal basis for $\H$ and let $$\textstyle\sigma_2 = \sum\limits_{n=1}^\infty 6/n^2\pi^2 |\psi_n\>\<\psi_n|, \ \sigma_1 = \sum\limits_{n=1}^\infty 90/n^4\pi^4 |\psi_n\>\<\psi_n|, \ \sigma_0 = \sum\limits_{n=1}^\infty 1/2^n |\psi_n\>\<\psi_n|.$$ Then $\app{\B(\H)}{(\sigma_m)_{m=1}^2}{\sigma_0} > 0$ while $\app{\B(\H)}{\sigma_2}{\sigma_0} = 0$. \hfill \blacksquare \endproclaim This should drive home the point that, under the proposed definition of a priori probability, it can be more probable to go from $\sigma_0$ to $\sigma_2$ via suitable $\sigma_1$ rather than directly. \medskip Turn now to the mathematical analysis of postulate \link{posEightfin}{eight}. The notation will be variously simplified, for example, by writing app$(k)$ (resp.~app$^0(k)$) \newline for app$({\cal N}_T, \B_{SM}, k, \omega)$ (resp.~app$^0({\cal N}_T, \B_{SM}, k, \omega))$ and by not mentioning $\B_{SM}$. It will always be assumed that ${\cal N}^M \ne \emptyset$. \pdfproclaim{example 9.5}{}{ex9.5} Let ${\cal N}^3 = \{ (\sigma^n_i )_{i=1}^3: n \geq 1 \} \cup \{(\rho_i)_{i=1}^3\}$ where the states $\omega$, $\rho_i$, and $\sigma^n_i$ are chosen to satisfy $\app{}{\sigma^n_1}{\omega} = {1\over2} - 1/2^n$, $\app{}{\rho_1}{\omega} = {1\over2}$, $\app{}{\sigma^n_2}{\sigma^n_1} = {1\over2}$, $\app{}{\rho_2}{\rho_1} = {1\over4}$, $\app{}{\sigma^n_3}{\sigma^n_2} = {1\over 8}$, and $\app{}{\rho_3}{\rho_2} = {1\over2}$. Then $\mathop{\rm app}(1) =\mathop{\rm app}^0(1) = {1\over2}$, $\mathop{\rm app}(2) = {1\over4} > {1 \over 8} =\mathop{\rm app}^0(2)$, and $\mathop{\rm app}(3) = {1 \over 32} < {1 \over 16} =\mathop{\rm app}^0(3)$. \hfill \blacksquare \endproclaim This example no doubt uses an entirely unphysical choice for ${\cal N}^3$. Although the preliminary version of postulate \link{posEightfin}{eight} does yield a non-trivial definition, that definition is unsatisfactory. At stage 2, the sequence $(\rho_i)_{i=1}^2$ is comparatively unlikely. Because of this the likelihood of $(\rho_i)_{i=1}^3$ at the next stage should be irrelevant. \proclaim{lemma 9.6}{\sl} For $\delta > 0$ and $1 \leq m \leq M$, $\widetilde{\cal N}_\delta^m$ is not empty and there exists a sequence $((\sigma^n_i )_{i=1}^M)_{n\geq1} \subset {\cal N}^M$ such that $\app{}{(\sigma^n_i )_{i=1}^k}{\omega} \rightarrow \mathop{\rm app}(k)$ for $k = 1, \dots,M$. \endproclaim \proof There certainly exists a sequence $((\sigma_i^{1,n})_{i=1}^M)_{n\geq1} \subset {\cal N}^M$ such that \newline $\app{}{\sigma_1^{1,n}}{\omega} \rightarrow \mathop{\rm app}(1)$. Suppose then, that for some $m < M$ there exists a sequence $((\sigma_i^{m,n})_{i=1}^M)_{n\geq1} \subset {\cal N}^M$ such that $\app{}{(\sigma_i^{m,n})_{i=1}^k}{\omega} \rightarrow \mathop{\rm app}(k)$ for $k = 1, \dots, m$. It follows that $\mathop{\rm app}(m+1)$ is well-defined and that, for $\delta > 0$, $\widetilde{\cal N}_\delta^m \ne \emptyset$. For $N \geq 1$, choose a sequence $((\rho_i^{N,n} )_{i=1}^M)_{n\geq1} \subset {\cal N}^M$ such that for $k = 1, \dots, m$, $\app{}{(\rho_i^{N,n})_{i=1}^k}{\omega} \rightarrow \mathop{\rm app}(k)$ and such that \newline $\limsup_{n\rightarrow\infty} \app{}{(\rho_i^{N,n})_{i=1}^{m+1}}{\omega} \geq \mathop{\rm app}(m+1) - 1/(2N)$. Then there exists $N_0$ such that $|\app{}{(\rho_i^{N,N_0})_{i=1}^k}{\omega} - \mathop{\rm app}(k)| \leq 1/N$ for $k = 1, \dots, m+1$. Set $(\sigma_i^{m+1, N} )_{i=1}^M = (\rho_i^{N,N_0})_{i=1}^M$. Then $((\sigma_i^{m+1, n})_{i=1}^M)_{n\geq1} \subset {\cal N}^M$ and $\app{}{(\sigma_i^{m+1, n})_{i=1}^k}{\omega} \rightarrow \mathop{\rm app}(k)$ for $k = 1, \dots,m+1$. The lemma follows by induction. \blacksquare \pdfproclaim{lemma 9.7}{\sl}{l9.7} If ${\cal N}^M$ is w*-closed then $\mathop{\rm app}^0(m) = \mathop{\rm app}(m)$ for $1 \leq m \leq M$. \endproclaim \proof Let $((\sigma^n_i )_{i=1}^M)_{n\geq1} \subset {\cal N}^M$ be a sequence, as given in lemma \link{ex9.5}{9.6}, such that $\app{}{(\sigma^n_i )_{i=1}^k}{\omega} \rightarrow \mathop{\rm app}(k)$ for $k = 1, \dots, M$. Let $((\sigma^\alpha_i )_{i=1}^M)_{\alpha\in I}$ be a w*-convergent subnet with $(\sigma^\alpha_i)_{i=1}^M \rightarrow (\tilde \sigma_i)_{i=1}^M$. $(\tilde \sigma_i)_{i=1}^M \in {\cal N}^M$ and so $(\tilde \sigma_i)_{i=1}^m \in {\cal N}^m$ for $1 \leq m < M$. By w* upper-semicontinuity, $\app{}{(\tilde \sigma_i)_{i=1}^k}{\omega} \geq \mathop{\rm app}(k)$ for $k = 1, \dots, M$. Clearly, $\app{}{\tilde \sigma_1}{\omega} = \mathop{\rm app}(1) =\mathop{\rm app}^0(1)$. Suppose, for some $m < M$, that $\app{}{(\tilde \sigma_i)_{i=1}^k}{\omega} = \mathop{\rm app}(k) =\mathop{\rm app}^0(k)$ for $k = 1, \dots, m$. The definitions of app and $\mathop{\rm app}^0$ then give $$\mathop{\rm app}(m+1) \geq\mathop{\rm app}\nolimits^0(m+1) \geq \app{}{(\tilde \sigma_i)_{i=1}^{m+1}}{\omega},$$ and, as $\app{}{(\tilde \sigma_i)_{i=1}^{m+1} }{\omega} \geq \mathop{\rm app}(m+1)$, an inductive proof is complete. \hfill \blacksquare \medskip This lemma is valuable in relating the two versions of postulate \link{posEightpv}{eight}, and it may be useful in computations. However, I do not expect it to be directly relevant to the set ${\cal N}_T$ of postulate \link{posFour}{four}: \pdfproclaim{example 9.8}{}{ex9.8} Mathematically, it is simplest to define closeness of states $\sigma$ and $\rho$ on a set $\B$ by the closeness of the values $\sigma(B)$ and $\rho(B)$ for $B \in \B$. However, physically, it is necessary also to consider unbounded operators. This arises implicitly, for example, in the differentiability requirement of Hypothesis V(2) of \link{otre}{Donald (1990)}. The present example shows that app can yield a satisfactory theory dealing with such operators, and suggests that, despite lemma \link{l9.7}{9.7}, it can be important not to take closures of sets of states. Let $(\psi_n)_{n\geq1}$ be an orthonormal basis for $\H$ and let $\rho = \sum_{n=1}^\infty 2^{-n} |\psi_n\>\<\psi_n|$. Let $H = \sum_{n=1}^\infty n|\psi_n\>\<\psi_n|$ and consider $H$ as an energy operator. A typical set of states, which we might wish to consider collapse to, is $K = \{ \sigma : 3 \leq \sigma(H) < \infty \}$. $\sup\{ \app{\B(\H)}{\sigma}{\rho} : \sigma \in K \} = 27/32$ and this supremum is uniquely attained at $\tilde\sigma = {1\over2} \sum_{n=1}^\infty (2/3)^n |\psi_n\>\<\psi_n|$. However, the closure of $K$ in the w*-topology (resp. the norm topology) is the set of all states (resp. all normal states) on $\B(\H)$. On these closures, the supremum would be $1$ and would be uniquely attained at $\rho$. \hfill \blacksquare \endproclaim \proclaim{lemma 9.9}{\sl} Suppose that ${\cal N}(1)^M \subset {\cal N}(2)^M$ and that, for some $m \leq M$, \newline $\mathop{\rm app}({\cal N}(1), m) \ne \mathop{\rm app}({\cal N}(2), m)$. Then there exists $m' \leq m$ such that $\mathop{\rm app}({\cal N}(1), k) \leq \mathop{\rm app}({\cal N}(2), k)$ for $k = 1, \dots, m'-1$ and $\mathop{\rm app}({\cal N}(1), m') < \mathop{\rm app}({\cal N}(2), m')$. \endproclaim \proof Suppose not. Then there would exist $m' \leq m$ such that $\mathop{\rm app}({\cal N}(1), k) = \mathop{\rm app}({\cal N}(2), k)$ for $k = 1, \dots, m'-1$ and $\mathop{\rm app}({\cal N}(1), m') > \mathop{\rm app}({\cal N}(2), m')$. Let $((\sigma^n_i )_{i=1}^M)_{n\geq1} \subset {\cal N}(1)^M$ be a sequence given by lemma \link{ex9.5}{9.6} such that \newline $\app{}{(\sigma^n_i )_{i=1}^k}{\omega} \rightarrow \mathop{\rm app}({\cal N}(1), k)$ for $k = 1, \dots, M$. Then $\app{}{(\sigma^n_i )_{i=1}^k}{\omega} \rightarrow \mathop{\rm app}({\cal N}(2), k)$ for $k = 1, \dots, m'-1$. The definition of $\mathop{\rm app}({\cal N}(2), m')$ requires that \newline $\mathop{\rm app}({\cal N}(2), m') \geq \limsup_{n\rightarrow\infty} \app{}{(\sigma^n_i )_{i=1}^{m'}}{\omega} = \mathop{\rm app}({\cal N}(1), m')$. This contradiction proves the result. \hfill \blacksquare \medskip This lemma emphasizes the absolute priority given by the definitions to the maximization of a priori probability at an early stage over its maximization at a later stage. Similar results can be proved for decreasing sets $\B$ and, as exemplified by \link{ex9.5}{9.5}, if $\mathop{\rm app}(m) \ne\mathop{\rm app}^0(m)$. \pdfproclaim{lemma 9.10}{\sl}{l9.10,p9.11} For $\delta > 0$, let $(\sigma^\delta_i )_{i=1}^m \in \widetilde{\cal N}_\delta^m$. Then, as $\delta \rightarrow 0$, $\app{}{(\sigma^\delta_i )_{i=1}^k}{\omega} \rightarrow \mathop{\rm app}(k)$ for $k = 1, \dots, m$. \endproclaim \proof By definition $\limsup_{\delta\rightarrow 0} \app{}{(\sigma^\delta_i )_{i=1}^k}{\omega} \geq \mathop{\rm app}(k)$ for $k = 1, \dots, m$. If the result does not hold then there exists $m' \leq m$ such that $\app{}{(\sigma^\delta_i )_{i=1}^k}{\omega} \rightarrow \mathop{\rm app}(k)$ for $k = 1, \dots,m'-1$, while $\limsup_{\delta\rightarrow 0} \app{}{(\sigma^\delta_i )_{i=1}^{m'}}{\omega} > \mathop{\rm app}(m')$. This contradicts the definition of $\mathop{\rm app}(m')$. \hfill \blacksquare \proclaim{Proposition 9.11}{\sl} (\link{6.2,6.3}{6.2}) and (\link{eq6.4}{6.4}) hold under the conditions of postulate \link{posNine}{nine}. \endproclaim \proof Use the method of lemma \link{ex9.5}{9.6} to find a sequence $((\sigma^n_m)_{m=1}^{M+1})_{n\geq1}$ with \newline $((\sigma^n_m)_{m=1}^M)_{n\geq1} \subset {\cal N}^M$ and $(\sigma^n_{M+1})_{n\geq1} \subset \Sigma$ such that $\app{\B_{SM}}{(\sigma^n_i )_{i=1}^k}{\omega} \rightarrow \mathop{\rm app}({\cal N}_T, \B_{SM}, k, \omega)$ for $k = 1, \dots, M$, and $\app{\B_{SM}}{(\sigma^n_m)_{m=1}^{M+1}}{\omega} \rightarrow \mathop{\rm app}\nolimits(O, T, \B_S, \Sigma\,|\,\omega)$. Then the sequence $((\rho^n_m)_{m=1}^{M+1})_{n\geq1}$ defined by $\rho^n_{M+1} = \sigma^n_M$ for $n \geq 1$ and $\rho^n_m = \sigma^n_m$ for $n \geq 1$ and $m = 1, \dots, M$ has the property that \name{eq9.12} $$\displaylines{ \mathop{\rm app}\nolimits(O, T, \B_S, \Sigma\,|\,\omega) \geq \limsup_{n\rightarrow\infty} \app{\B_{SM}}{(\rho^n_m)_{m=1}^{M+1}}{\omega} = \limsup_{n\rightarrow\infty} \app{\B_{SM}}{(\rho^n_m)_{m=1}^M}{\omega} \hcr = \mathop{\rm app}({\cal N}_T, \B_{SM}, M, \omega) \geq \limsup_{n\rightarrow\infty} \app{\B_{SM}}{(\sigma^n_m)_{m=1}^{M+1}}{\omega} = \mathop{\rm app}\nolimits(O, T, \B_S, \Sigma\,|\,\omega). \cr \text{This shows that } \mathop{\rm app}\nolimits(O, T, \B_S, \Sigma\,|\,\omega) = \mathop{\rm app}\nolimits({\cal N}_T, \B_{SM}, M, \omega). \hfill \llap(9.12) }$$ Let $(\sigma_m)_{m=1}^{M+1} \in \widetilde{\cal N}_\delta^{M+1}(\B_{SM}, C_a)$. Note that $(\sigma_m)_{m=1}^M \in \widetilde{\cal N}_\delta^M(\B_{SM})$. By (\link{7.5,7.6}{7.6}), (\link{eq7.7}{7.7}), and A of postulate \link{posNine}{nine}, for $\delta$ sufficiently small, $$\displaylines{ \app{\B_{SM}}{\sigma_{M+1}}{\sigma_M} \leq \exp\{-\sigma_{M+1}(Q_a) \log(\sigma_{M+1}(Q_a)/\sigma_M(Q_a)) \hcrh - \sigma_{M+1}(1 - Q_a) \log(\sigma_{M+1}(1-Q_a)/\sigma_M(1-Q_a))\} \sim p_a. }$$ Applying the method of lemma \link{ex9.5}{9.6} again, this yields $$\mathop{\rm app}\nolimits(O, T, \B_S, C_a|\omega) \leq (\sim) p_a \mathop{\rm app}({\cal N}_T, \B_{SM}, M, \omega) = p_a \mathop{\rm app}(O, T, \B_S, \Sigma\,|\,\omega).$$ Let $(\sigma^\delta_m)_{m=1}^M \in \widetilde{\cal N}_\delta^M(\B_{SM})$ be given by B of postulate \link{posNine}{nine} and set $\sigma^\delta_{M+1} = \rho^\delta_a$. By (\link{5.1}{5.1}) and (\link{posNine}{6.1}), $\app{\B_{SM}}{\rho^\delta_a}{\sigma^\delta_M} \sim p_a$ and $\app{\B_{SM}}{\rho^\delta_a|_{\B_S}}{\sigma^\delta_M|_{\B_S}} \sim p_a$, so that (\link{eq6.4}{6.4}) holds. By lemma \link{l9.10,p9.11}{9.10}, as $\delta \rightarrow 0$, $\app{\B_{SM}}{(\sigma^\delta_i)_{i=1}^k}{\omega} \rightarrow \mathop{\rm app}({\cal N}_T, \B_{SM}, k, \omega)$ for $k = 1, \dots, M$ and by (\link{eq4.7}{4.7}) and (\link{eq9.12}{9.12}), $$\displaylines{ \mathop{\rm app}\nolimits(O, T, \B_S, C_a|\omega) \geq \limsup_{\delta\rightarrow 0} \app{\B_{SM}}{(\sigma^\delta_m)_{m=1}^{M+1}}{\omega} \hcrh \sim p_a \mathop{\rm app}({\cal N}_T, \B_{SM}, M, \omega) = p_a \mathop{\rm app}\nolimits(O, T, \B_S, \Sigma\,|\,\omega). \hfill \blacksquare }$$ \eject \pdfproclaim{10.\quad Elementary Models.}{}{sec 10} \endproclaim In this section, three mathematically elementary models of the set $\B_M$ of postulate \link{posFour}{four} are discussed in order to explain the development of some of the abstract structures of earlier sections. The results used will be quite simple properties of the function defined by (\link{sec 8}{8.2}), so proofs have been omitted. We begin with the assumption that an observer comes into contact with a succession of independent physical subsystems. These are modelled as independent Hilbert spaces $(\H_m)_{m=1}^M$ forming a tensor product subspace $\otimes_{m=1}^M \H_m$ of the total universal Hilbert space $\H$. The set of operators that define states individually on these subsystems is $\B^0_M = c(\cup_{m=1}^M\B(\H_m))$, where, as usual, $\B(\H_m)$ is identified as a subalgebra of $\B(\H)$ and $c$ is defined in \link{def3.13}{3.13}. The independence assumption made here is a simplification of the more realistic situation considered earlier in which the $\B(\H_m)$ correspond to distinct, but not necessarily commuting, local algebras. It is not unreasonable however to claim on physical grounds that local algebras at distinct times are ``nearly'' independent, because of the variety of physically independent states that may be imposed on such regions. A version of this argument was used in section \link{sec 6}{6} in imagining the duplicitous graduate student. $\B^0_M$ contains no operators measuring correlations between the M subsystems. Thus app$_{\B^0_M}$ is an inappropriate function for determining physical a priori probabilities. For example, take $M = 2$ and consider the problem of finding the state $\tilde \sigma_2$ on $\B(\H_2)$ maximizing $\app{\B^0_2}{(\sigma_i)_{i=1}^2}{\omega}$ for given $\omega$ and $\sigma_1|_{\B(\H_1)}$. This problem has the solution that $\tilde \sigma_2|_{\B(\H_2)} = \omega|_{\B(\H_2)}$. This is useless for present purposes as it takes no account of the information gained by the observer from his observations on $\B(\H_1)$. The most obvious way to include correlations is to replace $\B^0_M$ by the von Neumann algebra $\B^1_M = \B(\otimes_{m=1}^M \H_m) = \otimes_{m=1}^M \B(\H_m)$. For particular choices of $\omega$, $\B^1_M$ would be satisfactory. For example, once again choose $M = 2$. Let $(\psi_i)^\infty_{i=1}$ be an orthonormal basis of $\H_1$ and $(\varphi_j)^\infty_{j=1}$ be an orthonormal basis of $\H_2$. Suppose that $\omega$ is diagonal in the product basis $(\psi_i\otimes\varphi_j)^\infty_{i=1}{}^\infty_{j=1}$, so that \name{eq10.1} $$\omega = \sum_{i,j = 1}^\infty r_{ij} |\psi_i\varphi_j\>\<\psi_i\varphi_j|. \eqno{(10.1)}$$ Then consider the problem of finding the state $\tilde \sigma_2$ on $\B(\H_2)$ maximizing \newline $\app{\B_2^1}{(\sigma_i)_{i=1}^2}{\omega}$ given that $\sigma_1|_{\B(\H_1)} = |\psi_1\>\<\psi_1|$. This problem has unique solution $\tilde \sigma_2|_{\B(\H_2)} = \sum_{j=1}^\infty r_{1j}|\varphi_j\>\<\varphi_j|/\sum_{j=1}^\infty r_{1j}$ as long as $\sum_{j=1}^\infty r_{1j} > 0$. As a model, this is fairly satisfactory. Indeed, the whole of Everett's analysis of quantum mechanics is based on this sort of idea. The problem lies in the justification of (\link{eq10.1}{10.1}) and in its generalization. The function app has been developed to generalize the idea of the $r_{ij}$ in (\link{eq10.1}{10.1}) being probabilities. This generalization, however, has the property demonstrated by example \link{ex7.11}{7.11}. Thus if $\omega$ happens to be a pure state on $\B^1_M$ then the problem considered has no solution except in the trivial case that $\sigma_1|_{\B(\H_1)} = \omega|_{\B(\H_1)}$. To resolve this difficulty, one either has to change the function app, and I know of no plausible alternatives; or argue that an assumption analogous to (\link{eq10.1}{10.1}) is physically natural, which, in view of the problem raised by \link{3.9}{3.9}, I cannot see how to do; or look for some appropriate set $\B^2_M$ between $\B^0_M$ and $\B^1_M$. Suppose then that ${\cal Z}_M$ is an Abelian algebra generated by suitable projections from $\cup_{m=1}^M \B(\H_m)$, and let $\B^2_M = c\{BC : B \in \cup_{m=1}^M \B(\H_m), C \in{\cal Z}_M\}$. ${\cal Z}_M$ is a version of the algebra ${\cal C}_M$ of postulate \link{posFour}{four}, so that this model envisages that the projections defined in (\link{posFour}{4.1}) are mutually commuting. A further simplification, reducing the mathematics of this paper to classical probability theory, would result if it could be claimed that \name{eq10.2} $$\app{\B^2_M}{(\sigma_m)_{m=1}^M}{\omega} = \app{{\cal Z}_M}{(\sigma_m)_{m=1}^M}{\omega}. \eqno{(10.2)}$$ The theory presented has indeed been developed under the assumptions that ${\cal C}_M$ behaves as if it were Abelian, and that (\link{eq10.2}{10.2}) is a good analogy for the states of highest a priori probability of postulate \link{posEightfin}{eight} with $\B^2_M$ replaced by $\B_M$ and ${\cal Z}_M$ by ${\cal C}_M$. While the final result is a mathematical theory independent of the correctness of these assumptions, they remain plausible because the world appears to behave so classically. In the sequel to this paper, I intend to consider variations in geometrical structure for an observer. It follows from (\link{7.5,7.6}{7.6}) that maximizing a priori probability over such variations should make most likely those structures which satisfy (or effectively satisfy) both of the assumptions just mentioned. The commutativity of ${\cal C}_M$ is made likely because the algebra generated by the projections defined in (\link{posFour}{4.1}) will be smallest precisely when those projections commute. Because, by (\link{7.5,7.6}{7.6}), we always have $\app{\B_M}{(\sigma_m)_{m=1}^M}{\omega} \leq \app{{\cal C}_M}{(\sigma_m)_{m=1}^M}{\omega}$, the second assumption can be interpreted as the claim that the states in the neighbourhood ${\cal N}_T$ of postulate \link{posFour}{four} are determined by their values on ${\cal C}_M$ and can be varied freely off ${\cal C}_M$ until maximum a priori probability is achieved. A determining algebra of this sort would be a set of ``definitive observables'' in the sense discussed in section \link{sec 2}{2}. The idea then arises that one should throw away all of $\B_M$ except for ${\cal C}_M$. This, however, would be inappropriate, both because of the approximate nature of the seond assumption and because, as can be seen from hypothesis V of \link{otre}{Donald (1990)}, the abstract theory which allows ${\cal C}_M$ to be defined depends on the behaviour of the states in ${\cal N}_T$ on the whole of $\B_M$. \pdfproclaim{11. \quad Consistency.}{}{sec 11} \endproclaim Three different consistency issues arise in this paper. These are; overall consistency of the underlying interpretation of quantum theory; consistency between the observations of distinct observers; and consistency between the various types of probability mentioned. Little can be said about the first issue until the full details of the interpretation have been presented. It is intended in a sequel to this paper to propose a complete characterization for the physical structure of observers. This will involve, in particular, developing a formalism to permit the claim that a given observer has only a finite number of possible distinct futures within a given bounded complexity. Consistency between the observations of distinct observers depends, in the first place, on each individual observer seeing other observers as observers rather than as superpositions or mixtures. This, in other words, is part of the general problem of what it is that an individual observer can observe. In \link{otre}{Donald (1990)}, it was proposed that the brain acts as an observer by processing definite information and that its quantum state, when so acting, is a state characterized by certain neural proteins having definite status. When these proteins do have definite status, the brain is processing information in a way which, in theory, can be interpreted by an external neurophysiologically-expert observer. The neighbourhoods ${\cal N}_T$ of postulate \link{posFour}{four} are to be taken as consisting of sequences of brain states of this type. When this is done, the only sets of states $C$ in (\link{eq4.7}{4.7}) of high a priori probability will be states compatible with the prior information processing by the brain. In particular, when a human observer interacts with a colleague who occupies a macroscopically mixed quantum state, the brain of the original observer moves into a mixed quantum state. That mixture then must be disentangled by the set of possible neighbourhoods ${\cal N}_T$ at the new time $T$. The consistency between the observations of observer $A$ and the observations that observer $A$ observes observer $B$ to be making then stems from the correlations within the states of each ${\cal N}_T$. There is a correlation between the number I write down in my notebook as we look at the bottom line of our computer printout and the number I see you writing down, because otherwise, assuming neither of us is making mistakes, there must be additional ``collapses'' costing (logarithmically) large amounts of a priori probability in the quantum state of my brain between when I write down my number and when I look over your shoulder. This large logarithmic cost is argued for by the justification of (\link{6.2,6.3}{6.3}). The underlying fact is that single states in quantum theory provide good descriptions of most observed causal processes -- like light reflecting off the printout and carrying the same message to two different observers. Only intermittent collapse is required. The central focus of this paper has been on probability. A mathematical tool has been provided and its relationship to other types of probability has been discussed. Whether that tool has any use independent of the framework of \link{otre}{Donald (1990)} is up to the reader. \medskip \name{Ref}\noindent{\bf References.} \medskip \frenchspacing \parindent=0pt \everypar={\hangindent=1cm \hangafter=1} \name{Araki} Araki, H. 1963 ``A generalization of Borchers' theorem.'' {\sl Helv. Phys. Acta. \bf 36}, 132--139. Araki, H. 1976 ``Relative entropy of states of von Neumann algebras.'' {\sl Publ. Res. Inst. Math. Sci. (Kyoto) \bf11}, 809--833. Araki, H. 1977 ``Relative entropy for states of von Neumann algebras II.'' {\sl Publ. Res. Inst. Math. Sci. (Kyoto) \bf 13}, 173--192. \name{Bor} Araki, H. 1980 ``A remark on Machida-Namiki theory of measurement.'' {\sl Prog. Theor. Phys. \bf 64}, 719--730. Araki, H. 1986 ``A continuous superselection rule as a model of classical measuring apparatus in quantum mechanics.'' In {\sl Fundamental Aspects of Quantum Theory,} Gorini, V. and Frigerio, A. (eds) pp 23--33, (Plenum). Borchers, H.J. 1961 ``\"Uber die Vollst\"andigkeit lorentzinvarianter Felder in einer zeitartigen R\"ohre.'' {\sl Il Nuovo Cimento \bf 19}, 787--793. Bratteli, O. and Robinson, D.W. 1979 {\sl Operator Algebras and Quantum Statistical Mechanics.} (Springer-Verlag) Vol. I. Bratteli, O. and Robinson, D.W. 1981 {\sl Operator Algebras and Quantum Statistical Mechanics.} (Springer-Verlag) Vol. II. \pdfeject \name{BucDew} Buchholz, D. and Wichmann, E.H. 1986 ``Causal independence and the energy-level density of states in local quantum field theory.'' {\sl Commun. Math. Phys. \bf 106}, 321--344. Buchholz, D., D'Antoni, C., and Fredenhagen, K. 1987 ``The universal structure of local algebras.'' {\sl Commun. Math. Phys. \bf 111}, 123--135. Cohen, L.J. 1989 {\sl An Introduction to the Philosophy of Induction and Probability.} (Oxford). Daneri, A., Loinger, A., and Prosperi, G.M. 1962 ``Quantum theory of measurement and ergodicity conditions.'' {\sl Nucl. Phys. \bf 33}, 297--319. Reprinted in \link{Strat}{Wheeler and Zurek (1983)}. De Facio, B. and Taylor, D.C. 1973 ``Commutativity and causal independence.'' {\sl Phys. Rev. D \bf 8}, 2729--2731. DeWitt, B.S. and Graham, N. 1973 {\sl The Many-Worlds Interpretation of Quantum Mechanics.} (Princeton). \name{otre} Donald, M.J. 1986 ``On the relative entropy.'' {\sl Commun. Math. Phys. }{\bf 105}, 13--34. Donald, M.J. 1987a ``Further results on the relative entropy.'' {\sl Math. Proc. Camb. Phil. Soc. }{\bf 101}, 363--373. Donald, M.J. 1987b ``Free energy and the relative entropy.'' {\sl J. Stat. Phys. }{\bf 49}, 81--87. Donald, M.J. 1990 ``Quantum theory and the brain.'' {\sl Proc. R. Soc. Lond. A \bf 427}, 43--93. Driessler, W., Summers, S.J., and Wichmann, E.H. 1986 ``On the connection between quantum fields and von Neumann algebras of local operators.'' {\sl Commun. Math. Phys. }{\bf 105}, 49--84. Ekstein, H. 1969 ``Presymmetry II.'' {\sl Phys. Rev. 1\bf 84}, 1315--1337 with correction in {\sl Phys. Rev. D \bf 1}, 1851(E) (1970). Everett, H., III 1957 ``The theory of the universal wave function.'' In \link{BucDew}{DeWitt and Graham (1973)}. Exner, P. 1985 {\sl Open Quantum Systems and Feynman Integrals.} (Reidel). \name{GarHaag} Garber, W.-D. 1975 ``The connexion of duality and causal properties for generalized free fields.'' {\sl Commun. Math. Phys. \bf 42}, 195--208. Glimm, J. and Jaffe, A. 1971 ``Field theory models.'' In {\sl Statistical Mechanics and Quantum Field Theory,} DeWitt, C. and Stora, R. (eds) pp 1--108, (Gordon and Breach). Reprinted in Glimm and Jaffe (1985). Glimm, J. and Jaffe, A. 1972 ``Boson quantum field models.'' In {\sl Mathematics of Contemporary Physics,} Streater, R.F. (ed.) pp 77--143 (Academic). Reprinted in Glimm and Jaffe (1985). Glimm, J. and Jaffe, A. 1979 ``The resummation of one particle lines.'' {\sl Commun. Math. Phys. \bf 67}, 267--293. Glimm, J. and Jaffe, A. 1985 {\sl Quantum Field Theory and Statistical Mechanics -- Expositions.} (Birkh\"auser). Haag, R. and Kastler, D. 1964 ``An algebraic approach to quantum field theory.'' {\sl J. Math. Phys. }{\bf 5}, 848--861. Haag, R. and Schroer, B. 1962 ``Postulates of quantum field theory.'' {J. Math. Phys. \bf 3}, 248--256. \pdfeject \name{HeppRS} Hepp, K. 1972 ``Quantum theory of measurement and macroscopic observables.'' {\sl Helv. Phys. Acta \bf 45}, 237--248. Hiai, F. and Petz, D. 1991 ``The proper formula for relative entropy and its asymptotics in quantum probability.'' {\sl Commun. Math. Phys. \bf 143}, 99--114. Lindblad, G. 1974 ``Expectations and entropy inequalities for finite quantum systems.'' {\sl Commun. Math. Phys. \bf 39}, 111--119. Machida, S. and Namiki, M. 1980 ``Theory of measurement in quantum mechanics.'' I -- {\sl Prog. Theor. Phys. \bf 63}, 1457--1473, II -- {\sl Prog. Theor. Phys. 63}, 1833--1847. Petz, D. preprint ``Characterization of the relative entropy of states of matrix algebras.'' Reed, M. and Simon, B. 1972 {\sl Methods of Modern Mathematical Physics I: Functional Analysis.} (Academic). \name{RoosShore} Roos, H. 1970 ``Independence of local algebras in quantum field theory.'' {\sl Commun. Math. Phys. \bf 16}, 238--246. Sanov, I.N. 1957 ``On the probability of large deviations of random variables.'' {\sl Mat. Sbornik \bf 42}, 11--44. Translation for the Institute of Mathematical Statistics published by the American Mathematical Society in {\sl Selected Translations in Mathematical Statistics and Probability \bf 1}, 213--244 (1961). Seiler, E. 1982 {\sl Gauge Theories as a Problem of Constructive Quantum Field Theory and Statistical Mechanics.} (Springer-Verlag). Shore, J.E. and Johnson, R.W. 1980 ``Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy.'' {\sl IEEE Trans. Inform. Theory \bf 26}, 26--37. Shore, J.E. and Johnson, R.W. 1981 ``Principles of cross-entropy minimization.'' {\sl IEEE Trans. Inform. Theory \bf 27}, 472--482. \name{Strat} Str\u atil\u a, S. and Zsid\'o, L. 1979 {\sl Lectures on von Neumann Algebras.} (Abacus). Streater, R.F. and Wightman, A.S. 1964 {\sl PCT, Spin and Statistics, and All That.} (Benjamin). Takesaki, M. 1979 {\sl Theory of Operator Algebras I.} (Springer-Verlag). Umegaki, H. 1962 ``Conditional expectation in an operator algebra, IV (entropy and information).'' {\sl Kodai Math. Sem. Rep. \bf 14}, 59--85. Wheeler, J.A. and Zurek, W.H. 1983 {\sl Quantum Theory and Measurement.} (Princeton). Whitten-Wolfe, B. and Emch, G.G. 1976 ``A mechanical quantum measuring process.'' {\sl Helv. Phys. Acta \bf 49}, 45--55. Zurek, W.H. 1981 ``Pointer basis of quantum apparatus: Into what mixture does the wave packet collapse?'' {\sl Phys. Rev. D \bf 24}, 1516--1525. Zurek, W.H. 1982 ``Environment-induced superselection rules.'' {\sl Phys. Rev. D \bf 26}, 1862--1880. \end