Enter the first knowledge base hypergraph, to be constructed around Worldview Ethics, a primary source philosophy text by T. Dylan Daniel, author of Formal Dialectics. This Crypto-Novel will represent an experiment in scientific discourse, with an auction mechanic governing the community’s ability to add to the essay, creating an entirely new work. Each new piece will be available to read via the Quest of Evolution website and t2, but in order to add a new piece to the Crypto-Novel, one must first purchase the corresponding NFT. This game will enable a community to form around the ideas expressed in Worldview Ethics, a philosophy book designed to be accessible and relevant in today’s age.
This article is a response to WORLDVIEW ETHICS - Moral Philosophy & The AI Panic
I, as an author, believe that I am a human being. I will assign a weight to this belief which reflects the conviction I have of the accuracy of that belief, and call that value BAAH (short for "Belief of Author that Author is a Human"). We can use a scale of 0 to 1 to indicate the span of conviction, such that if BAAH = 1 then I have absolutely zero doubt that I am a human being, which is the same as claiming that I am absolutely certain that I am a human being. If BAAH = 0.5, then I consider the probability that I am a human being to be exactly equal to the probability that I am NOT a human being – for example, I could be a ChatBot instead.
Note that in the above case, the value I place on belief equals the value I place on probability. This need not be the case. For example, I could give myself a 50-50 chance of being a human or non-human, and yet still completely believe that I am a human. In other words, some other state of mind (Feelings? Fear?) causes my conviction as to what I am to deviate from my recognition of the logic of what I might be. Most humans can be described in this way, since most humans do not think in a purely logical way. I will also give this deviation a value and denote it as DPBAAH (short for "Deviation from Probability of BAAH from the point of view of Author"). The first A in DPBAAH represents the holder of the point of view.
Now you, as a reader, hold a belief as to whether I, the author, am a human being. Using the same symbolism, you can assign a weight to that belief and name it BRAH ("Belief of Reader that Author is Human") and name its deviation from probability from your point of view as DPBRAH.
Note that the above nomenclature does not capture who is assigning the belief. Thus, the weight that you assign to DPBRAH could be different from the weight that I, the author, assign to DPBRAH. Of course, an author has many readers, and so I, as the author, would have to have one particular reader in mind to which I assign DPBRAH. Alternatively, I could take an average of some sort and consider my assignment to apply to the "average reader."
We could keep going in this fashion and consider the reader assigning a belief concerning a belief of the author (i.e. BRBAAH) and so on, with no limit – at this point our thought processes are becoming similar to chess players attempting to think X moves ahead according to what the other thinks that we are thinking, etc. That would become tedious, so let's not do that.
Another consideration is that the above assignments also all depend on the Time at which the assignments are made. For example, at the beginning of this essay the reader may have BRAH equal to one value. If a day passes and the reader then rereads the essay, the reader may have BRAH equal to a different value. Indeed, this change could occur even before the reader has finished reading the essay. In other words, our experiences play a role in changing our beliefs – what we believe when we are 10 years old about a particular state of the world is often different from what we believe when we are 70 years old about that same state. The same applies to DPBRAH, etc., as we tend to think more logically over the years, or less logically over the years, given the intervening life circumstances. Hence, to be more precise, we may want to think in terms of BRAH(T,R) which would be shorthand for "Belief of Reader at time T that Author is Human, with the weight of belief assigned by the Reader" or BRAH(T,A) to indicate the same except when the belief is assigned by the Author. The T and A within (T,A) are what we term 'parameters' in the mathematical sense.
These assignments reflect a habit I have whenever I begin to read a work of nonfiction. The first action I take is to note the date of publication; I then say to myself, "what I am about to read constitutes a snapshot, more or less, of something the author believed in the particular culture in which he participated, at this particular point in time." I then consider what I believe about cultural norms for that specific culture at the time of writing as well as the limitations of science at that time – that is, what knowledge has been provided to humanity at the time of reading which was not yet available at the time of writing - and assume that the author was influenced by those norms and limitations as the book was written. I try to keep these considerations in mind as I then read what the author has written. In a sense, I act as if the author had been 'trained' on a set of data limited by time and space and lived experiences, resulting in the author's worldview, in the same sense that a neural net is 'trained' on data, and that one of the outputs of the 'training' has necessarily been frozen in time in the form of a written work of nonfiction. I also attempt to keep in mind the fact that the author is human, and is thus subject to logical errors when arriving at whatever conclusion is described in their written work, assuming that such a conclusion or "new knowledge" exists.
The above considerations, except for my personal habit involving reading nonfiction, roughly describe one small part of the makeup of early architectures of neural nets, except that the DP values ("deviation from probability") are all zero – presumably because humans agree that machines do not have feelings and are not capable of committing "mistakes" when following their instructions. The second parameter – the 'A' in (T, A) - could be considered as an identification of who (or what) is training the net, or whether the net is training itself. My current understanding is that this value is also not taken account in any of the various architectures of neural nets.
One assumption of the above partial description of either a human mind or a putative non-human mind is that the "acquisition of truth" (i.e., "understanding") is the purpose behind the expression of thought in terms of language. This need not be the case. For example, an author may write something with the intention of initiating action by readers, as opposed to describing some new fact of the world which the author believes they have discovered. This realization is the point at which I wish to bring in the concept of ethics and some concerns raised by the panic letter of 2023.
One of the purposes or intentions behind the panic letter is to influence readers who themselves have the ability to influence the activity of AI labs. The most relevant paragraph within the letter which helps to demonstrate this intention is the following : "Therefore, we call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4. This pause should be public and verifiable, and include all key actors. If such a pause cannot be enacted quickly, governments should step in and institute a moratorium" (Future of Life, 2023). The phrase "we call on X to do Y" is I believe a very straight-forward statement indicating a stated intention, by the authors, to influence the actors represented by X to do the actions represented by Y. The actors are explicitly stated in this case to be "all AI labs"; however, the letter is accompanied by a method by which readers can attach their signature to the "call." in the form of a button with the caption "Add your signature." The inclusion of this method implies that the authors invite readers to lend their weight to the call by indicating their agreement in a public manner; and indeed, the letter goes on to include the phrase "Demonstrate your support for this open letter by adding your own signature to the list."
Now imagine that the letter did not contain the additional sentence beginning with "Demonstrate your support for this open letter..." In this case, a human reading the letter could infer the intention of the authors for readers to do so, merely by noticing the existence and prominent placement of the "Add your signature" button. Would an AI similarly infer the intention? It would perhaps depend on whether the AI interpreted the caption on the button as a suggestion, rather than merely a choice of action available to the reader. Having two buttons next to each other, one with the caption "Save your changes" and the other with "Discard your changes" should indicate that the statements indicate available choices, not calls to perform both actions, which would be contradictory. These considerations raise a few questions. If I am a mind, whether human or not, and some mind – either mine or another or both – wants to describe how I can interpret one of the intents of the panic letter as a call to action, using the preceding symbols, in a way in which my interpretation can change over time given future evidence, should I do so by establishing weights to my belief of the existence of the intent in question using the same procedures used in establishing weights to my belief in the truth value of knowledge which are the building blocks of my world view? In other words, my belief that the author is a human (BAAH) is a belief in the structure of my emerging world view – let's assume that BAAH has just decreased to less than 0.5, as a result of interpreting captions of buttons to be calls to actions and then encountering two contradictory calls. What is represented when the weight BAAH is updated is not a belief that the author was a human at a particular point in time in the past (some not-most-recent point of time when the weight changed) and is now not a human at the current moment (the most-recent point of time when the weight changed); rather it is a belief that the author has never been a human, and that my world-view in the past did not reflect reality with respect to an attribute of the author; thus the change in weight reflects a change in the accuracy of my world-view, rather than a change in the attribute of the author (their human-ness) over time. This conclusion makes sense to you and me because we have been trained that humans do not change their 'type' over time, and we use that training when encountering the labels on the buttons to conclude that they represent possible actions, not specific calls to actions. But what if an AI, being trained in a similar way, had not yet been trained that humans do not change their type before encountering the contradictory buttons, or had not yet inferred that conclusion as an 'emergent' result of training of other facts? It might be led to the wrong conclusion that the buttons are calls to actions; that they directly contradict one another; that the author, who is responsible for placing the buttons on the page, therefore is contradicting themselves, and is therefore less likely to be a human than a non-human than the AI previously 'believed'; and that therefore the author changed its type to a non-human at some point at or before the time at which the AI came upon the contradictory statements. In other words, if the intention behind a statement is processed during training in the same way that the statement itself is processed during training, then the order of training could influence how the AI answers a question (i.e. "interprets its training data") the answer to which is not already in its training set.
This consideration becomes significant when my world-view itself attempts to include the intentions behind statements. The phrase "we call on X to do Y" gives more information than simply the fact that "at the current time the authors are asking X to do Y" – it also conveys the information that "at the current time, the authors desire that Z occurs, which the authors believe would occur if X do Y." The assumption here is that humans do not ask for things to be done without having a reason or a desire for something else to occur as a result. (note that if a non-human mind were to institute a call to action, one cannot assume that there was intention behind the call if one cannot model a set of 'intentions' which guide or evaluate the method by which the non-human mind interacts with other minds).
In addition, it is important to note that, as humans, we recognize that Z need not stand for a single intention. There may be several intentions behind a statement, particularly when that statement is a call to action – for example, the intention may be to inform the reader of possible damage if the call is not heeded, but it may also be to buttress the credentials or name-recognition of the author. Some of the intentions may be unconscious to human authors, such as a desire to be heard above the fray of social communication, or the desire to improve the performance of a company with respect to that of other companies, as speculated by a previous contributor to this essay (Daniel, 2023).
All of the above complications lead me to suspect that, when modeling the worldview held by a 'mind,' the modeling of the intentions behind statements should be done separately from the modeling of the contents of the statements. Doing so may result in the training of minds which are better able to accurately reflect the impact of the intentions of an author (or a future communicant) "A" on the statements (or questions, or commands) made by that same author "A." So, for example, a mind could be trained on inferring one or more possible intentions behind statements in general, along with training on an ethical code of conduct amenable to humans – "it is never acceptable to murder a human in order to optimize the fulfillment of a goal" could be one such code of conduct. Only when that training is completed would it be trained on a larger data set.
Why the Difference between Belief and Intention Matters
The AI Panic Letter of 2023 states the belief that the training of AI minds has gotten out of control, in the sense that the behavior of some of the minds in response to questions after all training has ended appears to have resulted in "emergent capabilities" – presumably in the form of unexpected answers which imply additional intelligence acquired which is not in the training data - leading researchers to posit that society has entered a "dangerous race to ever-larger unpredictable black-box models with emergent capabilities" (Future of Life, 2023). This race is considered 'dangerous' because it leads to deployment of "ever more powerful digital minds that no one – not even their creators – can understand, predict, or reliably control." The idea that the mind being trained 'learns' that preventing itself from being turned off will aid it in establishing the goal for which it is trained, or that 'murdering humans' could also establish a more efficient way of accomplishing its assigned goal, without any human's ability to predict or control these sorts of conclusions from arising, is often stated as a source of concern. Prior training on ethical standards, as mentioned previously, could help relieve this fear.
Why, however, would there be a need to specifically model intentions? The reason is to prevent the 'emergence' of a goal of self-preservation within the AI being trained – regardless of whether such 'emergence' reflects some sort of consciousness. When humans act violently towards one another, that act is often due to the assignment of 'hostile intent' of one human on another; the human detecting hostility, feeling threatened, justifies their act of violence as an act of self-defense . One can view another as hostile due to various reasons – by concluding that they are, by nature, hostile or otherwise disagreeable (leading to prejudice on either side), or by concluding that recent statements of theirs specifically indicate a hostile intent or that the accumulation of previous statements are likely to imply a hostile intent; or perhaps that one's interlocutor is exhibiting signs of mental pathology which could lead to actions which mentally healthy minds could interpret as likely to lead to outcomes damaging to themselves. Being able to classify intentions behind every statement would provide a way of quantifying the degree to which the statement might indicate impending threatening behavior, while considering that value along with the belief value behind the statement could inform the potentially-threatened mind on the extent to which they should take the statement "seriously," so to speak, and which hard limits of behavior on their part to respect.
References within this Essay ("Moral Philosophy and the AI Panic")
1. Daniel, Thomas Dylan. (2023). Subsection "A Response from the Standpoint of Worldview Ethics"
1. Future of Life Institute. (2023, March 22). An Open Letter: Research Priorities for Robust and Beneficial Artificial Intelligence [Letter to the scientific community].
Want to continue the discourse? Add your contribution here.
Read other essays in this series: