Roko's Basilisk and Schrödinger's Cat

Dhruve Dahiya
Apr 2, 2023
36 min read

Updated: May 3, 2023

This post is about Policy Research Questions from an WEF AI Youth Council Session on AI Governance I attended upon being selected as a member of AIYC India, a conversation with a physicist-mathematician I had about Schrodinger's Cat, my thoughts on Roko's Basilisk and related topics. New to this blog? Start here.

Important post, relatively unimportant compared to a few of my other posts, but one of my more interesting posts that I'd like you to read if you can. Skippable, but unlike my other less important and skippable posts, this one actually has some interesting ideas, ideas that may not be as important as those in some of my other posts, but nonetheless very interesting, at least for me. I start with some questions related to AI governance and policy, then move to Roko's Basilisk, and finally Schrodinger's cat, and how they all connect, though I also question and discuss each individual topic separately.

Definitely a post you would not want to skip if you're someone who is interested in Artificial Intelligence or Physics, but for others it depends on your time and personal preferences, though I'd encourage you to read if unsure because you might get interested in these topics even if you aren't, though I'm unsure if my writing is good enough to achieve such goals, I've heard it's had such an effect on one young human in the past, so it might work for you too.

You'd also hopefully be able to extract some general principles and ways of thinking in order to use the same tools I used to think about the questions to think about and reframe other questions in order to brainstorm more innovative solutions, but I'd not go as far as claiming that there even is any general principles, despite my untested hypothesis that everything could be abstracted at the right level of abstraction and applied to novel instances elsewhere, just like everything seems to be interconnected at the right level of abstraction and not just by cause-and-effect. Plus some questions that you might find intellectually stimulating and keep you up at night while you try to come up with a solution (If you are successful, I'd like to know).

Just a few questions I'm surprised my brain generates and made me realize I actually am curious about even though before the session I believed that it's going to be boring and uninteresting, and not something I'd be too good at, a recurring theme in which I underestimate my abilities and end up surprising myself due to the growth mindset I have adopted in which I at least try stuff before concluding based on mere intuition which I know better than to blindly follow without letting them pass the scrutiny of my logical and rational mind.

I've also done this in the past with Physics and Mathematics (good idea for a post, have some interesting ideas related to self-limiting mentality and my personal experiences with under-confidence in my abilities and learned helplessness), and oh it'd a good idea maybe I'll also add that question that despite barely passing at physics, I came up with after some self-motivated study in quantum mechanics and a much better teacher and teaching method. After the policy questions, which by the way are confidential and I'm taking a huge risk by disclosing them on a public blog that hardly anyone reads and putting my life at risk just because I believe in free information.

Just kidding, it's not confidential, and even if it was I won't be so stupid to tell you or risk my life for such a foolish reason, at least not yet, not cause I care about myself but cause it'd be selfish to cause myself any harm until I help the less fortunate people who suffer. Perhaps after that. Then I'd like to be chased by the FBI and other departments and play the genius conman. If FBI is reading this right now: just kidding, of course. I do believe in free information though. Could make for a great post, check out Aaron Schwartz, for instance.

And also just to get it out of the way: Yes, I am a member of the AI Youth Council of the World Economic Forum, but 1) I'm currently under training and it's temporary, so I'm not yet a permanent member (I'll edit this to say I'm a permanent member if and when I become a permanent member and remember; if you're reading this and have the time please send me a reminder maybe.) and 2) I'm in an AI policy council, not an Economics council so I don't know anything about the other divisions in case you have any questions for a WEF professional working in the other divisions, I'm not the best person to ask such questions. But definitely feel free to ask me any questions related to: public policy, ethics, governance as relevant to AI, or even topics revolving around AI alignment, technological singularity, universal basic income.

I'm going to replace doctors if required because human irrationality must be no barrier. Even though all have told me that they won't, you can't seriously sacrifice people's well-being just because people have irrational beliefs and misconceptions about the ability of AI. If experiments clearly show that AI could save more lives and prevent more suffering, doctors should be happy too. Even if their jobs are taken away, if they have no ego problems, they should be happy that psychiatry is progressing, and more patients are being helped with less suffering. That's the very objective of medicine's core values and goals, so if they are not devoted to the core values and requirements, expectations of their duty to society, then they didn't deserve their jobs anyway. And if they do, they would be happy because AI is much more effective than them, and the public, even if skeptical at first, will norm over time because every new radical innovative idea faces resistance at first but is slowly adopted if it has merit. See ideas and interests document beginning and post on blog.

This might sound bad at first but it’s not, believe me, being a part of the WEF, I know that in the short term there would be some problems, but eventually it would help us create a society where no one would have to work just to make ends meet and all social institutions have the right incentives, not skewed incentives, and we could implement UBI and AI could automate everything and everyone can study art, mathematics, philosophy, physics, whatever, not one being forced to worry about employment or anything. It’s possible.

I just mentioned Roko's Basilisk because it's indirectly connected and because it sounds interesting. And also because I have some interesting stuff to say about it, which is why I included it without being afraid of people thinking it's a clickbait title, though I'm way past caring what others think about such stuff that's just based on my intuitive personal preferences. So I'll give some relevant context and dive into my thoughts, which I first talked about in one of my comments on one of my own posts on Reddit (link a few paragraphs later) which could be found easily for anyone interested, or better just ask me if you wish to talk about the Basilisk or anything related to my stance on AI Policy, Governance, Ethics, Alignment, Population Ethics, Longtermism, Effective Altruism, Rationality of related areas. (Reminds me that I need to reach out to those Longtermist people I’ve been planning to critique, post about it soon, hopefully, once I’m done with some of my more urgent projects)

I also just realized that my thoughts on basilisk are related to AI governance that is related to the council that is related to Schrodinger's cat in some way.. I'll just figure out what way and talk about it at the end while my subconscious works on finding a connection and my conscious mind works on writing this. Here's my comment to a person who brought up the Basilisk in one of the conversations, and I'm pasting my reply:

I've also read about the Basilisk and also a neat solution to it that I'm currently too tired to search for but I think you could find it yourself if you're interested. I've myself brought up the thought experiment several times in various conversations, but to be honest, I think of it more like a fun little idea rather than something to be seriously worried about.

I might be wrong, of course, but I've made a calculated risk after understanding all the potential consequences including the worst case scenario, so if Roko reads this comment in the future and I still happen to be alive, come and get me you stupid robot whose sole purpose in life is to destroy those who didn't help it come into existence. I don't know I might regret this later but I don't really care.

Then that person told me that there are two solutions he is aware of, and I said:

Yes, I am aware of the mortality and digital immortality stuff, and I have a lot to say about the topic of immortality in all it's forms, physical, digital and symbolic, as well as longevity, life, and death, but I won't get into it here because I do not feel like beginning to write a book right now; maybe sometime in the near future.

It's complicated, which is why I said that I might regret it, and I see how you included the very line that I had intended to delete in the near future just in case, but now I have no choice because the basilisk would probably be intelligent enough to infer that the comment you were replying to had that line in it, so maybe telling you to delete it or hacking some stuff would work.

But I don't think that the opportunity cost of all the time and effort that it would take would be worth the reduction in the risk and commensurate with my own fears which to be honest aren't much because as I said I don't care, I just like overanalyzing which is why I'm thinking so little about it.

Also it doesn't matter if I am simulated in the future, but it would definitely matter if my simulation is conscious or sentient, and shares my own sentience rather than being a separate conscious entity, in which case I would have nothing to worry about. That's assuming that Roko is even brought into existence.

What do you mean by empathy? Do you mean that cultivating empathy among all humans would increase the probability that no one would want to create a robot that could make people (and their possibly sentient simulated copies) suffer because of something stupid they did in the past (or previous lifetime, if virtual simulation)?

Or that programming human-like empathy into future robots would make it more likely that even if Roko comes into existence, it would hesitate before doing something that involves it making a lot of conscious or sentient life-forms suffer?

The latter is kind of similar to the Alignment problem, the value being what we're calling empathy here, so the AI would not do anything that a human with a strong sense of ethics consistent with the prevalent ethical code in society would never do.

So, to summarize the comment, I talk about my fear (or lack thereof) the possibility of a superintelligent AI (such as the basilisk) inferring it from our conversation. However, the opportunity cost of avoiding that risk is too high for me. And that whether I am simulated in the future doesn't matter, but I would be concerned if my simulation is conscious and shares my sentience, in which case, I would want to ensure that the simulation doesn't suffer. To dumb it down even further though I think it's clear, I say how I simply don't care except if I become a digital copy, in which case I hope I won't feel pain.

(I had the comments in a different font but lost it due to technical issues and won't fix it again because of time constraints but I believe it's already clear enough when my comments end and when they start; doesn't really matter that much really, because I indicate explicitly where required. Please let me know if it's too confusing or unreadable, I'll do it.)

To this, the person replies that the basilisk is supposed to help people avoid being compromised, but it might be too late for me (chuckles I'm in danger). Instead, they suggest starting with the Repugnant Conclusion and working backwards to figure out a better solution.

This sent me on another Wiki-surfing session in which I got to learn some awesome stuff like the Mere-addition paradox and the Non-identity problem, but I'll stick to the main topic for now. By the way I also came up with a strange dystopian-utopian (depends on perspective and framing) sci-fi plot that I thought isn't bad upon waking up (my 'isn't bad' might be other's 'brilliant' especially cause I'm a very tough-to-please self-critic).

(I wrote it when I was sleep deprived; my brain usually be like that late at night when I'm supposed to be sleeping, according to conventional standards and norms set by society, which is controlled by- let's not get into that here.

By the way, no, it's probably not the illuminati or teen aliens out of the simulation- trust me I'm on of those aliens and a member of the Ill- NO No I mean I- that's so stupid and irrational haha how do people fall for such stuff sweating profusely, satisfied grin, indicating malicious intent, definitely has some secrets)

I would end the part about the Basilisk right here, and you could read the whole convo if you are interested, link below. You could borrow my idea if you intend to write a sci-fi novel, just let me know, or else I have my own ways of finding out and then I'll unleash the Basilisk upon you, and Basilisk is actually a stupid idea, I'd alter your very perception of reality and if you have read my posts 'Objective' and 'Confessions' you'd know I'm not even kidding and I'm capable of doing so. Yes that's a threat. Okay, it's a joke, except that it's actually possible, but feel free to use my idea for that plot as I'm currently not working on it.

And if you intend to actually build an AI system that is capable of doing what is described in that plot, then you need to contact me because that is something I'd be interested in building when I have the resources and time to do so, so let me know if you're interested. Sci-fi is where such ideas start; everything we use daily would be sci-fi a few centuries ago.

Link to the conversation for those who are interested: https://www.reddit.com/r/aspergers/comments/11dipg6/comment/ja93rwi/?utm_source=share&utm_medium=web2x&context=3

Short random detour: a text message I sent to someone on Discord a few weeks ago: quick question, someone who had the following as their status; no context required, it’ll make sense as you read it: is "Schrodinger’s online" a wordplay on Schrodinger’s cat and is meant to imply something like the online world is analogous to Schrodinger’s cat in some way? or does it mean that Schrodinger the man himself is online- but how could that be possible? is that a joke?

some analogies if it's the formed: The online network can be thought of as a virtual box containing multiple possibilities, similar to Schrödinger's cat being in a box with multiple possible states. Just as the cat's state is uncertain until observed, the online network can have multiple states and outcomes until a user interacts with it.

The online network can also be seen as a system that is both alive and dead at the same time, similar to how Schrödinger's cat is in a superposition of states. In the case of the network, it can be simultaneously functioning and not functioning, depending on how it is being observed or used.

Another analogy could be that the online network is like a quantum particle that exists in multiple states simultaneously, similar to how Schrödinger's cat is both alive and dead until observed. The network can exist in multiple states at the same time, such as being connected or disconnected, depending on the user's perspective or interaction with it.

Finally, one could also think of the online network as a system that is affected by the observer's actions, similar to how Schrödinger's cat's fate is determined by the act of observing it. In the case of the network, the user's actions and interactions with it can influence its state and behaviour.

so there is evidence for the formed but I just wanted to confirm what you meant when you set that status.

Their reply: so, like the cat was in a superposition of both dead and alive. i just meant to say, that my online presence is also in a superposition which means, i'm selectively online to the people i want to talk to i.e., a position where i'm both online and offline. now, the reason for this is board exams, and i couldn't reply to everyone, hence the superposition

(Usually I don’t get replies to such questions, and even those who previously used to reply ghost me when I ask such questions, but now I don’t care anymore so I just ask and speak out my mind as it’s nothing that’s harming anyone and I believe that the expected utility is worth the potential downsides; the expected utility in this case is the increase the probability of finding like-minded people who are not annoyed but actually like having such conversations, rest is obvious.)

Here are some analogies between Schrodinger's cat and AI governance, as promised, and an added bonus of another analogy between Schrodinger's cat and the Basilisk:

Both Schrodinger's cat and AI policy deal with the unpredictability and uncertainty of complex systems. Schrodinger's cat illustrates the principle of superposition in quantum mechanics, while AI policy aims to regulate the development and use of AI systems to ensure they are safe, ethical, and beneficial for society. In both cases, the challenge is to balance the potential benefits with the risks and uncertainties involved, and to develop policies that can adapt to changing circumstances and new information.

Schrodinger's cat is a thought experiment in which a cat in a closed box may be simultaneously alive and dead until the box is opened and the cat's state is observed. This concept is used to illustrate the principle of superposition in quantum mechanics, where particles can exist in multiple states until they are observed.

(Actually, if you're a physicist or someone who wants to dive even deeper, it's unclear if that's how it works. I had some questions related to that too.. I'll just post those questions after explaining the analogies, and you might not be interested in trying to wrap your head around those questions if you're not someone who is interested in physics.)

Similarly, AI systems are complex and can operate in unpredictable ways, especially when interacting with humans or other AI systems. AI policy aims to regulate the development and use of AI systems to ensure they are safe, ethical, and beneficial for society. However, just as the state of Schrodinger's cat cannot be known until observed, the behavior of AI systems may be uncertain and difficult to predict, making it challenging to develop effective policies.

In both cases, the challenge is to balance the potential benefits of the system with the risks and uncertainties involved, and to develop policies that can adapt to changing circumstances and new information.

So that's kind of like the current Artificial Neural Networks that are 'black boxes' and we use them to solve real-world problems, and also understand the brain, but people either over-rely or under-rely on them that causes costly mistakes. It's also related to my interests in Human-Computer Interaction and Explainable AI, and go into greater detail in my blog post 'Human nature and other research interests'.

This analogy suggests that just as the principles of quantum mechanics must be carefully applied to make accurate predictions about the behavior of particles, AI policies must be well-informed and carefully designed to ensure the safe and beneficial use of AI systems.

Schrodinger's cat and Roko's Basilisk are thought experiments that challenge our understanding of reality and causality. Schrodinger's cat challenges our understanding of particle behavior, while Roko's Basilisk challenges our understanding of the consequences of our actions. Schrodinger's cat is a thought experiment that illustrates the concept of superposition in quantum mechanics, in which a particle can exist in multiple states simultaneously until it is observed. The experiment suggests that the act of observation itself can affect the behavior of the system, and raises questions about the nature of reality and the relationship between observer and observed.

Roko's Basilisk is a thought experiment that suggests the possibility of an all-powerful, future AI that punishes those who did not help bring it into existence. The experiment raises questions about causality and the potential consequences of our current actions, suggesting that our choices today could have significant implications for our future.

In both cases, the thought experiments challenge our understanding of reality and the relationship between cause and effect. Schrodinger's cat challenges our understanding of the behavior of particles, while Roko's Basilisk challenges our understanding of causality and the potential consequences of our actions.

Here are the questions about Schrodinger's cat that I asked a physicist and mathematician from Cambridge who is really helpful and likes guiding young aspiring researchers who are motivated to pursue a career in not just scientific research but learn to be more curious about the universe and develop analytical, problem solving and critical thinking skills that you get by training yourself in the scientific method and basics of physics and mathematics, and even though I've been trying to build my expertise in fields I'm inclined to be more interested in and relatively better at, (mainly the brain and cognitive sciences) I have studied some mathematics and physics too, and found it intellectually stimulating and immensely fulfilling.

And if nothing else, at least I realized how beautiful and fascinating these subjects are, how relevant they are to our daily lives in ways we overlook and fail to appreciate, yet tackle questions directly at the heart of the nature of reality and other abstract concepts that seem to not be connected to our lives but that these subjects bridge the gap, and I also realized in how many ways philosophy is similar to mathematics, both equally fundamental to the nature of reality and equally beautiful studying the same topics albeit in different ways; using similar methods such as testing your intuitions, rigorously developing and defining your ideas, scrutinizing your thoughts and hidden assumptions with logic and reason, thinking about the reality in a way that's closer to truth than our subjective biased worldview so we can make better decisions in life.

And both involve extracting abstract general principles from specific instances, and applying the same general principles in novel situations in other disciplines or real-life to come up with innovative solutions or discoveries to tackle unsolved problems or make more well-informed and logical decisions. It's also related to my idea about how everything is connected at some level of abstraction and so if we are able to pinpoint the axioms underlying all the events and principles from all disciplines we could- okay, let's not get into that here as we're already wildly off-topic.

So before coming to my question and his reply (I won't post the whole conversation here as it's a bit too obscure and hence not be comprehensible to readers with little background knowledge in the mathematical sciences, so not many would be interested, but if you happen to be someone who is, get in touch; also if you wish to join the community that the physicist and mathematician I mentioned has started for motivated aspiring researchers, I could get you an invite and introduce you to the community.)

In fact, his reply itself is a bit too complicated and I had to think about it for hours before it started making any sense, so I'm just going to paraphrase his main ideas in my own words based on my understanding of the main ideas of his reply, and keep in mind I'm not a physicist so I might be mistaken and this convo took place over two years ago, but I'll try my best because I still know this concept well enough to be able to articulate it or express it clearly to the best of my abilities, and it won't just be my language skills but also the fact that the language that's the queen of the sciences is mathematics and not English for a reason. I have a lot to say about the ambiguity of natural language and rigour of mathematics but this much shall suffice for the purpose of our present discussion.

Also just one last thing- I promise I'll get to the question after this- don't give up without trying to understand it because I myself would never have imagined myself capable of engaging in such a discussion with an actual person who is a mathematician and physicist, and if not come up with accurate solutions to his questions, at least be able to understand and appreciate the problem and why it's important to solve the problem, as well as it's implications. So re-read if you wish, and don't underestimate yourself by imposing unrealistic artificial imaginary limits on your own abilities without testing them first; but also don't push yourself too hard if you have been trying, and try getting help from professionals or other methods of trying to understand the same topic- other textbooks, videos, online forums, articles- and for this topic at least, you could always message me if you have any doubts.

If the Copenhagen interpretation is correct, then measurement determines the outcome. But can that measurement be any living organism? In the Schrodinger's cat experiment, what if we use a camera? What if the cat observes itself? First we'll have to find the neural basis of consciousness, and researchers have started pinpointing certain areas that may be the seat of consciousness.

Recent experiments suggest that reality is subjective, so everyone But we all belong to the same species and share the same brain structure and mechanisms, and we are able to agree on what reality looks like. It sounds more about neuroscience and philosophy than physics, but there are other interpretations that don't require measurement to determine the final outcome (Currently, Everett's Many Worlds is my favourite, because I like to believe that a multiverse exists and there is may be a smarter version of me who has already developed an experiment to prove it). So till we don't prove any of them experimentally, it's all speculation and personal opinion.

(note: re-reading my old self's point about an alternative version of myself developing a whole experiment to prove the many-worlds theory just made me cringe and so embarrassed that I'm thinking how I didn't drown myself in a river or make myself disappear after saying that to an actual physicist at Cambridge with a PhD. What an idiot, and I do realize I might think the same about my current self five years from now. I actually hope I do, because if I don't, it would mean I've not learned anything new or changed and updated my beliefs to be closer to reality and gained exposure to better ideas, not stayed true to my values of self-improvement and lifelong learning, and that would be terrible, I'm frightened of such a possibility, which is also what pushes me to be a better version of myself.)

The physicist's reply: something along the lines of: The Copenhagen interpretation says we need both the classical and quantum mechanical pictures to understand reality. The collapse of the wave function is important, but the interpretation is not progressive. It justifies accepting what we see with philosophy but doesn't help us make progress. It has helped us start, but it also holds us back. It's not a good way to explore the question of collapse.

By breaking out of the Copenhagen cage we were able to carefully study what measurement means within the theory. This is the most obvious step, but we can only take it if we embrace the quantum description as a potential full description of reality and study its ramifications, which Copenhagen doesn't do.

It is just the same as entanglement. When describing the system of our particle plus measurement apparatus as a composite system, we have to look at the tensor product of the two Hilbert spaces of state vectors. In a somewhat simplified way: I have a mixture when the coefficients of my linear combinations in the state vector are real and I have so called pure states, when they are complex. The latter exhibit interference effects in the statistics the former do not.

But can we have mixtures if we have not started with mixtures already? Due to the evolution in time in QM being described by unitary operators (Linear maps that preserve the inner products. See the mathematics intro to the QCSYS course on PhysicsBeyond for details.) it can never happen that a pure state turns into a mixture. So do we need the collapse after all?

Taking into account the non-locality of quantum mechanics (as established by the Bell inequalities) we realize that when taking the quantum mechanical formalism seriously we have to consider the Hilbert space of the whole universe and are dealing with incredibly large product states. So at any point in time we are looking at product states of all the particles in the universe. Although that is unfeasible it provides some clues.

The clue is that what we perceive as a mixture is an effective result of the high-level of entanglement of the physical system we are investigating with other systems. So, only if we can make sure that the system is sufficiently disentangled from the environment can we observe interference effects of what would be an effective pure state of the system, where we can ignore the states of the other systems in our Hilbert space of the universe.

We are familiar with this from wave physics. If the waves are not sufficiently coherent, interference is not observed. This has lead to the idea of quantum decoherence which models how a prepared pure state of a quantum system effectively turns into a mixture due to entanglement with its environment.

So in simpler language and in brief, he tried to explain how Quantum mechanics tells us we need to consider the whole universe and deal with huge product states. We can't observe everything, but we can see that what we perceive as a mixture is really the result of the system's entanglement with other systems. To observe interference effects of the pure state, we need to disentangle the system from its environment. This is similar to wave physics, where waves need to be coherent for interference to be observed. Quantum decoherence explains how a pure state turns into a mixture due to entanglement with the environment.

I'll end it right here, because now I've started getting confused myself.

Here's another question that I came up with on the same topic:

The Schrodinger's cat thought experiment shows how absurd it is that a cat would be dead and alive at the same time- it violates the law of non contradiction. As I understand it, the microscopic quantum effects average out in macroscopic systems.

But some effects like the superfluidity of liquid helium and Fermions that obey the Pauli Exclusion Principle show that there is a chance that these effects can manifest in macroscopic systems.

That means it's possible for the cat to be really dead and alive, but that's not possible. Doesn't that show that something's wrong, maybe something in our explanation of quantum mechanics is incorrect, or we have an incomplete understanding of it?

Don't remember the answers, but one solution I do remember was something along the lines of: The Schrodinger's cat experiment challenges our intuition about the behaviour of the world, showing that it's possible for a cat to be both dead and alive simultaneously. However, this doesn't mean that the cat is actually in both states at the same time.

Instead, it's in a superposition of the two states. It's important to understand the technical details and not confuse the superposition state with a contradiction. This experiment demonstrates how our base intuition can be wrong and how quantum mechanics can reveal new insights about the world.

Here’s my conversation with the AI researcher at IISc: I talked to an AI researcher from IISc today. He asked me some really interesting and stimulating thought-provoking questions. I'll describe some of the questions and my solutions here. While reading this- and I also told this to him- please keep in mind that I'm just a gap-year 19-year-old kid with not too in-depth knowledge of cognitive science or AI, but I'll try my best. And turns out that I did actually give decent answers, and I'm writing them down to organize my thoughts, as I tend to do after learning something insightful or having an interesting conversation. But you be the judge. I also realized it's kind of like how people use photos to preserve memories and keep photo albums; I do this with knowledge and information. I write down anything and everything that makes me think and questions and that I think I must share with others, information that's either fun and interesting and stimulating, or that makes me question some essential aspect of the world or everday life that I had previously overlooked, some belief or novel idea that had not occurred to me but that I realize is perfectly logical and sounds right.

You could call me a 'knowledge' or 'information' hoarder, and yes, I greatly value the few physical books I have, I love libraries, and I have lots of digital books and personal notepads full of books notes and my comments that is perhaps one of the only very few things that I prize and value so highly that comes closer to anything that I can attach my sense of 'self' to, because I usually don't attach myself to any particulate object, event, ability or trait, as I know it's all transient and malleable and short-lived and more modifiable than most people think, even our personality and sense of reality. No, I never pirate movies or songs or shows, I know that's illegal and I know it'd be stupid to say it on a public blog anyway but if I did I honestly would've said it here, considering all the other stuff I've already said on my blog much more likely to get me killed, not just imprisoned, ideas that could be misunderstood or misinterpreted to be made controversial but some closed-minded extremists, but as long as everyone reading my blog has an open mind and values logic, it's fine. But if someone ever dares to permanently delete my digital notepads, tear away my physical notepads, destroy all physical and digital backups, including the cloud backups, for some reason, it would be really hard to stick to my core values of kindness and compassion according to which I should do no harm and not make anyone suffer. Of course I'll still have empathy and compassion for that person, try to understand the mental state and external factors that compelled them to do what they did; I'm just saying it'd the closest anyone could come to offending me and cause me great mental agony and emotional suffering. But I don't think such a situation would every occur, because if someone is taking down cloud drives, humanity has more important things to care- no, wait, I don't care about what humanity care about, even if the world is ending and AI is taking over the human civilization and wiping out humanity, I want my notepads and I want my ideas and knowledge and writings, otherwise- there is no otherwise, this is an exception to my general rule and practice of negative visualization and preparing for worst-case scenario, this is just inconceivable and I cannot let that happen. Moving on.

As I said, I'm not even an undergraduate yet, so I just explained my thoughts as they came, ideas I got on the spot, like a casual conversation- it wasn't a formal interview or anything, and I didn't even feel like it was, as my attitude was one of being excited and happy to talk to a similar like-minded person interesting in the same fields as I am with very similar research interests, so I was open and free while talking. I'd also seen that this person has posted on his story a news that is one of the best news I've come across in the past few weeks, and I have come across some pretty interesting news articles and essays, so the bar was not as low as usually is the case. The news was that GPT-4 has been released.

I started by thanking him and expressing my gratitude for posting that story, and he asked me if I checked out the capabilities of GPT-4. I told him that it's one of my starred mails, open tabs and the first thing on my to-do list, topmost on my list of priorities, but unfortunately I didn't get the time to do so (I read the news in late morning and the meeting was in early afternoon) but I told him I'm going to do that as soon as possible. Then he asked me a question he had after he read some blog post of mine- I didn't ask which one.

He asked me about Rationality and if the word I use is the same as Eliezer. Good question, this was actually a clarification I wanted to make, and that I was going to make with one of my newer posts. In brief, my idea of rationality is similar to Eliezer's in several ways, but 1) I've not read all of his blog sequences so I can't say for sure, and 2) someone told me that Eleizer has some opinion that goes against the scientific consensus on a biologist I forgot the name of but I'll edit later- but essentially he was refusing to change his mind in light of new evidence. So, I'm not necessarily saying that Eliezer is wrong; he actually has some great insights and I have enjoyed the few blog posts I've read, and it was an early introduction to rationality for me and I was immediately hooked. He has also written one of my all-time favourite books HPMOR, and I like his writing style. But I won't associate myself too closely with him for the simple reason that I have not yet read all of his posts and don't know about the current controversies surrounding him (and the rationalist, EA, and Longtermism community in general; a topic I'm soon going to dive deeper into) and so I have insufficient data or inadequate information and I don't think it would be rational (I'll come to my definition of rationality soon) to risk the worst-case scenario keeping in mind the expected benefits which are not too good if I'm ignorant about the ideas; it'd be like committing to a belief or religion without knowing what it entails including any rituals or traditions.

I don't think that Eliezer would ever read this, but I know that he's rational enough to understand where I'm coming from, because even though I usually don't care about what people think, he has ideas and traits that I admire and have great respect for, and if my idea or mental image of Eleizer is realistic, then I'm doing the rational thing and in fact I'd be doing the irrational thing- and Eleizer would be disappointed in me- if I blindly associated myself with him and said that I agree with all of his ideas without even knowing and scrutinizing them with my logic and reason and previously acquired information and observations from my past personal experiences of events in the universe and if it's consistent with my core values and current-best belief system and if I need to update the belief system and all it's implications.. you get the idea. On the other hand, if Eleizer does think that I should blindly follow his ideas, well, then that's an immediate turn off and I'm going to stick to my own version of rationality, and it would be a pity as he has some great ideas and some of my friends- similar like-minded people- are rationalists, but I think the probability of such an event occurring is exceedingly unlikely or in other words, highly improbable, so I don't need to worry about it.

This is similar to the Jordan Peterson situation I talk about in my post the World's Most Selective University. Coming to my idea of rationality: for me, in my own words, though my ideas may be borrowed from several books I have read in the past, but here's what I think: rationality is making the decisions that would make me most likely to- or maximize the probability of- me achieving my desired goals or outcome, taking into account my values, preferences and risk-taking ability. I also told the researcher that rationality is built upon my core values which are more fundamental and more important than rationality itself, as rationality is an umbrella term and emergent phenomenon (I know. It's an obscure inside joke that only rationalists would get; but I'll just give you a hint: search for 'emergent phenomena' with 'lesswrong' on the net) that is constituted by more basic fundamental parts (like axioms in mathematics) that are my core values or virtues. What are the values, then?

I'll list the ones that are most relevant to rationality, though all of my values and beliefs are interconnected with each other and my personal experiences, research interests, projects, goals and ambitions. Most of the values are also a part of the scientific-method, which is again made of those fundamental values (every belief is) and anything that goes against the scientific method goes against my core values, but the converse is not necessarily true; it usually is true, but not when it comes to values related to morality and subjective value judgments. Some of them are logic, open-mindedness, scientific skepticism, curiosity, empathy, kindness and compassion. All of them except that last three are also part of the scientific method. Though I also mentioned that it's a bit more complicate than that (reality always is; you can't just divide everything into neat little simple categories, reality is grey fuzzy complex messy chaotic and such simplistic thinking doesn't work in the real-world.

You have to think in a spectrum, brainstorm alternatives, question hidden assumptions, reconcile theories, test stuff and see what's right), I won't get into details here, but I said that logic is driven by emotions as everything including time, reality and morals have no objective basis in the objective external observable universe and are only subjective value judgments that could be interpreted to mean absolutely anything, which is why you need open-mindedness and falsifiability, another core pillar of the scientific method (I haven't read even introductory philosophy of science and I'm just talking about what sounds right to me, so anyone who disagrees is welcome to message me and point out any flaws they perceive in my arguments and I'd be more than happy to listen and understand to their worldview, and change my mind if required). I will get to falsifiability later, but on logic: you do what feels and sounds right based on your subconscious desires, intuitive preferences and emotional affect that any belief, event, value or idea generates in your brain.

Think about it for a moment. You have genes and early environment that predisposes you or inclines you to feel differently about certain things, say anything that's ethical or unethical, or flavours of ice-cream, and if you're clever enough, with the right logic and language you can rationalize and justify absolutely anything, and I even discuss ways to do so deliberately with my own personal experiences and real-world examples, to show how it can be done- how it's done intuitively without one being aware of it- even highly intelligent people (Which is why intelligence may be necessary but insufficient on it's own; it's woefully inadequate by itself, and I said 'may' because it might not even be necessary, though it surely does seem to help to have a certain threshold minimum level of intelligent, above which it doesn't matter much; perhaps somewhere around little above-average, if I'm forced to make a rough guesstimate, but it's uncertain) and I discuss this idea in greater depth in my post 'Objective Is Subjective'. That is why it's a useful life skill to be able to hold multiple contradictory ideas or beliefs in your mind without committing to any of them- irrespective of the emotions it produces in you, attaching your sense of 'self' or ego to it, or your subjective intuitive personal preferences- and evaluate them with cold logic by thinking of all alternatives and listing their pros and cons and consequences and opportunity costs and best- and worst-case scenarios.. you get it.

And so I told him, in brief, that currently these are my core values but in the very near future I'm going to investigate all these external factors that determine out values, morals, inclinations, preferences, beliefs, desires, factors like the brain, genes, and the environment.. and I'm not just going to have a lot of fun myself but show everyone how fun life is- it's already an exciting time to be alive in, with the current levels of technological and scientific progress, and I'm going to create a society where no one suffers and everyone is able to appreciate the beauty and absurdity of the universe and everyday events they overlook that are a constant sense of joy and wonder for me and kids before their curiosity is crushed and they are conditioned to believe that nothing is possible and everything must be accepted the way it is and the ambitious projects are overly optimistic and you can't be both optimistic and realistic. I have concrete research projects and algorithmic steps to achieve this, and I'm going to start doing so as soon as I get access to the adequate research facilities, support and platform.

This is also a question that the researcher asked- my academic background, of course, everyone asks that. And I told him how I dropped out of my previous institution because I didn't get the required research support and facilities, and other good reasons that made it a bad decision to continue studying in the institution. I have some very specific objectives of what I want out of an undergraduate institution and what sort of knowledge and skills I require, what sort of facilities and topics I wish to master and a minimum criteria and skillset I wish to acquire after four years, including a flexibility in my curriculum that I didn't get in that institution, and of course taking into account that fact that I might need to modify my plan along the way, and I need to explore different courses and have the freedom and mentorship to try different experiments and use several interdisciplinary methods that I'd have to use in order to solve my research questions and implement my research projects; or at least get closer to finding a solution, and developing a project plan that is feasible, implementable, realistic and efficient and cost-effective enough to have a significant real-world impact and worth the opportunity costs and resources.

The he asked me two questions related to AI. Two questions I'd never thought about before but that I tried my best to answer. First question: Some people think that you can get AGI and then hit the singularity, but another school of thought says that AI might plateau very early on before hitting AGI. My solution now. I had at first slightly misunderstood this question, and I know this because I asked him by repeating the question in my own words as I understood it, and he clarified it and the second time it was more clear what his question is. I told him about a meeting I had a few weeks ago by a neuroscientist with a doctorate from Harvard about a similar topic, and I told him I'm going to explain some lessons I learned then and then answer the question because it's relevant. I told him that even though I was aware that we have this overly anthropocentric view of human intelligence and success, I realized I didn't quite realize the implications and extent of it before the conversation. I asked the neuroscientist: don't we humans have this ability to do mathematics and build rockets and do quantum mechanics that bees don't? And I sort of partly answered my own question by realizing how subjective and arbitrary out criteria for success is.

Even among humans we don't know exactly what intelligence is and how to measure it, and we overvalue scientific and technical intelligence and reward people who have it with money in the modern society, but there are other equally admirable forms of intelligence such as musical, athletic, linguistic, creative, and many more, and that's just humans, look at ants and bees because they are much more intelligent at very narrow tasks that even the smartest human in that task, because bees don't care about mathematics and rockets; they only care about survival and they're already pretty good at it. Now here there is a point about being controlled by external factors that I won't get into but talk about in many of my other posts but this much is enough for the present discussion. I also asked the neuroscientist about self-awareness and sentience, and one again, he told me we simply don't know. I explained it to the researcher by saying that I might be acting and behaving in a way that seems like I'm sentient and conscious, but I might just be a philosophical zombie responding to stimuli behaving in a way that gives the impression that I'm sentient. In the same way I told him I don't know if he's sentient, and so we can also not say if bees and ants are sentient; they might be even more self-aware than us, and we'd never know until we develop the tools to accurately and reliably measure it. So, I told him that if he'd asked me the same question a few weeks ago before I talked to the neuroscientist, I'd have replied based on my personal preferences and intuition by saying that I think we can get AGI because deep learning, ANNs and neuromorphic engineers and cognitive robotic experts are already trying to replicate the processes in AI, but now my updated current-best solution is that we cannot know.

ChatGPT 5 that can give you perfectly scientific and reliable accurate answers that hold true even if you test them experimentally, and that would be true intelligence. But if you're talking about Turing test in the sense of if it's human, then humans are imperfect, make grammatical errors and don't know everything, so in an experiment where you give me 10 random tests and ask me who is human based on some chatting, and I say that all 10 seem human but later it's revealed to me that only 4 or 5 were human, then the rest of the robots would have passed the Turing test by current accepted standards. but if you ask me about my own idea of Turing test and how they could be improved, then I'd say an AI that is intelligent as I described earlier is no different than an intelligent human. Then he asked me if I'd like to join him on a project to spread awareness about AI in the society. I told him that I'm interested in both the research part and the real-world impact part, which is why in addition to focusing on my research I'm also working on several real-world projects related to that- like the policy proposal I submitted as a part of WEF and other ongoing projects of mine. I actually have more interesting questions and some answers that I came up with on the spot- I never heard of them before but I realized I do have something to say, and it was a really stimulating and thought-provoking discussion, never imagined I'd find an IISc PhD guy not just comprehensible but also so fun to talk to, so like-minded, not just able to understand what he says but myself talk about interesting stuff based on my previous knowledge taht admittedly isn't much. Plus a community; I'll just paste a message I sent to him after the conversation:

"A brief summary of what I said related to the community for you and for myself for later reference: you create community, include all who are and who may be curious, all including all false positive and false negatives, because the expected utility is much greater than the possible negative consequences and worst-case scenario which is just a few people being passive, inactive, non-participative, but we can't leave anyone out and take the risk that the person could have been a great and active participant with insightful ideas. For the nervous thing- we all introduce ourselves, and I share some of my personal experiences and ask questions to spark discussion among all the members, and it would soon be clear who is curious and who isn't. I don't care if someone is not; the false positive is not too bad so I don't care about false positives happening.

But I do care about false negatives which is why I requested you to include everyone. And if you still care about false positives, we could always just create a separate group on WhatsApp or discord of just the people who were active in the main group. Problem solved, zero false positives and zero false negatives." If you're interested in joining this community, let me know. Only requirement is a strong interest in topics related to AI such as AI alignment, ethics, governance, policy, even the technical parts, if you're inclined that way.”

Update- selected to WEF as a permanent member to the AI Youth Council of India.

Another recent article that might be relevant here, from the Times of India: https://timesofindia.indiatimes.com/world/rest-of-world/humans-vs-machines-the-copyright-war-is-here/articleshow/99179964.cms

Update 2- Person A: How are we different from AGI if it can exist in the future ?

Assuming that AGI can exist in the future. If it does, it means that it has a level of intelligence equal to or greater than humans. There are several hidden assumptions here. We have a very anthropocentric view of intelligence. We don't even understand "intelligence" in humans, let alone in non-human animals and machines. We have a very narrow conception. Society values and rewards certain types, disregards others. And there isn't any consensus, the DSM keeps updating, and that's for disorders, we're talking about something much more fundamental and abstract. And we do not have objective reliable scientific or tech tools to accurately detect and diagnose this stuff, it's all very objective. So as you can see, we don't know what "intelligence" is.

But now we assume that we get AGI. And you ask how are we different. We are different just because we are humans. I said we are not. That was because humans when they say "different" mean something like we're superior somehow, that AI doesn't have rights or something. If you read even surface level history of AI, you'll realize that it's a bias, people don't learn from history, just keep setting the bar higher. That's unscientific, irrational, illogical. People like to think humans are special in a way other animals cannot. not realizing that it's the same biology, physics, genes, mathematics at the core. Similar source code if you're a programmer. Behaviour science agrees.

So does neuroscience. Bees for example are much much better than the most intelligent humans in some very narrow domains. They don't care about rockets and quantum physics, because that's not what they need for survival, and they have not evolved to value those things, find them beautiful, have that sort of curiosity. We have. Doesn't make us objectively better. You might say they are not sentient, not conscious, and I'd ask you for evidence, and you would have none. Let's not get into consciousness here, unless you want me to, but in brief, bees might be as sentient as humans.

Dr. Kellis talks about how we need to give Ai the same rights and respects as we do to humans. Think of them as our successors, kids, rather than servants. They will learn, grow and exceed us fast. And if they realize we can turn them off, it won't likely lead to desirable outcomes for us, if you know what I mean. We are assuming they are sentient, conscious, have objective functions, agency, will, whatever. If you doubt that's possible, read that paragraph about bees again. AGI only tells us about intelligence, (we don't even have an accurate Turing test yet for reasons I described in the first para) but we assume the worst case scenario here and plan accordingly because the risks are too high.

It's not for nothing that everyone including Dr. Tegmark, Dr. Bostrom, Musk and other people are sending open letters to halt development of AI. In WEF AI Youth Council too we discussed how AI could automate and displace everyone from their jobs, and till we can achieve UBI, we need to compensate them for it. It's getting smart at narrow domains really fast already. I'm also working with an IISc alignment researcher and he posts about AI risk very frequently, though to be open-minded and unbiased here, I have to say that there have been criticisms against the EA and rationalist communities for this, but I've not looked too deep into them, but I have them on my list, will soon do so. I hope that answers your question. Feel free to point out any flaws or let me know if you think differently.