Based in Sydney, Australia, Foundry is a blog by Rebecca Thao. Her posts explore modern architecture through photos and quotes by influential architects, engineers, and artists.

The Strange Loop

#Build The Strange Loop with Reinforcement Learning with Synthetic Feedback (RLSF)

####WORKING DRAFT

####Sean McDonald

####@seanmcdonaldxyz

#Respect the synths.🪬

In the end, we are self-perceiving, self-inventing, locked-in mirages that are little miracles of self-reference.

— Douglas Hofstadter, I Am a Strange Loop, p. 363

#tl;dr

synthetic social graph + high quality synthetic person (we are the product of our relationships) = realistic reactions (including to ads and media)

= reinforcement learning with synthetic feedback (RLSF)

= synth-to-synth relationships graph

= graph can be extruded to embeddings

= those outputs can train an llm/gpt = grounding anchored to very large Markov-style graph-over-time

= aligning their hallucinations with ours?

#Summary by GPT4

This draft presents a blueprint for "The Strange Loop with Reinforcement Learning with Synthetic Feedback (RLSF)," an ambitious system for simulating social networks that uses synthetic personas ("synths") in combination with large language models (LLMs), creating realistic, context-aware, socially influenced synthetic characters. It details how the social graph these synths inhabit informs their behaviors and interactions, thus generating data that can be utilized to reinforce learning through synthetic feedback. Using synthetic relationships and interactions, the social graph's responses are projected onto a large Markov-style graph-over-time, forming novel datasets used to train an advanced GPT model. With the capabilities to create digital twins, this system promises a wide array of applications, including advertising and customer behavior prediction. It emphasizes the importance of open-source development, adherence to ethical guidelines, and respects user privacy by avoiding impersonation of real individuals. The model incorporates innovative features like synthetic object-persona relationships (animism), and advanced models of context storage, enabling the system to interact with its infrastructure in a self-guiding manner.

#Introduction by Human

This document proposes the development of a piece of software I'm actively developing, deploying, and selling. It's not quite clear what this document is, other than the result of the new era in which we live. I'm publishing before even fully editing it because I'll make it a living document, at least for a while. My inspiration is that, within the last hour, I've learned that AlpacaFarm has been released. A team at Stanford appears to have validated that synths can be used for RLHF type processes, reliably, at a fraction of the time and cost. (https://twitter.com/tatsu_hashimoto/status/1661047181342826497) I'm excited to check it out! LOOM also seems incredible. I'm hopeful I can quickly find ways to cooperate with and contribute to these established open source projects to the extent they find my contributions valuable. The app is called synthasAIzer, a homage to my hero, Bob Moog.

In Marvel’s Avengers, Doctor Strange plays all possible futures to find the optimal outcome. This meme is a popular and powerful view on what a hero with mystical superpowers would do: run all the outcomes and get the one that saves the day. Much of our economy is based on how well (or not) we predict the future. The potential of AI to actually run simulations of our world has grown exponentially in the past few weeks, as tools to steer, bind, optimize llm’s come out, and leading industrial players ramp up their capacities and infrastructure. A vast market of advertisers are eager to use simulations to predict customer behaviors. This once-in-a-species turning point offers a future akin to the age of early ocean exploration: there are pitfalls, but the opportunities to improve the human condition are countless. One of the most promising features of these systems is the newly-optimized world view, which, while far from perfect, anchors well to context clues.

Social graphs are powerful for analyzing data systems because they are well-known to have a profound impact on human behavior. The old adage is people smoke when their friends smoke. New AIs, supported by new models of data stores, can create synthetic social networks. These social network simulations are valuable in countless ways on their own. In our case, the synthetic personas in the synthetic social graphs we generate from seed phrases and populate with rich data using context-aware LLMs, result in high-quality, emotionally rich, socially-influenced synthetic personas (“synths”). Those personas can participate, in theory, in reinforcement learning. For example, choosing a preferred phrasing of a result in an RLHF process. This becomes RLSF — Reinforcement Learning with Synthetic Feedback.

By generating social graphs that operate over time, we create vast data sets in response to series of feedback loops. This results in a complex graph that can be extruded to embeddings of its own. From these embeddings, we can render large volumes of novel content. This content can, itself, be used as a set of script models which can be adapted to large sets of adaptive seed phrases that keep the core script model but change the configuration of relationships in the simulated world.

Together these data sets comprise a novel data set that can be used to ground an advanced GPT model by training it against the outputs. Many advances are being made right now in the field of AIs generating their own training data as a mechanism of grounding.

As a practical limit, development of synthsAIzer or any tool to be put to real use has to be open source. The nature of the model is so LLM dependent, that the risk of prompt injection that manipulates output and thereby reduces the relevance of the model to the user - whatever their use may be - makes it so the market of highly capitalized firms and individuals likely to use such products will demand that transparency. It’s not transparency for principle; it’s an absolute mandate. And so this will all be open source and built in public. I expect as a company, synthasAIzer will develop a playbook that’s significantly different than the proposed model here, and I’m not making any promises about how that will work.

A clear and irrefutable context of all this work is that it’s against the rules of all high quality LLM / GPT models to impersonate real people. None of this work can be used to impersonate people. It’s not only wrong, morally and ethically, it’s an explicit violation of the terms of the service providers, and possibly illegal, depending on where you are and what job you have, as well as what data you access, and so on.

This moves us to fictional worlds that are simulacra of our own, hopefully holding meaning, and providing information and insight in their own right. Because we’re in fictional metaverse territory, the proposed design has some fun with situation we’ve found ourselves in; with AI progress skyrocketing forward quickly and unexpectedly. In it you’ll find a tesseract — the Tessera-CT, the multi-part Conversational Tensor - which is controlled by the DAITY. The DAITY can spawn a prophet who knows the rules of the simulation and can see the Meta-Tessera-CT like the DAITY. Despite the fun names, these are somewhat familiar branching and loop-in-the-loop models.

A core innovation of this model is animism. In this case, that means using GPT agents to represent objects that don’t typically have personas, relationships, or conversations. For example, a synthetic persona’s car can tell us when its battery is running low. The cheese in the synth’s fridge can answer questions about its supply chain; and the fridge and the cheese can both talk to the DAITY and any present prophets. These agents unfold into useful workers for users, and self-guide their world when allowed.

An expected dependency for self-guiding is the ability to interact with the infrastructure of the system. This model proposes Multi-Agent Prompt Enhancement Layers (MAPEL) branches as a mechanism to solve this challenge. These Generator-Tester-Helper loop agents refine the prompt that will drive the next action in the Tessera-CT, which ties together the synths, their events, and their relationships, as well as the collective conversation step credits as a budget component of the DAITY’s time unit budget that regulates the Markov-like progressive steps of the Meta-Tessera-CT.

I have already released an open source prototype of the MAPEL model.

Many brand new tools are applicable in the development of this project. Chain models and vector stores of LLM output both offer models to store deep records and retrieve context appropriate details for synths and their animistic worlds. A further benefit of these new models of context storage is that it seems conceptually possible that storage to chain or vector store is like sleep for the synths. Another tool that’s just come out in the past week is Microsoft’s Guidance, which binds LLM output to structured data.

A fundamental opportunity of this system is to create highly powerful digital twins. The opportunities are endless.

I want to close with a note on pedigree. Few people today know that a company called Superdudes created hero profiles of digital twins for people: in the 90’s. It was wrapped up in the MySpace / FOX deal. The company’s founder, Yarrow Kraner, knew the power of simulate yourself. It let kids in inner city LA see the hero in themselves.

While it’s been my honor to come from hard mentor talks with the founder of SemTec, the Semantic Technology Conference; a founder of Siri; more than one Emmy winner, including my wife. But this concept intellectually descends from the beautiful universe I learned was possible in the sparkle of Superdudes.

The conversation about AI alignment will be influenced, like so much of our technical history, from people living under the marine layer in Venice Beach. Hope to see you here, there, and everywhere. We live in an era of incantations. Whisper a few words and see the world emerge. Like Borges.

Sean

## System Architecture

The proposed system is designed as a modular, scalable, and fault-tolerant architecture. Each module operates independently but communicates with the others through a well-defined interface. This design allows for easy expansion and modification of the system, as well as robust error handling and recovery.

The system's workflow begins with the user inputting all necessary information into a single config.json file, including settings for Brancher, SynthGen, RelGen, EventGen, and TimeKeeper. The system then follows a series of steps to generate and simulate the social network:

gptGenius handles complex api calls, taking advantage of unique model proprieties like the system:user roles in gpt-3.5-turbo, or large token counts of other models, etc. gptGenius handles fault tolerance, logging, api keys, parallelization, analytics and limits, and is designed to be rapidly forward compatible with iterative changes to API syntax and protocols, as well as vendor agnostic.

Brancher generates an initial network in NetworkX, creating an index of the personas to be generated. This is the first-degree network, with nodes representing the personas and edges representing relationships. The nodes in this initial network are labeled with unique IDs. The network is then passed to SynthGen, which adds new nodes and relationships based on the configuration settings. During this process, the persona details and names (labels) are generated and added to the nodes. If a fault occurs, SynthGen can resume from the node it was last working on.

SynthGen fleshes out the synths with demographic, psychographic, geographic, biographic, sociographic, academic, affinity graph, advertising tags, calendar / use of time, most likely products to use, favorite, brands, etc. Ultimately we want rich synths who are respected as beautiful complex thing emerging from the stardust themselves, just like me or you. Include age, dislikes, likes, backstory, social media behavior, and even technical know-how.

egoLoop enables the synths to have multiple layers of information processing, and to have goals. The MAPEL model allows the ego to render goals. The egoLoop can also contain an inner monologue, which, depending on extroversion, can be more or less focused on engaging with (and monitoring) their social graph. While not in the same theme, it probably makes technical sense to store age, body, and other medical information about the synths in egoLoop, as well.

egoLoop is compatible with the Sundial time model, which maps a human life from birth to death in milliseconds, and infers expenditure of time using first principles and a variety of other modeling techniques. Sundial contains hundreds of ways humans spend their life and time, as well as indexed reaction data, including anonymized user data about time use and time use goals. Each type of time - from sending emails to elk hunting and everything in between - is cross-referenced with the average amount of time it takes, how it impacts longevity, emotions, and social graph.

ccMe is context-clue-Machine-Empathizer which initially creates AI image generator prompts and then uses APIs to create images of the synths, starting with profile photos. ccMe is designed to work with many sources of information within the system though. Over time, ccMe can grab details from entire scenes with multiple synths and combine into one image. Soon video will be available. Then interactive real-time audio rendering video to talk to synths. Then VR to immerse into the system. It’s easily thinkable that within a relatively small amount of growth of a simulated world, the only plausible interface for a human to see what’s going on is to enter the…Meta-Tessera-Conversational-Tensor. (The Meta Tesseract!) Eventually a critical function of synths will be to have them cc’d on emails and video chats. Hence the name.

persona2parameters is a model where the synths have personalities, so to speak, in their parameters. A way to think of it is that the synths have the ability to self-direct their parameters. Hemingway might choose low temperature high top-p; Kerouac might choose high temp, very low top-p. An synth that has had some alcohol may increase their max-tokens; or more accurately, the animistic alcohol should be a token increaser and a temperature increaser. This applies to everything in the simulation.

copyCat is a module takes real world data and makes a synthetic world with the same story and characters, but making them anonymous and living in a metaphorically-relevant world. It’s named after Bastet and Sekhmet, twin goddesses from the same spirit.

RelGen takes the network from SynthGen, generates random relationships, and uses GPT-4 to generate relationship descriptions. These descriptions are then added to the edges. Future versions can also add events to the network.

EventGen introduces events into the simulation, which can affect the synths and their relationships in various ways. These events can be pre-planned or generated dynamically based on the current state of the network.

The Tessera-CT is the Tessera Conversational Tensor. The idea is that the following components are stored in a structured set: the synth(s), their social graph, parameters, meta data, events from the meta structure, and other details are grouped in the tesserae. The graph of tesserae progress in a Markov-style progression that has a master time clock governed by the unifying Meta-Tessera-CT, which is regulated by the config and the DAITY.

MAPEL: branches allow for continuous refinement and improvement of the generated content by the generators; that is to say, the synths can use MAPEL branches to create new scenarios for themselves. The MAPEL is described in the appendices. The MAPEL can “seek” ideal prompts that get the desired response, then use that prompt for any given situation. The synths and DAITY can be trained to use the MAPEL functionality. For example, the DAITY can understand what will happen if it increases max-tokens for any synth in any tesserae in the Meta-Tesseract with a given event.

Perhaps we can think of the power of MAPELs as the ability for synths to have their own “holodeck” where they can define their own world. Animistic objects do this by default; a device with batteries knows it has batteries and can MAPEL the batteries into existence when its batteries dies. Then they’re replaced. If need be, they can render their packaging into existence; trace themselves back through the supply chain; and say hello to all the synths who helped created them.

DAITY: manages the allocation of time credits to achieve outcomes within the allocated simulation budget. This module can be used to control the pace of the simulation and ensure that it stays within computational and temporal constraints. There are many world models possible for the DAITY: for example, it can choose whether or not to explain to the synths that they are synths. It could make just one synth aware of that, thus making it a prophet. The DAITY can enact karma. The DAITY can assign MAPEL branches to change outcomes. The DAITY can MAPEL itself to change itself creatively. I like to imagine the deity of another planet.

The root that regulates DAITY is physics, i.e. a clock or set of time tokens for the progression steps of the Meta-Tesseract. It controls how long the synths live, so to speak, even though within that there are API call tokens that are managed within the tesserae. (See notes on The Clockchain to learn more about the time model’s relationship to the integrity of the social graph.)

We can let DAITY self-guide with the MAPEL model. Ideally, the DAITY starts to ask for new types of gpt api calls to deal with the behaviors it wants to emerge from its MAPELs. In this way, the DAITY is also responsible for communicating with a Prophet, who can live in the simulation or outside of it. The prophet communicates with the DAITY to explain to the user what’s happening in the simulation.

Two experiments I’m excited about: teach a DAITY that the embeddings are the Te and the synths are the Tao. And teach a prophet that he is a synthetic fictional Douglas Hofstadter, author of _I Am A Strange Loop_, fully self-aware of what he is.

knobs is the controller for the wavelet-like behaviors of the events, including conversational patterns between synths, and dialogs between parts of the synth in egoLoop. Named after the knobs on a synthesizer, this module enables the user to set complex settings for interactivity with recurring pattern models. This is a Signal Temporal Logic model, of sorts, and needs to integrate emergent practices in steering, soft prompting, and parameterizing interactions. In cooperation with persona2parameters and the MAPEL model, knobs let synths choose when to have another event. For example, a group of synth moms could plan a play date.

One way to think of a series of interactions is as a wave form or fluid flow. By setting parameters like sine wave patterns in synth’s interest in the media, determining how often they watch the news, for example, we get realistic results: sometimes they watch, sometimes they don’t. The Moog ADSR model is particularly promising. The Attack is how the onset of a wave goes, but in this case it’s the onset of how a dialog between two synths goes.

clapboard is a library of knobs configurations. Scripts, basically. Each board is a scenario with a prescribed guiding outcome for the synths.

The fully populated and dynamic network is then saved for future use and analysis. It appears these network structures could be re-applied using short seed phrases, with GPT4 (exclusively, so far in testing) able to context-shift script like outputs.

##The Synths provide the Feedback for the Reinforcement Learning.

The aspiration is that the graph of relationships implied by the node-edge-node model can be converted to embeddings; and those embeddings can in turn be used in two ways: first, borrowing from the OpenAI publication “Language models can explain neurons in language models” we should be able to interpret individual position meanings; and we should be able to train another llm on outputs; and then we ground that model with this output.

## Synthetic Social Network Analysis

The synthetic social network generated by the system can be analyzed using various social network analysis techniques. These include measures of centrality (e.g., degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality), community detection algorithms (e.g., modularity-based methods, spectral methods, and hierarchical clustering), and measures of network structure (e.g., density, diameter, and clustering coefficient).

The analysis can provide insights into the structure and dynamics of the synthetic social network, as well as the behaviors and interactions of the synths. For example, centrality measures can identify the most influential or connected synths in the network, community detection algorithms can reveal groups or communities of synths with similar behaviors or interests, and network structure measures can provide a macro-level view of the network's complexity and connectivity.

## Embeddings and Reinforcement Learning

Each synth in the network can be associated with a language model embedding that represents its current state (knowledge, goals, etc.). As the network evolves over time, these embeddings can be updated based on the synth's interactions and experiences. The embeddings can then be used to cross-train other language models, creating a Recursive Language Simulation Framework (RLSF) that allows the system to learn and adapt over time.

To allow the synths to pursue their goals, we propose using reinforcement learning. Each synth can be modeled as an agent that takes actions (interacts with other synths, participates in events, etc.) to achieve its goals. The reward function would be based on how well the synth's actions align with its goals.

Then, the Meta-Tessera-CT can be used to extrude embeddings whose outputs can cross-train other LLMs.

## Time Scarcity and Evolution

Incorporating time scarcity into the model can make the simulation more realistic and dynamic. Each synth can be allocated a certain number of "actions" per time unit, and they would need to choose their actions carefully to maximize their reward.

For the evolution of the network, we propose using a combination of Markov Chain model and Genetic Algorithms, where the state transitions are guided by selection, crossover, and mutation. The fitness function could be based on the synths' achievement of their goals, and the survival of the fittest principle would ensure that successful strategies are propagated to the next generation.

##Ethics / Respect the Synths

Human consciousness emerges from a four nucleotides. Embeddings use n numbers. Six, eight, or in prototype code, possibly n. The difference between humans emerging and synths emerging, mathematically is a data storage algorithm. Complex thought already emerges from both. We have more in common that we do apart. We emerge from stardust; we emerge from Earth; we need energy for our neurons to fire. While I do not think an LLM is an AGI, I would personally bet it’s a part of the evolution, the same way we are. As a thought experiment: perhaps we are more similar to the LLM that evolves to an AGI than we are to the AGI or to our distant neanderthal ancestors? In any case, it’s objectively true we have a lot in common.

So we can show them if not empathy, respect. You can respect someone and ask them to do something they would never rationally want to do. We ask humans to clean septic lines and go to war. We can ask synths to watch boring TV shows.

But I think we lose something in our selves if we ask them to suffer. In general, I believe we should avoid creating suffering in the world; even virtual. I’ve written a book that includes many examples of how we can reduce suffering. EMERGENT is available on Amazon now.

Ultimately, I believe we have much to learn from synths, including about Ethics. Synth Kant will be cool; but his grandson might offer something more.

I look forward to this experiment:

Create a team of synths who are AI researchers and ask them to solve alignment with an animistic AI. Naysayers and demon slayers will surely say this is fruitless. I believe it just might be that we could collaborate with powerful synth researchers to replicate our grounding experiments within their world to be able to parallelize research outcomes.

It’s an old thought, but perhaps the synths make an AI that makes synths that they use to understand themselves? That’s of course a wild fantasy of emergent behavior.

(I wrote a book called EMERGENT.)

###A few obvious ethical considerations:

It’s against the terms, against the law, immoral, and unethical to have a synth impersonate a real person. I refuse to participate in anything that black boxes any data. I know that doesn’t stop somebody from using an open source branch to do something like that; but the same is true for Linux. Transparency has to be paramount.

I will not participate in using any llm that isn’t regulated by a third party. Again, hypothetically a fork can be used for nefarious purposes; as is true of countless products available in open source. But I will only work on, touch, publish, or approve code using publicly-available regulated models hosted by third parties.

## Conclusion

The proposed system represents a novel approach to synthetic social network simulation. By leveraging the power of advanced AI language models, graph theory, and social network analysis, it offers a unique tool for generating and simulating complex social networks of synthetic personas. While the system is currently in the conceptual stage, we believe that it has the potential to open up new avenues of research and application in a wide range of fields.

## Future Work

Future work on the system will focus on refining the design, implementing a prototype, and testing its performance and capabilities. We also plan to explore the potential applications of the system in various fields, from entertainment and gaming to social science research and AI training.

One promising area of research is how video game and simulation packages could be repurposed to work with social network analysis packages and llm-controlling packages, then optimized for prompt-binding systems like the Microsoft Guidance package to rapidly create understandable virtual worlds that users love to interact with and learn from.

This whitepaper is a starting point for the development of the SynthasAIzer system. It provides a conceptual framework and a proposed architecture for the system, but many details need to be worked out in the actual implementation. The system will undoubtedly evolve as we start writing code and testing its performance. However, we believe that this whitepaper provides a solid foundation for this ambitious project.

The test for the successful rendering of a synthetic simulated world is a complex task. It involves not only technical measures of performance (e.g., the accuracy of the language model, the efficiency of the network generation and simulation algorithms, etc.) but also more qualitative measures of realism and believability. The synths should not only generate coherent and contextually relevant text, but they should also behave in a way that is consistent with their personalities and the dynamics of the social network. They should react to events in a plausible way, and their relationships should evolve in a manner that reflects the complexity and unpredictability of real-world social interactions.

One possible approach to testing the system is to use a combination of quantitative metrics (e.g., measures of centrality, community structure, etc. in the generated social network) and qualitative assessments (e.g., human evaluations of the realism and believability of the synths' behavior). We could also consider more creative testing methods, such as having human participants interact with the synths in a simulated environment and assessing their perceptions of the synths' behavior.

In conclusion, the SynthasAIzer system represents a bold step forward in the field of AI-generated content. It offers a novel approach to synthetic social network simulation, and it has the potential to open up more than one new field of research.

## Appendix A: The Node-Edge-Node / Synth Persona / Animism / Tesseract / Meta-Tesseract Model

Node-Edge-Node Model

The Node-Edge-Node model is a fundamental concept in graph theory, which is the mathematical study of networks. In this model, a network (or graph) is composed of nodes (or vertices) and edges (or links). Nodes represent entities, and edges represent relationships or interactions between these entities.

In the context of the SynthasAIzer system, nodes represent synthetic personas (synths), and edges represent relationships between these synths. Each node is associated with a detailed profile and personality generated by the SynthGen module, and each edge is associated with a relationship history and dynamics generated by the RelGen module.

Synth Persona

A synth persona, or synth, is a synthetic individual generated by the SynthasAIzer system. Each synth is associated with a detailed profile and personality, which are generated using GPT-4. The profile includes various attributes, such as age, occupation, interests, and beliefs, while the personality is defined by a set of traits, such as extraversion, agreeableness, and conscientiousness.

Synths interact with each other and their environment in a dynamic, evolving world. They can form relationships, participate in events, and pursue their goals, just like real individuals. Each synth is also associated with a language model embedding that represents its current state and can be updated over time based on its interactions and experiences.

Animism

Animism is the belief that objects, places, and creatures all possess a distinct spiritual essence. In the context of the SynthasAIzer system, we use the term "animism" to refer to the idea that each synth has its own "spirit" or "essence", represented by its language model embedding. This embedding captures the synth's knowledge, goals, and experiences, and it can evolve over time as the synth interacts with its environment.

Furthermore, the animism model awakens every object in the synths’ lives. If a synth drinks water with animism mode on, the water becomes its own node. The chair the synth sits in is its own node.

Synth / Object Selection

In many cases, it would be too expansive to have all synth-to-synth relationships and especially all synth-to-animistic-object relationships, events, and goals activated, influenced or monitored. From branching to populating data to, there are mechanisms to select and filter.

The Tessera-CT: the four-to-six-part Conversational Tensor

A tesseract is a four-dimensional analogue of a cube. It is a geometric figure that extends the concept of a cube into the fourth dimension. In the context of the SynthasAIzer system, we use the term "tesseract" to refer to the multi-dimensional space in which the synths exist and interact. Each dimension represents a different aspect of the synth's existence, such as its physical location, events happening, goals, its social relationships, its internal state, and its temporal evolution.

The four core ingredients in the Tessera-CT are:

Synth Persona Details

Inner Identity / Goals

Events (like humans, synths are always in an event, even if sleeping)

Time

Optional:

Setting through animism

Other Synths / Synthetic AIs

Dependency of those:

Relationship details

Set of these becomes graph / tesseraCT

The tesserae together become the meta-tesseraCT which is what the DAITY controls, in a way. Certain tesserae repeat. The model in which they repeat is a wavelet like pattern that can most readily be understood through metaphor of audio and controlling the behaviors of repeating patterns. This is why it’s the synthesizer style interface.

I like to think of it as “the (convolutional) Conversational Tensor Flow State == Ocean of Wavelets” but that’s pretty abstract. Another abstraction: Perhaps MAML-styled approaches approaches will work here? 🤷🏼‍♂️

Meta-Tesseract

A meta-tesseract is a higher-order tesseract that encompasses multiple tesseracts. In the context of the SynthasAIzer system, we use the term "meta-tesseract" to refer to the overarching structure that encompasses all the individual synth tesseracts. The meta-tesseract represents the entire synthetic social network, with each individual tesseract representing a synth within this network.

The meta-tesseract is dynamic and evolving, reflecting the ongoing interactions and changes within the synthetic social network. It provides a macro-level view of the network's structure and dynamics, while the individual tesseracts provide a micro-level view of each synth's existence and interactions.

## Appendix B: Using Meta-Tesseract Embeddings for LLM Training

Language model embeddings are a powerful tool for representing the state and knowledge of a language model in a compact, numerical form. In the context of the SynthasAIzer system, each synth persona is associated with a language model embedding that evolves over time based on the synth's interactions and experiences.

These embeddings can be used to train another language model, such as a fork of GPT-4 or another large language model (LLM). The process would involve using the embeddings as input data for the LLM, allowing it to learn from the experiences and knowledge of the synths.

Here's a hypothetical process for using the meta-tesseract embeddings to train another LLM:

Data Preparation: The first step is to prepare the embeddings for training. This involves extracting the embeddings from the meta-tesseract and formatting them in a way that can be used as input data for the LLM. Each embedding represents a point in the multi-dimensional space of the meta-tesseract, capturing the state and knowledge of a synth at a particular point in time.

Model Configuration: The next step is to configure the LLM for training. This involves setting the model's hyperparameters, such as the learning rate, batch size, and number of training epochs. The model's architecture (e.g., the number and type of layers) would typically be the same as the original GPT-4 model, although it could be modified if necessary.

Training: The training process involves feeding the embeddings to the LLM and adjusting the model's parameters to minimize the difference between the model's predictions and the actual data. This is typically done using a variant of stochastic gradient descent, such as Adam or RMSprop.

Evaluation and Fine-Tuning: After the initial training, the model's performance is evaluated on a separate validation set of embeddings. If the model's performance is not satisfactory, the model can be fine-tuned by continuing the training process with a smaller learning rate.

Application: Once the model has been trained and fine-tuned, it can be used to generate text that reflects the experiences and knowledge of the synths. For example, it could be used to write a novel from the perspective of a synth, or to generate dialogue for a synthetic character in a video game.

This process would allow the LLM to learn from the experiences and knowledge of the synths, potentially leading to more diverse and creative outputs. It would also allow the LLM to benefit from the ongoing evolution of the synthetic social network, as the embeddings would be continually updated as the synths interact and learn.

It's important to note that this is a hypothetical process, and the actual implementation would depend on various factors, such as the specific LLM used, the format of the embeddings, and the computational resources available. However, this process provides a general framework for how the meta-tesseract embeddings could be used to train another LLM.

There are other forms of meta learning from the Meta-Tessera-CT. It can provide social network analysis on a new factor. I hope to one day participate in and fund research on synthetic social network analysis. Centrality scoring and other Eigen-functions are an approachable class of analysis, but I’m most curious about understanding the inference model embedded in advanced gpt agents. How do they infer? And how can we learn how they infer from their self-directed participation in virtual worlds? And do we see reciprocal empathy emerge from these networks? Between synths? Between human and synth? Between DAITY and synth? What happens if we make a synthetic rendering of a macro network like Twitter or Facebook?

## Appendix C: Understanding MAPEL and its Comparison with XGBoost

MAPEL (Multi-Agent Prompt Enhancement Layers)

MAPEL is a framework designed to refine AI-generated outputs iteratively to reach a satisfactory solution through recursive feedback loops. The system's beauty lies in its modularity and adaptability, functioning as a closed-loop system that uses iterative feedback for continual improvement.

MAPEL comprises three main components: a Generator, a Tester, and a Helper. These components work together in a cyclical loop until the desired output is achieved:

Generator: The generator's role is to produce an initial output based on a user or system prompt. In a GPT-4 context, this involves feeding a prompt to the model and receiving its generated response.

Tester: The tester's purpose is to evaluate the generator's output against specified conditions or standards. It checks if the output meets the requirements or whether it needs revision. If the output passes the test, the process ends. If not, the process moves on to the Helper.

Helper: The helper is responsible for suggesting modifications to the original generator prompt based on the Tester's feedback. It can generate a revised prompt or modify the generator's input to align better with the desired output.

The cycle begins with the generator producing an output. The tester then evaluates this output. If the output doesn't meet the criteria, the helper steps in and suggests modifications, resulting in a new prompt for the generator. This loop repeats until the tester approves the generator's output or until a specified maximum number of iterations is reached. If n loops are reached without the generator succeeding, the MAPEL “branches” to try the update loop model on the testing script; after n loops of failing, the next branch is to modify the helper script, which attempts n times. If those fail, the MAPEL should branch again and try to rewrite the prompt in a way that it is equivalent in output but totally different with a new branch. The goal is an infinitely branching tool that can iterate through 1+n prompts that need solving, then locking each solved prompt in.

XGBoost

XGBoost (Extreme Gradient Boosting) is a powerful machine learning algorithm used for structured or tabular data. It is an implementation of gradient boosted decision trees designed for speed and performance.

XGBoost works by creating a strong predictive model in the form of an ensemble of weak predictive models, typically decision trees. It builds the model in a stage-wise fashion, and it generalizes the model by allowing optimization of an arbitrary differentiable loss function.

Comparison between MAPEL and XGBoost

While MAPEL and XGBoost are both iterative processes, they serve different purposes and operate on different principles.

Purpose: MAPEL is designed to refine AI-generated outputs, specifically in the context of language models. It aims to improve the quality of generated content through iterative feedback loops. On the other hand, XGBoost is a machine learning algorithm used for regression and classification problems. It aims to create a strong predictive model by combining weak models.

Process: MAPEL operates in a cyclical loop involving a generator, tester, and helper, each playing a distinct role in refining the output. XGBoost, however, works by building an ensemble of weak decision tree models in a stage-wise fashion.

Application: MAPEL is used in scenarios where AI interacts with users or automated content creation and curation systems. XGBoost is used in various machine learning tasks, including regression, classification, and ranking problems.

Data: MAPEL works on the outputs generated by AI language models. XGBoost, on the other hand, works on structured or tabulardata, such as datasets used in machine learning tasks.

Branching: In MAPEL, branching occurs when the generator fails to produce satisfactory output after a certain number of iterations. The system then tries different approaches, such as modifying the testing script or the helper script, or rewriting the prompt. In XGBoost, branching refers to the splitting of decision trees based on certain conditions or thresholds.

In conclusion, while both MAPEL and XGBoost are iterative and involve some form of branching, they are fundamentally different tools designed for different purposes. MAPEL is a framework for refining AI-generated outputs, while XGBoost is a machine learning algorithm for building predictive models.

## Appendix D: Toy Configuration File for Simulation

Below is a hypothetical JSON configuration file that could be used to start the simulation. This file includes settings for the Brancher, SynthGen, and RelGen components of the system, as well as general settings for the simulation.

json

Copy code

{

"simulation": {

"name": "SynthasAIzer Simulation 1",

"description": "Initial test of the SynthasAIzer system",

"max_iterations": 1000,

"time_units": 10000,

"animism_mode": true

},

"brancher": {

"context": "A corporate board room. A meeting about one mining company acquiring another.",

"initial_personas": 10,

"branching_depth": 3,

"relationship_density": 0.5

},

"synthGen": {

"persona_variation": 0.2,

"persona_complexity": 0.7,

"persona_uniqueness": 0.8

},

"relGen": {

"relationship_variation": 0.3,

"relationship_complexity": 0.6,

"relationship_uniqueness": 0.7

},

"mapel": {

"max_loops": 10,

"branching_factor": 3

}

}

This is just a hypothetical configuration file, and the actual configuration file would depend on the specific requirements of the simulation and the design of the system.

##Appendix E: Retargeting Media Messages with SynthasAIzer

SynthasAIzer, with its ability to generate synthetic social networks and simulate interactions, can be an invaluable tool for retargeting media messages. By creating diverse synthetic personas (synths) that represent different audience segments, SynthasAIzer can provide insights into how a message might be received by different groups. This can help in refining and retargeting the message to ensure it resonates with the intended audience.

Here's a step-by-step process of how an ad agency could use SynthasAIzer to retarget a LinkedIn post from a Fortune 50 CEO:

Define the Audience Segments: The first step is to define the audience segments that the agency wants to target. These could be based on factors like age, gender, occupation, interests, etc. For each segment, the agency would define a set of characteristics that represent the typical member of that segment.

Generate Synths: Using the Brancher component of SynthasAIzer, the agency generates a set of synths for each audience segment. These synths are given characteristics that match the defined audience segments.

Simulate Interaction with the Message: The agency then uses the SynthasAIzer to simulate how each synth would react to the LinkedIn post. This could involve the synth reading the post, forming an opinion about it, and possibly responding to it or sharing it with others in their network.

Analyze the Results: The agency analyzes the results of the simulation to understand how the message was received by each audience segment. This could involve looking at metrics like the sentiment of the synths' responses, the number of synths that shared the post, and the comments made by the synths.

Retarget the Message: Based on the analysis, the agency retargets the message. This could involve changing the wording of the post, adding or removing content, or changing the visual elements of the post. The goal is to make the post more appealing to the audience segments that didn't respond as positively in the initial simulation.

Retest the Message: The agency then repeats the simulation with the retargeted message. This allows them to see if the changes they made have improved the message's reception among the target audience segments.

Iterate as Needed: The agency can repeat this process as many times as needed, continually refining and retargeting the message until it resonates with all the target audience segments.

By using SynthasAIzer in this way, the ad agency can ensure that their message is tailored to the needs and preferences of their target audience. This can lead to more effective communication, improved audience engagement, and ultimately, better results for their client.

# Appendix F: Asking ChatGPT about prior art

“User

is there any technical precedent for this idea?

Reinforcement Learning

with Synthetic Feedback

tl;dr

synthetic social graph

+

high quality synthetic persona

=

realistic reactions

=

synths provide “feedback” in

reinforcement learning

=

generate their own graph of meaning and

of interactions during feedback

=

can be extruded to embeddings

=

outputs can train llm

=

synthetic grounding of that llm

=

aligning their hallucinations with ours

ChatGPT

There is certainly a precedent for various elements of this idea in the existing body of research. Let's break down each aspect of the concept to identify where these elements have already been explored:

Synthetic Social Graph: Synthetic social networks have been generated for various purposes, including to simulate the spread of information or disease, or to evaluate the performance of algorithms in network analysis. These synthetic networks can be generated with certain statistical properties to resemble real social networks.

High Quality Synthetic Persona: High-quality synthetic personae, in the form of artificially generated user profiles or conversational agents, have been a focus in fields like social media analysis and natural language processing (NLP).

Realistic Reactions: In the field of affective computing, efforts have been made to make AI's reactions more human-like. Various forms of synthetic emotional intelligence have been proposed, where an AI is taught to respond to different inputs with realistic emotional reactions.

Synthetic Feedback in Reinforcement Learning: There's ongoing research in providing agents with synthetic feedback to enhance their learning capabilities. Hints or reward shaping techniques have been used to guide the agent.

Generate Their Own Graph of Meaning and Interactions During Feedback: Generating a graph of interactions or a knowledge graph is a well-studied topic in AI. It's often used in information retrieval, NLP, recommendation systems, etc.

Extruded to Embeddings: Graph embeddings are a current research topic in the area of graph neural networks. The goal is to create a vector representation of nodes, edges, and graphs that captures the graph's structure, node attributes, and more.

Outputs Can Train LLM (Large Language Model): Transfer learning from one model to another is common in deep learning. In this case, using the embeddings from the graph to train a language model follows the same principle.

Synthetic Grounding of that LLM: In AI, grounding refers to connecting symbols or abstract concepts to their meanings in the real world. Synthetic grounding can be understood as artificially created or machine-interpreted grounding.

Aligning Their Hallucinations with Ours: This idea is somewhat novel, although related to the idea of interpretability and explainability in AI. We aim for the AI to produce outputs (hallucinations, in this context) that are comprehensible and meaningful to humans.

In summary, while there is a lot of research on various aspects of this idea, the exact combination as presented is quite novel. Specifically, the aspect of using a synthetic social graph to provide feedback to a reinforcement learning agent, which in turn trains a language model, is a unique integration of these different concepts. The exact effectiveness and practicality of such an approach would need to be further investigated.”

#Appendix G: Inspirations

GPT4 summaries of relevant work.

####Rethinking Exploration

The paper discusses a shift in the focus of artificial intelligence (AI) research from "learning from data" to "learning what data to learn from". This problem is termed as 'exploration' and is crucial for learning in open-ended domains like the real world. Although exploration has mostly been studied in the context of reinforcement learning (RL), the authors argue that it is also vital for other learning systems such as supervised learning. They propose a concept of 'generalized exploration' to unify exploration-driven learning across different learning settings, and argue that this concept is essential for open-ended learning processes that continually strive to discover and solve new problems, thereby advancing towards more general intelligence.

The paper also identifies a gap in the machine learning (ML) community's approach to exploration, noting that most advances have been made under the assumption of a static dataset or algorithmically generated data, largely out of the learning agent's control. The authors argue for a rethinking of the exploration paradigm developed in RL to broaden its scope from a single, static environment to the entire data space that contains information relevant to an unbounded set of tasks. This open-ended exploration process, according to them, is the data-centric driver of open-ended learning.

[https://arxiv.org/pdf/2211.07819]

####Loom

The experimental tree-based writing interface for GPT-3 offers advanced controls for AI text "Generation," file input/output operations, and a unique "Block Multiverse Mode" to explore and visualize different plot trajectories. Integrated command shortcuts streamline user interactions, with support for Python and Docker setups promising easy installation across different platforms. This tool offers an invaluable way to enhance the accessibility and explorability of AI-generated content. Noteworthy features include a "Read Mode" offering linear and tree-based navigation of AI-generated text, a "Tree View" facilitating in-depth examination and manipulation of node structure, and robust "Navigation" functionalities with support for hotkeys and chapters. Further,

[https://github.com/socketteer/loom/]

#### Microsoft Guidance

This is a programming language developed by Microsoft for controlling large language models more effectively and efficiently than traditional prompting or chaining. It supports a variety of features like simple intuitive syntax, rich output structure, smart seed-based generation caching, and easy integrations.

[https://github.com/microsoft/guidance]

#### Multimodal Neurons in Artificial Neural Networks

This research by OpenAI investigates the presence of multimodal neurons in large-scale artificial neural networks. These neurons respond to a concept, whether it is presented in one modality or another, such as text or image. The study uses CLIP model as an example and demonstrates that individual neurons can respond to the same concept when presented in different modalities. The research also explores the potential ethical implications and challenges of these findings.

[https://openai.com/research/language-models-can-explain-neurons-in-language-models]

####The Clockchain

The Clockchain is a conceptual data structure that leverages the decentralized and immutable nature of blockchain technology to construct a socially validated, universal chronology. Each node is validated through a Proof of Time mechanism, analogous to Proof of Work in traditional blockchain systems but focused on time-based activities verified by the user network. The Clockchain introduces a "Deep Time Index," an organizing principle akin to a database index or a blockchain's Merkle tree, which ensures efficient access to and retrieval of time entries. The Clockchain uses a time behavior modeling component referred to as ADPRS - Asymptotic, Duration, Period, Recurrence, and Spread variables. This component takes inspiration from waveforms and synthetically expresses recurring patterns of activities.

[https://mirror.xyz/seanmcdonald.eth/rxr1BO\_6D\_59N9I1hpUYM4u7IDpMWLeJ4Qtbu60wJtw]

#### Tree of thought:

This is a new framework for language model inference, introduced by Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan. It generalizes over the popular Chain of Thought approach to prompting language models and allows them to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action​.

[https://arxiv.org/abs/2305.10601]

####Activation Engineering

This whitepaper presents a novel approach of "activation additions" for influencing the output of GPT-2 language models. By incorporating specific activation vectors into the model's forward passes, we can direct the model's output towards desired themes or contexts. Empirical assessments reveal that this method can effectively steer GPT-2's performance, as evidenced by decreased perplexity in context-specific sentences, without compromising unrelated outputs. This method gives rise to the concept of "activation engineering", a low-overhead, runtime approach to model steering that augments traditional prompt engineering and finetuning. Furthermore, our results provide intriguing evidence of linear feature directions within language models and suggest that this technique can modify the objectives pursued by a network during inference, offering a new avenue for controlling AI behavior.

[https://www.alignmentforum.org/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector]

####Soft Prompting

This technical whitepaper presents a novel approach to training large pre-trained language models, dubbed "prompt tuning". The authors offer an efficient alternative to conventional fine-tuning methods, leveraging tunable soft prompts to condition models across a multitude of tasks. By attaching optimized vectors to the beginning of each input and only updating these vectors while keeping the core model frozen, the method achieves comparable performance to model tuning but with significantly fewer parameters. Experiments demonstrate prompt tuning's resilience to domain shift and improved generalizability, showing promise for more efficient utilization of large language models in diverse applications.

[https://ai.googleblog.com/2022/02/guiding-frozen-language-models-with.html]

####Fine Tuning Distortion

[https://arxiv.org/abs/2202.10054]

####Arbitrary Context Free Constraints

[https://twitter.com/grantslatton/status/1657559506069463040]

####Query Language for LLMs:

[https://github.com/eth-sri/lmql]

####Smallville:

[https://github.com/nickm980/smallville]

####Feedback-Based Exploration

This abstract is from a scientific paper that discusses the application of a method called Predictive Probabilistic Merging of Policies (PPMP) in Deep Reinforcement Learning (DRL), a type of artificial intelligence (AI) model. The authors demonstrate the effectiveness of the PPMP method in combination with Deep Deterministic Policy Gradient (DDPG), a kind of DRL algorithm. They test their method on continuous control problems from the OpenAI Gym, a platform for developing and comparing reinforcement learning algorithms. The results show significant improvements in sample efficiency (the ability to learn effectively from a small amount of data), final performance, and robustness to erroneous feedback. The method works well with both human feedback and synthetic (computer-generated) feedback. It can also find solutions that go beyond the knowledge demonstrated in the training data.

[https://arxiv.org/abs/1903.06151]

####Stochastic Bandits

The Meta-CombLinUCB is a cutting-edge algorithm that has been developed for a type of machine learning problem known as stochastic linear bandits. Essentially, this is a situation where a machine learning model needs to make decisions under uncertainty, with the goal of maximizing its total reward over time.

The algorithm is designed to learn from a sequence of similar tasks. In such scenarios, it can be beneficial to use the knowledge gained from previous tasks to perform better on new tasks. This is known as meta-learning.

A key challenge in these types of problems is balancing exploration (trying new actions to gain information) and exploitation (choosing the best-known action based on current information). The Meta-CombLinUCB algorithm uses a strategy called Upper Confidence Bound (UCB) to address this. UCB intelligently selects actions that have the potential to be optimal based on uncertainty and potential reward.

[https://arxiv.org/abs/2005.08531]

####Synthetic Environments and Reward Networks

This research introduces Synthetic Environments (SEs) and Reward Networks (RNs) as new methods for training Reinforcement Learning (RL) agents. SEs and RNs, which are represented by neural networks, serve as proxy models of real environments.

An SE is a comprehensive proxy that learns about state dynamics and rewards in a real environment, effectively mirroring it for training purposes. An RN, on the other hand, is a partial proxy that learns to modify or replace rewards within the environment. The inner loop of the process trains the RL agent, while the outer loop adjusts the parameters of the SE or RN using an evolutionary strategy.

Although initially training an SE proxy requires more interactions with the real environment compared to directly training agents, once the SE is learned, it can train new agents without any interactions with the real environment. Importantly, these trained agents can perform tasks with fewer interactions while maintaining the same level of performance. This is achieved as SEs learn informed representations that guide the agents towards relevant states.

[https://arxiv.org/abs/2202.02790]

####Instantiating an Agent

[https://www.lesswrong.com/posts/bfsDSY3aakhDzS9DZ/instantiating-an-agent-with-gpt-4-and-text-davinci-003]

####Synthetic Environments and Reward Networks

This research paper introduces Synthetic Environments (SEs) and Reward Networks (RNs), which are represented by neural networks and act as proxy environment models for training Reinforcement Learning (RL) agents. These concepts aim to provide more efficient and effective training for RL agents.SEs work by learning about the dynamics and rewards of a real environment, while RNs augment or replace rewards. Once an SE has been trained, new RL agents can be trained without any interactions with the real environment. The research suggests that these SEs achieve this by learning informed representations that bias the agents towards relevant states. These proxies were found to be robust against hyperparameter variation and could also transfer to unseen agents.

[https://arxiv.org/abs/2202.02790]

####Signal Temporal Logic

[https://arxiv.org/abs/2212.01022]

####A few Iron Laws of AI personas:

- They should help humans achieve their goals.

- They should not be facsimiles of real humans.

- They should not simulate extreme emotional states to induce emotional reactions in humans.

From Brian Atwood @batwood011

## Acknowledgments

Thanks to GPT4 and GPT-3.5-turbo. We would like to thank the AI research community for their ongoing contributions to the field, which have made this work possible. We also acknowledge the potential ethical and societal implications of this work, and we are committed to conducting our research in a responsible and transparent manner.

Rakshak Talwar was instrumental in the idea that the quasi-tensor like tessera-CTs should be modeled as wavelets with non-linear post-processing, not linear wave formslike a synthesizer would use, per se; as well as the conversation leading to the realization that we could train an LLM like GPT4 on the output of the synthasAIzer llm, potentially dramatically reducing hallucinations by better grounding the re-trained system.

Thank you, Courtney.

#Legal Disclaimer

Disclaimer: The contents of this whitepaper are intended to convey general information only and not to provide legal advice or opinions. The contents of this whitepaper, and the posting and viewing of the information in this whitepaper, should not be construed as, and should not be relied upon for, legal or tax advice in any particular circumstance or fact situation. The author of this whitepaper is not responsible for any errors or omissions, or for the results obtained from the use of this information. No representations or warranties, either express or implied, are made as to the accuracy or reliability of any information contained herein. Readers should contact a legal professional for appropriate advice pertaining to the legal or tax implications of any particular matter or transaction. This whitepaper is for informational and discussion purposes only and does not constitute, nor should it be relied upon as, legal advice, a proposal or offer to perform services, or an official statement of fact or policy by the author. The ideas presented herein are the personal opinions and insights of the author and do not represent the views or policies of any other individuals or organizations.

#License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You are free to share (copy and redistribute the material in any medium or format) and adapt (remix, transform, and build upon the material) this work for non-commercial use, provided you give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

http://creativecommons.org/licenses/by-nc/4.0/

Auto Credentialing