Euler One, the Technology
Managing risks from advanced artificial intelligence is one of the most important problems of our time
Loose concepts:
Shallow concepts:
Density of concepts:
Through this, it can identify model alignments (guardrails) whose concept spaces are shallow compared to the concepts they are linked with within the EMH. These are the guardrails that can be jailbroken, whereby the model alignment process only attained a poor local minima during gradient descent convergence.
A node in the EMH is called a concept space, which is a high-dimensional space that captures meanings of concepts within that concept space relative to other concepts. For example, every token, words, or sentence can form a concept, and lives in a concept space pertaining to its meaning.
An edge between two nodes indicate an expansion or compression of a source concept space into a sink concept space. This idea follows from the expansion and compression processes of the Multi-Layer Perception (MLP) in the Feed Forward Network (FFN) of the Transformer architecture. MLPs perform non-linear transformation to enrich the meaning of each token embedding during the training process.
We automatically detect safety and security alignments that fell into "shallow" concept spaces, which are areas where very poor local minima was reached during the gradient descent procedure of the back-propagation algorithm, the algorithm that drives learning in modern AI training.
WE do not build a complete EMH of an LLM, rater we render it on-demand based on the policy of the customer and our triggering of the LLM internals.
Shallow means its not deep enough so attacker (well defined in its abstract meaning having a max value in all of its projections) , and loose mean it is not connected enough with similar-meaning concepts hence attacker can use other language to make it circumvent that guardrail. Ideally you want the concept you want to be protected to be deep (live within a super principal concept space or a super concept within a principal concept space) and strongly connected, or inter-woven into many concept spaces
The philosophy:
The meaning of a concept may change based on a prompt (the plane of observation). This is why meaning of a concept is abstract and depends on context (inter-relation with the prompt togeter with pattern in the training data where the most area where those inter-relation of the prompt and concept maps to). Isolated meaning (the essense of a concept) is not a myth. All things are connected.
The shape of the space is what allows the model to understanding or capture the meaning of concepts within that space. the combination of concepts within a space may be in another space.