Collections
Collection schemas are shared between Ragu's applications.
Chonkit is responsible for creating collections, while Kappi is responsible for assigning them to agents. Chonkit has no concept of agents, it is solely a document and collection management system.
Access control
Each collection can be assigned a list of groups. If the list is empty or does not exist on a collection, it is considered accessible by all Ragu users. If it does contain entries in the group list, then the access is restricted only to users who are members of those groups. A user has to be a member of any one group in the collection in order to access it.
Identity vector
Each collection in a vector database created by chonkit will contain a metadata vector, also known as an identity vector. This vector contains metadata about the collection; Its name, embedding model, the vector database implementation identifier (vector provider), the embedding implementation identifier (embedding provider) and an optional list of groups.
The following is a JSON schema representing the above description.
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"collection_id": {
"type": "string",
"description": "The UUID of the collection. Relevant to Chonkit."
},
"name": {
"type": "string",
"pattern": "^[A-Z]{1}[a-zA-Z0-9_]*$",
"minLength": 1,
"description": "Collection name. Cannot contain special characters. Must begin with a capital ASCII letter and contain only alphanumeric characters and underscores."
},
"model": {
"type": "string",
"description": "Collection embedding model."
},
"vectorProvider": {
"type": "string",
"description": "Vector database provider."
},
"embeddingProvider": {
"type": "string",
"description": "Embeddings provider."
},
"groups": {
"type": "array",
"items": {
"type": "string"
},
"minItems": 1,
"description": "Optional collection groups that indicate which user groups can use it. If this is not defined, the collection is visible to everyone."
}
},
"required": [
"collection_id",
"name",
"model",
"vectorProvider",
"embeddingProvider"
],
"additionalProperties": false
}
It is important for all Ragu applications to strictly follow this schema so the created collections are compatible. It is up to the applications to interpret these parameters as they see fit.
Each vector stored in the collection will have an associated payload, depending on the vector payload type (i.e. whether it is text, an image, etc.).
Vector payloads
It is important to note that since we use the concept of an identity vector, each subsequent vector inserted to the collection will also contain these properties. Therefore, it is important to pass property selectors when querying, so that only the anticipated properties are returned.
Text vector payload
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"content": {
"type": "string",
"description": "The original text of the embedding (vector)."
}
"document_id": {
"type": "string",
"description": "The UUID of the document. Relevant to Chonkit."
}
}
"required": [
"content",
"document_id"
]
}