The Vectros data model

Concept and mental model: what records, schemas, documents, folders, lookups, and version history are — and why the platform is shaped this way.

Vectros gives an application one coherent place to keep its structured data, its documents, and the relationships between them — schema-validated, isolated per customer, searchable in a single query stream, and carrying a full version history. This page explains the pieces and how they fit together. For runnable guides see how-to.md; for the exhaustive field/option/limit list see reference.md.

The shape of the problem

Most applications built on top of an AI or knowledge layer need more than a search index but less than a full relational database. A session note has a date, a provider, a client, and a status. A lead has an email, a source, and a qualification score. A support ticket has a priority, an assignee, and a body. You want that data typed and validated, you want to look an entity up by a known key, you want it to come back in natural-language search alongside your uploaded files, and — in regulated settings — you want every change recorded.

Vectros models this with four resource families that share one isolation model, one indexing pipeline, and one audit mechanism:

  • Schemas declare the shape of a type — its fields, validation rules, which fields are searchable, which support indexed lookups, and which are sensitive.
  • Records are structured JSON entities written against a schema (or schemaless).
  • Documents are text or files you ingest — chunked, embedded, and indexed for retrieval, optionally carrying a structured payload of their own.
  • Folders give documents and records a shared organizational hierarchy.

Around those four sit lookups and references (find an entity by a known value; link one record to another) and version history (an immutable trail of every write to an audited type).

Schemas: define the shape once

A schema declares one type — a typeName, a displayName, and a set of field definitions. Each field has a type (string, number, boolean, date, enum, array, object, or reference), optional validation rules, and flags that decide how it participates in the platform:

  • searchable: true routes the field's text into the full-text search lane.
  • filterable: true makes the field available as a search filter without it influencing relevance ranking.
  • sensitive: true marks the field as PHI/PII — it is redacted at write, excluded from the search index, blind-indexed for lookups, and masked on read unless the caller's token carries the reveal scope (see Sensitivity).

A schema also declares its index mode (HYBRID, SEMANTIC, TEXT, or NONE), its lookup fields, the surfaces it may bind to (record, document, user, org, client), and per-type capabilities such as audit history.

Schemas are deliberately optional in spirit: a bare schema with nothing but a typeName and displayName is valid. Records written against a bare schema are stored as-is with no payload validation. This removes the usual prototype-vs-production fork — you get the same API surface whether you are sketching with free-form JSON or running a fully-validated production type, and you can add fields incrementally as the data model stabilizes (adding a field is non-breaking).

Schemas are versioned. Every edit increments a public revision counter, and each record or document is stamped at write with the schema version that governed it — so a record keeps the version it was written under even after the schema evolves.

Note: schemas are replaced in full on update (PUT). There is no partial-update (PATCH) path for schemas — supply the complete intended schema body on every update.

Records: structured entities

A record is a JSON payload written against a schema's typeName. On create, the payload is validated against the schema's field definitions; required fields, types, enum membership, and validation rules (length, range, pattern) are enforced before the record is persisted. The record carries optional ownership fields (userId, orgId, clientId) and an optional folderId, plus a system-assigned id, timestamps, and a monotonically-increasing version.

Records are first-class search content. A record's searchable fields flow into the same retrieval pipeline as ingested documents, so a single natural-language query can surface a structured intake record and a clinical note in the same ranked result set.

Two update modes exist:

  • PUT is a whole-object replace of the mutable fields. An omitted field is preserved; the payload, when supplied, replaces the stored payload in full (it is not deep-merged). To change one payload field with PUT you must resend the entire payload.
  • PATCH (SDK 0.26+) is a true partial update following RFC 7386 (JSON Merge Patch), applied to the record's payload. Inside payload, keys present in the patch overwrite, nested objects recurse, a key set to null deletes that key, and absent keys are preserved — the natural way to change a single field without read-modify-write. Note that a top-level mutable field (such as status or folderId) set to null is not a delete: it is rejected with 400. Clearing a top-level field is not supported this release.

Records support optimistic concurrency: pass the version you last read back as expectedVersion, and the write is rejected with a 409 VERSION_CONFLICT if the record moved on in the meantime. Omit it for last-write-wins.

Deleting a record is a hard delete: the row and all of its lookup rows are removed atomically, a tombstone is recorded for audit, and the record is dropped from the search index on the next cycle. There is no soft-delete status that lingers in the index — workflow states are expressed through the status field instead.

Documents: text and files

A document is content you ingest for retrieval. Two ingest paths exist:

  • Inline text ingest — send the raw text directly. The platform chunks it, embeds the chunks, and indexes them. Optionally store the raw text for later retrieval.
  • File upload — request a presigned URL, PUT the file bytes to it directly (no Authorization header on that PUT), then poll until indexing completes. Text is extracted from the file, then chunked, embedded, and indexed.

Documents carry the same ownership and folder fields as records, and — when bound to a schema via schemaId — a validated, lookup-indexed structured payload of their own (records parity). Undeclared payload keys pass through as free-form and remain available as search filters. Like records, documents support PUT (full replace) and PATCH (RFC 7386, SDK 0.26+), optimistic concurrency, version history, and hard delete.

Documents move through a processing lifecycle — uploaded, text extracted, queued for indexing, then INDEXED (searchable) or STORED (store-only, when index mode is NONE). An update re-runs the pipeline; the old content is removed from the index as the new content is written.

Folders: shared organization

Folders give documents and records a common hierarchy. A folder has a name, an optional description, optional ownership fields, and a stable, path-derived slug that makes folder creation idempotent — re-creating a folder at the same path converges on the same folder rather than duplicating it. Each context has a protected root folder; unparented folders are placed under it.

A folder's parent is set at creation only. There is no move or reparent operation — a folder cannot currently be relocated in the hierarchy through the API. Deleting a folder that still contains children is rejected; empty the folder first.

Lookups and references

Lookups are direct, indexed retrievals by a declared field value — no scan. You declare which fields are lookup fields on the schema (bare field names, or {fieldName, unique} to enforce uniqueness). At write time the platform maintains a small index row per lookup field; a lookup reads that index directly. Three lookup modes are supported, one mode per call:

  • exact — match a single value.
  • range — an inclusive from/to range, returned in ascending order (non-sensitive fields only).
  • prefix — a string-prefix match in ascending order (string, non-sensitive fields only).

Lookups come in unique and non-unique flavors. A unique lookup returns at most one record (and the platform enforces uniqueness on write). A non-unique lookup is an enumeration — it returns every record sharing that value, paginated.

Every record additionally gets three automatic ownership lookups (userId, orgId, clientId) without declaring them and without counting against the lookup-field cap, so you can list records owned by a user, org, or client out of the box.

For a sensitive lookup field, the value must not appear in a URL (where it could be captured by access logs or proxies). The platform offers a body-based lookup variant so the sensitive value travels in the request body and is blind-indexed server-side.

References are typed links between records: a reference field declares the targetTypeName it points at, the target field to resolve against, and a cardinality (one or many). The platform can additionally maintain per-field reverse-reference rows (opt-in) so the inverse direction is indexed.

Note: the reverse-reference list endpoint is not yet available — do not depend on querying back-references through the API today. Batch record write/lookup/get operations are likewise reserved and not yet implemented.

Version history: an immutable change trail

Every write to an audited type emits an immutable version row. Each row records the change type (CREATE, UPDATE, DELETE), who made the change, when, the full snapshot of the state prior to the change, and a field-level diff of what changed. The current state always lives on the entity itself; version history captures the "before" side of every transition, so you can reconstruct exactly how an entity reached its present state.

Audit history is on by default and controlled per schema through a capability flag. Turning it off for a high-volume, low-value type saves the write cost; tombstones on delete are still recorded regardless. The audit trail is the data layer — it is not a separate, bypassable logging path — and heavy historical content is externalized to a write-once, retention-governed store. The data-model-facing view is simply: you get full version history for free on every audited type. The deeper compliance posture — retention, the tamper-evident continuity chain, and how sensitive data is handled across all of this — lives in the operations & trust documentation.

Isolation: built in, not bolted on

Every record, schema, document, and folder is partitioned by an auth-derived context key that the caller cannot forge. Reads and lookups are confined to the caller's own context, and cross-context access is closed at the data layer by construction rather than by a runtime check that a later change could regress. A request that supplies a known id belonging to another tenant or context gets the same uniform "not found" response as a request for an id that does not exist — the error message is never a probe channel.

Scoped credentials narrow access further. A token whose data scope names a specific owner can only reach entities owned by that owner; tenant-level (owner-less) entities are reached only by explicitly opting in. This is what makes a single Vectros tenant safe to partition across many of your customers. The full access model is covered in the identity & access documentation.

Sensitivity: three distinct mechanisms

When a schema field is marked sensitive, three independent protections apply — it is worth understanding that they are separate:

  1. Redacted at write. The sensitive value is destroyed out of audit snapshots and change diffs before they are persisted. This is not reversible masking — the value is not recoverable from the audit trail regardless of any later grant.
  2. Excluded from the search index. A sensitive field's value never enters the search index, so it can never surface through search.
  3. Masked on read. In normal reads the field is returned masked ([redacted]) unless the caller's token explicitly carries the reveal scope for that type. Lookups on the field still work — it is blind-indexed — without exposing the value.

These combine so that a sensitive field is usable as a key (you can look a record up by it) while its plaintext never lands in logs, the search index, or an unprivileged response. The full treatment is in the operations & trust documentation.

Where to go next

  • how-to.md — runnable guides: define a schema, write and update records, PATCH, lookup, paginate, ingest a document, create a folder, read version history.
  • reference.md — every method, field, validation rule, limit, envelope shape, and error, plus an honest "what each feature does not do."
  • ../operations-trust/compliance.md — version history retention, the tamper-evident chain, and the full sensitive-data posture.