Skip to content
  • Auto
  • Light
  • Dark

Create

Create a Knowledge Base
post/v2/gen-ai/knowledge_bases

To create a knowledge base, send a POST request to /v2/gen-ai/knowledge_bases.

Body ParametersExpand Collapse
database_id: optional string

Identifier of the DigitalOcean OpenSearch database this knowledge base will use, optional. If not provided, we create a new database for the knowledge base in the same region as the knowledge base.

datasources: optional array of object { aws_data_source, bucket_name, bucket_region, 6 more }

The data sources to use for this knowledge base. See Organize Data Sources for more information on data sources best practices.

aws_data_source: optional AwsDataSource { bucket_name, item_path, key_id, 2 more }

AWS S3 Data Source

bucket_name: optional string

Spaces bucket name

item_path: optional string
key_id: optional string

The AWS Key ID

region: optional string

Region of bucket

secret_key: optional string

The AWS Secret Key

bucket_name: optional string

Deprecated, moved to data_source_details

bucket_region: optional string

Deprecated, moved to data_source_details

dropbox_data_source: optional object { folder, refresh_token }

Dropbox Data Source

folder: optional string
refresh_token: optional string

Refresh token. you can obrain a refresh token by following the oauth2 flow. see /v2/gen-ai/oauth2/dropbox/tokens for reference.

file_upload_data_source: optional APIFileUploadDataSource { original_file_name, size_in_bytes, stored_object_key }

File to upload as data source for knowledge base.

original_file_name: optional string

The original file name

size_in_bytes: optional string

The size of the file in bytes

formatuint64
stored_object_key: optional string

The object key the file was stored as

google_drive_data_source: optional object { folder_id, refresh_token }

Google Drive Data Source

folder_id: optional string
refresh_token: optional string

Refresh token. you can obrain a refresh token by following the oauth2 flow. see /v2/gen-ai/oauth2/google/tokens for reference.

item_path: optional string
spaces_data_source: optional APISpacesDataSource { bucket_name, item_path, region }

Spaces Bucket Data Source

bucket_name: optional string

Spaces bucket name

item_path: optional string
region: optional string

Region of bucket

web_crawler_data_source: optional APIWebCrawlerDataSource { base_url, crawling_option, embed_media, exclude_tags }

WebCrawlerDataSource

base_url: optional string

The base url to crawl.

crawling_option: optional "UNKNOWN" or "SCOPED" or "PATH" or 2 more

Options for specifying how URLs found on pages should be handled.

  • UNKNOWN: Default unknown value
  • SCOPED: Only include the base URL.
  • PATH: Crawl the base URL and linked pages within the URL path.
  • DOMAIN: Crawl the base URL and linked pages within the same domain.
  • SUBDOMAINS: Crawl the base URL and linked pages for any subdomain.
Accepts one of the following:
"UNKNOWN"
"SCOPED"
"PATH"
"DOMAIN"
"SUBDOMAINS"
embed_media: optional boolean

Whether to ingest and index media (images, etc.) on web pages.

exclude_tags: optional array of string

Declaring which tags to exclude in web pages while webcrawling

embedding_model_uuid: optional string

Identifier for the embedding model.

name: optional string

Name of the knowledge base.

project_id: optional string

Identifier of the DigitalOcean project this knowledge base will belong to.

region: optional string

The datacenter region to deploy the knowledge base in.

tags: optional array of string

Tags to organize your knowledge base.

vpc_uuid: optional string

The VPC to deploy the knowledge base database in

ReturnsExpand Collapse
knowledge_base: optional APIKnowledgeBase { added_to_agent_at, created_at, database_id, 10 more }

Knowledgebase Description

added_to_agent_at: optional string

Time when the knowledge base was added to the agent

formatdate-time
created_at: optional string

Creation date / time

formatdate-time
database_id: optional string
embedding_model_uuid: optional string
is_public: optional boolean

Whether the knowledge base is public or not

last_indexing_job: optional APIIndexingJob { completed_datasources, created_at, data_source_jobs, 16 more }

IndexingJob description

completed_datasources: optional number

Number of datasources indexed completed

formatint64
created_at: optional string

Creation date / time

formatdate-time
data_source_jobs: optional array of APIIndexedDataSource { completed_at, data_source_uuid, error_details, 11 more }

Details on Data Sources included in the Indexing Job

completed_at: optional string

Timestamp when data source completed indexing

formatdate-time
data_source_uuid: optional string

Uuid of the indexed data source

error_details: optional string

A detailed error description

error_msg: optional string

A string code provinding a hint which part of the system experienced an error

failed_item_count: optional string

Total count of files that have failed

formatuint64
indexed_file_count: optional string

Total count of files that have been indexed

formatuint64
indexed_item_count: optional string

Total count of files that have been indexed

formatuint64
removed_item_count: optional string

Total count of files that have been removed

formatuint64
skipped_item_count: optional string

Total count of files that have been skipped

formatuint64
started_at: optional string

Timestamp when data source started indexing

formatdate-time
status: optional "DATA_SOURCE_STATUS_UNKNOWN" or "DATA_SOURCE_STATUS_IN_PROGRESS" or "DATA_SOURCE_STATUS_UPDATED" or 4 more
Accepts one of the following:
"DATA_SOURCE_STATUS_UNKNOWN"
"DATA_SOURCE_STATUS_IN_PROGRESS"
"DATA_SOURCE_STATUS_UPDATED"
"DATA_SOURCE_STATUS_PARTIALLY_UPDATED"
"DATA_SOURCE_STATUS_NOT_UPDATED"
"DATA_SOURCE_STATUS_FAILED"
"DATA_SOURCE_STATUS_CANCELLED"
total_bytes: optional string

Total size of files in data source in bytes

formatuint64
total_bytes_indexed: optional string

Total size of files in data source in bytes that have been indexed

formatuint64
total_file_count: optional string

Total file count in the data source

formatuint64
data_source_uuids: optional array of string
finished_at: optional string
formatdate-time
is_report_available: optional boolean

Boolean value to determine if the indexing job details are available

knowledge_base_uuid: optional string

Knowledge base id

phase: optional "BATCH_JOB_PHASE_UNKNOWN" or "BATCH_JOB_PHASE_PENDING" or "BATCH_JOB_PHASE_RUNNING" or 4 more
Accepts one of the following:
"BATCH_JOB_PHASE_UNKNOWN"
"BATCH_JOB_PHASE_PENDING"
"BATCH_JOB_PHASE_RUNNING"
"BATCH_JOB_PHASE_SUCCEEDED"
"BATCH_JOB_PHASE_FAILED"
"BATCH_JOB_PHASE_ERROR"
"BATCH_JOB_PHASE_CANCELLED"
started_at: optional string
formatdate-time
status: optional "INDEX_JOB_STATUS_UNKNOWN" or "INDEX_JOB_STATUS_PARTIAL" or "INDEX_JOB_STATUS_IN_PROGRESS" or 4 more
Accepts one of the following:
"INDEX_JOB_STATUS_UNKNOWN"
"INDEX_JOB_STATUS_PARTIAL"
"INDEX_JOB_STATUS_IN_PROGRESS"
"INDEX_JOB_STATUS_COMPLETED"
"INDEX_JOB_STATUS_FAILED"
"INDEX_JOB_STATUS_NO_CHANGES"
"INDEX_JOB_STATUS_PENDING"
tokens: optional number

Number of tokens [This field is deprecated]

formatint64
total_datasources: optional number

Number of datasources being indexed

formatint64
total_items_failed: optional string

Total Items Failed

formatuint64
total_items_indexed: optional string

Total Items Indexed

formatuint64
total_items_removed: optional string

Total Items Removed

formatuint64
total_items_skipped: optional string

Total Items Skipped

formatuint64
total_tokens: optional string

Total Tokens Consumed By the Indexing Job

formatuint64
updated_at: optional string

Last modified

formatdate-time
uuid: optional string

Unique id

name: optional string

Name of knowledge base

project_id: optional string
region: optional string

Region code

tags: optional array of string

Tags to organize related resources

updated_at: optional string

Last modified

formatdate-time
user_id: optional string

Id of user that created the knowledge base

formatint64
uuid: optional string

Unique id for knowledge base

Create a Knowledge Base
curl https://api.digitalocean.com/v2/gen-ai/knowledge_bases \
    -H 'Content-Type: application/json' \
    -H "Authorization: Bearer $DIGITALOCEAN_ACCESS_TOKEN"
{
  "knowledge_base": {
    "added_to_agent_at": "2023-01-01T00:00:00Z",
    "created_at": "2023-01-01T00:00:00Z",
    "database_id": "123e4567-e89b-12d3-a456-426614174000",
    "embedding_model_uuid": "123e4567-e89b-12d3-a456-426614174000",
    "is_public": true,
    "last_indexing_job": {
      "completed_datasources": 123,
      "created_at": "2023-01-01T00:00:00Z",
      "data_source_jobs": [
        {
          "completed_at": "2023-01-01T00:00:00Z",
          "data_source_uuid": "123e4567-e89b-12d3-a456-426614174000",
          "error_details": "example string",
          "error_msg": "example string",
          "failed_item_count": "12345",
          "indexed_file_count": "12345",
          "indexed_item_count": "12345",
          "removed_item_count": "12345",
          "skipped_item_count": "12345",
          "started_at": "2023-01-01T00:00:00Z",
          "status": "DATA_SOURCE_STATUS_UNKNOWN",
          "total_bytes": "12345",
          "total_bytes_indexed": "12345",
          "total_file_count": "12345"
        }
      ],
      "data_source_uuids": [
        "example string"
      ],
      "finished_at": "2023-01-01T00:00:00Z",
      "is_report_available": true,
      "knowledge_base_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "phase": "BATCH_JOB_PHASE_UNKNOWN",
      "started_at": "2023-01-01T00:00:00Z",
      "status": "INDEX_JOB_STATUS_UNKNOWN",
      "tokens": 123,
      "total_datasources": 123,
      "total_items_failed": "12345",
      "total_items_indexed": "12345",
      "total_items_removed": "12345",
      "total_items_skipped": "12345",
      "total_tokens": "12345",
      "updated_at": "2023-01-01T00:00:00Z",
      "uuid": "123e4567-e89b-12d3-a456-426614174000"
    },
    "name": "example name",
    "project_id": "123e4567-e89b-12d3-a456-426614174000",
    "region": "example string",
    "tags": [
      "example string"
    ],
    "updated_at": "2023-01-01T00:00:00Z",
    "user_id": "user_id",
    "uuid": "123e4567-e89b-12d3-a456-426614174000"
  }
}
Returns Examples
{
  "knowledge_base": {
    "added_to_agent_at": "2023-01-01T00:00:00Z",
    "created_at": "2023-01-01T00:00:00Z",
    "database_id": "123e4567-e89b-12d3-a456-426614174000",
    "embedding_model_uuid": "123e4567-e89b-12d3-a456-426614174000",
    "is_public": true,
    "last_indexing_job": {
      "completed_datasources": 123,
      "created_at": "2023-01-01T00:00:00Z",
      "data_source_jobs": [
        {
          "completed_at": "2023-01-01T00:00:00Z",
          "data_source_uuid": "123e4567-e89b-12d3-a456-426614174000",
          "error_details": "example string",
          "error_msg": "example string",
          "failed_item_count": "12345",
          "indexed_file_count": "12345",
          "indexed_item_count": "12345",
          "removed_item_count": "12345",
          "skipped_item_count": "12345",
          "started_at": "2023-01-01T00:00:00Z",
          "status": "DATA_SOURCE_STATUS_UNKNOWN",
          "total_bytes": "12345",
          "total_bytes_indexed": "12345",
          "total_file_count": "12345"
        }
      ],
      "data_source_uuids": [
        "example string"
      ],
      "finished_at": "2023-01-01T00:00:00Z",
      "is_report_available": true,
      "knowledge_base_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "phase": "BATCH_JOB_PHASE_UNKNOWN",
      "started_at": "2023-01-01T00:00:00Z",
      "status": "INDEX_JOB_STATUS_UNKNOWN",
      "tokens": 123,
      "total_datasources": 123,
      "total_items_failed": "12345",
      "total_items_indexed": "12345",
      "total_items_removed": "12345",
      "total_items_skipped": "12345",
      "total_tokens": "12345",
      "updated_at": "2023-01-01T00:00:00Z",
      "uuid": "123e4567-e89b-12d3-a456-426614174000"
    },
    "name": "example name",
    "project_id": "123e4567-e89b-12d3-a456-426614174000",
    "region": "example string",
    "tags": [
      "example string"
    ],
    "updated_at": "2023-01-01T00:00:00Z",
    "user_id": "user_id",
    "uuid": "123e4567-e89b-12d3-a456-426614174000"
  }
}