Skip to content
  • Auto
  • Light
  • Dark

Create

Add Data Source to a Knowledge Base
post/v2/gen-ai/knowledge_bases/{knowledge_base_uuid}/data_sources

To add a data source to a knowledge base, send a POST request to /v2/gen-ai/knowledge_bases/{knowledge_base_uuid}/data_sources.

Path ParametersExpand Collapse
knowledge_base_uuid: string
Body ParametersExpand Collapse
aws_data_source: optional AwsDataSource { bucket_name, item_path, key_id, 2 more }

AWS S3 Data Source

bucket_name: optional string

Spaces bucket name

item_path: optional string
key_id: optional string

The AWS Key ID

region: optional string

Region of bucket

secret_key: optional string

The AWS Secret Key

knowledge_base_uuid: optional string

Knowledge base id

spaces_data_source: optional APISpacesDataSource { bucket_name, item_path, region }

Spaces Bucket Data Source

bucket_name: optional string

Spaces bucket name

item_path: optional string
region: optional string

Region of bucket

web_crawler_data_source: optional APIWebCrawlerDataSource { base_url, crawling_option, embed_media }

WebCrawlerDataSource

base_url: optional string

The base url to crawl.

crawling_option: optional "UNKNOWN" or "SCOPED" or "PATH" or 2 more

Options for specifying how URLs found on pages should be handled.

  • UNKNOWN: Default unknown value
  • SCOPED: Only include the base URL.
  • PATH: Crawl the base URL and linked pages within the URL path.
  • DOMAIN: Crawl the base URL and linked pages within the same domain.
  • SUBDOMAINS: Crawl the base URL and linked pages for any subdomain.
Accepts one of the following:
"UNKNOWN"
"SCOPED"
"PATH"
"DOMAIN"
"SUBDOMAINS"
embed_media: optional boolean

Whether to ingest and index media (images, etc.) on web pages.

ReturnsExpand Collapse
knowledge_base_data_source: optional APIKnowledgeBaseDataSource { aws_data_source, bucket_name, created_at, 10 more }

Data Source configuration for Knowledge Bases

aws_data_source: optional object { bucket_name, item_path, region }

AWS S3 Data Source for Display

bucket_name: optional string

Spaces bucket name

item_path: optional string
region: optional string

Region of bucket

bucket_name: optional string

Name of storage bucket - Deprecated, moved to data_source_details

created_at: optional string

Creation date / time

formatdate-time
dropbox_data_source: optional object { folder }

Dropbox Data Source for Display

folder: optional string
file_upload_data_source: optional APIFileUploadDataSource { original_file_name, size_in_bytes, stored_object_key }

File to upload as data source for knowledge base.

original_file_name: optional string

The original file name

size_in_bytes: optional string

The size of the file in bytes

formatuint64
stored_object_key: optional string

The object key the file was stored as

item_path: optional string

Path of folder or object in bucket - Deprecated, moved to data_source_details

last_datasource_indexing_job: optional APIIndexedDataSource { completed_at, data_source_uuid, error_details, 11 more }
completed_at: optional string

Timestamp when data source completed indexing

formatdate-time
data_source_uuid: optional string

Uuid of the indexed data source

error_details: optional string

A detailed error description

error_msg: optional string

A string code provinding a hint which part of the system experienced an error

failed_item_count: optional string

Total count of files that have failed

formatuint64
indexed_file_count: optional string

Total count of files that have been indexed

formatuint64
indexed_item_count: optional string

Total count of files that have been indexed

formatuint64
removed_item_count: optional string

Total count of files that have been removed

formatuint64
skipped_item_count: optional string

Total count of files that have been skipped

formatuint64
started_at: optional string

Timestamp when data source started indexing

formatdate-time
status: optional "DATA_SOURCE_STATUS_UNKNOWN" or "DATA_SOURCE_STATUS_IN_PROGRESS" or "DATA_SOURCE_STATUS_UPDATED" or 3 more
Accepts one of the following:
"DATA_SOURCE_STATUS_UNKNOWN"
"DATA_SOURCE_STATUS_IN_PROGRESS"
"DATA_SOURCE_STATUS_UPDATED"
"DATA_SOURCE_STATUS_PARTIALLY_UPDATED"
"DATA_SOURCE_STATUS_NOT_UPDATED"
"DATA_SOURCE_STATUS_FAILED"
total_bytes: optional string

Total size of files in data source in bytes

formatuint64
total_bytes_indexed: optional string

Total size of files in data source in bytes that have been indexed

formatuint64
total_file_count: optional string

Total file count in the data source

formatuint64
last_indexing_job: optional APIIndexingJob { completed_datasources, created_at, data_source_uuids, 12 more }

IndexingJob description

completed_datasources: optional number

Number of datasources indexed completed

formatint64
created_at: optional string

Creation date / time

formatdate-time
data_source_uuids: optional array of string
finished_at: optional string
formatdate-time
knowledge_base_uuid: optional string

Knowledge base id

phase: optional "BATCH_JOB_PHASE_UNKNOWN" or "BATCH_JOB_PHASE_PENDING" or "BATCH_JOB_PHASE_RUNNING" or 4 more
Accepts one of the following:
"BATCH_JOB_PHASE_UNKNOWN"
"BATCH_JOB_PHASE_PENDING"
"BATCH_JOB_PHASE_RUNNING"
"BATCH_JOB_PHASE_SUCCEEDED"
"BATCH_JOB_PHASE_FAILED"
"BATCH_JOB_PHASE_ERROR"
"BATCH_JOB_PHASE_CANCELLED"
started_at: optional string
formatdate-time
status: optional "INDEX_JOB_STATUS_UNKNOWN" or "INDEX_JOB_STATUS_PARTIAL" or "INDEX_JOB_STATUS_IN_PROGRESS" or 4 more
Accepts one of the following:
"INDEX_JOB_STATUS_UNKNOWN"
"INDEX_JOB_STATUS_PARTIAL"
"INDEX_JOB_STATUS_IN_PROGRESS"
"INDEX_JOB_STATUS_COMPLETED"
"INDEX_JOB_STATUS_FAILED"
"INDEX_JOB_STATUS_NO_CHANGES"
"INDEX_JOB_STATUS_PENDING"
tokens: optional number

Number of tokens

formatint64
total_datasources: optional number

Number of datasources being indexed

formatint64
total_items_failed: optional string

Total Items Failed

formatuint64
total_items_indexed: optional string

Total Items Indexed

formatuint64
total_items_skipped: optional string

Total Items Skipped

formatuint64
updated_at: optional string

Last modified

formatdate-time
uuid: optional string

Unique id

region: optional string

Region code - Deprecated, moved to data_source_details

spaces_data_source: optional APISpacesDataSource { bucket_name, item_path, region }

Spaces Bucket Data Source

bucket_name: optional string

Spaces bucket name

item_path: optional string
region: optional string

Region of bucket

updated_at: optional string

Last modified

formatdate-time
uuid: optional string

Unique id of knowledge base

web_crawler_data_source: optional APIWebCrawlerDataSource { base_url, crawling_option, embed_media }

WebCrawlerDataSource

base_url: optional string

The base url to crawl.

crawling_option: optional "UNKNOWN" or "SCOPED" or "PATH" or 2 more

Options for specifying how URLs found on pages should be handled.

  • UNKNOWN: Default unknown value
  • SCOPED: Only include the base URL.
  • PATH: Crawl the base URL and linked pages within the URL path.
  • DOMAIN: Crawl the base URL and linked pages within the same domain.
  • SUBDOMAINS: Crawl the base URL and linked pages for any subdomain.
Accepts one of the following:
"UNKNOWN"
"SCOPED"
"PATH"
"DOMAIN"
"SUBDOMAINS"
embed_media: optional boolean

Whether to ingest and index media (images, etc.) on web pages.

Add Data Source to a Knowledge Base
curl https://api.digitalocean.com/v2/gen-ai/knowledge_bases/$KNOWLEDGE_BASE_UUID/data_sources \
    -H 'Content-Type: application/json' \
    -H "Authorization: Bearer $DIGITALOCEAN_ACCESS_TOKEN"
{
  "knowledge_base_data_source": {
    "aws_data_source": {
      "bucket_name": "example name",
      "item_path": "example string",
      "region": "example string"
    },
    "bucket_name": "example name",
    "created_at": "2023-01-01T00:00:00Z",
    "dropbox_data_source": {
      "folder": "example string"
    },
    "file_upload_data_source": {
      "original_file_name": "example name",
      "size_in_bytes": "12345",
      "stored_object_key": "example string"
    },
    "item_path": "example string",
    "last_datasource_indexing_job": {
      "completed_at": "2023-01-01T00:00:00Z",
      "data_source_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "error_details": "example string",
      "error_msg": "example string",
      "failed_item_count": "12345",
      "indexed_file_count": "12345",
      "indexed_item_count": "12345",
      "removed_item_count": "12345",
      "skipped_item_count": "12345",
      "started_at": "2023-01-01T00:00:00Z",
      "status": "DATA_SOURCE_STATUS_UNKNOWN",
      "total_bytes": "12345",
      "total_bytes_indexed": "12345",
      "total_file_count": "12345"
    },
    "last_indexing_job": {
      "completed_datasources": 123,
      "created_at": "2023-01-01T00:00:00Z",
      "data_source_uuids": [
        "example string"
      ],
      "finished_at": "2023-01-01T00:00:00Z",
      "knowledge_base_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "phase": "BATCH_JOB_PHASE_UNKNOWN",
      "started_at": "2023-01-01T00:00:00Z",
      "status": "INDEX_JOB_STATUS_UNKNOWN",
      "tokens": 123,
      "total_datasources": 123,
      "total_items_failed": "12345",
      "total_items_indexed": "12345",
      "total_items_skipped": "12345",
      "updated_at": "2023-01-01T00:00:00Z",
      "uuid": "123e4567-e89b-12d3-a456-426614174000"
    },
    "region": "example string",
    "spaces_data_source": {
      "bucket_name": "example name",
      "item_path": "example string",
      "region": "example string"
    },
    "updated_at": "2023-01-01T00:00:00Z",
    "uuid": "123e4567-e89b-12d3-a456-426614174000",
    "web_crawler_data_source": {
      "base_url": "example string",
      "crawling_option": "UNKNOWN",
      "embed_media": true
    }
  }
}
Returns Examples
{
  "knowledge_base_data_source": {
    "aws_data_source": {
      "bucket_name": "example name",
      "item_path": "example string",
      "region": "example string"
    },
    "bucket_name": "example name",
    "created_at": "2023-01-01T00:00:00Z",
    "dropbox_data_source": {
      "folder": "example string"
    },
    "file_upload_data_source": {
      "original_file_name": "example name",
      "size_in_bytes": "12345",
      "stored_object_key": "example string"
    },
    "item_path": "example string",
    "last_datasource_indexing_job": {
      "completed_at": "2023-01-01T00:00:00Z",
      "data_source_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "error_details": "example string",
      "error_msg": "example string",
      "failed_item_count": "12345",
      "indexed_file_count": "12345",
      "indexed_item_count": "12345",
      "removed_item_count": "12345",
      "skipped_item_count": "12345",
      "started_at": "2023-01-01T00:00:00Z",
      "status": "DATA_SOURCE_STATUS_UNKNOWN",
      "total_bytes": "12345",
      "total_bytes_indexed": "12345",
      "total_file_count": "12345"
    },
    "last_indexing_job": {
      "completed_datasources": 123,
      "created_at": "2023-01-01T00:00:00Z",
      "data_source_uuids": [
        "example string"
      ],
      "finished_at": "2023-01-01T00:00:00Z",
      "knowledge_base_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "phase": "BATCH_JOB_PHASE_UNKNOWN",
      "started_at": "2023-01-01T00:00:00Z",
      "status": "INDEX_JOB_STATUS_UNKNOWN",
      "tokens": 123,
      "total_datasources": 123,
      "total_items_failed": "12345",
      "total_items_indexed": "12345",
      "total_items_skipped": "12345",
      "updated_at": "2023-01-01T00:00:00Z",
      "uuid": "123e4567-e89b-12d3-a456-426614174000"
    },
    "region": "example string",
    "spaces_data_source": {
      "bucket_name": "example name",
      "item_path": "example string",
      "region": "example string"
    },
    "updated_at": "2023-01-01T00:00:00Z",
    "uuid": "123e4567-e89b-12d3-a456-426614174000",
    "web_crawler_data_source": {
      "base_url": "example string",
      "crawling_option": "UNKNOWN",
      "embed_media": true
    }
  }
}