Handling errors

Getting Started

How to handle errors

When the library is unable to connect to the API (for example, due to network connection problems or a timeout), a subclass of gradient.APIConnectionError is raised.

When the API returns a non-success status code (that is, 4xx or 5xx response), a subclass of gradient.APIStatusError is raised, containing status_code and response properties.

All errors inherit from gradient.APIError.

import gradient
from gradient import Gradient

client = Gradient()

try:
    client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "What is the capital of France?",
            }
        ],
        model="anthropic-claude-4-sonnet",
    )
except gradient.APIConnectionError as e:
    print("The server could not be reached")
    print(e.__cause__)  # an underlying Exception, likely raised within httpx.
except gradient.RateLimitError as e:
    print("A 429 status code was received; we should back off a bit.")
except gradient.APIStatusError as e:
    print("Another non-200-range status code was received")
    print(e.status_code)
    print(e.response)

Error codes are as follows:

Status Code	Error Type
400	`InvalidRequestError`
401	`UnauthorizedError`
404	`NotFoundError`
429	`RateLimitExceededError`
>=500	`InternalError`
N/A	`APIConnectionError`

Retries

Certain errors are automatically retried 2 times by default, with a short exponential backoff. Connection errors (for example, due to a network connectivity problem), 408 Request Timeout, 409 Conflict, 429 Rate Limit, and >=500 Internal errors are all retried by default.

You can use the max_retries option to configure or disable retry settings:

from gradient import Gradient

# Configure the default for all requests:
client = Gradient(
    # default is 2
    max_retries=0,
)

# Or, configure per-request:
client.with_options(max_retries=5).chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?",
        }
    ],
    model="anthropic-claude-4-sonnet",
)

Timeouts

By default requests time out after 1 minute. You can configure this with a timeout option, which accepts a float or an httpx.Timeout object:

from gradient import Gradient

# Configure the default for all requests:
client = Gradient(
    # 20 seconds (default is 1 minute)
    timeout=20.0,
)

# More granular control:
client = Gradient(
    timeout=httpx.Timeout(60.0, read=5.0, write=10.0, connect=2.0),
)

# Override per-request:
client.with_options(timeout=5.0).chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?",
        }
    ],
    model="llama3.3-70b-instruct",
)

On timeout, an APITimeoutError is thrown.

Note that requests that time out are retried twice by default.