Handling errors
When the library is unable to connect to the API (for example, due to network connection problems or a timeout), a subclass of gradient.APIConnectionError is raised.
When the API returns a non-success status code (that is, 4xx or 5xx
response), a subclass of gradient.APIStatusError is raised, containing status_code and response properties.
All errors inherit from gradient.APIError.
import gradientfrom gradient import Gradient
client = Gradient()
try: client.chat.completions.create( messages=[ { "role": "user", "content": "What is the capital of France?", } ], model="llama3.3-70b-instruct", )except gradient.APIConnectionError as e: print("The server could not be reached") print(e.__cause__) # an underlying Exception, likely raised within httpx.except gradient.RateLimitError as e: print("A 429 status code was received; we should back off a bit.")except gradient.APIStatusError as e: print("Another non-200-range status code was received") print(e.status_code) print(e.response)Error codes are as follows:
| Status Code | Error Type |
|---|---|
| 400 | BadRequestError |
| 401 | AuthenticationError |
| 403 | PermissionDeniedError |
| 404 | NotFoundError |
| 422 | UnprocessableEntityError |
| 429 | RateLimitError |
| >=500 | InternalServerError |
| N/A | APIConnectionError |
Retries
Section titled “Retries”Certain errors are automatically retried 2 times by default, with a short exponential backoff. Connection errors (for example, due to a network connectivity problem), 408 Request Timeout, 409 Conflict, 429 Rate Limit, and >=500 Internal errors are all retried by default.
You can use the max_retries option to configure or disable retry settings:
from gradient import Gradient
# Configure the default for all requests:client = Gradient( # default is 2 max_retries=0,)
# Or, configure per-request:client.with_options(max_retries=5).chat.completions.create( messages=[ { "role": "user", "content": "What is the capital of France?", } ], model="llama3.3-70b-instruct",)Timeouts
Section titled “Timeouts”By default requests time out after 1 minute. You can configure this with a timeout option,
which accepts a float or an httpx.Timeout object:
from gradient import Gradient
# Configure the default for all requests:client = Gradient( # 20 seconds (default is 1 minute) timeout=20.0,)
# More granular control:client = Gradient( timeout=httpx.Timeout(60.0, read=5.0, write=10.0, connect=2.0),)
# Override per-request:client.with_options(timeout=5.0).chat.completions.create( messages=[ { "role": "user", "content": "What is the capital of France?", } ], model="llama3.3-70b-instruct",)On timeout, an APITimeoutError is thrown.
Note that requests that time out are retried twice by default.