hbllmutils.model.remote

This module provides a remote LLM (Large Language Model) client implementation.

It offers a unified interface for interacting with OpenAI-compatible API endpoints, supporting both synchronous and asynchronous operations, streaming responses, and customizable parameters.

Classes:

RemoteLLMModel: Main class for managing remote LLM API interactions.

RemoteLLMModel

class hbllmutils.model.remote.RemoteLLMModel(base_url: str, api_token: str, model_name: str, organization_id: str | None = None, timeout: int = 30, max_retries: int = 3, headers: Dict[str, str] | None = None, **default_params)[source]

A client for interacting with remote Large Language Model APIs.

This class provides a unified interface for communicating with OpenAI-compatible API endpoints. It supports both synchronous and asynchronous operations, streaming responses, and allows customization of request parameters.

Variables:
  • base_url (str) – API base URL (e.g., “https://api.openai.com/v1”)

  • api_token (str) – API access token for authentication

  • model_name (str) – Name of the model to use (e.g., “gpt-3.5-turbo”, “claude-3-opus”)

  • organization_id (Optional[str]) – Organization ID (required by some APIs)

  • timeout (int) – Request timeout in seconds

  • max_retries (int) – Maximum number of retry attempts

  • headers (Dict[str, str]) – Custom request headers

  • default_params (Dict[str, Any]) – Default parameters for API requests

__init__(base_url: str, api_token: str, model_name: str, organization_id: str | None = None, timeout: int = 30, max_retries: int = 3, headers: Dict[str, str] | None = None, **default_params)[source]

Initialize the RemoteLLMModel instance.

Parameters:
  • base_url (str) – API base URL (e.g., “https://api.openai.com/v1”)

  • api_token (str) – API access token for authentication

  • model_name (str) – Name of the model to use (e.g., “gpt-3.5-turbo”)

  • organization_id (Optional[str]) – Organization ID (optional, required by some APIs)

  • timeout (int) – Request timeout in seconds (default: 30)

  • max_retries (int) – Maximum number of retry attempts (default: 3)

  • headers (Optional[Dict[str, str]]) – Custom request headers (optional)

  • default_params – Default parameters for API requests (optional)

Raises:
  • ValueError – If base_url format is invalid

  • ValueError – If api_token is empty

  • ValueError – If model_name is empty

  • ValueError – If timeout is not positive

  • ValueError – If max_retries is negative

Example::
>>> model = RemoteLLMModel(
...     base_url="https://api.openai.com/v1",
...     api_token="sk-xxx",
...     model_name="gpt-3.5-turbo"
... )
__repr__() str[source]

Return a string representation of the RemoteLLMModel instance.

All constructor parameters including default_params are displayed at the same level. The API token is masked for security purposes.

Returns:

String representation of the instance

Return type:

str

Example::
>>> model = RemoteLLMModel(
...     base_url="https://api.openai.com/v1",
...     api_token="sk-xxx",
...     model_name="gpt-3.5-turbo",
...     max_tokens=1000
... )
>>> repr(model)
'RemoteLLMModel(base_url=..., api_token=..., max_tokens=1000, ...)'
ask(messages: List[dict], with_reasoning: bool = False, **params) str | Tuple[str | None, str][source]

Send a chat request and get the text response.

Parameters:
  • messages (List[dict]) – List of message dictionaries for the conversation

  • with_reasoning (bool) – Whether to return reasoning content along with the response (default: False)

  • params (Any) – Additional parameters to pass to the API

Returns:

If with_reasoning is False, returns the content string. If with_reasoning is True, returns a tuple of (reasoning_content, content).

Return type:

Union[str, Tuple[Optional[str], str]]

Example::
>>> model = RemoteLLMModel(base_url="...", api_token="...", model_name="...")
>>> messages = [{"role": "user", "content": "Explain quantum computing"}]
>>> # Get only the response content
>>> response = model.ask(messages)
>>> print(response)
>>> # Get both reasoning and response content
>>> reasoning, response = model.ask(messages, with_reasoning=True)
>>> print(f"Reasoning: {reasoning}")
>>> print(f"Response: {response}")
ask_stream(messages: List[dict], with_reasoning: bool = False, **params) ResponseStream[source]

Send a chat request and get a streaming response.

Parameters:
  • messages (List[dict]) – List of message dictionaries for the conversation

  • with_reasoning (bool) – Whether to include reasoning content in the stream (default: False)

  • params (Any) – Additional parameters to pass to the API

Returns:

A ResponseStream object for iterating over the streaming response

Return type:

ResponseStream

Example::
>>> model = RemoteLLMModel(base_url="...", api_token="...", model_name="...")
>>> messages = [{"role": "user", "content": "Write a story"}]
>>> stream = model.ask_stream(messages)
>>> for chunk in stream:
...     print(chunk, end='', flush=True)
create_message(messages: List[dict], **params) ChatCompletionMessage[source]

Send a chat request and get the complete message response.

Parameters:
  • messages (List[dict]) – List of message dictionaries for the conversation

  • params (Any) – Additional parameters to pass to the API

Returns:

The message object from the first choice in the response

Return type:

ChatCompletionMessage

Example::
>>> model = RemoteLLMModel(base_url="...", api_token="...", model_name="...")
>>> messages = [{"role": "user", "content": "What is AI?"}]
>>> response = model.create_message(messages)
>>> print(response.content)