hbllmutils.model.remote

Remote LLM client implementation for OpenAI-compatible APIs.

This module implements a concrete LLMModel that talks to OpenAI-compatible chat completion endpoints. It supports synchronous requests, streaming responses, and convenient helpers for returning either the full openai.types.chat.ChatCompletionMessage or only the text content. The implementation is designed to be endpoint-agnostic as long as the server implements the OpenAI API protocol.

The main public component is:

  • RemoteLLMModel - A concrete client for OpenAI-compatible API endpoints

Typical usage focuses on building a conversation history list and then calling RemoteLLMModel.ask(), RemoteLLMModel.ask_stream(), or RemoteLLMModel.create_message().

Example:

>>> from hbllmutils.model.remote import RemoteLLMModel
>>> model = RemoteLLMModel(
...     base_url="https://api.openai.com/v1",
...     api_token="sk-xxx",
...     model_name="gpt-3.5-turbo",
...     max_tokens=128
... )
>>> messages = [{"role": "user", "content": "Hello, world!"}]
>>> print(model.ask(messages))
Hello! How can I help you today?

>>> # Streaming usage
>>> stream = model.ask_stream(messages)
>>> for chunk in stream:
...     print(chunk, end="", flush=True)

Note

The class relies on the official openai package and expects OpenAI-style responses, including streaming deltas. If your endpoint diverges from this structure, consider implementing a custom response stream handler.

RemoteLLMModel

class hbllmutils.model.remote.RemoteLLMModel(base_url: str, api_token: str, model_name: str, organization_id: str | None = None, timeout: int = 30, max_retries: int = 3, headers: Dict[str, str] | None = None, **default_params: Any)[source]

A client for interacting with remote Large Language Model APIs.

This class provides a unified interface for communicating with OpenAI-compatible API endpoints. It supports both synchronous and asynchronous operations, streaming responses, and allows customization of request parameters.

Parameters:
  • base_url (str) – API base URL (e.g., "https://api.openai.com/v1")

  • api_token (str) – API access token for authentication

  • model_name (str) – Name of the model to use (e.g., "gpt-3.5-turbo")

  • organization_id (Optional[str]) – Organization ID (optional, required by some APIs)

  • timeout (int) – Request timeout in seconds, defaults to 30

  • max_retries (int) – Maximum number of retry attempts, defaults to 3

  • headers (Optional[Dict[str, str]]) – Custom request headers

  • default_params – Default parameters for API requests

Variables:
  • base_url (str) – API base URL (e.g., "https://api.openai.com/v1")

  • api_token (str) – API access token for authentication

  • model_name (str) – Name of the model to use (e.g., "gpt-3.5-turbo", "claude-3-opus")

  • organization_id (Optional[str]) – Organization ID (required by some APIs)

  • timeout (int) – Request timeout in seconds

  • max_retries (int) – Maximum number of retry attempts

  • headers (Dict[str, str]) – Custom request headers

  • default_params (Dict[str, Any]) – Default parameters for API requests

Raises:
  • ValueError – If base_url format is invalid

  • ValueError – If api_token is empty

  • ValueError – If model_name is empty

  • ValueError – If timeout is not positive

  • ValueError – If max_retries is negative

Example:

>>> model = RemoteLLMModel(
...     base_url="https://api.openai.com/v1",
...     api_token="sk-xxx",
...     model_name="gpt-3.5-turbo",
...     temperature=0.2
... )
>>> messages = [{"role": "user", "content": "Summarize LLMs in one sentence."}]
>>> print(model.ask(messages))
Large language models generate text by predicting tokens from vast training data.
__init__(base_url: str, api_token: str, model_name: str, organization_id: str | None = None, timeout: int = 30, max_retries: int = 3, headers: Dict[str, str] | None = None, **default_params: Any)[source]

Initialize the RemoteLLMModel instance.

Parameters:
  • base_url (str) – API base URL (e.g., "https://api.openai.com/v1")

  • api_token (str) – API access token for authentication

  • model_name (str) – Name of the model to use (e.g., "gpt-3.5-turbo")

  • organization_id (Optional[str]) – Organization ID (optional, required by some APIs)

  • timeout (int) – Request timeout in seconds (default: 30)

  • max_retries (int) – Maximum number of retry attempts (default: 3)

  • headers (Optional[Dict[str, str]]) – Custom request headers (optional)

  • default_params (Any) – Default parameters for API requests (optional)

Raises:
  • ValueError – If base_url format is invalid

  • ValueError – If api_token is empty

  • ValueError – If model_name is empty

  • ValueError – If timeout is not positive

  • ValueError – If max_retries is negative

Example::
>>> model = RemoteLLMModel(
...     base_url="https://api.openai.com/v1",
...     api_token="sk-xxx",
...     model_name="gpt-3.5-turbo"
... )
__repr__() str[source]

Return a string representation of the RemoteLLMModel instance.

All constructor parameters including default_params are displayed at the same level. The API token is masked for security purposes.

Returns:

String representation of the instance

Return type:

str

Example::
>>> model = RemoteLLMModel(
...     base_url="https://api.openai.com/v1",
...     api_token="sk-xxx",
...     model_name="gpt-3.5-turbo",
...     max_tokens=1000
... )
>>> repr(model)
'RemoteLLMModel(base_url=..., api_token=..., max_tokens=1000, ...)'
ask(messages: List[dict], with_reasoning: bool = False, **params: Any) str | Tuple[str | None, str][source]

Send a chat request and get the text response.

Parameters:
  • messages (List[dict]) – List of message dictionaries for the conversation

  • with_reasoning (bool) – Whether to return reasoning content along with the response (default: False)

  • params (Any) – Additional parameters to pass to the API

Returns:

If with_reasoning is False, returns the content string. If with_reasoning is True, returns a tuple of (reasoning_content, content).

Return type:

Union[str, Tuple[Optional[str], str]]

Example::
>>> model = RemoteLLMModel(base_url="...", api_token="...", model_name="...")
>>> messages = [{"role": "user", "content": "Explain quantum computing"}]
>>> # Get only the response content
>>> response = model.ask(messages)
>>> print(response)
>>> # Get both reasoning and response content
>>> reasoning, response = model.ask(messages, with_reasoning=True)
>>> print(f"Reasoning: {reasoning}")
>>> print(f"Response: {response}")
ask_stream(messages: List[dict], with_reasoning: bool = False, **params: Any) ResponseStream[source]

Send a chat request and get a streaming response.

Parameters:
  • messages (List[dict]) – List of message dictionaries for the conversation

  • with_reasoning (bool) – Whether to include reasoning content in the stream (default: False)

  • params (Any) – Additional parameters to pass to the API

Returns:

A ResponseStream object for iterating over the streaming response

Return type:

ResponseStream

Example::
>>> model = RemoteLLMModel(base_url="...", api_token="...", model_name="...")
>>> messages = [{"role": "user", "content": "Write a story"}]
>>> stream = model.ask_stream(messages)
>>> for chunk in stream:
...     print(chunk, end='', flush=True)
create_message(messages: List[dict], **params: Any) ChatCompletionMessage[source]

Send a chat request and get the complete message response.

Parameters:
  • messages (List[dict]) – List of message dictionaries for the conversation

  • params (Any) – Additional parameters to pass to the API

Returns:

The message object from the first choice in the response

Return type:

ChatCompletionMessage

Example::
>>> model = RemoteLLMModel(base_url="...", api_token="...", model_name="...")
>>> messages = [{"role": "user", "content": "What is AI?"}]
>>> response = model.create_message(messages)
>>> print(response.content)