hbllmutils.model.remote
Remote LLM client implementation for OpenAI-compatible APIs.
This module implements a concrete LLMModel that
talks to OpenAI-compatible chat completion endpoints. It supports synchronous
requests, streaming responses, and convenient helpers for returning either the
full openai.types.chat.ChatCompletionMessage or only the text content.
The implementation is designed to be endpoint-agnostic as long as the server
implements the OpenAI API protocol.
The main public component is:
RemoteLLMModel- A concrete client for OpenAI-compatible API endpoints
Typical usage focuses on building a conversation history list and then calling
RemoteLLMModel.ask(), RemoteLLMModel.ask_stream(), or
RemoteLLMModel.create_message().
Example:
>>> from hbllmutils.model.remote import RemoteLLMModel
>>> model = RemoteLLMModel(
... base_url="https://api.openai.com/v1",
... api_token="sk-xxx",
... model_name="gpt-3.5-turbo",
... max_tokens=128
... )
>>> messages = [{"role": "user", "content": "Hello, world!"}]
>>> print(model.ask(messages))
Hello! How can I help you today?
>>> # Streaming usage
>>> stream = model.ask_stream(messages)
>>> for chunk in stream:
... print(chunk, end="", flush=True)
Note
The class relies on the official openai package and expects OpenAI-style
responses, including streaming deltas. If your endpoint diverges from this
structure, consider implementing a custom response stream handler.
RemoteLLMModel
- class hbllmutils.model.remote.RemoteLLMModel(base_url: str, api_token: str, model_name: str, organization_id: str | None = None, timeout: int = 30, max_retries: int = 3, headers: Dict[str, str] | None = None, **default_params: Any)[source]
A client for interacting with remote Large Language Model APIs.
This class provides a unified interface for communicating with OpenAI-compatible API endpoints. It supports both synchronous and asynchronous operations, streaming responses, and allows customization of request parameters.
- Parameters:
base_url (str) – API base URL (e.g.,
"https://api.openai.com/v1")api_token (str) – API access token for authentication
model_name (str) – Name of the model to use (e.g.,
"gpt-3.5-turbo")organization_id (Optional[str]) – Organization ID (optional, required by some APIs)
timeout (int) – Request timeout in seconds, defaults to
30max_retries (int) – Maximum number of retry attempts, defaults to
3headers (Optional[Dict[str, str]]) – Custom request headers
default_params – Default parameters for API requests
- Variables:
base_url (str) – API base URL (e.g.,
"https://api.openai.com/v1")api_token (str) – API access token for authentication
model_name (str) – Name of the model to use (e.g.,
"gpt-3.5-turbo","claude-3-opus")organization_id (Optional[str]) – Organization ID (required by some APIs)
timeout (int) – Request timeout in seconds
max_retries (int) – Maximum number of retry attempts
headers (Dict[str, str]) – Custom request headers
default_params (Dict[str, Any]) – Default parameters for API requests
- Raises:
ValueError – If
base_urlformat is invalidValueError – If
api_tokenis emptyValueError – If
model_nameis emptyValueError – If
timeoutis not positiveValueError – If
max_retriesis negative
Example:
>>> model = RemoteLLMModel( ... base_url="https://api.openai.com/v1", ... api_token="sk-xxx", ... model_name="gpt-3.5-turbo", ... temperature=0.2 ... ) >>> messages = [{"role": "user", "content": "Summarize LLMs in one sentence."}] >>> print(model.ask(messages)) Large language models generate text by predicting tokens from vast training data.
- __init__(base_url: str, api_token: str, model_name: str, organization_id: str | None = None, timeout: int = 30, max_retries: int = 3, headers: Dict[str, str] | None = None, **default_params: Any)[source]
Initialize the RemoteLLMModel instance.
- Parameters:
base_url (str) – API base URL (e.g.,
"https://api.openai.com/v1")api_token (str) – API access token for authentication
model_name (str) – Name of the model to use (e.g.,
"gpt-3.5-turbo")organization_id (Optional[str]) – Organization ID (optional, required by some APIs)
timeout (int) – Request timeout in seconds (default:
30)max_retries (int) – Maximum number of retry attempts (default:
3)headers (Optional[Dict[str, str]]) – Custom request headers (optional)
default_params (Any) – Default parameters for API requests (optional)
- Raises:
ValueError – If base_url format is invalid
ValueError – If api_token is empty
ValueError – If model_name is empty
ValueError – If timeout is not positive
ValueError – If max_retries is negative
- Example::
>>> model = RemoteLLMModel( ... base_url="https://api.openai.com/v1", ... api_token="sk-xxx", ... model_name="gpt-3.5-turbo" ... )
- __repr__() str[source]
Return a string representation of the RemoteLLMModel instance.
All constructor parameters including
default_paramsare displayed at the same level. The API token is masked for security purposes.- Returns:
String representation of the instance
- Return type:
str
- Example::
>>> model = RemoteLLMModel( ... base_url="https://api.openai.com/v1", ... api_token="sk-xxx", ... model_name="gpt-3.5-turbo", ... max_tokens=1000 ... ) >>> repr(model) 'RemoteLLMModel(base_url=..., api_token=..., max_tokens=1000, ...)'
- ask(messages: List[dict], with_reasoning: bool = False, **params: Any) str | Tuple[str | None, str][source]
Send a chat request and get the text response.
- Parameters:
messages (List[dict]) – List of message dictionaries for the conversation
with_reasoning (bool) – Whether to return reasoning content along with the response (default:
False)params (Any) – Additional parameters to pass to the API
- Returns:
If
with_reasoningisFalse, returns the content string. Ifwith_reasoningisTrue, returns a tuple of(reasoning_content, content).- Return type:
Union[str, Tuple[Optional[str], str]]
- Example::
>>> model = RemoteLLMModel(base_url="...", api_token="...", model_name="...") >>> messages = [{"role": "user", "content": "Explain quantum computing"}] >>> # Get only the response content >>> response = model.ask(messages) >>> print(response) >>> # Get both reasoning and response content >>> reasoning, response = model.ask(messages, with_reasoning=True) >>> print(f"Reasoning: {reasoning}") >>> print(f"Response: {response}")
- ask_stream(messages: List[dict], with_reasoning: bool = False, **params: Any) ResponseStream[source]
Send a chat request and get a streaming response.
- Parameters:
messages (List[dict]) – List of message dictionaries for the conversation
with_reasoning (bool) – Whether to include reasoning content in the stream (default:
False)params (Any) – Additional parameters to pass to the API
- Returns:
A ResponseStream object for iterating over the streaming response
- Return type:
- Example::
>>> model = RemoteLLMModel(base_url="...", api_token="...", model_name="...") >>> messages = [{"role": "user", "content": "Write a story"}] >>> stream = model.ask_stream(messages) >>> for chunk in stream: ... print(chunk, end='', flush=True)
- create_message(messages: List[dict], **params: Any) ChatCompletionMessage[source]
Send a chat request and get the complete message response.
- Parameters:
messages (List[dict]) – List of message dictionaries for the conversation
params (Any) – Additional parameters to pass to the API
- Returns:
The message object from the first choice in the response
- Return type:
ChatCompletionMessage
- Example::
>>> model = RemoteLLMModel(base_url="...", api_token="...", model_name="...") >>> messages = [{"role": "user", "content": "What is AI?"}] >>> response = model.create_message(messages) >>> print(response.content)