hbllmutils.model.fake

Fake LLM Model Module.

This module provides a fake implementation of an LLM (Large Language Model) for testing and development purposes. It simulates LLM behavior by returning predefined responses based on configurable rules, supporting both synchronous and streaming response modes with customizable word-per-second rates.

The module contains the following main components:

FakeResponseSequence - Immutable sequence handler for ordered responses
FakeResponseStream - Streaming response wrapper with reasoning/content separation
FakeLLMModel - Immutable mock LLM model with rule-driven responses

Note

The streaming implementation uses jieba for word segmentation, which is optimized for Chinese text. English text will still be tokenized, but the granularity may differ from character-level streaming.

Example:

>>> model = FakeLLMModel(stream_wps=20)
>>> model = model.response_when_keyword_in_last_message(
...     ["weather", "temperature"],
...     ("thinking...", "It's sunny today!")
... )
>>> model.ask([{"role": "user", "content": "What's the weather?"}])
"It's sunny today!"

>>> seq_model = model.response_sequence(["First", "Second"])
>>> seq_model.ask([{"role": "user", "content": "next"}])
'First'

FakeResponseTyping

hbllmutils.model.fake.FakeResponseTyping

Type alias for fake response types: string, (reasoning, content) tuple, or callable response factory.

alias of str | Tuple[str, str] | Callable[[…], str | Tuple[str, str]]

FakeResponseSequence

class hbllmutils.model.fake.FakeResponseSequence(responses: List[str | Tuple[str, str]], index: int = 0)[source]

A sequence-based response handler that returns responses in order.

This class maintains immutability by creating new instances when the index changes, ensuring thread safety and compatibility with FakeLLMModel’s immutable design.

Parameters:

responses (List[Union[str, Tuple[str, str]]]) – List of responses to return in order.
index (int) – Current index in the sequence, defaults to 0.

Variables:

_response_contents (Tuple[Union[str, Tuple[str, str]], ...]) – Immutable tuple of response items.
_index (int) – Current position in the sequence.

Example:

>>> sequence = FakeResponseSequence(["A", ("thinking", "B")])
>>> sequence.response([{"role": "user", "content": "hi"}])
('', 'A')
>>> sequence.advance().response([{"role": "user", "content": "hi"}])
('thinking', 'B')

__eq__(other: object) → bool[source]

Check equality with another FakeResponseSequence instance.

Parameters:: other (object) – The other instance to compare with.
Returns:: True if instances are equal, False otherwise.
Return type:: bool

__hash__() → int[source]

Return hash for use in sets and as dict keys.

Returns:: Hash value of the instance.
Return type:: int

__init__(responses: List[str | Tuple[str, str]], index: int = 0) → None[source]

Initialize the response sequence.

Parameters:

responses (List[Union[str, Tuple[str, str]]]) – List of responses to return in order.
index (int) – Current index in the sequence (default: 0).

__repr__() → str[source]

Return string representation of the sequence.

Returns:: String representation showing responses and current index.
Return type:: str

advance() → FakeResponseSequence[source]

Create a new instance with the index advanced by 1.

Returns:: A new FakeResponseSequence instance with incremented index.
Return type:: FakeResponseSequence

property current_index: int

Get the current index in the sequence.

Returns:: The current index position.
Return type:: int

property has_more_responses: bool

Check if there are more responses available.

Returns:: True if more responses are available, False otherwise.
Return type:: bool

reset() → FakeResponseSequence[source]

Create a new instance with the index reset to 0.

Returns:: A new FakeResponseSequence instance with index reset to 0.
Return type:: FakeResponseSequence

response(messages: List[dict], **params: Any) → Tuple[str, str][source]

Get the next response in the sequence.

Parameters:

messages (List[dict]) – The list of message dictionaries.
params (dict) – Additional parameters (unused).

Returns:

A tuple of (reasoning_content, content).

Return type:

Tuple[str, str]

Raises:

IndexError – If no more responses are available.

rule_check(messages: List[dict], **params: Any) → bool[source]

Check if this sequence can provide a response.

Parameters:

messages (List[dict]) – The list of message dictionaries.
params (dict) – Additional parameters (unused).

Returns:

True if there are more responses available, False otherwise.

Return type:

bool

property total_responses: int

Get the total number of responses in the sequence.

Returns:: The total count of responses.
Return type:: int

FakeResponseStream

class hbllmutils.model.fake.FakeResponseStream(session: Any, with_reasoning: bool = False, reasoning_splitter: str = '---------------------------reasoning---------------------------', content_splitter: str = '---------------------------content---------------------------')[source]

A fake response stream that handles streaming responses with reasoning and content.

This class extends ResponseStream to provide a simple implementation for testing purposes, where chunks are tuples of (reasoning_content, content).

FakeLLMModel

class hbllmutils.model.fake.FakeLLMModel(stream_wps: float = 50, rules: List[Tuple[Callable, str | Tuple[str, str] | Callable[[...], str | Tuple[str, str]]]] | None = None)[source]

An immutable fake LLM model implementation for testing and development.

This class simulates an LLM by returning predefined responses based on configurable rules. It supports both synchronous and streaming response modes, with customizable streaming speed. Responses can be configured to match specific conditions or keywords in messages.

All modification operations return new instances, ensuring immutability and thread safety.

Parameters:

stream_wps (float) – Words per second for streaming responses, defaults to 50.
rules (Optional[List[Tuple[Callable, FakeResponseTyping]]]) – List of (rule_function, response) tuples. Internal parameter.

Variables:

_stream_wps (float) – Words-per-second rate for streaming.
_rules (Tuple[Tuple[Callable, FakeResponseTyping], ...]) – Immutable tuple of response rules.
_frozen (bool) – Immutability flag, set after initialization.

Example:

>>> model = FakeLLMModel(stream_wps=50)
>>> model_with_rule = model.response_when_keyword_in_last_message("weather", "It's sunny today!")
>>> response = model_with_rule.ask([{"role": "user", "content": "What's the weather?"}])
>>> print(response)
It's sunny today!

>>> final_model = model_with_rule.response_always("Hello, I'm a fake LLM!")
>>> response = final_model.ask([{"role": "user", "content": "Hi"}])
>>> print(response)
Hello, I'm a fake LLM!

__delattr__(name: str) → None[source]

Prevent attribute deletion to ensure immutability.

Parameters:: name (str) – The attribute name.
Raises:: AttributeError – Always, as deletion is not allowed.

__init__(stream_wps: float = 50, rules: List[Tuple[Callable, str | Tuple[str, str] | Callable[[...], str | Tuple[str, str]]]] | None = None) → None[source]

Initialize the fake LLM model.

Parameters:

stream_wps (float) – Words per second for streaming responses (default: 50).
rules (Optional[List[Tuple[Callable, FakeResponseTyping]]]) – List of (rule_function, response) tuples. Internal parameter, not intended for direct use.

__repr__() → str[source]

Return a string representation of the FakeLLMModel instance.

Shows the stream_wps parameter and the number of configured rules.

Returns:: String representation of the instance.
Return type:: str

Example::

>>> model = FakeLLMModel(stream_wps=100).response_always("Hello")
>>> repr(model)
'FakeLLMModel(stream_wps=100, rules_count=1)'

__setattr__(name: str, value: Any) → None[source]

Prevent attribute modification after initialization to ensure immutability.

Parameters:

name (str) – The attribute name.
value (Any) – The attribute value.

Raises:

AttributeError – If attempting to modify attributes after initialization.

ask(messages: List[dict], with_reasoning: bool = False, **params: Any) → str | Tuple[str | None, str][source]

Send messages and get a synchronous response.

Parameters:

messages (List[dict]) – The list of message dictionaries containing conversation history.
with_reasoning (bool) – If True, return both reasoning and content as a tuple (default: False).
params (dict) – Additional parameters to pass to response functions.

Returns:

The response content string, or tuple of (reasoning_content, content) if with_reasoning is True.

Return type:

Union[str, Tuple[Optional[str], str]]

Example::

>>> model = FakeLLMModel().response_always(("thinking...", "final answer"))
>>> model.ask([{"role": "user", "content": "test"}])
'final answer'
>>> model.ask([{"role": "user", "content": "test"}], with_reasoning=True)
('thinking...', 'final answer')

ask_stream(messages: List[dict], with_reasoning: bool = False, **params: Any) → ResponseStream[source]

Send messages and get a streaming response.

This method returns a ResponseStream that yields the response word-by-word, simulating the streaming behavior of a real LLM. The streaming speed is controlled by the stream_wps parameter set during initialization.

Parameters:

messages (List[dict]) – The list of message dictionaries containing conversation history.
with_reasoning (bool) – If True, include reasoning content in the stream (default: False).
params (dict) – Additional parameters to pass to response functions.

Returns:

A ResponseStream object that yields word-by-word chunks.

Return type:

ResponseStream

Example::

>>> model = FakeLLMModel(stream_wps=10).response_always("Hello world")
>>> stream = model.ask_stream([{"role": "user", "content": "Hi"}])
>>> for chunk in stream:
...     print(chunk, end='', flush=True)
Hello world

clear_rules() → FakeLLMModel[source]

Create a new instance with all rules removed.

Returns:: A new FakeLLMModel instance with no rules.
Return type:: FakeLLMModel

Example::

>>> model = FakeLLMModel().response_always("Hello")
>>> model.rules_count
1
>>> clean_model = model.clear_rules()
>>> clean_model.rules_count
0

response_always(response: str | Tuple[str, str] | Callable[[...], str | Tuple[str, str]]) → FakeLLMModel[source]

Create a new instance with a rule that always returns the specified response.

Parameters:: response (FakeResponseTyping) – The response to return, can be a string, tuple of (reasoning, content), or callable.
Returns:: A new FakeLLMModel instance with the added rule.
Return type:: FakeLLMModel

Example::

>>> model = FakeLLMModel()
>>> new_model = model.response_always("Default response")
>>> new_model.ask([{"role": "user", "content": "anything"}])
'Default response'
>>> model.rules_count  # Original unchanged
0
>>> new_model.rules_count
1

response_sequence(responses: List[str | Tuple[str, str]]) → FakeLLMModel[source]

Create a new instance with a rule that returns responses in sequence.

Each call to ask() or ask_stream() will return the next response in the sequence. Once all responses are exhausted, the rule will no longer match.

Parameters:: responses (List[Union[str, Tuple[str, str]]]) – List of responses to return in order. Each can be a string or tuple of (reasoning, content).
Returns:: A new FakeLLMModel instance with the sequence rule added.
Return type:: FakeLLMModel
Raises:: ValueError – If the response list is empty.

Example::

>>> model = FakeLLMModel()
>>> seq_model = model.response_sequence([
...     "First response",
...     ("thinking about second", "Second response"),
...     "Third response"
... ])
>>> seq_model.ask([{"role": "user", "content": "test1"}])
'First response'
>>> seq_model.ask([{"role": "user", "content": "test2"}])
'Second response'
>>> seq_model.ask([{"role": "user", "content": "test3"}])
'Third response'

response_when(fn_when: Callable[[...], bool], response: str | Tuple[str, str] | Callable[[...], str | Tuple[str, str]]) → FakeLLMModel[source]

Create a new instance with a conditional rule that returns the specified response when the condition is met.

Parameters:

fn_when (Callable[..., bool]) – A callable that takes (messages, **params) and returns bool.
response (FakeResponseTyping) – The response to return when condition is True.

Returns:

A new FakeLLMModel instance with the added rule.

Return type:

FakeLLMModel

Example::

>>> model = FakeLLMModel()
>>> new_model = model.response_when(
...     lambda messages, **params: len(messages) > 2,
...     "Long conversation response"
... )

response_when_keyword_in_last_message(keywords: str | List[str], response: str | Tuple[str, str] | Callable[[...], str | Tuple[str, str]]) → FakeLLMModel[source]

Create a new instance with a rule that returns the specified response when any keyword is found in the last message.

Parameters:

keywords (Union[str, List[str]]) – A keyword or list of keywords to match in the last message content.
response (FakeResponseTyping) – The response to return when keyword is found.

Returns:

A new FakeLLMModel instance with the added rule.

Return type:

FakeLLMModel

Example::

>>> model = FakeLLMModel()
>>> new_model = model.response_when_keyword_in_last_message(
...     ["weather", "temperature"],
...     "It's 25 degrees and sunny!"
... )
>>> new_model.ask([{"role": "user", "content": "What's the weather?"}])
"It's 25 degrees and sunny!"

property rules_count: int

Get the number of configured response rules.

Returns:: The count of rules currently configured.
Return type:: int

property stream_wps: float

Get the streaming words per second rate.

Returns:: The words per second rate for streaming.
Return type:: float

with_stream_wps(stream_wps: float) → FakeLLMModel[source]

Create a new instance with a different streaming words per second rate.

Parameters:: stream_wps (float) – The new words per second rate for streaming responses.
Returns:: A new FakeLLMModel instance with the updated stream rate.
Return type:: FakeLLMModel

Example::

>>> model = FakeLLMModel(stream_wps=50)
>>> fast_model = model.with_stream_wps(100)
>>> fast_model.stream_wps
100
>>> model.stream_wps  # Original unchanged
50