hbllmutils.model.fake
Fake LLM Model Module.
This module provides a fake implementation of an LLM (Large Language Model) for testing and development purposes. It simulates LLM behavior by returning predefined responses based on configurable rules, supporting both synchronous and streaming response modes with customizable word-per-second rates.
The module contains the following main components:
FakeResponseSequence- Immutable sequence handler for ordered responsesFakeResponseStream- Streaming response wrapper with reasoning/content separationFakeLLMModel- Immutable mock LLM model with rule-driven responses
Note
The streaming implementation uses jieba for word segmentation, which
is optimized for Chinese text. English text will still be tokenized, but the
granularity may differ from character-level streaming.
Example:
>>> model = FakeLLMModel(stream_wps=20)
>>> model = model.response_when_keyword_in_last_message(
... ["weather", "temperature"],
... ("thinking...", "It's sunny today!")
... )
>>> model.ask([{"role": "user", "content": "What's the weather?"}])
"It's sunny today!"
>>> seq_model = model.response_sequence(["First", "Second"])
>>> seq_model.ask([{"role": "user", "content": "next"}])
'First'
FakeResponseTyping
- hbllmutils.model.fake.FakeResponseTyping
Type alias for fake response types: string, (reasoning, content) tuple, or callable response factory.
alias of
str|Tuple[str,str] |Callable[[…],str|Tuple[str,str]]
FakeResponseSequence
- class hbllmutils.model.fake.FakeResponseSequence(responses: List[str | Tuple[str, str]], index: int = 0)[source]
A sequence-based response handler that returns responses in order.
This class maintains immutability by creating new instances when the index changes, ensuring thread safety and compatibility with
FakeLLMModel’s immutable design.- Parameters:
responses (List[Union[str, Tuple[str, str]]]) – List of responses to return in order.
index (int) – Current index in the sequence, defaults to
0.
- Variables:
_response_contents (Tuple[Union[str, Tuple[str, str]], ...]) – Immutable tuple of response items.
_index (int) – Current position in the sequence.
Example:
>>> sequence = FakeResponseSequence(["A", ("thinking", "B")]) >>> sequence.response([{"role": "user", "content": "hi"}]) ('', 'A') >>> sequence.advance().response([{"role": "user", "content": "hi"}]) ('thinking', 'B')
- __eq__(other: object) bool[source]
Check equality with another FakeResponseSequence instance.
- Parameters:
other (object) – The other instance to compare with.
- Returns:
True if instances are equal, False otherwise.
- Return type:
bool
- __hash__() int[source]
Return hash for use in sets and as dict keys.
- Returns:
Hash value of the instance.
- Return type:
int
- __init__(responses: List[str | Tuple[str, str]], index: int = 0) None[source]
Initialize the response sequence.
- Parameters:
responses (List[Union[str, Tuple[str, str]]]) – List of responses to return in order.
index (int) – Current index in the sequence (default: 0).
- __repr__() str[source]
Return string representation of the sequence.
- Returns:
String representation showing responses and current index.
- Return type:
str
- advance() FakeResponseSequence[source]
Create a new instance with the index advanced by 1.
- Returns:
A new FakeResponseSequence instance with incremented index.
- Return type:
- property current_index: int
Get the current index in the sequence.
- Returns:
The current index position.
- Return type:
int
- property has_more_responses: bool
Check if there are more responses available.
- Returns:
True if more responses are available, False otherwise.
- Return type:
bool
- reset() FakeResponseSequence[source]
Create a new instance with the index reset to 0.
- Returns:
A new FakeResponseSequence instance with index reset to 0.
- Return type:
- response(messages: List[dict], **params: Any) Tuple[str, str][source]
Get the next response in the sequence.
- Parameters:
messages (List[dict]) – The list of message dictionaries.
params (dict) – Additional parameters (unused).
- Returns:
A tuple of (reasoning_content, content).
- Return type:
Tuple[str, str]
- Raises:
IndexError – If no more responses are available.
- rule_check(messages: List[dict], **params: Any) bool[source]
Check if this sequence can provide a response.
- Parameters:
messages (List[dict]) – The list of message dictionaries.
params (dict) – Additional parameters (unused).
- Returns:
True if there are more responses available, False otherwise.
- Return type:
bool
- property total_responses: int
Get the total number of responses in the sequence.
- Returns:
The total count of responses.
- Return type:
int
FakeResponseStream
- class hbllmutils.model.fake.FakeResponseStream(session: Any, with_reasoning: bool = False, reasoning_splitter: str = '---------------------------reasoning---------------------------', content_splitter: str = '---------------------------content---------------------------')[source]
A fake response stream that handles streaming responses with reasoning and content.
This class extends
ResponseStreamto provide a simple implementation for testing purposes, where chunks are tuples of(reasoning_content, content).
FakeLLMModel
- class hbllmutils.model.fake.FakeLLMModel(stream_wps: float = 50, rules: List[Tuple[Callable, str | Tuple[str, str] | Callable[[...], str | Tuple[str, str]]]] | None = None)[source]
An immutable fake LLM model implementation for testing and development.
This class simulates an LLM by returning predefined responses based on configurable rules. It supports both synchronous and streaming response modes, with customizable streaming speed. Responses can be configured to match specific conditions or keywords in messages.
All modification operations return new instances, ensuring immutability and thread safety.
- Parameters:
stream_wps (float) – Words per second for streaming responses, defaults to
50.rules (Optional[List[Tuple[Callable, FakeResponseTyping]]]) – List of
(rule_function, response)tuples. Internal parameter.
- Variables:
_stream_wps (float) – Words-per-second rate for streaming.
_rules (Tuple[Tuple[Callable, FakeResponseTyping], ...]) – Immutable tuple of response rules.
_frozen (bool) – Immutability flag, set after initialization.
Example:
>>> model = FakeLLMModel(stream_wps=50) >>> model_with_rule = model.response_when_keyword_in_last_message("weather", "It's sunny today!") >>> response = model_with_rule.ask([{"role": "user", "content": "What's the weather?"}]) >>> print(response) It's sunny today! >>> final_model = model_with_rule.response_always("Hello, I'm a fake LLM!") >>> response = final_model.ask([{"role": "user", "content": "Hi"}]) >>> print(response) Hello, I'm a fake LLM!
- __delattr__(name: str) None[source]
Prevent attribute deletion to ensure immutability.
- Parameters:
name (str) – The attribute name.
- Raises:
AttributeError – Always, as deletion is not allowed.
- __init__(stream_wps: float = 50, rules: List[Tuple[Callable, str | Tuple[str, str] | Callable[[...], str | Tuple[str, str]]]] | None = None) None[source]
Initialize the fake LLM model.
- Parameters:
stream_wps (float) – Words per second for streaming responses (default: 50).
rules (Optional[List[Tuple[Callable, FakeResponseTyping]]]) – List of (rule_function, response) tuples. Internal parameter, not intended for direct use.
- __repr__() str[source]
Return a string representation of the FakeLLMModel instance.
Shows the stream_wps parameter and the number of configured rules.
- Returns:
String representation of the instance.
- Return type:
str
- Example::
>>> model = FakeLLMModel(stream_wps=100).response_always("Hello") >>> repr(model) 'FakeLLMModel(stream_wps=100, rules_count=1)'
- __setattr__(name: str, value: Any) None[source]
Prevent attribute modification after initialization to ensure immutability.
- Parameters:
name (str) – The attribute name.
value (Any) – The attribute value.
- Raises:
AttributeError – If attempting to modify attributes after initialization.
- ask(messages: List[dict], with_reasoning: bool = False, **params: Any) str | Tuple[str | None, str][source]
Send messages and get a synchronous response.
- Parameters:
messages (List[dict]) – The list of message dictionaries containing conversation history.
with_reasoning (bool) – If True, return both reasoning and content as a tuple (default: False).
params (dict) – Additional parameters to pass to response functions.
- Returns:
The response content string, or tuple of (reasoning_content, content) if with_reasoning is True.
- Return type:
Union[str, Tuple[Optional[str], str]]
- Example::
>>> model = FakeLLMModel().response_always(("thinking...", "final answer")) >>> model.ask([{"role": "user", "content": "test"}]) 'final answer' >>> model.ask([{"role": "user", "content": "test"}], with_reasoning=True) ('thinking...', 'final answer')
- ask_stream(messages: List[dict], with_reasoning: bool = False, **params: Any) ResponseStream[source]
Send messages and get a streaming response.
This method returns a
ResponseStreamthat yields the response word-by-word, simulating the streaming behavior of a real LLM. The streaming speed is controlled by thestream_wpsparameter set during initialization.- Parameters:
messages (List[dict]) – The list of message dictionaries containing conversation history.
with_reasoning (bool) – If True, include reasoning content in the stream (default: False).
params (dict) – Additional parameters to pass to response functions.
- Returns:
A ResponseStream object that yields word-by-word chunks.
- Return type:
- Example::
>>> model = FakeLLMModel(stream_wps=10).response_always("Hello world") >>> stream = model.ask_stream([{"role": "user", "content": "Hi"}]) >>> for chunk in stream: ... print(chunk, end='', flush=True) Hello world
- clear_rules() FakeLLMModel[source]
Create a new instance with all rules removed.
- Returns:
A new FakeLLMModel instance with no rules.
- Return type:
- Example::
>>> model = FakeLLMModel().response_always("Hello") >>> model.rules_count 1 >>> clean_model = model.clear_rules() >>> clean_model.rules_count 0
- response_always(response: str | Tuple[str, str] | Callable[[...], str | Tuple[str, str]]) FakeLLMModel[source]
Create a new instance with a rule that always returns the specified response.
- Parameters:
response (FakeResponseTyping) – The response to return, can be a string, tuple of (reasoning, content), or callable.
- Returns:
A new FakeLLMModel instance with the added rule.
- Return type:
- Example::
>>> model = FakeLLMModel() >>> new_model = model.response_always("Default response") >>> new_model.ask([{"role": "user", "content": "anything"}]) 'Default response' >>> model.rules_count # Original unchanged 0 >>> new_model.rules_count 1
- response_sequence(responses: List[str | Tuple[str, str]]) FakeLLMModel[source]
Create a new instance with a rule that returns responses in sequence.
Each call to
ask()orask_stream()will return the next response in the sequence. Once all responses are exhausted, the rule will no longer match.- Parameters:
responses (List[Union[str, Tuple[str, str]]]) – List of responses to return in order. Each can be a string or tuple of (reasoning, content).
- Returns:
A new FakeLLMModel instance with the sequence rule added.
- Return type:
- Raises:
ValueError – If the response list is empty.
- Example::
>>> model = FakeLLMModel() >>> seq_model = model.response_sequence([ ... "First response", ... ("thinking about second", "Second response"), ... "Third response" ... ]) >>> seq_model.ask([{"role": "user", "content": "test1"}]) 'First response' >>> seq_model.ask([{"role": "user", "content": "test2"}]) 'Second response' >>> seq_model.ask([{"role": "user", "content": "test3"}]) 'Third response'
- response_when(fn_when: Callable[[...], bool], response: str | Tuple[str, str] | Callable[[...], str | Tuple[str, str]]) FakeLLMModel[source]
Create a new instance with a conditional rule that returns the specified response when the condition is met.
- Parameters:
fn_when (Callable[..., bool]) – A callable that takes (messages, **params) and returns bool.
response (FakeResponseTyping) – The response to return when condition is True.
- Returns:
A new FakeLLMModel instance with the added rule.
- Return type:
- Example::
>>> model = FakeLLMModel() >>> new_model = model.response_when( ... lambda messages, **params: len(messages) > 2, ... "Long conversation response" ... )
- response_when_keyword_in_last_message(keywords: str | List[str], response: str | Tuple[str, str] | Callable[[...], str | Tuple[str, str]]) FakeLLMModel[source]
Create a new instance with a rule that returns the specified response when any keyword is found in the last message.
- Parameters:
keywords (Union[str, List[str]]) – A keyword or list of keywords to match in the last message content.
response (FakeResponseTyping) – The response to return when keyword is found.
- Returns:
A new FakeLLMModel instance with the added rule.
- Return type:
- Example::
>>> model = FakeLLMModel() >>> new_model = model.response_when_keyword_in_last_message( ... ["weather", "temperature"], ... "It's 25 degrees and sunny!" ... ) >>> new_model.ask([{"role": "user", "content": "What's the weather?"}]) "It's 25 degrees and sunny!"
- property rules_count: int
Get the number of configured response rules.
- Returns:
The count of rules currently configured.
- Return type:
int
- property stream_wps: float
Get the streaming words per second rate.
- Returns:
The words per second rate for streaming.
- Return type:
float
- with_stream_wps(stream_wps: float) FakeLLMModel[source]
Create a new instance with a different streaming words per second rate.
- Parameters:
stream_wps (float) – The new words per second rate for streaming responses.
- Returns:
A new FakeLLMModel instance with the updated stream rate.
- Return type:
- Example::
>>> model = FakeLLMModel(stream_wps=50) >>> fast_model = model.with_stream_wps(100) >>> fast_model.stream_wps 100 >>> model.stream_wps # Original unchanged 50