hbllmutils.testing.base

This module provides a framework for conducting binary tests on language models.

It defines data structures for storing test results and a base class for implementing binary tests. Binary tests are tests that have a pass/fail outcome, and can be run multiple times to gather statistics.

Classes:

BinaryTestResult: Stores the result of a single binary test MultiBinaryTestResult: Stores and analyzes results from multiple binary tests BinaryTest: Base class for implementing binary tests on language models

BinaryTestResult

class hbllmutils.testing.base.BinaryTestResult(passed: bool, content: str)[source]

Data class representing the result of a single binary test.

Parameters:
  • passed (bool) – Whether the test passed or failed.

  • content (str) – The content or output from the test.

MultiBinaryTestResult

class hbllmutils.testing.base.MultiBinaryTestResult(tests: List[BinaryTestResult], total_count: int = 0, passed_count: int = 0, passed_ratio: float = 0, failed_count: int = 0, failed_ratio: float = 0)[source]

Data class representing aggregated results from multiple binary tests.

This class automatically calculates statistics about the test results, including total count, passed/failed counts, and their ratios.

Parameters:
  • tests (List[BinaryTestResult]) – List of individual binary test results.

  • total_count (int) – Total number of tests (automatically calculated).

  • passed_count (int) – Number of tests that passed (automatically calculated).

  • passed_ratio (float) – Ratio of tests that passed (automatically calculated).

  • failed_count (int) – Number of tests that failed (automatically calculated).

  • failed_ratio (float) – Ratio of tests that failed (automatically calculated).

Example::
>>> results = [BinaryTestResult(passed=True, content="test1"), 
...            BinaryTestResult(passed=False, content="test2")]
>>> multi_result = MultiBinaryTestResult(tests=results)
>>> multi_result.passed_ratio
0.5
__post_init__()[source]

Post-initialization method that calculates test statistics.

This method is automatically called after the dataclass is initialized. It computes the total count, passed/failed counts, and their ratios based on the provided test results.

BinaryTest

class hbllmutils.testing.base.BinaryTest[source]

Base class for implementing binary tests on language models.

This class provides a framework for running tests that have a pass/fail outcome. Tests can be run once or multiple times to gather statistics. Subclasses should implement the _single_test method to define the specific test logic.

Variables:

__desc_name__ – Optional descriptive name for the test, used in progress bars.

Methods:

_single_test: Abstract method to implement the test logic (must be overridden) test: Run the test one or multiple times and return results

test(model: str | LLMModel, n: int = 1, silent: bool = False, **params) BinaryTestResult | MultiBinaryTestResult[source]

Run the binary test one or multiple times on the given model.

If n=1, runs a single test and returns a BinaryTestResult. If n>1, runs multiple tests and returns a MultiBinaryTestResult with aggregated statistics.

Parameters:
  • model (LLMModelTyping) – The language model to test. Can be a model instance or a model identifier.

  • n (int) – Number of times to run the test, defaults to 1.

  • silent (bool) – If True, suppresses the progress bar, defaults to False.

  • params (dict) – Additional parameters to pass to the test.

Returns:

Single test result if n=1, otherwise aggregated results.

Return type:

Union[BinaryTestResult, MultiBinaryTestResult]

Example::
>>> test = MyBinaryTest()  # Assuming MyBinaryTest is a subclass
>>> result = test.test(model, n=10)
>>> print(f"Pass rate: {result.passed_ratio}")
Pass rate: 0.8