hbllmutils.meta.code.tree

File path pattern management and directory tree building utilities for Python projects.

This module provides comprehensive functionality for managing file path patterns and determining which files should be ignored based on Python project conventions and custom patterns. It enables building filtered directory tree structures while respecting gitignore-style patterns commonly used in Python projects.

The module contains the following main components:

Key features include:

  • Pattern matching using gitignore-style patterns via pathspec library

  • Comprehensive default Python gitignore patterns covering common artifacts

  • Support for custom additional ignore patterns

  • Directory tree building with optional focus item highlighting

  • Text-based tree visualization with box-drawing characters

  • LRU caching of pattern matchers for performance optimization

Note

The module uses pathspec library for gitignore-style pattern matching, which provides robust and standards-compliant pattern evaluation.

Warning

Large directory structures may require significant time to traverse. Consider using extra_patterns to filter out unnecessary directories early.

Example:

>>> from hbllmutils.meta.code.tree import build_python_project_tree, get_python_project_tree_text
>>> 
>>> # Build a directory tree structure
>>> root, tree = build_python_project_tree('/path/to/project')
>>> print(root)
'project'
>>> 
>>> # Generate formatted text representation
>>> print(get_python_project_tree_text('/path/to/project'))
project
├── src
│   ├── main.py
│   └── utils.py
└── tests
    └── test_main.py
>>> 
>>> # Highlight specific files with focus labels
>>> print(get_python_project_tree_text(
...     '/path/to/project',
...     focus_items={'entry': 'src/main.py', 'config': 'config.yaml'}
... ))
project
├── src
│   ├── main.py <-- (entry)
│   └── utils.py
├── tests
│   └── test_main.py
└── config.yaml <-- (config)

is_file_should_ignore

hbllmutils.meta.code.tree.is_file_should_ignore(path: str | Path, extra_patterns: List[str] | None = None) bool[source]

Determine whether a file should be ignored based on Python gitignore patterns.

This function checks if the given file path matches any of the default Python gitignore patterns or any additional custom patterns provided. It uses a cached PathSpec matcher for efficient pattern matching. The function handles both string paths and pathlib.Path objects, converting them to POSIX-style paths for consistent pattern matching across platforms.

Parameters:
  • path (Union[str, pathlib.Path]) – The file path to check against ignore patterns. Can be absolute or relative.

  • extra_patterns (Optional[List[str]]) – Optional list of additional patterns to check beyond the default Python gitignore patterns. Patterns follow gitignore syntax.

Returns:

True if the file should be ignored (matches any pattern), False otherwise.

Return type:

bool

Note

The extra_patterns list is sorted and converted to a tuple for caching purposes. This ensures consistent cache keys regardless of the original list order.

Example:

>>> is_file_should_ignore('__pycache__/test.pyc')
True
>>> is_file_should_ignore('main.py')
False
>>> is_file_should_ignore('test.txt', extra_patterns=['*.txt'])
True
>>> 
>>> # Works with pathlib.Path objects
>>> from pathlib import Path
>>> is_file_should_ignore(Path('build/output.so'))
True
>>> 
>>> # Custom patterns can be added
>>> is_file_should_ignore('data.csv', extra_patterns=['*.csv', '*.json'])
True

build_python_project_tree

hbllmutils.meta.code.tree.build_python_project_tree(root_path: str, extra_patterns: List[str] | None = None, focus_items: dict | None = None) Tuple[str, List][source]

Build a directory tree structure for a Python project while respecting ignore patterns.

This function recursively traverses the directory structure starting from the root path, filtering out files and directories that match the Python gitignore patterns or any additional custom patterns provided. It returns a tree structure representation of the project suitable for visualization or further processing. Optionally, specific files or directories can be highlighted with focus labels to draw attention to important items.

Parameters:
  • root_path (str) – The root directory path to start building the tree from. Can be absolute or relative to the current working directory.

  • extra_patterns (Optional[List[str]]) – Optional list of additional patterns to ignore beyond the default Python gitignore patterns. Patterns follow gitignore syntax.

  • focus_items (Optional[dict]) – Optional dictionary mapping focus labels to file/directory paths that should be highlighted. The paths must be within the root_path or its subdirectories. Paths can be either absolute or relative to root_path. Focus items are marked with “ <– (label)” suffix in their names.

Returns:

A tuple containing: - The name of the root directory (str) - A list of tree nodes representing the directory structure Each tree node is a tuple of (name, children) where: - name (str) is the file/directory name, optionally with focus suffix - children (list) is a list of child nodes (empty for files)

Return type:

Tuple[str, List]

Raises:
  • ValueError – If a focus item path is not within the root path or its subdirectories.

  • PermissionError – If a directory cannot be accessed due to permissions (caught and marked in tree as “(Permission Denied)”).

Note

Empty directories (after filtering) are excluded from the tree structure. Only directories containing at least one non-ignored file are included.

Warning

Large directory structures may take significant time to traverse. Consider using extra_patterns to filter out large directories early in the traversal.

Example:

>>> root, tree = build_python_project_tree('/path/to/project')
>>> print(root)
'project'
>>> print(tree)
[('src', [('main.py', []), ('utils.py', [])]), ('tests', [('test_main.py', [])])]
>>> 
>>> # With focus items to highlight specific files
>>> root, tree = build_python_project_tree(
...     '/path/to/project',
...     focus_items={'entry': 'src/main.py', 'config': 'config.yaml'}
... )
>>> print(tree)
[('src', [('main.py <-- (entry)', []), ('utils.py', [])]),
 ('tests', [('test_main.py', [])]),
 ('config.yaml <-- (config)', [])]
>>> 
>>> # With extra ignore patterns
>>> root, tree = build_python_project_tree(
...     '/path/to/project',
...     extra_patterns=['*.md', 'docs/']
... )

get_python_project_tree_text

hbllmutils.meta.code.tree.get_python_project_tree_text(root_path: str, extra_patterns: List[str] | None = None, focus_items: dict | None = None, encoding: str | None = None) str[source]

Generate a formatted text representation of a Python project’s directory tree.

This function builds a directory tree structure for a Python project and formats it as a text string with tree-like visual formatting using box-drawing characters (UTF-8) or ASCII characters depending on the encoding. It respects Python gitignore patterns and can optionally highlight specific files or directories with focus labels.

Parameters:
  • root_path (str) – The root directory path to start building the tree from. Can be absolute or relative to the current working directory.

  • extra_patterns (Optional[List[str]]) – Optional list of additional patterns to ignore beyond the default Python gitignore patterns. Patterns follow gitignore syntax.

  • focus_items (Optional[dict]) – Optional dictionary mapping focus labels to file/directory paths that should be highlighted. The paths must be within the root_path or its subdirectories. Focus items are marked with “ <– (label)” suffix.

  • encoding (Optional[str]) – Encoding to be used for tree formatting. Default is None which means system encoding. When ASCII encoding is used, ASCII characters will be used instead of UTF-8 box-drawing characters for wider compatibility.

Returns:

A formatted string representation of the directory tree with visual tree structure using box-drawing characters (├──, └──, │) or ASCII equivalents.

Return type:

str

Raises:

ValueError – If a focus item path is not within the root path or its subdirectories.

Note

The function automatically selects appropriate characters based on encoding: - UTF-8 encoding uses Unicode box-drawing characters for better visual appearance - ASCII encoding uses simple ASCII characters (+, |, -) for maximum compatibility

Example:

>>> print(get_python_project_tree_text('/path/to/project'))
project
├── src
│   ├── main.py
│   └── utils.py
└── tests
    └── test_main.py
>>> 
>>> # With focus items to highlight specific files
>>> print(get_python_project_tree_text(
...     '/path/to/project',
...     focus_items={'entry': 'src/main.py', 'test': 'tests/test_main.py'}
... ))
project
├── src
│   ├── main.py <-- (entry)
│   └── utils.py
└── tests
    └── test_main.py <-- (test)
>>> 
>>> # With ASCII encoding for compatibility
>>> print(get_python_project_tree_text('/path/to/project', encoding='ASCII'))
project
+-- src
|   +-- main.py
|   +-- utils.py
+-- tests
    +-- test_main.py
>>> 
>>> # With extra ignore patterns
>>> print(get_python_project_tree_text(
...     '/path/to/project',
...     extra_patterns=['*.md', 'docs/', 'examples/']
... ))