hbllmutils.meta.code.pypi_downloads

PyPI package download statistics and popularity analysis utilities.

This module provides functionality for analyzing PyPI package popularity based on download statistics. It includes utilities for loading download data, querying package popularity, and determining if packages meet specific download thresholds.

The module contains the following main components:

Note

Download statistics are cached using LRU cache for performance optimization. The data is loaded from a bundled CSV file containing package download counts.

Warning

The download statistics are static and reflect data at the time of package installation. For real-time statistics, consider using the PyPI API directly.

Example:

>>> from hbllmutils.meta.code.pypi_downloads import get_pypi_downloads, is_hot_pypi_project
>>> 
>>> # Get all download statistics
>>> df = get_pypi_downloads()
>>> print(df.head())
>>> 
>>> # Check if a package is popular
>>> is_popular = is_hot_pypi_project('numpy', min_last_month_downloads=1000000)
>>> print(f"Is numpy popular? {is_popular}")
>>> 
>>> # Check with custom threshold
>>> is_very_popular = is_hot_pypi_project('requests', min_last_month_downloads=5000000)

get_pypi_downloads

hbllmutils.meta.code.pypi_downloads.get_pypi_downloads() DataFrame[source]

Load PyPI package download statistics from bundled CSV file.

This function reads download statistics for PyPI packages from a CSV file bundled with the module. The data includes package names and their download counts for the last month. Results are cached for improved performance on subsequent calls.

Returns:

DataFrame containing package download statistics with columns: - ‘name’: Package name (str) - ‘last_month’: Download count for the last month (int)

Return type:

pd.DataFrame

Raises:
  • FileNotFoundError – If the pypi_downloads.csv file is not found

  • pd.errors.EmptyDataError – If the CSV file is empty

  • pd.errors.ParserError – If the CSV file format is invalid

Note

This function uses LRU cache with unlimited size. The data is loaded only once per Python session and reused for all subsequent calls.

Warning

The returned DataFrame should not be modified directly as it is cached. Create a copy if modifications are needed.

Example:

>>> df = get_pypi_downloads()
>>> print(df.columns)
Index(['name', 'last_month'], dtype='object')
>>> 
>>> # Get top 5 most downloaded packages
>>> top_packages = df.nlargest(5, 'last_month')
>>> print(top_packages)
>>> 
>>> # Get specific package statistics
>>> numpy_stats = df[df['name'] == 'numpy']
>>> print(numpy_stats['last_month'].values[0])

is_hot_pypi_project

hbllmutils.meta.code.pypi_downloads.is_hot_pypi_project(pypi_name: str, min_last_month_downloads: int = 1000000) bool[source]

Check if a PyPI package meets the specified popularity threshold.

This function determines whether a given PyPI package is considered “hot” or popular based on its download count from the last month. A package is considered hot if its download count meets or exceeds the specified minimum threshold.

Parameters:
  • pypi_name (str) – Name of the PyPI package to check

  • min_last_month_downloads (int, optional) – Minimum download count threshold for considering a package as hot, defaults to 1000000 (1 million)

Returns:

True if the package exists and meets the download threshold, False otherwise

Return type:

bool

Note

The function returns False if the package name is not found in the statistics, even if the package exists on PyPI. This means the package either doesn’t exist or wasn’t included in the statistics dataset.

Warning

Package name matching is case-sensitive. Ensure the package name matches exactly as it appears on PyPI.

Example:

>>> # Check if numpy is a hot project (default 1M threshold)
>>> is_hot_pypi_project('numpy')
True
>>> 
>>> # Check with custom threshold
>>> is_hot_pypi_project('requests', min_last_month_downloads=5000000)
True
>>> 
>>> # Check a less popular package
>>> is_hot_pypi_project('obscure-package', min_last_month_downloads=100)
False
>>> 
>>> # Check non-existent package
>>> is_hot_pypi_project('this-package-does-not-exist')
False