hbllmutils.meta.code.pypi_downloads
PyPI package download statistics and popularity analysis utilities.
This module provides functionality for analyzing PyPI package popularity based on download statistics. It includes utilities for loading download data, querying package popularity, and determining if packages meet specific download thresholds.
The module contains the following main components:
get_pypi_downloads()- Load PyPI download statistics from CSV datais_hot_pypi_project()- Check if a package meets popularity threshold
Note
Download statistics are cached using LRU cache for performance optimization. The data is loaded from a bundled CSV file containing package download counts.
Warning
The download statistics are static and reflect data at the time of package installation. For real-time statistics, consider using the PyPI API directly.
Example:
>>> from hbllmutils.meta.code.pypi_downloads import get_pypi_downloads, is_hot_pypi_project
>>>
>>> # Get all download statistics
>>> df = get_pypi_downloads()
>>> print(df.head())
>>>
>>> # Check if a package is popular
>>> is_popular = is_hot_pypi_project('numpy', min_last_month_downloads=1000000)
>>> print(f"Is numpy popular? {is_popular}")
>>>
>>> # Check with custom threshold
>>> is_very_popular = is_hot_pypi_project('requests', min_last_month_downloads=5000000)
get_pypi_downloads
- hbllmutils.meta.code.pypi_downloads.get_pypi_downloads() DataFrame[source]
Load PyPI package download statistics from bundled CSV file.
This function reads download statistics for PyPI packages from a CSV file bundled with the module. The data includes package names and their download counts for the last month. Results are cached for improved performance on subsequent calls.
- Returns:
DataFrame containing package download statistics with columns: - ‘name’: Package name (str) - ‘last_month’: Download count for the last month (int)
- Return type:
pd.DataFrame
- Raises:
FileNotFoundError – If the pypi_downloads.csv file is not found
pd.errors.EmptyDataError – If the CSV file is empty
pd.errors.ParserError – If the CSV file format is invalid
Note
This function uses LRU cache with unlimited size. The data is loaded only once per Python session and reused for all subsequent calls.
Warning
The returned DataFrame should not be modified directly as it is cached. Create a copy if modifications are needed.
Example:
>>> df = get_pypi_downloads() >>> print(df.columns) Index(['name', 'last_month'], dtype='object') >>> >>> # Get top 5 most downloaded packages >>> top_packages = df.nlargest(5, 'last_month') >>> print(top_packages) >>> >>> # Get specific package statistics >>> numpy_stats = df[df['name'] == 'numpy'] >>> print(numpy_stats['last_month'].values[0])
is_hot_pypi_project
- hbllmutils.meta.code.pypi_downloads.is_hot_pypi_project(pypi_name: str, min_last_month_downloads: int = 1000000) bool[source]
Check if a PyPI package meets the specified popularity threshold.
This function determines whether a given PyPI package is considered “hot” or popular based on its download count from the last month. A package is considered hot if its download count meets or exceeds the specified minimum threshold.
- Parameters:
pypi_name (str) – Name of the PyPI package to check
min_last_month_downloads (int, optional) – Minimum download count threshold for considering a package as hot, defaults to 1000000 (1 million)
- Returns:
True if the package exists and meets the download threshold, False otherwise
- Return type:
bool
Note
The function returns False if the package name is not found in the statistics, even if the package exists on PyPI. This means the package either doesn’t exist or wasn’t included in the statistics dataset.
Warning
Package name matching is case-sensitive. Ensure the package name matches exactly as it appears on PyPI.
Example:
>>> # Check if numpy is a hot project (default 1M threshold) >>> is_hot_pypi_project('numpy') True >>> >>> # Check with custom threshold >>> is_hot_pypi_project('requests', min_last_month_downloads=5000000) True >>> >>> # Check a less popular package >>> is_hot_pypi_project('obscure-package', min_last_month_downloads=100) False >>> >>> # Check non-existent package >>> is_hot_pypi_project('this-package-does-not-exist') False