HAlphaAnomalyzer.anomalyzer
Module Contents
Classes
A class for detecting anomalies in full-disk H-Alpha solar observations by |
- class HAlphaAnomalyzer.anomalyzer.Anomalyzer(grid_size: int = 8)[source]
A class for detecting anomalies in full-disk H-Alpha solar observations by superimposing a grid, computing average pixel values per grid cell, and performing statistical analysis to determine the best normal range values per grid cell.
Attributes
- grid_sizeint
The number of rows and columns to divide each image into.
- best_rangespd.DataFrame
The DataFrame containing the best range values for each grid cell.
- images_datapd.DataFrame
The DataFrame containing the training images data.
- meanpd.DataFrame
The DataFrame containing the mean S statistic values for each grid cell.
- stdpd.DataFrame
The DataFrame containing the standard deviation of S statistic values for each grid cell.
- _process_image(image_path: str) List[Tuple[int, int, float, float]][source]
Processes an image to calculate the average pixel and S statistic values for each cell in a grid for a given image.
This function reads the image from the specified path in grayscale, divides it into a grid of cells, and computes the average pixel value for each cell. It calculates the S statistic as the sum of absolute deviations between best range values and the average pixel values for each cell.
Parameters
- image_pathstr
The path to the image file.
Returns
- image_dataList[Tuple[int, int, float, float]]
A list containing the calculated average pixel and S statistic values for each grid cell in the image.
- _compute_stats() None[source]
Computes mean and standard deviation of S statistic values for each grid cell of the training images data.
- _compute_anomaly_likelihoods(image_paths: List[str]) pandas.DataFrame[source]
Computes the anomaly likelihood (standardized sigmoid of S statistic) of each grid cell of the test images.
This function processes a list of image paths to calculate the S statistic for each grid cell. The S statistic is then standardized and transformed using a sigmoid function to produce an anomaly likelihood for each cell.
Parameters
- image_pathsList[str]
The list of paths to the image files.
Returns
- df_anomaly_likelihoodspd.DataFrame
A DataFrame containing the calculated anomaly likelihoods of each grid cell of the test images.
- compute_best_ranges(non_anomalous_paths: List[str] = None, anomalous_paths: List[str] = None, lower_range_end: int = 20, upper_range_start: int = 80, step_size: int = 2) None[source]
Compute the best range values for each grid cell based on the highest One-way ANOVA F-test statistic.
Parameters
- non_anomalous_pathsList[str], optional
A list of paths to non-anomalous image files.
- anomalous_pathsList[str], optional
A list of paths to anomalous image files.
- lower_range_endint, optional
The end of candidate lower ranges, by default 20.
- upper_range_startint, optional
The start of candidate upper ranges, by default 80.
- step_sizeint, optional
The step size for candidate ranges, by default 2.
Notes
Users must provide both lists of paths to the anomalous and
non-anomalous image files for training; an error is raised if the lists are not provided. - The images should be H-alpha solar observations in JPG, JPEG, or PNG format. - Users can optionally set the lower_range_end, upper_range_start, and step_size parameters:
These parameters are used by the One-way ANOVA F-test to rank
the best range that differentiates between normal and anomalous images for each cell. - The lower range candidates will start from 0 and end at lower_range_end minus step_size. For example, if lower_range_end is set to 20 and step_size is 2, the lower range candidates will start from 0 and end at 18 with a step size of 2; the lower range percentage candidates are 0, 2, 4, 6, …,18. - The upper range candidates will start from upper_range_start and end at 100 minus step_size. For example, if upper_range_start is set to 80 and step_size is 2, the upper range candidates will start from 80 and end at 98 with a step size of 2; the upper range percentage candidates are 80, 82, 84, 86, …, 98.
- find_corrupt_images(image_paths: List[str] = None, likelihood_threshold: float = 0.5, min_corrupt_cells: int = 0, verbose: bool = False) List[int][source]
Identifies corrupt images based on the anomaly likelihood of grid cells.
This function evaluates each image based on the anomaly likelihoods of its grid cells. An image is marked as corrupt if the number of grid cells exceeding the likelihood threshold is greater than the specified minimum number of corrupt cells.
Parameters
- image_pathsList[str]
The list of paths to the image files.
- likelihood_thresholdfloat, optional
The threshold for the anomaly likelihood to consider a cell as corrupt, by default 0.5.
- min_corrupt_cellsint, optional
The minimum number of corrupt cells required to classify an image corrupt, by default 0.
- verbosebool, optional
If True, prints the number of corrupt images detected, by default False.
Returns
- anomaly_labelsList[int]
List of binary labels where 0 indicates a non-corrupt image and 1 indicates a corrupt image.
Notes
Users must provide a list of paths to the image files for testing.
The images should be H-alpha solar observations in JPG, JPEG,
or PNG format.
- plot_image_likelihoods(image_path: str = None, likelihood_threshold: float = None) None[source]
Plots the original image alongside the processed image with grid cell anomaly likelihoods indicated by a colormap. Optionally outlines corrupt cells based on a specified likelihood threshold.
Parameters
- image_pathstr
The path to the image file.
- likelihood_thresholdfloat, optional
The likelihood threshold for identifying corrupt cells, by default None.
Notes
Users must provide a paths to the image file for plotting.
The image should be H-alpha solar observation in JPG, JPEG,
or PNG format.