compare_kepler.py

load_Kepler_planets_cleaned()

Load a table of the Kepler objects of interest (KOIs) from a CSV file.

Returns:

planets_cleaned – A table with the properties of the KOIs.

Return type:

structured array

The table has the following columns:

  • kepid: The Kepler ID.

  • KOI: The KOI number.

  • koi_disposition: The disposition of the KOI.

  • koi_pdisposition: (TODO: TBD).

  • koi_score: The disposition score (between 0 and 1).

  • P: The orbital period (days).

  • t_D: The transit duration (hrs).

  • depth: The transit depth (ppm).

  • Rp: The planet radius (Earth radii).

  • teff: The stellar effective temperature (K).

  • logg: The log surface gravity of the star.

  • Rstar: The stellar radius (solar radii).

  • Mstar: The stellar mass (solar masses).

load_Kepler_stars_cleaned()

Load a table of Kepler target stars from a CSV file.

Returns:

stars_cleaned – A table with the properties of the Kepler target stars.

Return type:

structured array

The table has the following columns:

  • kepid: The Kepler ID.

  • mass: The stellar mass (solar masses).

  • radius: The stellar radius (solar radii).

  • teff: The stellar effective temperature (K).

  • bp_rp: The Gaia DR2 bp-rp color (mag).

  • lum_val: The luminosity (solar luminosities).

  • e_bp_rp_interp: The extinction in bp-rp color (mag) interpolated from a model/binning.

  • e_bp_rp_true: The extinction in bp-rp color (mag) as given in the Gaia DR2 catalog.

  • rrmscdpp04p5: The root-mean-square combined differential photometric precision (CDPP) for 4.5 hr durations (ppm).

compute_summary_stats_from_Kepler_catalog(P_min, P_max, radii_min, radii_max, Rstar_min=0.0, Rstar_max=10.0, Mstar_min=0.0, Mstar_max=10.0, teff_min=0.0, teff_max=10000.0, bp_rp_min=-5.0, bp_rp_max=5.0, i_stars_custom=None, compute_ratios=<function compute_ratios_adjacent>)

Compute detailed summary statistics per system in the Kepler catalog.

Parameters:
  • P_min (float) – The minimum orbital period (days) to be included.

  • P_max (float) – The maximum orbital period (days) to be included.

  • radii_min (float) – The minimum planet radius (Earth radii) to be included.

  • radii_max (float) – The maximum planet radius (Earth radii) to be included.

  • Rstar_min (float, default=0.) – The minimum stellar radius (solar radii) to be included.

  • Rstar_max (float, default=10.) – The maximum stellar radius (solar radii) to be included.

  • Mstar_min (float, default=0.) – The minimum stellar mass (solar masses) to be included.

  • Mstar_max (float, default=10.) – The maximum stellar mass (solar masses) to be included.

  • teff_min (float, default=0.) – The minimum stellar effective temperature (K) to be included.

  • teff_max (float, default=0.) – The maximum stellar effective temperature (K) to be included.

  • bp_rp_min (float, default=-5.) – The minimum Gaia DR2 bp-rp color (mag) to be included.

  • bp_rp_max (float, default=5.) – The maximum Gaia DR2 bp-rp color (mag) to be included.

  • i_stars_custom (array[int], default=None) – An array of indices for the stars in the Kepler stellar catalog to be included.

  • compute_ratios (func, default=compute_ratios_adjacent) – The function to use for computing ratios; can be either syssimpyplots.general.compute_ratios_adjacent() or syssimpyplots.general.compute_ratios_all().

Returns:

  • ssk_per_sys (dict) – A dictionary containing the planetary and stellar properties for each observed system (2-d and 1-d arrays).

  • ssk (dict) – A dictionary containing the planetary and stellar properties of all observed planets (1-d arrays).

The fields of ssk_per_sys and ssk are the same as those returned by syssimpyplots.load_sims.compute_summary_stats_from_cat_obs().

CRPD_dist(En, On)

Compute the Cressie-Read Power Divergence (CRPD) statistic for observed planet multiplicity distributions.

Warning

Can potentially return negative values for extreme/edge cases!

Parameters:
  • En (array[int]) – The ‘expected’ (i.e. simulated) numbers of total systems with 1,2,3,… observed planets.

  • On (array[int]) – The ‘observed’ (i.e. actual Kepler) numbers of total systems with 1,2,3,… observed planets.

Returns:

rho – The CRPD statistic.

Return type:

float

KS_dist_mult(x1, x2)

Compute the two-sample Kolmogorov-Smirnov (KS) distance between two discrete distributions taking on integer values.

Parameters:
  • x1 (array[int]) – A sample of integers.

  • x2 (array[int]) – A sample of integers.

Returns:

  • KS_dist (float) – The KS distance between the two distributions (i.e. the greatest distance between the cumulative distributions).

  • KS_x (float) – The x-value that corresponds to the greatest distance between the two cumulative distributions.

KS_dist(x1, x2)

Compute the two-sample Kolmogorov-Smirnov (KS) distance between two continuous distributions (i.e. no repeat values).

Parameters:
  • x1 (array[float]) – A sample of real values.

  • x2 (array[float]) – A sample of real values.

Returns:

  • KS_dist (float) – The KS distance between the two distributions (i.e. the greatest distance between the cumulative distributions).

  • KS_x (float) – The x-value that corresponds to the greatest distance between the two cumulative distributions.

AD_dist(x1, x2)

Compute the two-sample Anderson-Darling (AD) distance between two continuous distributions.

Implements Equation 1.2 of A. N. Pettitt (1976).

Note

Returns np.inf if there are not enough points (less than two) in either x1 or x2 for computing the AD distance.

Parameters:
  • x1 (array[float]) – A sample of real values.

  • x2 (array[float]) – A sample of real values.

Returns:

AD_dist – The AD distance between the two distributions.

Return type:

float

AD_dist2(x1, x2)

Compute the two-sample Anderson-Darling (AD) distance between two continuous distributions.

Implements Equation 3 of Scholz & Stephens (1987). Tested to be equivalent to syssimpyplots.compare_kepler.AD_dist().

Note

Returns np.inf if there are not enough points (less than two) in either x1 or x2 for computing the AD distance.

Parameters:
  • x1 (array[float]) – A sample of real values.

  • x2 (array[float]) – A sample of real values.

Returns:

AD_dist – The AD distance between the two distributions.

Return type:

float

AD_mod_dist(x1, x2)

Compute a modified version of the two-sample Anderson-Darling (AD) distance between two continuous distributions.

Equivalent to the AD distance (implemented by syssimpyplots.compare_kepler.AD_dist() and syssimpyplots.compare_kepler.AD_dist2()) without the factor of ‘n*m/N’ in front of the integral, where ‘n’ and ‘m’ are the sample sizes (and ‘N=n+m’ is the combined sample size).

Note

Returns np.inf if there are not enough points (less than two) in either x1 or x2 for computing the AD distance.

Parameters:
  • x1 (array[float]) – A sample of real values.

  • x2 (array[float]) – A sample of real values.

Returns:

AD_dist – The (modified) AD distance between the two distributions.

Return type:

float

load_split_stars_model_evaluations_and_weights(file_name)

Load a file containing the distances from many evaluations of the same model, and compute the weights for each distance term.

Parameters:

file_name (str) – The path/name of the file containing the distances of many model evaluations.

Returns:

  • Nmult_evals (dict) – A dictionary containing an array of observed planet multiplicity distributions for each model evaluation, for each stellar sample (all, bluer, and redder fields).

  • d_all_keys_evals (dict) – A dictionary containing an array of distance term names (strings) for each model evaluation, for each stellar sample.

  • d_all_vals_evals (dict) – A dictionary containing an array of distances (corresponding to the distance term names) for each model evaluation, for each stellar sample.

  • weights_all (dict) – A dictionary containing a dictionary for the weights corresponding to each distance term, for each stellar sample.

Note

The bluer and redder samples split the stellar sample into two equal sized samples of stars below and above the median Gaia DR2 bp-rp color, respectively.

Warning

Currently returns empty arrays in the Nmult_evals dictionary.

load_split_stars_weights_only()

Compute the weights for each distance term.

Wrapper to return just the weights from the function syssimpyplots.compare_kepler.load_split_stars_model_evaluations_and_weights().

compute_total_weighted_dist(weights, dists, dists_w, dists_include=[])

Compute the total weighted distance including a number of distance terms.

Also prints out the individual distance terms, their weights, and their unweighted and weighted distances.

Parameters:
  • weights (dict) – The dictionary containing the weights for to each distance term.

  • dists (dict) – The dictionary containing the individual distance terms.

  • dists_w (dict) – The dictionary containing the individual weighted distance terms.

  • dists_include (list[str], default=[]) – The list of distance terms (strings) to include in the sum.

Returns:

tot_dist_w – The total weighted distance of the included distance terms.

Return type:

float

compute_distances_sim_Kepler(sss_per_sys, sss, ssk_per_sys, ssk, weights, dists_include, N_sim, cos_factor=1.0, AD_mod=True, print_dists=True)

Compute weighted and unweighted distances for a large collection of distance terms.

Parameters:
  • sss_per_sys (dict) – A dictionary of summary statistics per observed system in a simulated observed catalog.

  • sss (dict) – A dictionary of summary statistics for all observed planets in a simulated observed catalog.

  • ssk_per_sys (dict) – A dictionary of summary statistics per observed system in the Kepler catalog.

  • ssk (dict) – A dictionary of summary statistics for observed all planets in the Kepler catalog.

  • weights (dict) – A dictionary of the weights corresponding to each distance term.

  • dists_include (list[str]) – A list of distance terms (strings) to be printed.

  • N_sim (int) – The number of target stars (i.e. simulated systems) in the simulated catalog.

  • cos_factor (float, default=1.) – The cosine of the maximum inclination angle (relative to the sky plane) drawn for the reference planes of the simulated systems (between 0 and 1).

  • AD_mod (bool, default=True) – Whether to compute the modified AD distance (syssimpyplots.compare_kepler.AD_mod_dist(), if True) or the standard AD distance (syssimpyplots.compare_kepler.AD_dist(), if False).

  • print_dists (bool, default=True) – Whether to print the distances corresponding to the terms in dists_include. If True, also prints the total numbers of observed planets and planet pairs in the simulated and Kepler catalogs.

Returns:

  • dists (dict) – A dictionary containing all the various (unweighted) distance terms.

  • dists_w (dict) – A dictionary containing all the various (weighted) distance terms.

Note

The distance terms computed and included in dists and dists_w are not limited to the terms in dists_include.