Functional Programming: Maximizing the Efficacy of Python Functions#

In this workbook, we’ll take our knowledge of python functions to the next level. In particular, we’ll talk about

  • Type hinting function definitions and properly documenting

  • Unit testing

Documentation and Type Hinting#

A typical Python function we are used to writing may look something like this:

import numpy as np 
import matplotlib.pyplot as plt
def centroid_2D(pixels,x_loc,y_loc,x_win,y_win):
    x_window = np.arange(x_loc-int(np.floor(x_win/2)),x_loc+int(np.floor(x_win/2))) 
    y_window = np.arange(y_loc-int(np.floor(y_win/2)),y_loc+int(np.floor(y_win/2)))
    x,y = np.meshgrid(x_window,y_window)
    crop=pixels[y_loc-int(np.floor(y_win/2)):y_loc+int(np.floor(y_win/2)),x_loc-int(np.floor(x_win/2)):x_loc+int(np.floor(x_win/2))]
    x_cen = np.sum(crop*x)/np.sum(crop) 
    y_cen = np.sum(crop*y)/np.sum(crop) 
    return x_cen,y_cen 

We can see what our function does by creating some fake data:

from astropy.modeling.functional_models import Gaussian2D
def eval_gauss(x_arr,y_arr,sigma_x,sigma_y,mu_x,mu_y):
    
    g = Gaussian2D.evaluate(x=x_arr,y=y_arr,amplitude=1,theta=0,x_mean=mu_x,
                   y_mean=mu_y,
                   x_stddev=sigma_x,
                   y_stddev=sigma_y)
    g/=np.sum(g)
    return g
xx, yy = np.meshgrid(np.arange(20),
                     np.arange(20))
model = eval_gauss(x_arr=xx,y_arr=yy,sigma_x=3,sigma_y=3,mu_x=9.75,mu_y=10.75)
fig, ax = plt.subplots(figsize=(8,8))
ax.imshow(model,origin='lower')
<matplotlib.image.AxesImage at 0x7f7e68b81f00>
../../_images/c14d56680e7f2072bb99e19b25bc5738ae74106320e9eda517f28fc75bfa3004.png

Above I have created a 2D gaussian, which might be a reasonable approximation of some star or source we wish to centroid. Let’s run our function on it:

x_cen, y_cen = centroid_2D(model,x_loc=10,y_loc=11,x_win=10,y_win=10)
print(x_cen,y_cen)
9.658314583634262 10.658314583634263

Plotting over our data (and comparing the values to what we entered in), this seems to have done a good job locating the center!

fig,ax=plt.subplots(figsize=(8,8)) 
ax.imshow(model,origin='lower')
ax.axhline(y_cen,color='r')
ax.axvline(x_cen,color='r')
<matplotlib.lines.Line2D at 0x7f7e699a4be0>
../../_images/9c3b258cbc7d4729c4f0ea87070c64562ef9b79abcbb220ff384a69e1541b535.png

Okay, all of that was an aside, because the focus here is on type hinting.

Looking at our function again, it might not be clear to us exactly what the functional arguments are asking for, or how they are being used. We can help add clarity here via some docstrings and type hints:

def centroid_2D(pixels:np.array,
                x_loc:int,
                y_loc:int,
                x_win:int = None,
                y_win:int = None):
    """
    Function to calculate the 2D centroid of a pixel distribution. 
    
    Parameters
    ----------
    pixels: np.array_like
        the image to centroid, containing some peaked source.
    x_loc: int
        Approximate x-location of the peak (i.e., a peak pixel). 
    y_loc: int
        Approximate y-location of the peak. 
    x_win: int, optional
        Window size to use when centroiding x. If None, will use the full input image width, default None.
    y_win: int, optional
        Window size to use when centroiding y. If None, will use the full input image height, default None.
    
    Returns
    -------
    x_cen,y_cen -> np.float, np.float
        the x and y centroid
    """
    if x_win is None:
        x_win = pixels.shape[1]
    if y_win is None:
        y_win = pixels.shape[0]
    x_window = np.arange(x_loc-int(np.floor(x_win/2)),x_loc+int(np.floor(x_win/2))) 
    y_window = np.arange(y_loc-int(np.floor(y_win/2)),y_loc+int(np.floor(y_win/2)))
    x,y = np.meshgrid(x_window,y_window)
    crop=pixels[y_loc-int(np.floor(y_win/2)):y_loc+int(np.floor(y_win/2)),x_loc-int(np.floor(x_win/2)):x_loc+int(np.floor(x_win/2))]
    x_cen = np.sum(crop*x)/np.sum(crop) 
    y_cen = np.sum(crop*y)/np.sum(crop) 
    return x_cen,y_cen 

Several things about our function have improved in this iteration. First, we have added type hinting to the function arguments. By adding colons and the expexted data type of that variable, we clue the user (including ourselves, later) in to what types of input this function needs. For example, it is useful to know that x_loc and y_loc must be integer inputs.

Quick Exercise

Why will the above function fail if a non integer is input for x_loc?

Additionally, we have utilized the use of optional keyword arguments, which both have type hints and default values. If you look inside the function, the first few lines now handle what happens when these values are None. And finally, we have added properly formatted documentation, which defines the parameters of the function, descriptions of them, and describes what the function outputs. This lets us now run, e.g., the help() function on it:

help(centroid_2D)
Help on function centroid_2D in module __main__:

centroid_2D(pixels: <built-in function array>, x_loc: int, y_loc: int, x_win: int = None, y_win: int = None)
    Function to calculate the 2D centroid of a pixel distribution. 
    
    Parameters
    ----------
    pixels: np.array_like
        the image to centroid, containing some peaked source.
    x_loc: int
        Approximate x-location of the peak (i.e., a peak pixel). 
    y_loc: int
        Approximate y-location of the peak. 
    x_win: int, optional
        Window size to use when centroiding x. If None, will use the full input image width, default None.
    y_win: int, optional
        Window size to use when centroiding y. If None, will use the full input image height, default None.
    
    Returns
    -------
    x_cen,y_cen -> np.float, np.float
        the x and y centroid

Additionally, if you are using a fully featured editor like VSCode, when you type out this function, or hover over its name, the type hints for the inputs will pop up on screen, making it easy to ensure you are using a function as intended.

Multiple Allowed Types#

What if our function can handle both, e.g., a python list or a numpy array? For that, we can import the typing library and utilize the Union operator:

from typing import Union 

def my_func(input_var:Union[list,np.array]):
    pass

The Union operator, attached to the allowed types, says “the input must be a list OR a numpy array”.

Warning

Note that Python, by default, does not check or confirm that inputs to a function match the types listed in the type hinting. However, there are external packages, like mypy, that can carry out such an assertion.

If we have a common combination of inputs allowed, we can also define a custom type for them:

import pandas as pd 
import astropy
import numpy

keyed_data = Union[pd.DataFrame,dict,astropy.table.Table,numpy.recarray] 

def my_func(input_var: keyed_data):
    pass

In this case, we have defined keyed_data as the union of these several different data storage types which allow access-by-attribute (dict-like access). We then use this in our function definition.