In this tutorial we will be loading up some SDSS data (mass, metallicity, and star formation rate) for a bunch of galaxies. We will then plot the Mass metallicity relation, and explore it in a bit more depth.

You can work on this tutorial in anyway you like- You can use this html to copy and paste into your own ipython notebook (or just a python script in a plaintext editor) and then make changes, you can follow along working from scratch, or you can download the .ipynb version of this document from the Tutorials page and work directly in it.

Let's go ahead and get started with our import statements:

In [1]:

```
import numpy as np
import matplotlib.pyplot as plt
import pyfits as pf
from matplotlib.colors import LogNorm
%matplotlib inline
```

In [3]:

```
def load_fits(fname):
hdu = pf.open(fname)[1] #loads the fits file into python
data = hdu.data #accesses the data table
return data
sfr_full = load_fits('gal_totsfr_dr7_v5_2.fits')
mass_full = load_fits('totlgm_dr7_v5_2b.fit')
z_full = load_fits('gal_fiboh_dr7_v5_2.fits')
```

In [11]:

```
sfr_full.columns
```

Out[11]:

As we can see, we have columns like "Avg", "Entropy", "Median", "Flag", etc. For the purposes of this tutorial, we are interested in the "Avg column." Like most datasets of real data, not all of the galaxies or entries in these files are usable- generally the pipeline that creates the datasets will "flag" bad data in an easily programmatically removable way. Sometimes it is using a "flag" column, and other times is is by selecting an arbitrary and non-physical number to enter as the value (this is how the SDSS data is handled). The values of the flags can usually be found in a readme file.

So, we need to restrict our data to just those that don't have any warning flags. In the cell below, (or in your code), find the indices (locations) for which the following conditions are satisfied, and save them to a variable called "restrictions":

- In the SFR array, the value is > -99
- In the Mass array, the value is not equal to -1
- In the metallicity array, the value is > -99.9

Hint: Do not use a for loop to iterate over the arrays and check the conditions- there is a much faster and more efficient method.

In [12]:

```
restrictions = np.where((sfr_full['AVG'] > -99) & (mass_full['AVG'] != -1) & (z_full['AVG'] > -99.9))[0]
```

In [14]:

```
print len(sfr_full['Avg']) - len(restrictions)
```

In [15]:

```
sfr = np.array(sfr_full[restrictions])
mass =np.array(mass_full[restrictions])
z = np.array(z_full[restrictions])
```

In [19]:

```
sfr
```

Out[19]:

In [24]:

```
sfrs = sfr['AVG']
masses = mass['AVG']
metallicities = z['AVG']
```

In [18]:

```
print sfrs
```

Out[18]:

In [22]:

```
def plot_mass_vs_metal(masses,metallicities):
#Plot mass against metalicity.
plt.hist2d(masses,metallicities,bins=300, norm=LogNorm())
plt.colorbar()
plt.title('Mass/Metallicity relation for SDSS Galaxies')
plt.xlabel(r'log Mass [$M_\odot$]')
plt.ylabel(r'log Gas Phase Metallicities')
plt.show()
```

In [25]:

```
plot_mass_vs_metal(masses,metallicities)
```

In [26]:

```
def plot_sfr_metal(sfrs,metallicities):
plt.hist2d(sfrs,metallicities,bins=300,norm=LogNorm())
plt.title('SFR/Metallicity relation for SDSS Galaxies')
plt.colorbar()
plt.xlabel(r'log SFR')
plt.ylabel(r'log Gas Phase Metallicities')
plt.show()
def plot_mass_sfr(masses=masses,sfrs=sfrs):
plt.hist2d(masses,sfrs,bins=300,norm=LogNorm())
plt.title('Mass/SFR relation for SDSS Galaxies')
plt.colorbar()
plt.xlabel(r'log Mass')
plt.ylabel(r'log SFR')
plt.show()
```

In [27]:

```
plot_sfr_metal(sfrs,metallicities)
plot_mass_sfr()
```

Oh dear. It seems like SFR correlates positively with mass, and metallicity correlates positively with SFR. We need to tease out which of these things are actually correlated, and which only look correlated because they depend on something else which is correlated.

One way we can do this is to take slices of one variable. For example, if metallicity truly does depend on mass, then for all galaxies of a single mass, the correlation should dissapear. We can do such a check for all three of our variables- taking slices of single SFR, metallicity, and mass, and see for which the positive correlation dissapears.

In the space below, write 3 functions, which will bin your data by mass, metallicity, and sfr slices. In theory we would select multiple single slices (choosing a specific value), but in practice our bins will have to have a certain width. To make things easier, I have looked at the data and created a bins array for each function- see if you can figure out how it works (look at the bounds on the graphs above and the behavior of linspace).

In [34]:

```
def mass_bins(masses):
bins = np.linspace(7,12,10)
binned_masses = []
binned_sfrs = []
binned_z = []
for i in range(len(bins)-1):
mass_indices = np.where((masses>bins[i]) & (masses<bins[i+1]))[0] #check "where" masses are > left edge of bin and < right edge of bin
mass_indices = np.array(mass_indices)
masses_needed = masses[mass_indices] #index masses for the indices found above
sfr_needed = sfrs[mass_indices]
z_needed = metallicities[mass_indices]
binned_masses.append(masses_needed)
binned_sfrs.append(sfr_needed)
binned_z.append(z_needed)
return binned_masses, binned_sfrs, binned_z
def sfr_bins(sfrs):
bins = np.linspace(-2,2,10)
binned_masses = []
binned_sfrs = []
binned_z = []
for i in range(len(bins)-1):
to_choose = np.where((sfrs>bins[i]) & (sfrs<bins[i+1]))[0]
to_choose = np.array(to_choose)
masses_needed = masses[to_choose]
sfr_needed = sfrs[to_choose]
z_needed = metallicities[to_choose]
binned_masses.append(masses_needed)
binned_sfrs.append(sfr_needed)
binned_z.append(z_needed)
return binned_masses, binned_sfrs, binned_z
def z_bins(metallicities):
bins = np.linspace(8,9.5,10)
binned_masses = []
binned_sfrs = []
binned_z = []
for i in range(len(bins)-1):
to_choose = np.where((metallicities>bins[i]) & (metallicities<bins[i+1]))[0]
to_choose = np.array(to_choose)
masses_needed = masses[to_choose]
sfr_needed = sfrs[to_choose]
z_needed = metallicities[to_choose]
binned_masses.append(masses_needed)
binned_sfrs.append(sfr_needed)
binned_z.append(z_needed)
return binned_masses, binned_sfrs, binned_z
```

In [35]:

```
def plot_mbins(masses=masses):
m,s,z = mass_bins(masses)
for i in range(len(m)):
plt.hist2d(s[i],z[i],bins=100,norm=LogNorm())
plt.xlabel('Log SFR')
plt.ylabel('Log Metallicity')
plt.colorbar()
plt.figure()
plt.show()
return
def plot_sfrbins(sfrs=sfrs):
m,s,z = sfr_bins(sfrs)
for i in range(len(m)):
plt.hist2d(m[i],z[i],bins=100,norm=LogNorm())
plt.xlabel('Log Mass')
plt.ylabel('Log Metallicity')
plt.colorbar()
plt.figure()
plt.show()
return
def plot_zbins(metallicities=metallicities):
m,s,z = z_bins(metallicities)
for i in range(len(m)):
plt.hist2d(m[i],s[i],bins=100,norm=LogNorm())
plt.xlabel('Log Mass')
plt.ylabel('Log SFR')
plt.colorbar()
plt.figure()
plt.show()
return
```

As a final step, lets run our functions below, and see what we get:

In [37]:

```
plot_mbins()
```

In [38]:

```
plot_sfrbins()
```

In [39]:

```
plot_zbins()
```

As we can see, the relationship between SFR and metallicity disappears at a single mass slice (the first ten plots). Thus, SFR and metallicity are not truly correlated, but only appear so when all masses are included because each depends on mass (which can be seen in the second and third set of plots, where for single slice in metallicity, the SFR/mass relation still exists, and for single slices in SFR, the mass/metallicity relation still exists.

Congrats! You made it to the end of the tutorial. I hope you enjoyed it, practiced a little python, and learned something about galaxy properties. As always, feel free to contact me (post an issue on the github http://github.com/prappleizer/prappleizer.github.io ) if anything was messed up, confusing, or poorly explained.

In [ ]:

```
```