Skip to content

Writing Functions in Python

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Add your notes here

# Add your code snippets heredef count_letter(content, letter):
  """Count the number of times `letter` appears in `content`.

  Args:
    content (str): The string to search.
    letter (str): The letter to search for.

  Returns:
    int

  # Add a section detailing what errors might be raised
  Raises:
    ValueError: If `letter` is not a one-character string.
  """
  if (not isinstance(letter, str)) or len(letter) != 1:
    raise ValueError('`letter` must be a single character string.')
  return len([char for char in content if char == letter])

his IDE is going to be an incredibly delightful experience for your users now! Notice how the count_letter.doc version of the docstring had strange whitespace at the beginning of all but the first line. That's because the docstring is indented to line up visually when reading the code. But when we want to print the docstring, removing those leading spaces with inspect.getdoc() will look much better.

import inspect

def build_tooltip(function):
  """Create a tooltip for any function that shows the
  function's docstring.

  Args:
    function (callable): The function we want a tooltip for.

  Returns:
    str
  """
  # Get the docstring for the "function" argument by using inspect
  docstring = inspect.getdoc(function)
  border = '#' * 28
  return '{}\n{}\n{}'.format(border, docstring, border)

print(build_tooltip(count_letter))
print(build_tooltip(range))
print(build_tooltip(print))

While you were developing a model to predict the likelihood of a student graduating from college, you wrote this bit of code to get the z-scores of students' yearly GPAs. Now you're ready to turn it into a production-quality system, so you need to do something about the repetition. Writing a function to calculate the z-scores would improve this code.

Standardize the GPAs for each year

df['y1_z'] = (df.y1_gpa - df.y1_gpa.mean()) / df.y1_gpa.std() df['y2_z'] = (df.y2_gpa - df.y2_gpa.mean()) / df.y2_gpa.std() df['y3_z'] = (df.y3_gpa - df.y3_gpa.mean()) / df.y3_gpa.std() df['y4_z'] = (df.y4_gpa - df.y4_gpa.mean()) / df.y4_gpa.std() Note: df is a pandas DataFrame where each row is a student with 4 columns of yearly student GPAs: y1_gpa, y2_gpa, y3_gpa, y4_gpa

def standardize(column):
  """Standardize the values in a column.

  Args:
    column (pandas Series): The data to standardize.

  Returns:
    pandas Series: the values as z-scores
  """
  # Finish the function so that it returns the z-scores
  z_score = (column - column.mean()) / column.std()
  return z_score

# Use the standardize() function to calculate the z-scores
df['y1_z'] = standardize(df.y1_gpa)
df['y2_z'] = standardize(df.y2_gpa)
df['y3_z'] = standardize(df.y3_gpa)
df['y4_z'] = standardize(df.y4_gpa)

Another engineer on your team has written this function to calculate the mean and median of a sorted list. You want to show them how to split it into two simpler functions: mean() and median()

def mean_and_median(values): """Get the mean and median of a sorted list of values

Args: values (iterable of float): A list of numbers

Returns: tuple (float, float): The mean and median """ mean = sum(values) / len(values) midpoint = int(len(values) / 2) if len(values) % 2 == 0: median = (values[midpoint - 1] + values[midpoint]) / 2 else: median = values[midpoint]

return mean, median

ImmutableorMutable?

Immutable int float bool string bytes tuple frozensetNone

Mutable list dict set byte array objects functions almosteverythingelse!

Dictionaries are mutable objects in Python, so the function can directly change it in the _dict[_orig_string] = _string statement. Strings, on the other hand, are immutable. When the function creates the lowercase version, it has to assign it to the _string variable. This disconnects what happens to _string from the external s variable.

Beautiful and best practice! When you need to set a mutable variable as a default argument, always use None and then set the value in the body of the function. This prevents unexpected behavior like adding multiple columns if you call the function more than once.

# Use an immutable variable for the default argument
def better_add_column(values, df=None):
  """Add a column of `values` to a DataFrame `df`.
  The column will be named "col_<n>" where "n" is
  the numerical index of the column.

  Args:
    values (iterable): The values of the new column
    df (DataFrame, optional): The DataFrame to update.
      If no DataFrame is passed, one is created by default.

  Returns:
    DataFrame
  """
  # Update the function to create a default DataFrame
  if df is None:
    df = pandas.DataFrame()
  df['col_{}'.format(len(df.columns))] = values
  return df

Context Manager