wrangle_in_py.column_name_standardizer
======================================

.. py:module:: wrangle_in_py.column_name_standardizer


Functions
---------

.. autoapisummary::

   wrangle_in_py.column_name_standardizer.string_standardizer
   wrangle_in_py.column_name_standardizer.resulting_duplicates
   wrangle_in_py.column_name_standardizer.column_name_standardizer


Module Contents
---------------

.. py:function:: string_standardizer(messy_string)

   Converts the inputted messy_string to lowercase and
   non-alphanumerics (including spaces and punctuation) will be replaced with underscores.

   :param messy_string: The input string to be standardized.
   :type messy_string: str

   :raises TypeError :: If the input messy_string is not a string.

   :returns: A standardized version of the input string in lowercase
             and non-alphanumeric characters replaced by underscores.
   :rtype: str

   .. rubric:: Examples

   >>> string_standardizer('Jack Fruit 88')
   'jack_fruit_88'

   >>> string_standardizer('PINEAPPLES')
   'pineapples'

   >>> string_standardizer('Dragon (Fruit)')
   'dragon__fruit_'


.. py:function:: resulting_duplicates(original_strings, standardized_strings)

   Identifies which strings became duplicates after standardization.

   :param original_strings: List of strings before standardization.
   :type original_strings: list of str
   :param standardized_strings: List of strings after standardization.
   :type standardized_strings: list of str

   :raises ValueError :: If the inputs original_strings and standardized_strings are not the same length.
   :raises TypeError :: If either of the inputs, original_strings or standardized_strings,
       are not a list of strings.

   :returns: A dictionary where the keys are the standardized strings with duplicate(s),
             and the values are lists of the original strings that map to them.
   :rtype: dict

   .. rubric:: Examples

   >>> strings_before = ['Jack Fruit 88.', "Jack!Fruit!88!", "PINEAPPLES"]
   >>> strings_after = ["jack_fruit_88_", "jack_fruit_88_", "pineapples"]
   >>> identify_duplicates(strings_before, strings_after)
   {'jack_fruit_88_': ['Jack Fruit 88.', 'Jack!Fruit!88!']}


.. py:function:: column_name_standardizer(df)

   Returns a copy of the inputted dataframe with standardized column names.
   Column names will be converted to lowercase and
   non-alphanumerics (including spaces and punctuation) will be replaced with underscores.

   If the standardization results in duplicate column names, a warning will be raised.

   :param df: The input pandas DataFrame whose column names need standardization.
   :type df: pandas DataFrame

   .. warning::

      UserWarning :
          If any of the standardized column names are the same.

   :raises TypeError:: If the input dataframe is not a pandas DataFrame.

   :returns: A new DataFrame with standardized column names.
   :rtype: pandas.DataFrame

   .. rubric:: Examples

   >>> import pandas as pd
   >>> data = {'Jack Fruit 88': [1, 2], 'PINEAPPLES': [3, 4], 'Dragon (Fruit)': [25, 30]}
   >>> df = pd.DataFrame(data)
   >>> column_name_standardizer(df)
      jack_fruit_88  pineapples  dragon__fruit_
   0           1          3         25
   1           2          4         30