wrangle_in_py.column_name_standardizer ====================================== .. py:module:: wrangle_in_py.column_name_standardizer Functions --------- .. autoapisummary:: wrangle_in_py.column_name_standardizer.string_standardizer wrangle_in_py.column_name_standardizer.resulting_duplicates wrangle_in_py.column_name_standardizer.column_name_standardizer Module Contents --------------- .. py:function:: string_standardizer(messy_string) Converts the inputted messy_string to lowercase and non-alphanumerics (including spaces and punctuation) will be replaced with underscores. :param messy_string: The input string to be standardized. :type messy_string: str :raises TypeError :: If the input messy_string is not a string. :returns: A standardized version of the input string in lowercase and non-alphanumeric characters replaced by underscores. :rtype: str .. rubric:: Examples >>> string_standardizer('Jack Fruit 88') 'jack_fruit_88' >>> string_standardizer('PINEAPPLES') 'pineapples' >>> string_standardizer('Dragon (Fruit)') 'dragon__fruit_' .. py:function:: resulting_duplicates(original_strings, standardized_strings) Identifies which strings became duplicates after standardization. :param original_strings: List of strings before standardization. :type original_strings: list of str :param standardized_strings: List of strings after standardization. :type standardized_strings: list of str :raises ValueError :: If the inputs original_strings and standardized_strings are not the same length. :raises TypeError :: If either of the inputs, original_strings or standardized_strings, are not a list of strings. :returns: A dictionary where the keys are the standardized strings with duplicate(s), and the values are lists of the original strings that map to them. :rtype: dict .. rubric:: Examples >>> strings_before = ['Jack Fruit 88.', "Jack!Fruit!88!", "PINEAPPLES"] >>> strings_after = ["jack_fruit_88_", "jack_fruit_88_", "pineapples"] >>> identify_duplicates(strings_before, strings_after) {'jack_fruit_88_': ['Jack Fruit 88.', 'Jack!Fruit!88!']} .. py:function:: column_name_standardizer(df) Returns a copy of the inputted dataframe with standardized column names. Column names will be converted to lowercase and non-alphanumerics (including spaces and punctuation) will be replaced with underscores. If the standardization results in duplicate column names, a warning will be raised. :param df: The input pandas DataFrame whose column names need standardization. :type df: pandas DataFrame .. warning:: UserWarning : If any of the standardized column names are the same. :raises TypeError:: If the input dataframe is not a pandas DataFrame. :returns: A new DataFrame with standardized column names. :rtype: pandas.DataFrame .. rubric:: Examples >>> import pandas as pd >>> data = {'Jack Fruit 88': [1, 2], 'PINEAPPLES': [3, 4], 'Dragon (Fruit)': [25, 30]} >>> df = pd.DataFrame(data) >>> column_name_standardizer(df) jack_fruit_88 pineapples dragon__fruit_ 0 1 3 25 1 2 4 30