wrangle_in_py.remove_duplicates =============================== .. py:module:: wrangle_in_py.remove_duplicates Functions --------- .. autoapisummary:: wrangle_in_py.remove_duplicates.remove_duplicates Module Contents --------------- .. py:function:: remove_duplicates(df, subset_columns=None, keep='first') Remove duplicate rows from a DataFrame based on specified columns. :param df: The dataframe to process. :type df: pd.DataFrame :param subset_columns: List of column names to consider for identifying duplicates. If None (default), consider all columns. :type subset_columns: list or None :param keep: Determines which duplicates to keep: - 'first': Keep the first occurrence (default). - 'last': Keep the last occurrence. - False: Drop all duplicates. :type keep: str :raises ValueError :: If the input for df is not a pandas DataFrame. If any column in subset_columns is not a column in the input dataframe. If the input for keep is not 'first', 'last', or False. :returns: **pd.DataFrame** :rtype: A DataFrame with duplicates removed. .. rubric:: Example >>> data = {'A': [1, 2, 2, 4], 'B': [5, 6, 6, 8]} >>> df = pd.DataFrame(data) >>> remove_duplicates(df, subset_columns=['A']) A B 0 1 5 1 2 6 3 4 8