wrangle_in_py.remove_duplicates
===============================

.. py:module:: wrangle_in_py.remove_duplicates


Functions
---------

.. autoapisummary::

   wrangle_in_py.remove_duplicates.remove_duplicates


Module Contents
---------------

.. py:function:: remove_duplicates(df, subset_columns=None, keep='first')

   Remove duplicate rows from a DataFrame based on specified columns.

   :param df: The dataframe to process.
   :type df: pd.DataFrame
   :param subset_columns: List of column names to consider for identifying duplicates.
                          If None (default), consider all columns.
   :type subset_columns: list or None
   :param keep: Determines which duplicates to keep:
                - 'first': Keep the first occurrence (default).
                - 'last': Keep the last occurrence.
                - False: Drop all duplicates.
   :type keep: str

   :raises ValueError :: If the input for df is not a pandas DataFrame.
       If any column in subset_columns is not a column in the input dataframe.
       If the input for keep is not 'first', 'last', or False.

   :returns: **pd.DataFrame**
   :rtype: A DataFrame with duplicates removed.

   .. rubric:: Example

   >>> data = {'A': [1, 2, 2, 4], 'B': [5, 6, 6, 8]}
   >>> df = pd.DataFrame(data)
   >>> remove_duplicates(df, subset_columns=['A'])
      A  B
   0  1  5
   1  2  6
   3  4  8