red_panda.pandas package

Submodules

red_panda.pandas.utils module

red_panda.pandas.utils.groupby_mutate(df: pandas.core.frame.DataFrame, group_by: Union[List[str], str], func_dict: Dict[str, Callable], inplace: bool = False) → pandas.core.frame.DataFrame

Similar to R’s dplyr::mutate.

Example

>>> def func(x):
        return x["x"] / sum(x["x"])
>>> func_dict = {
        'ratio': x["x"] / sum(x["x"])
    }
>>> groupby_mutate(df, "b", func_dict)
red_panda.pandas.utils.merge_dfs(dfs: List[pandas.core.frame.DataFrame], **kwargs) → pandas.core.frame.DataFrame

Merge a list of DataFrames on common columns.

Parameters:
  • dfs – A list of `pandas.DataFrame`s.
  • **kwargs – Keyword arguments for pandas.merge.
Returns:

Merged DataFrame.

red_panda.pandas.utils.row_number(df: pandas.core.frame.DataFrame, group_by: List[str], sort_by: List[str], ascending: bool = True) → pandas.core.series.Series

Create a row number series given a DataFrame lists of columns for group by and sort by.

Parameters:
  • df – Input DataFrame.
  • group_by – List of group by columns.
  • sort_by – List of sort by columns.
  • col_name (optional) – The output column name.
  • ascending (optional) – Whether sort in ascending order.
  • as_series (optional) – Whether to return a Series instead of a DataFrame.
Returns:

A DataFrame with row number or the row number Series.

Example

>>> df = row_number(df, ['group'], ['sort'], as_series=False)
>>> df['rn'] = row_number(df, ['group'], ['sort'])

Module contents