red_panda.pandas package¶
Submodules¶
red_panda.pandas.utils module¶
-
red_panda.pandas.utils.
groupby_mutate
(df: pandas.core.frame.DataFrame, group_by: Union[List[str], str], func_dict: Dict[str, Callable], inplace: bool = False) → pandas.core.frame.DataFrame¶ Similar to R’s dplyr::mutate.
Example
>>> def func(x): return x["x"] / sum(x["x"]) >>> func_dict = { 'ratio': x["x"] / sum(x["x"]) } >>> groupby_mutate(df, "b", func_dict)
-
red_panda.pandas.utils.
merge_dfs
(dfs: List[pandas.core.frame.DataFrame], **kwargs) → pandas.core.frame.DataFrame¶ Merge a list of DataFrames on common columns.
Parameters: - dfs – A list of `pandas.DataFrame`s.
- **kwargs – Keyword arguments for pandas.merge.
Returns: Merged DataFrame.
-
red_panda.pandas.utils.
row_number
(df: pandas.core.frame.DataFrame, group_by: List[str], sort_by: List[str], ascending: bool = True) → pandas.core.series.Series¶ Create a row number series given a DataFrame lists of columns for group by and sort by.
Parameters: - df – Input DataFrame.
- group_by – List of group by columns.
- sort_by – List of sort by columns.
- col_name (optional) – The output column name.
- ascending (optional) – Whether sort in ascending order.
- as_series (optional) – Whether to return a Series instead of a DataFrame.
Returns: A DataFrame with row number or the row number Series.
Example
>>> df = row_number(df, ['group'], ['sort'], as_series=False) >>> df['rn'] = row_number(df, ['group'], ['sort'])