ðŸ“Ž

# Make full use of pandas value_counts()

`value_counts()` is a very handy function, that groups together values of a Series or DataFrame, and returns their count. With a few arguments, you can expand its capabilities.

``````# Import libraries
import numpy as np
import pandas as pd

# Generate random data
df = pd.DataFrame({
'value': np.concatenate([[i]*np.random.randint(1,20) for i in range(10)])
})
df.iloc[[3]] = np.nan``````
``````# Default behaviour
df.value_counts()``````
``````value
3.0      17
5.0      17
2.0      16
9.0      15
6.0      13
1.0      10
7.0      10
0.0       2
4.0       2
8.0       2
dtype: int64``````

# Relative frequencies

To get the relative frequencies instead of the absolue count, use `normalize=True`. This is equivalent to dividing each row by the total:

``````# Normalized count, i.e. relative frequency of values
df.value_counts(normalize=True)``````
``````value
3.0      0.163462
5.0      0.163462
2.0      0.153846
9.0      0.144231
6.0      0.125000
1.0      0.096154
7.0      0.096154
0.0      0.019231
4.0      0.019231
8.0      0.019231
dtype: float64``````

# Group into buckets

It is also possible to group values into buckets with `bins=True`, which is a quicker than calling `pd.cut`:

``````# Group values into buckets -- only works with Series
df['value'].value_counts(bins=5)``````
``````(1.8, 3.6]                      33
(5.4, 7.2]                      23
(3.6, 5.4]                      19
(7.2, 9.0]                      17
(-0.009999999999999998, 1.8]    12
Name: value, dtype: int64``````

# Include NA values

By default, NA values are not counted. To include them, specify `dropna=False`:

``````# Without NA
df.value_counts()``````
``````value
3.0      17
5.0      17
2.0      16
9.0      15
6.0      13
1.0      10
7.0      10
0.0       2
4.0       2
8.0       2
dtype: int64``````
``````# Including NA -- only works for Series
df['value'].value_counts(dropna=False)``````
``````3.0    17
5.0    17
2.0    16
9.0    15
6.0    13
7.0    10
1.0    10
0.0     2
4.0     2
8.0     2
NaN     1
Name: value, dtype: int64``````