Data tools

The package comes with utility functions to work directly with Datasets. In this section we will see all these functions contained in the datatools module.

Average

average function calculates the average of the numbers within a column.

import pyreports

# Build a dataset
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])

# Calculate average
print(pyreports.average(mydata, 'salary'))  # Column by name
print(pyreports.average(mydata, 2))         # Column by index

Attention

All values in the column must be float or int, otherwise a ReportDataError exception will be raised.

Most common

The most_common function will return the value of a specific column that is most recurring.

import pyreports

# Build a dataset
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
mydata.append(('Ford', 'Prefect', 65000))

# Get most common
print(pyreports.most_common(mydata, 'name'))  # Ford

Percentage

The percentage function will calculate the percentage based on a filter (Any) on the whole Dataset.

import pyreports

# Build a dataset
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
mydata.append(('Ford', 'Prefect', 65000))

# Calculate percentage
print(pyreports.percentage(mydata, 65000))  # 66.66666666666666 (percent)

Counter

The counter function will return a Counter object, with inside it the count of each element of a specific column.

import pyreports

# Build a dataset
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
mydata.append(('Ford', 'Prefect', 65000))

# Create Counter object
print(pyreports.counter(mydata, 'name'))  # Counter({'Arthur': 1, 'Ford': 2})

Aggregate

The aggregate function aggregates multiple columns of some Dataset into a single Dataset.

Warning

The number of elements in the columns must be the same. If you want to aggregate columns with a different number of elements, you need to specify the argument fill_empty=True. Otherwise, an InvalidDimension exception will be raised.

import pyreports

# Build a datasets
employee = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
places = tablib.Dataset([('London', 'Green palace', 1), ('Helsinky', 'Red palace', 2)], headers=['city', 'place', 'floor'])

# Aggregate column for create a new Dataset
new_data = pyreports.aggregate(employee['name'], employee['surname'], employee['salary'], places['city'], places['place']))
new_data.headers = ['name', 'surname', 'salary', 'city', 'place']
print(new_data)     # ['name', 'surname', 'salary', 'city', 'place']

Merge

The merge function combines multiple Dataset objects into one.

Warning

The datasets must have the same number of columns otherwise an InvalidDimension exception will be raised.

import pyreports

# Build a datasets
employee1 = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
employee2 = tablib.Dataset([('Tricia', 'McMillian', 55000), ('Zaphod', 'Beeblebrox', 65000)], headers=['name', 'surname', 'salary'])

# Merge two Dataset object into only one
employee = pyreports.merge(employee1, employee2)
print(len(employee))     # 4

Chunks

The chunks function divides a Dataset into pieces from N (int). This function returns a generator object.

import pyreports

# Build a datasets
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
mydata.append(*[('Tricia', 'McMillian', 55000), ('Zaphod', 'Beeblebrox', 65000)])

# Divide data into 2 chunks
new_data = pyreports.chunks(mydata, 2)      # Generator object
print(list(new_data))     # [[('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], [('Tricia', 'McMillian', 55000), ('Zaphod', 'Beeblebrox', 65000)]]

Note

If the division does not result zero, the last tuple of elements will be a smaller number.