Data tools#

The package comes with utility functions to work directly with Datasets. In this section we will see all these functions contained in the datatools module.

DataObject#

DataObject class represents a pure Dataset.

class pyreports.DataObject(input_data: Dataset)#

Data object class

clone()#

Clone itself

Returns:

Dataset

import pyreports, tablib

data = pyreports.DataObject(tablib.Dataset(*[("Arthur", "Dent", 42)]))
assert isinstance(data.data, tablib.Dataset) == True

# Clone data
new_data = data.clone()
assert isinstance(new_data.data, tablib.Dataset) == True

# Select column
new_data.column("name")
new_data.column(0)

DataAdapters#

DataAdapters class is an object that contains methods that modifying Dataset.

import pyreports, tablib

data = pyreports.DataAdapters(tablib.Dataset(*[("Arthur", "Dent", 42)]))
assert isinstance(data.data, tablib.Dataset) == True


# Aggregate
planets = tablib.Dataset(*[("Heart",)])
data.aggregate(planets)

# Merge
others = tablib.Dataset(*[("Betelgeuse", "Ford", "Prefect", 42)])
data.merge(others)

# Counter
data = pyreports.DataAdapters(Dataset(*[("Heart", "Arthur", "Dent", 42)]))
data.merge(self.data)
counter = data.counter()
assert counter["Arthur"] == 2

# Chunks
data.data.headers = ["planet", "name", "surname", "age"]
assert list(data.chunks(4))[0][0] == ("Heart", "Arthur", "Dent", 42)

# Deduplicate
data.deduplicate()
assert len(data.data) == 2

# Subsets
new_data = data.subset("planet", "age")
assert len(data.data[0]) == 2

# Sort
new_data = data.sort("age")
reverse_data = data.sort("age", reverse=True)

# Get items
assert data[1] == ("Betelgeuse", "Ford", "Prefect", 42)

# Iter items
for item in data:
    print(item)
class pyreports.DataAdapters(input_data: Dataset)#

Data adapters class

aggregate(*columns, fill_value=None)#

Aggregate in the current Dataset other columns

Parameters:
  • columns – columns added

  • fill_value – fill value for empty field

Returns:

None

chunks(length)#

Yield successive n-sized chunks from Dataset

Parameters:

length – n-sized chunks

Returns:

generator

counter()#

Count value into the rows

Returns:

Counter

deduplicate()#

Remove duplicated rows

Returns:

None

merge(*datasets)#

Merge in the current Dataset other Dataset objects

Parameters:

datasets – datasets that will merge

Returns:

None

sort(column, reverse=False)#

Sort a Dataset by a specific column

Parameters:
  • column – column to sort

  • reverse – reversed order

Returns:

Dataset

subset(*columns)#

New dataset with only columns added

Parameters:

columns – select columns of new Dataset

Returns:

Dataset

DataPrinters#

DataPrinters class is an object that contains methods that printing Dataset’s information.

import pyreports, tablib

data = pyreports.DataPrinters(tablib.Dataset(*[("Arthur", "Dent", 42), ("Ford", "Prefect", 42)], headers=["name", "surname", "age"]))
assert isinstance(data.data, tablib.Dataset) == True

# Print
data.print()

# Average
assert data.average(2) == 42
assert data.average("age") == 42

# Most common
data.data.append(("Ford", "Prefect", 42))
assert data.most_common(0) == "Ford"
assert data.most_common("name") == "Ford"

# Percentage
assert data.percentage("Ford") == 66.66666666666666

# Representation
assert repr(data) == "<DataObject, headers=['name', 'surname', 'age'], rows=3>"

# String
assert str(data) == 'name  |surname|age\n------|-------|---\nArthur|Dent   |42 \nFord  |Prefect|42 \nFord  |Prefect|42 '

# Length
assert len(data) == 3
class pyreports.DataPrinters(input_data: Dataset)#

Data printers class

average(column)#

Average of list of integers or floats

Parameters:

column – column name or index

Returns:

float

most_common(column)#

The most common element in a column

Parameters:

column – column name or index

Returns:

Any

percentage(filter_)#

Calculating the percentage according to filter

Parameters:

filter – equality filter

Returns:

float

print()#

Print data

Returns:

None

Average#

average function calculates the average of the numbers within a column.

import pyreports

# Build a dataset
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])

# Calculate average
print(pyreports.average(mydata, 'salary'))  # Column by name
print(pyreports.average(mydata, 2))         # Column by index

Attention

All values in the column must be float or int, otherwise a ReportDataError exception will be raised.

Most common#

The most_common function will return the value of a specific column that is most recurring.

import pyreports

# Build a dataset
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
mydata.append(('Ford', 'Prefect', 65000))

# Get most common
print(pyreports.most_common(mydata, 'name'))  # Ford

Percentage#

The percentage function will calculate the percentage based on a filter (Any) on the whole Dataset.

import pyreports

# Build a dataset
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
mydata.append(('Ford', 'Prefect', 65000))

# Calculate percentage
print(pyreports.percentage(mydata, 65000))  # 66.66666666666666 (percent)

Counter#

The counter function will return a Counter object, with inside it the count of each element of a specific column.

import pyreports

# Build a dataset
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
mydata.append(('Ford', 'Prefect', 65000))

# Create Counter object
print(pyreports.counter(mydata, 'name'))  # Counter({'Arthur': 1, 'Ford': 2})

Aggregate#

The aggregate function aggregates multiple columns of some Dataset into a single Dataset.

Warning

The number of elements in the columns must be the same. If you want to aggregate columns with a different number of elements, you need to specify the argument fill_empty=True. Otherwise, an InvalidDimension exception will be raised.

import pyreports

# Build a datasets
employee = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
places = tablib.Dataset([('London', 'Green palace', 1), ('Helsinky', 'Red palace', 2)], headers=['city', 'place', 'floor'])

# Aggregate column for create a new Dataset
new_data = pyreports.aggregate(employee['name'], employee['surname'], employee['salary'], places['city'], places['place']))
new_data.headers = ['name', 'surname', 'salary', 'city', 'place']
print(new_data)     # ['name', 'surname', 'salary', 'city', 'place']

Merge#

The merge function combines multiple Dataset objects into one.

Warning

The datasets must have the same number of columns otherwise an InvalidDimension exception will be raised.

import pyreports

# Build a datasets
employee1 = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
employee2 = tablib.Dataset([('Tricia', 'McMillian', 55000), ('Zaphod', 'Beeblebrox', 65000)], headers=['name', 'surname', 'salary'])

# Merge two Dataset object into only one
employee = pyreports.merge(employee1, employee2)
print(len(employee))     # 4

Chunks#

The chunks function divides a Dataset into pieces from N (int). This function returns a generator object.

import pyreports

# Build a datasets
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
mydata.append(*[('Tricia', 'McMillian', 55000), ('Zaphod', 'Beeblebrox', 65000)])

# Divide data into 2 chunks
new_data = pyreports.chunks(mydata, 2)      # Generator object
print(list(new_data))     # [[('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], [('Tricia', 'McMillian', 55000), ('Zaphod', 'Beeblebrox', 65000)]]

Note

If the division does not result zero, the last tuple of elements will be a smaller number.

Deduplicate#

The deduplicate function remove duplicated rows into Dataset objects.

import pyreports

# Build a datasets
employee1 = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])

# Remove duplicated rows (removed the last ('Ford', 'Prefect', 65000))
print(len(pyreports.deduplicate(employee1)))     # 2

Subset#

The subset function make a new Dataset with only selected columns.

import pyreports

# Build a datasets
employee1 = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])

# Select only a two columns
print(len(pyreports.subset(employee1, 'name', 'surname')[0]))     # 2

Sort#

The sort function sort the Dataset by column, also in reversed mode.

import pyreports

# Build a datasets
employee1 = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])

# Sort and sort reversed
print(pyreports.sort(employee1, 'salary'))
print(pyreports.sort(employee1, 'salary', reverse=True))