Data tools
The package comes with utility functions to work directly with Datasets. In this section we will see all these functions contained in the datatools module.
DataObject
DataObject class represents a pure Dataset.
- class pyreports.DataObject(input_data: Dataset)
Data object class
import pyreports, tablib
data = pyreports.DataObject(tablib.Dataset(*[("Arthur", "Dent", 42)]))
assert isinstance(data.data, tablib.Dataset) == True
DataAdapters
DataAdapters class is an object that contains methods that modifying Dataset.
import pyreports, tablib
data = pyreports.DataAdapters(tablib.Dataset(*[("Arthur", "Dent", 42)]))
assert isinstance(data.data, tablib.Dataset) == True
# Aggregate
planets = tablib.Dataset(*[("Heart",)])
data.aggregate(planets)
# Merge
others = tablib.Dataset(*[("Betelgeuse", "Ford", "Prefect", 42)])
data.merge(others)
# Counter
data = pyreports.DataAdapters(Dataset(*[("Heart", "Arthur", "Dent", 42)]))
data.merge(self.data)
counter = data.counter()
assert counter["Arthur"] == 2
# Chunks
data.data.headers = ["planet", "name", "surname", "age"]
assert list(data.chunks(4))[0][0] == ("Heart", "Arthur", "Dent", 42)
# Deduplicate
data.deduplicate()
assert len(data.data) == 2
# Get items
assert data[1] == ("Betelgeuse", "Ford", "Prefect", 42)
# Iter items
for item in data:
print(item)
- class pyreports.DataAdapters(input_data: Dataset)
Data adapters class
- aggregate(*columns, fill_value=None)
Aggregate in the current Dataset other columns
- Parameters:
columns – columns added
fill_value – fill value for empty field if “fill_empty” argument is specified
- Returns:
None
- chunks(length)
Yield successive n-sized chunks from Dataset
- Parameters:
length – n-sized chunks
- Returns:
generator
- counter()
Count value into the rows
- Returns:
Counter
- deduplicate()
Remove duplicated rows
- Returns:
None
- merge(*datasets)
Merge in the current Dataset other Dataset objects
- Parameters:
datasets – datasets that will merge
- Returns:
None
DataPrinters
DataPrinters class is an object that contains methods that printing Dataset’s information.
import pyreports, tablib
data = pyreports.DataPrinters(tablib.Dataset(*[("Arthur", "Dent", 42), ("Ford", "Prefect", 42)], headers=["name", "surname", "age"]))
assert isinstance(data.data, tablib.Dataset) == True
# Print
data.print()
# Average
assert data.average(2) == 42
assert data.average("age") == 42
# Most common
data.data.append(("Ford", "Prefect", 42))
assert data.most_common(0) == "Ford"
assert data.most_common("name") == "Ford"
# Percentage
assert data.percentage("Ford") == 66.66666666666666
# Representation
assert repr(data) == "<DataObject, headers=['name', 'surname', 'age'], rows=3>"
# String
assert str(data) == 'name |surname|age\n------|-------|---\nArthur|Dent |42 \nFord |Prefect|42 \nFord |Prefect|42 '
# Length
assert len(data) == 3
- class pyreports.DataPrinters(input_data: Dataset)
Data printers class
- average(column)
Average of list of integers or floats
- Parameters:
column – column name or index
- Returns:
float
- most_common(column)
The most common element in a column
- Parameters:
column – column name or index
- Returns:
Any
- percentage(filter_)
Calculating the percentage according to filter
- Parameters:
filter – equality filter
- Returns:
float
- print()
Print data
- Returns:
None
Average
average function calculates the average of the numbers within a column.
import pyreports
# Build a dataset
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
# Calculate average
print(pyreports.average(mydata, 'salary')) # Column by name
print(pyreports.average(mydata, 2)) # Column by index
Attention
All values in the column must be float
or int
, otherwise a ReportDataError
exception will be raised.
Most common
The most_common function will return the value of a specific column that is most recurring.
import pyreports
# Build a dataset
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
mydata.append(('Ford', 'Prefect', 65000))
# Get most common
print(pyreports.most_common(mydata, 'name')) # Ford
Percentage
The percentage function will calculate the percentage based on a filter (Any) on the whole Dataset.
import pyreports
# Build a dataset
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
mydata.append(('Ford', 'Prefect', 65000))
# Calculate percentage
print(pyreports.percentage(mydata, 65000)) # 66.66666666666666 (percent)
Counter
The counter function will return a Counter object, with inside it the count of each element of a specific column.
import pyreports
# Build a dataset
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
mydata.append(('Ford', 'Prefect', 65000))
# Create Counter object
print(pyreports.counter(mydata, 'name')) # Counter({'Arthur': 1, 'Ford': 2})
Aggregate
The aggregate function aggregates multiple columns of some Dataset into a single Dataset.
Warning
The number of elements in the columns must be the same. If you want to aggregate columns with a different number of elements,
you need to specify the argument fill_empty=True
. Otherwise, an InvalidDimension
exception will be raised.
import pyreports
# Build a datasets
employee = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
places = tablib.Dataset([('London', 'Green palace', 1), ('Helsinky', 'Red palace', 2)], headers=['city', 'place', 'floor'])
# Aggregate column for create a new Dataset
new_data = pyreports.aggregate(employee['name'], employee['surname'], employee['salary'], places['city'], places['place']))
new_data.headers = ['name', 'surname', 'salary', 'city', 'place']
print(new_data) # ['name', 'surname', 'salary', 'city', 'place']
Merge
The merge function combines multiple Dataset objects into one.
Warning
The datasets must have the same number of columns otherwise an InvalidDimension
exception will be raised.
import pyreports
# Build a datasets
employee1 = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
employee2 = tablib.Dataset([('Tricia', 'McMillian', 55000), ('Zaphod', 'Beeblebrox', 65000)], headers=['name', 'surname', 'salary'])
# Merge two Dataset object into only one
employee = pyreports.merge(employee1, employee2)
print(len(employee)) # 4
Chunks
The chunks function divides a Dataset into pieces from N (int
). This function returns a generator object.
import pyreports
# Build a datasets
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
mydata.append(*[('Tricia', 'McMillian', 55000), ('Zaphod', 'Beeblebrox', 65000)])
# Divide data into 2 chunks
new_data = pyreports.chunks(mydata, 2) # Generator object
print(list(new_data)) # [[('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], [('Tricia', 'McMillian', 55000), ('Zaphod', 'Beeblebrox', 65000)]]
Note
If the division does not result zero, the last tuple of elements will be a smaller number.
Deduplicate
The deduplicate function remove duplicated rows into Dataset objects.
import pyreports
# Build a datasets
employee1 = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
# Remove duplicated rows (removed the last ('Ford', 'Prefect', 65000))
pyreports.deduplicate(employee1)
print(len(employee1)) # 2