Data tools#
The package comes with utility functions to work directly with Datasets. In this section we will see all these functions contained in the datatools module.
DataObject#
DataObject class represents a pure Dataset.
- class pyreports.DataObject(input_data: Dataset)#
Data object class
- clone()#
Clone itself
- Returns:
Dataset
import pyreports, tablib
data = pyreports.DataObject(tablib.Dataset(*[("Arthur", "Dent", 42)]))
assert isinstance(data.data, tablib.Dataset) == True
# Clone data
new_data = data.clone()
assert isinstance(new_data.data, tablib.Dataset) == True
# Select column
new_data.column("name")
new_data.column(0)
DataAdapters#
DataAdapters class is an object that contains methods that modifying Dataset.
import pyreports, tablib
data = pyreports.DataAdapters(tablib.Dataset(*[("Arthur", "Dent", 42)]))
assert isinstance(data.data, tablib.Dataset) == True
# Aggregate
planets = tablib.Dataset(*[("Heart",)])
data.aggregate(planets)
# Merge
others = tablib.Dataset(*[("Betelgeuse", "Ford", "Prefect", 42)])
data.merge(others)
# Counter
data = pyreports.DataAdapters(Dataset(*[("Heart", "Arthur", "Dent", 42)]))
data.merge(self.data)
counter = data.counter()
assert counter["Arthur"] == 2
# Chunks
data.data.headers = ["planet", "name", "surname", "age"]
assert list(data.chunks(4))[0][0] == ("Heart", "Arthur", "Dent", 42)
# Deduplicate
data.deduplicate()
assert len(data.data) == 2
# Subsets
new_data = data.subset("planet", "age")
assert len(data.data[0]) == 2
# Sort
new_data = data.sort("age")
reverse_data = data.sort("age", reverse=True)
# Get items
assert data[1] == ("Betelgeuse", "Ford", "Prefect", 42)
# Iter items
for item in data:
print(item)
- class pyreports.DataAdapters(input_data: Dataset)#
Data adapters class
- aggregate(*columns, fill_value=None)#
Aggregate in the current Dataset other columns
- Parameters:
columns – columns added
fill_value – fill value for empty field
- Returns:
None
- chunks(length)#
Yield successive n-sized chunks from Dataset
- Parameters:
length – n-sized chunks
- Returns:
generator
- counter()#
Count value into the rows
- Returns:
Counter
- deduplicate()#
Remove duplicated rows
- Returns:
None
- merge(*datasets)#
Merge in the current Dataset other Dataset objects
- Parameters:
datasets – datasets that will merge
- Returns:
None
- sort(column, reverse=False)#
Sort a Dataset by a specific column
- Parameters:
column – column to sort
reverse – reversed order
- Returns:
Dataset
- subset(*columns)#
New dataset with only columns added
- Parameters:
columns – select columns of new Dataset
- Returns:
Dataset
DataPrinters#
DataPrinters class is an object that contains methods that printing Dataset’s information.
import pyreports, tablib
data = pyreports.DataPrinters(tablib.Dataset(*[("Arthur", "Dent", 42), ("Ford", "Prefect", 42)], headers=["name", "surname", "age"]))
assert isinstance(data.data, tablib.Dataset) == True
# Print
data.print()
# Average
assert data.average(2) == 42
assert data.average("age") == 42
# Most common
data.data.append(("Ford", "Prefect", 42))
assert data.most_common(0) == "Ford"
assert data.most_common("name") == "Ford"
# Percentage
assert data.percentage("Ford") == 66.66666666666666
# Representation
assert repr(data) == "<DataObject, headers=['name', 'surname', 'age'], rows=3>"
# String
assert str(data) == 'name |surname|age\n------|-------|---\nArthur|Dent |42 \nFord |Prefect|42 \nFord |Prefect|42 '
# Length
assert len(data) == 3
- class pyreports.DataPrinters(input_data: Dataset)#
Data printers class
- average(column)#
Average of list of integers or floats
- Parameters:
column – column name or index
- Returns:
float
- most_common(column)#
The most common element in a column
- Parameters:
column – column name or index
- Returns:
Any
- percentage(filter_)#
Calculating the percentage according to filter
- Parameters:
filter – equality filter
- Returns:
float
- print()#
Print data
- Returns:
None
Average#
average function calculates the average of the numbers within a column.
import pyreports
# Build a dataset
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
# Calculate average
print(pyreports.average(mydata, 'salary')) # Column by name
print(pyreports.average(mydata, 2)) # Column by index
Attention
All values in the column must be float or int, otherwise a ReportDataError exception will be raised.
Most common#
The most_common function will return the value of a specific column that is most recurring.
import pyreports
# Build a dataset
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
mydata.append(('Ford', 'Prefect', 65000))
# Get most common
print(pyreports.most_common(mydata, 'name')) # Ford
Percentage#
The percentage function will calculate the percentage based on a filter (Any) on the whole Dataset.
import pyreports
# Build a dataset
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
mydata.append(('Ford', 'Prefect', 65000))
# Calculate percentage
print(pyreports.percentage(mydata, 65000)) # 66.66666666666666 (percent)
Counter#
The counter function will return a Counter object, with inside it the count of each element of a specific column.
import pyreports
# Build a dataset
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
mydata.append(('Ford', 'Prefect', 65000))
# Create Counter object
print(pyreports.counter(mydata, 'name')) # Counter({'Arthur': 1, 'Ford': 2})
Aggregate#
The aggregate function aggregates multiple columns of some Dataset into a single Dataset.
Warning
The number of elements in the columns must be the same. If you want to aggregate columns with a different number of elements,
you need to specify the argument fill_empty=True. Otherwise, an InvalidDimension exception will be raised.
import pyreports
# Build a datasets
employee = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
places = tablib.Dataset([('London', 'Green palace', 1), ('Helsinky', 'Red palace', 2)], headers=['city', 'place', 'floor'])
# Aggregate column for create a new Dataset
new_data = pyreports.aggregate(employee['name'], employee['surname'], employee['salary'], places['city'], places['place']))
new_data.headers = ['name', 'surname', 'salary', 'city', 'place']
print(new_data) # ['name', 'surname', 'salary', 'city', 'place']
Merge#
The merge function combines multiple Dataset objects into one.
Warning
The datasets must have the same number of columns otherwise an InvalidDimension exception will be raised.
import pyreports
# Build a datasets
employee1 = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
employee2 = tablib.Dataset([('Tricia', 'McMillian', 55000), ('Zaphod', 'Beeblebrox', 65000)], headers=['name', 'surname', 'salary'])
# Merge two Dataset object into only one
employee = pyreports.merge(employee1, employee2)
print(len(employee)) # 4
Chunks#
The chunks function divides a Dataset into pieces from N (int). This function returns a generator object.
import pyreports
# Build a datasets
mydata = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
mydata.append(*[('Tricia', 'McMillian', 55000), ('Zaphod', 'Beeblebrox', 65000)])
# Divide data into 2 chunks
new_data = pyreports.chunks(mydata, 2) # Generator object
print(list(new_data)) # [[('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000)], [('Tricia', 'McMillian', 55000), ('Zaphod', 'Beeblebrox', 65000)]]
Note
If the division does not result zero, the last tuple of elements will be a smaller number.
Deduplicate#
The deduplicate function remove duplicated rows into Dataset objects.
import pyreports
# Build a datasets
employee1 = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
# Remove duplicated rows (removed the last ('Ford', 'Prefect', 65000))
print(len(pyreports.deduplicate(employee1))) # 2
Subset#
The subset function make a new Dataset with only selected columns.
import pyreports
# Build a datasets
employee1 = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
# Select only a two columns
print(len(pyreports.subset(employee1, 'name', 'surname')[0])) # 2
Sort#
The sort function sort the Dataset by column, also in reversed mode.
import pyreports
# Build a datasets
employee1 = tablib.Dataset([('Arthur', 'Dent', 55000), ('Ford', 'Prefect', 65000), ('Ford', 'Prefect', 65000)], headers=['name', 'surname', 'salary'])
# Sort and sort reversed
print(pyreports.sort(employee1, 'salary'))
print(pyreports.sort(employee1, 'salary', reverse=True))