pyreports example#

Example scripts using pyreports module.

Basic usage#

In this section you will find examples that represent the entire reporting workflow, relying on the *Manager objects as input and output, and the Executor object for the process part.

Database to file#

In this example, we extract the data from a mysql database, filter it by error code and finally export it to a csv.

import pyreports

# INPUT

# Select source: this is a DatabaseManager object
mydb = pyreports.manager('mysql', host='mysql1.local', database='login_users', user='dba', password='dba0000')

# Get data
mydb.execute('SELECT * FROM site_login')
site_login = mydb.fetchall()                    # return Dataset object

# PROCESS

# Filter data
error_login = pyreports.Executor(site_login)    # accept Dataset object
error_login.filter([400, 401, 403, 404, 500])

# OUTPUT

# Save report: this is a FileManager object
output = pyreports.manager('csv', '/home/report/error_login.csv')
output.write(error_login.get_data())

Note

A reflection on this example could be: “Why don’t I apply the filter directly in the SQL syntax?” The answer is simple. The advantage of using an Executor object is that from general data I can filter or modify (map function or with my custom function) without affecting the original Dataset. So much so that I could do several different Executors, process them and then re-merge them into a single Executor, which would be difficult to do with SQL syntax.

File to Database#

In this example I have a json file as input, received from a web server, I process it and write to the database.

import pyreports

# INPUT

# Return json from GET request on web server: this is a FileManager object
web_server_result = pyreports.manager('json', '/home/report/users.json')
# Get data
users = web_server_result.read()                            # return Dataset object

# PROCESS

# Filter data
user_int = pyreports.Executor(users)                        # accept Dataset object
user_int.filter(key=lambda record: if record == 'INTERNAL') # My filter is a function
user_ext = pyreports.Executor(users)
user_ext.filter(key=lambda record: if record == 'EXTERNAL')

# OUTPUT

# Save report: this is a DatabaseManager object
mydb = pyreports.manager('mysql', host='mysql1.local', database='users', user='dba', password='dba0000')

# Write to database
mydb.executemany("INSERT INTO internal_users(name, surname, employeeType) VALUES(%s, %s, %s)", list(user_int))
mydb.executemany("INSERT INTO external_users(name, surname, employeeType) VALUES(%s, %s, %s)", list(user_ext))
mydb.commit()

Combine inputs#

In this example, we will take two different inputs, and combine them to export an excel file containing the data processing of the two sources.

import pyreports

# INPUT

# Config Unix application file: this is a FileManager object
config_file = pyreports.manager('yaml', '/home/myapp.yml')
# Console admin: this is a DatabaseManager object
mydb = pyreports.manager('mysql', server='mysql1.local', database='admins', user='sa', password='sa0000')
# Get data
admin_app = config_file.read()                  # return Dataset object: three column (name, shell, login)
mydb.execute('SELECT * FROM console_admins')
admins = mydb.fetchall()                        # return Dataset object: three column (name, shell, login)

# PROCESS

# Filter data
all_console_admins = pyreports.Executor(admins) # accept Dataset object
all_console_admins.filter(config_file['shell']) # filter by shells

# OUTPUT

# Save report: this is a FileManager object
output = pyreports.manager('xlsx', '/home/report/all_admins.xlsx')
output.write(all_console_admins.get_data())

Simple report#

In this example, we use a Report type object to create and filter the data through a function and save it in a csv file, printing the number of lines in total.

import pyreports

OFFICE_FILTER = 'Customer'

# Function: filter by office
def filter_by_office(value):
    if value == OFFICE_FILTER:
        return True

# Connect to database
mydb = pyreports.manager('postgresql', host='pssql1.local', database='users', user='admin', password='pwd0000')
mydb.execute('SELECT * FROM employees')
all_employees = mydb.fetchall()
# Output to csv
output = pyreports.manager('csv', f'/home/report/office_{OFFICE_FILTER}.csv')
# All customer employees: Report object
one_office = pyreports.Report(all_employees,
                             filters=filter_by_office,
                             title=f'All employees in {OFFICE_FILTER}',
                             count=True,
                             output=output)
# Run and save report
one_office.export()
print(one_office.count)     # Row count

Advanced usage#

From here on, the examples will be a bit more complex; we will process the data in order to modify it, filter it, combine it and merge it before exporting or parsing it in another object.

Report apache log#

In this example we will analyze and capture parts of a web server log. For each error code present in the log, we will create a report that will be inserted in a book, where each sheet will contain the details of the error code. In the last sheet, there will be an element counter for every single error present in the report.

import pyreports
import tablib
import re

# Get apache log data: this is a FileManager object
apache_log = pyreports.manager('file', '/var/log/httpd/error.log').read()
# apache log format: regex
regex = '([(\d\.)]+) - - \[(.*?)\] "(.*?)" (\d+) - "(.*?)" "(.*?)"'

# Function than receive Dataset and return a new Dataset
def format_dataset_log(data_input):
    data = tablib.Dataset(headers=['ip', 'date', 'operation', 'code', 'client'])
    for row in data_input:
        log_parts = re.match(regex, row[0]).groups()
        new_row = list(log_parts[:4])
        new_row.append(log_parts[5])
        data.append(new_row)
    return data

# Create a collection of Report objects
all_apache_error = pyreports.ReportBook(title='Apache error on my site')

# Create a Report object based on error code
apache_error_log = format_dataset_log(apache_log)
all_error = set(apache_error_log['code'])
for code in all_error:
    all_apache_error.add(pyreports.Report(apache_error_log, filters=[code], title=f'Error {code}'))

# Count all error code
counter = pyreports.counter(apache_error_log, 'code')
# Append new Report on ReportBook with error code counters
error_counter = tablib.Dataset(counter.values(), headers=counter)
all_apache_error.add(pyreports.Report(error_counter))

# Save ReportBook on Excel
all_apache_error.export('/home/report/apache_log_error_code.xlsx')

We now have a script that parses and breaks an apache httpd log file by error code.

Report e-commerce data#

In this example, we combine data from different e-commerce databases. In addition, we will create two reports: one for the sales, the other for the warehouse. Then once saved, we will create an additional report that combines both of the previous ones.

import pyreports

# Get data from database: a DatabaseManager object
mydb = pyreports.manager('postgresql', host='pssql1.local', database='ecommerce', user='reader', password='pwd0000')
mydb.execute('SELECT * FROM sales')
sales = mydb.fetchall()
mydb.execute('SELECT * FROM warehouse')
warehouse = mydb.fetchall()

# filters
household = ['plates', 'glass', 'fork']
clothes = ['shorts', 'tshirt', 'socks']

# Create sales Report objects
sales_by_household= pyreports.Report(sales, filter=household, title='household sold items')
sales_by_clothes = pyreports.Report(sales, filter=clothes, title='clothes sold items')

# Create warehouse Report objects
warehouse_by_household= pyreports.Report(warehouse, filter=household, title='household items in warehouse')
warehouse_by_clothes = pyreports.Report(warehouse, filter=clothes, title='clothes items in warehouse')

# Create a ReportBook objects
sales_book = pyreports.ReportBook([sales_by_household, sales_by_clothes], filter='Total sold')
warehouse_book = pyreports.ReportBook([warehouse_by_household, warehouse_by_clothes], filter='Total remained')

# Save reports
sales_book.export('/home/report/sales.xlsx')
warehouse_book.export('/home/report/warehouse.xlsx')

# Other report: combine two book
all = sales_book + warehouse_book
all.export('/home/report/all.xlsx')

# Now print to stdout all data
all.export()

Command line report#

In this example, we’re going to create a script that doesn’t save any files. We will read from a database, modify the data so that it is more readable and print it in standard output. We will also see how to use our script with other command line tools.

import pyreports

# Get data from database: a DatabaseManager object
mydb = pyreports.manager('sqllite', database='/var/myapp/myapp.db')
mydb.execute('SELECT * FROM performance')
performance = mydb.fetchall()

# Transform data for command line reader
cmd = pyreports.Executor(performance)

def number_to_second(seconds):
    if isinstance(seconds, int):
        ret = float(int)
        return f'{ret:.2f} s'
    else:
        return seconds

cmd.map(number_to_second)

# Print data
print(cmd.get_data())

Now we can read the db directly from the command line.

$ python performance.py
$ python performance.py | grep -G "12.*"

Note

The examples we can give are almost endless. This library has such flexible python objects that we can adapt them to any use case. You can also use it as a simple database data reader.

Use cases#

As you may have noticed, there are many use cases for this library. The manager objects are so flexible that you can read and write data from any source. Furthermore, thanks to the Executor objects you can filter and modify the data on-demand when you want and restore it at a later time, and then channel it into the Report objects and then into the ReportBook collection objects.

Below, I’ll list other use cases common to both package users and developers:

Export LDAP users and insert them into a database
Read a log file and write it into a database
Find out which LDAP users are present in a web server log file
Backup configuration files by exporting them in yaml format (passwd, httpd.conf, etc)
Calculate access rates of a database
Count how many times an ip address is present in a log file

I could go on indefinitely; anything you can think of about a file, a database and an LDAP server and you need to manipulate or verify the data, this is the library for you.