pyreports example ################# Example scripts using ``pyreports`` module. .. toctree:: Basic usage *********** In this section you will find examples that represent the entire reporting workflow, relying on the *\*Manager* objects as input and output, and the *Executor* object for the process part. Database to file ---------------- In this example, we extract the data from a mysql database, filter it by error code and finally export it to a csv. .. code-block:: python import pyreports # INPUT # Select source: this is a DatabaseManager object mydb = pyreports.manager('mysql', host='mysql1.local', database='login_users', user='dba', password='dba0000') # Get data mydb.execute('SELECT * FROM site_login') site_login = mydb.fetchall() # return Dataset object # PROCESS # Filter data error_login = pyreports.Executor(site_login) # accept Dataset object error_login.filter([400, 401, 403, 404, 500]) # OUTPUT # Save report: this is a FileManager object output = pyreports.manager('csv', '/home/report/error_login.csv') output.write(error_login.get_data()) .. note:: A reflection on this example could be: "Why don't I apply the filter directly in the SQL syntax?" The answer is simple. The advantage of using an *Executor* object is that from general data I can filter or modify (*map* function or with my custom function) without affecting the original Dataset. So much so that I could do several different Executors, process them and then re-merge them into a single Executor, which would be difficult to do with SQL syntax. File to Database ---------------- In this example I have a json file as input, received from a web server, I process it and write to the database. .. code-block:: python import pyreports # INPUT # Return json from GET request on web server: this is a FileManager object web_server_result = pyreports.manager('json', '/home/report/users.json') # Get data users = web_server_result.read() # return Dataset object # PROCESS # Filter data user_int = pyreports.Executor(users) # accept Dataset object user_int.filter(key=lambda record: if record == 'INTERNAL') # My filter is a function user_ext = pyreports.Executor(users) user_ext.filter(key=lambda record: if record == 'EXTERNAL') # OUTPUT # Save report: this is a DatabaseManager object mydb = pyreports.manager('mysql', host='mysql1.local', database='users', user='dba', password='dba0000') # Write to database mydb.executemany("INSERT INTO internal_users(name, surname, employeeType) VALUES(%s, %s, %s)", list(user_int)) mydb.executemany("INSERT INTO external_users(name, surname, employeeType) VALUES(%s, %s, %s)", list(user_ext)) mydb.commit() Combine inputs -------------- In this example, we will take two different inputs, and combine them to export an excel file containing the data processing of the two sources. .. code-block:: python import pyreports # INPUT # Config Unix application file: this is a FileManager object config_file = pyreports.manager('yaml', '/home/myapp.yml') # Console admin: this is a DatabaseManager object mydb = pyreports.manager('mysql', server='mysql1.local', database='admins', user='sa', password='sa0000') # Get data admin_app = config_file.read() # return Dataset object: three column (name, shell, login) mydb.execute('SELECT * FROM console_admins') admins = mydb.fetchall() # return Dataset object: three column (name, shell, login) # PROCESS # Filter data all_console_admins = pyreports.Executor(admins) # accept Dataset object all_console_admins.filter(config_file['shell']) # filter by shells # OUTPUT # Save report: this is a FileManager object output = pyreports.manager('xlsx', '/home/report/all_admins.xlsx') output.write(all_console_admins.get_data()) Simple report ------------- In this example, we use a Report type object to create and filter the data through a function and save it in a csv file, printing the number of lines in total. .. code-block:: python import pyreports OFFICE_FILTER = 'Customer' # Function: filter by office def filter_by_office(value): if value == OFFICE_FILTER: return True # Connect to database mydb = pyreports.manager('postgresql', host='pssql1.local', database='users', user='admin', password='pwd0000') mydb.execute('SELECT * FROM employees') all_employees = mydb.fetchall() # Output to csv output = pyreports.manager('csv', f'/home/report/office_{OFFICE_FILTER}.csv') # All customer employees: Report object one_office = pyreports.Report(all_employees, filters=filter_by_office, title=f'All employees in {OFFICE_FILTER}', count=True, output=output) # Run and save report one_office.export() print(one_office.count) # Row count Advanced usage ************** From here on, the examples will be a bit more complex; we will process the data in order to modify it, filter it, combine it and merge it before exporting or parsing it in another object. Report apache log ----------------- In this example we will analyze and capture parts of a web server log. For each error code present in the log, we will create a report that will be inserted in a book, where each sheet will contain the details of the error code. In the last sheet, there will be an element counter for every single error present in the report. .. code-block:: python import pyreports import tablib import re # Get apache log data: this is a FileManager object apache_log = pyreports.manager('file', '/var/log/httpd/error.log').read() # apache log format: regex regex = '([(\d\.)]+) - - \[(.*?)\] "(.*?)" (\d+) - "(.*?)" "(.*?)"' # Function than receive Dataset and return a new Dataset def format_dataset_log(data_input): data = tablib.Dataset(headers=['ip', 'date', 'operation', 'code', 'client']) for row in data_input: log_parts = re.match(regex, row[0]).groups() new_row = list(log_parts[:4]) new_row.append(log_parts[5]) data.append(new_row) return data # Create a collection of Report objects all_apache_error = pyreports.ReportBook(title='Apache error on my site') # Create a Report object based on error code apache_error_log = format_dataset_log(apache_log) all_error = set(apache_error_log['code']) for code in all_error: all_apache_error.add(pyreports.Report(apache_error_log, filters=[code], title=f'Error {code}')) # Count all error code counter = pyreports.counter(apache_error_log, 'code') # Append new Report on ReportBook with error code counters error_counter = tablib.Dataset(counter.values(), headers=counter) all_apache_error.add(pyreports.Report(error_counter)) # Save ReportBook on Excel all_apache_error.export('/home/report/apache_log_error_code.xlsx') We now have a script that parses and breaks an apache httpd log file by error code. Report e-commerce data ---------------------- In this example, we combine data from different e-commerce databases. In addition, we will create two reports: one for the sales, the other for the warehouse. Then once saved, we will create an additional report that combines both of the previous ones. .. code-block:: python import pyreports # Get data from database: a DatabaseManager object mydb = pyreports.manager('postgresql', host='pssql1.local', database='ecommerce', user='reader', password='pwd0000') mydb.execute('SELECT * FROM sales') sales = mydb.fetchall() mydb.execute('SELECT * FROM warehouse') warehouse = mydb.fetchall() # filters household = ['plates', 'glass', 'fork'] clothes = ['shorts', 'tshirt', 'socks'] # Create sales Report objects sales_by_household= pyreports.Report(sales, filter=household, title='household sold items') sales_by_clothes = pyreports.Report(sales, filter=clothes, title='clothes sold items') # Create warehouse Report objects warehouse_by_household= pyreports.Report(warehouse, filter=household, title='household items in warehouse') warehouse_by_clothes = pyreports.Report(warehouse, filter=clothes, title='clothes items in warehouse') # Create a ReportBook objects sales_book = pyreports.ReportBook([sales_by_household, sales_by_clothes], filter='Total sold') warehouse_book = pyreports.ReportBook([warehouse_by_household, warehouse_by_clothes], filter='Total remained') # Save reports sales_book.export('/home/report/sales.xlsx') warehouse_book.export('/home/report/warehouse.xlsx') # Other report: combine two book all = sales_book + warehouse_book all.export('/home/report/all.xlsx') # Now print to stdout all data all.export() Command line report ------------------- In this example, we're going to create a script that doesn't save any files. We will read from a database, modify the data so that it is more readable and print it in standard output. We will also see how to use our script with other command line tools. .. code-block:: python import pyreports # Get data from database: a DatabaseManager object mydb = pyreports.manager('sqllite', database='/var/myapp/myapp.db') mydb.execute('SELECT * FROM performance') performance = mydb.fetchall() # Transform data for command line reader cmd = pyreports.Executor(performance) def number_to_second(seconds): if isinstance(seconds, int): ret = float(int) return f'{ret:.2f} s' else: return seconds cmd.map(number_to_second) # Print data print(cmd.get_data()) Now we can read the db directly from the command line. .. code-block:: console $ python performance.py $ python performance.py | grep -G "12.*" .. note:: The examples we can give are almost endless. This library has such flexible python objects that we can adapt them to any use case. You can also use it as a simple database data reader. Use cases ********* As you may have noticed, there are many use cases for this library. The ``manager`` objects are so flexible that you can read and write data from any source. Furthermore, thanks to the ``Executor`` objects you can filter and modify the data on-demand when you want and restore it at a later time, and then channel it into the ``Report`` objects and then into the ``ReportBook`` collection objects. Below, I'll list other use cases common to both package users and developers: - Export LDAP users and insert them into a database - Read a log file and write it into a database - Find out which LDAP users are present in a web server log file - Backup configuration files by exporting them in yaml format (passwd, httpd.conf, etc) - Calculate access rates of a database - Count how many times an ip address is present in a log file I could go on indefinitely; anything you can think of about a file, a database and an LDAP server and you need to manipulate or verify the data, this is the library for you.