Database Management System

The io module was written to contain functions and classes that can simplify and standardize the way a user interacts with Relational Database Management Systems(RDBMS). This module contains classes that a user can use and expand upong to interact with Microsoft SQL-Server, MySQL, PostGreSQL, and SQLite. In addition, that class contains a comman interface for all four RDBM systems. NOTE: Future versions of this library may also contain interfaces for OracleDB.

Database Commonality

This library impelments several classes that enable a user to use and interact with different database. Each class is written with the Protocol Class class definition so they will have a common interface. Each class is further described in the SQL Classes section; however, it should be noticed, that with exception of the class instantiation, each method maintains the same interface.

Protocol Class

The Protocol class RelationalDB provides an implicit interface for the development of all database classes. This class provides a contract to all future developers that guides the implementation of each class.

class cobralib.db.RelationalDB(*args, **kwargs)[source]

A protocol class to handle structural sub-typing of classes used to read and interact with relational databases

Variables:
  • database – The name of the database currently being used

  • conn – The connection attribute of the database management system

  • cur – The cursor attribute of the database management system.

  • db_engine – A string representing the type of database engine

Raises:

ConnectionError – If a connection can not be established

More to be added later

change_database(database: str) None[source]

Method to change the connection from one database to another.

Parameters:

database – The new database or database file to be used. If a database file, this must include the path length.

Raises:

ConnectionError – if query fails.

close_connection() None[source]

Close the connection to the database managment system

Raises:

ConnectionError – If the connection does not exist.

csv_to_table(csv_file: str, table_name: str, csv_headers: dict[str, type], table_headers: list = None, delimiter: str = ',', skip: int = 0) None[source]

Read data from a CSV or TXT file and insert it into the specified table.

Parameters:
  • csv_file – The path to the CSV file or TXT file.

  • table_name – The name of the table.

  • csv_headers – The names of the columns in the TXT file and datatypes as a dictionary.

  • table_headers – The names of the columns in the table (default is None, assumes CSV column names and table column names are the same).

  • delimiter – The seperating delimeter in the text file. Defaulted to ‘,’ for a CSV file, but can work with other delimeters

  • skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0

Raises:
  • ValueError – If the CSV file or table name is not provided, or if the number of CSV columns and table columns mismatch.

  • Error – If the data insertion fails or the data types are incompatible.

db_engine() str[source]

Protection for the _db_engine attribute

excel_to_table(excel_file: str, table_name: str, excel_headers: dict[str, type], table_headers: list = None, sheet_name: str = 'Sheet1', skip: int = 0) None[source]

Read data from an Excel file and insert it into the specified table.

Parameters:
  • excel_file – The path to the Excel file.

  • table_name – The name of the table.

  • excel_headers – The names of the columns in the Excel file and their data types as a dictionary.

  • table_headers – The names of the columns in the table (default is None, assumes Excel column names and table column names are the same).

  • sheet_name – The name of the sheet in the Excel file (default is ‘Sheet1’).

  • skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0.

Raises:
  • ValueError – If the Excel file, table name, or sheet name is not provided, or if the number of Excel columns and table columns mismatch.

  • Error – If the data insertion fails or the data types are incompatible.

execute_query(query: str, params: tuple = ()) DataFrame[source]

Execute a query with placeholders and return the result as a Pandas DataFrame. The user of this class should ensure that when applicable they parameteratize the inputs to this method to minimize the potential for an injection attack

Parameters:
  • query – The query with placeholders.

  • params – The values to be substituted into the placeholders (default is an empty tuple).

Returns:

A Pandas DataFrame with the query result.

Raises:
  • ValueError – If the database name is not provided.

  • ConnectionError – If the query execution fails.

get_database_tables(database: str = None) DataFrame[source]

Method to retrieve a dataframe containing a list of all tables wtihin the SQL database or database file.

Parameters:

database – The name of the database or database file that the tables will be retrieved from.

Return df:

A dataframe containing all information relating to the tables within the database or database file.

Raises:

ConnectionError – If program is not able to get tables

get_databases() DataFrame[source]

Retrieve the names of all databases available to the user.

Returns:

A pandas dataframe of database names with a header of Databases

get_table_columns(table_name: str, database: str = None) DataFrame[source]

Retrieve the names and data types of the columns within the specified table.

Parameters:
  • table_name – The name of the table.

  • database – The database name, defaulted to currently selected database or None

Returns:

A Pandas Dataframe with the table information

Raises:

ValueError – If the database is not selected at the class level :raises ConnectionError: If the columns cannot be retrieved.

pdf_to_table(pdf_file: str, table_name: str, pdf_headers: dict[str, type], table_columns: list = None, table_idx: int = 0, page_num: int = 0, skip: int = 0) None[source]

Read a table from a PDF file and insert it into the specified SQLite table.

Parameters:
  • pdf_file – The path to the PDF file.

  • table_name – The name of the SQLite table.

  • pdf_headers – A dictionary of column names in the PDF and their data types.

  • table_columns – The names of the columns in the SQLite table (default is None, assumes PDF column names and SQLite column names are the same).

  • table_idx – Index of the table in the PDF (default: 0).

  • page_num – Page number from which to extract the table (default: 0).

  • skip – The number of rows to skip in the PDF table.

Raises:
  • ValueError – If the PDF file, table name, or sheet name is not provided, or if the number of PDF headers and table columns mismatch.

  • Error – If the data insertion fails or the data types are incompatible.

property conn: Any

Protection for the _conn attribute

property cur: Any

Protection for the _cur attribute

property database: Any

Protection for the _database attribute

SQL Classes

This section outlines the classes that enable a user to interface with MySQL, PostGreSQL, SQLite, and Microsoft SQLServer. With exception of the class instantiation, the interfaces for the methods of each class are identical, which enables a user to easily switch between the use of the different databases.

class cobralib.db.MySQLDB(username: str, password: str, database: str, port: int = 3306, hostname: str = 'localhost')[source]

A class for connecting to MySQL databases using mysql-connector-python. The user can access the conn and cur variables, where conn is the connection variable and cur is the connection.cursor() method to expand the capability of this class beyond its methods.

Parameters:
  • username – The username for the database connection.

  • password – The password for the database connection.

  • port – The port number for the database connection. Defaulted to 3306

  • hostname – The hostname for the database connection (default is ‘localhost’).

  • database – The database you wish to connect to, defaulted to None

Raises:

ConnectionError – If a connection can not be established

Variables:
  • conn – The connection attribute of the mysql-connector-python module.

  • cur – The cursor method for the mysql-connector-python module.

  • db_engine – A string describing the database engine

  • database – The name of the database currently being used.

change_database(database: str) None[source]

Change to the specified database within the server.

Parameters:

database – The name of the database to change to.

Raises:

ConnectionError – if query fails.

close_connection() None[source]

Close the connection to the server.

Raises:

ConnectionError – If the connection does not exist.

csv_to_table(csv_file: str, table_name: str, csv_headers: dict[str, type], table_headers: list = None, delimiter: str = ',', skip: int = 0) None[source]

Read data from a CSV or TXT file and insert it into the specified table.

Parameters:
  • csv_file – The path to the CSV file or TXT file.

  • table_name – The name of the table.

  • csv_headers – The names of the columns in the TXT file and datatypes as a dictionary.

  • table_headers – The names of the columns in the table (default is None, assumes CSV column names and table column names are the same).

  • delimiter – The seperating delimeter in the text file. Defaulted to ‘,’ for a CSV file, but can work with other delimeters

  • skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0

Raises:
  • ValueError – If the CSV file or table name is not provided, or if the number of CSV columns and table columns mismatch.

  • Error – If the data insertion fails or the data types are incompatible.

Assune we have a csv table with the following Columns, FirstName, MiddleName, LastName. Within the Names database we have a table with no entries that has columns for First and Last.

from cobralib.io import MySQLDB

db = MySQLDB('username', 'password', port=3306, hostname='localhost')
db.change_db('Names')
db.csv_to_table('csv_file.csv', 'FirstLastName',
                ['FirstName': str, 'LastName': str],
                ['First', 'Last'])
query = "SELDCT * FROM Names;"
result = db.query_db(query)
print(result)
>> index  name_id First   Last
   0      1       Jon     Webb
   1      2       Fred    Smith
   2      3       Jillian Webb

If instead of a csv file, you have a text file that uses spaces as a delimeter, and the first two rows are consumed by file metadata before reaching the header, the following code will work

from cobralib.io import MySQLDB

db = MySQLDB('username', 'password', port=3306, hostname='localhost')
db.change_db('Names')
db.csv_to_table('txt_file.txt', 'FirstLastName',
                ['FirstName': str, 'LastName': str],
                ['First', 'Last'], delemeter=r"\s+", skip=2)
query = "SELDCT * FROM Names;"
result = db.query_db(query)
print(result)
>> index  name_id First   Last
   0      1       Jon     Webb
   1      2       Fred    Smith
   2      3       Jillian Webb
excel_to_table(excel_file: str, table_name: str, excel_headers: dict[str, type], table_headers: list = None, sheet_name: str = 'Sheet1', skip: int = 0) None[source]

Read data from an Excel file and insert it into the specified table.

Parameters:
  • excel_file – The path to the Excel file.

  • table_name – The name of the table.

  • excel_headers – The names of the columns in the Excel file and their data types as a dictionary

  • table_headers – The names of the columns in the table (default is None, assumes Excel column names and table column names are the same).

  • sheet_name – The name of the sheet in the Excel file (default is ‘Sheet1’).

  • skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0

Raises:
  • ValueError – If the Excel file, table name, or sheet name is not provided, or if the number of Excel columns and table columns mismatch.

  • Error – If the data insertion fails or the data types are incompatible.

Assune we have an excel table with the following Columns, FirstName, MiddleName, LastName. Within the Names database we have a table with no entries that has columns for First and Last.

from cobralib.io import MySQLDB

db = MySQLDB('username', 'password', port=3306, hostname='localhost')
db.change_db('Names')
db.csv_to_table('excel_file.xlsx', 'FirstLastName',
                {'FirstName': str, 'LastName': str},
                ['First', 'Last'])
query = "SELDCT * FROM Names;"
result = db.query_db(query)
print(result)
>> index  name_id First   Last
   0      1       Jon     Webb
   1      2       Fred    Smith
   2      3       Jillian Webb
execute_query(query: str, params: tuple = ()) DataFrame[source]

Execute a query with placeholders and return the result as a Pandas DataFrame. The user of this class should ensure that when applicable they parameteratize the inputs to this method to minimize the potential for an injection attack

Parameters:
  • query – The query with placeholders.

  • params – The values to be substituted into the placeholders (default is an empty tuple).

Returns:

A Pandas DataFrame with the query result.

Raises:
  • ValueError – If the database name is not provided.

  • ConnectionError – If the query execution fails.

Example usage when parameters are provided:

from cobralib.io import MySQLDB

db = MySQLDB('username', 'password', port=3306, hostname='localhost')
query = "SELECT * FROM names WHERE name_id = %s"
params = (2,)
result = db.execute_query(query, params)
print(result)
>> index  name_id  FirstName  LastName
   0      2        Fred       Smith

Example usage when no parameters are provided:

from cobralib.io import MySQLDB

db = MySQLDB('username', 'password', port=3306, hostname='localhost')
query = "SELECT * FROM names"
result = db.execute_query(query)
print(result)
>> index  name_id  FirstName  LastName
 0        1        Jon        Webb
 1        2        Fred       Smith
 2        3        Jillian    Webb
get_database_tables(database: str = None) DataFrame[source]

Retrieve the names of all tables within the current database.

Parameters:

database – Database name, defaulted to currently selected database or None

Returns:

A pandas dataframe of table names with a header of Tables

Raises:
  • ValueError – If no database is currently selected.

  • ConnectionError – If program is not able to get tables

Assuming the user has a database titled Inventory which had the tables Names, Product, Sales.

from cobralib.io import MySQLDB

db = MySQLDB('username', 'password', port=3306, hostname='localhost')
dbs = db.get_database_tables("Inventory")
db.close_conn()
print(dbs)
>> index  Tables
   0      Names
   1      Product
   2      Sales
get_databases() DataFrame[source]

Retrieve the names of all databases available to the user.

Returns:

A pandas dataframe of database names with a header of Databases

Raises:

ConnectionError – If program fails to retrive database

If you assume the server has three databases available to the username, and those databases were Inventory, Address, project_data, you could use this class with the following commands.

from cobralib.io import MySQLDB

db = MySQLDB('username', 'password', port=3306, hostname='localhost')
dbs = db.get_databases()
db.close_conn()
print(dbs)
>> index  Databases
   0      Address
   1      Inventory
   2      project_data
get_table_columns(table_name: str, database: str = None) DataFrame[source]

Retrieve the names and data types of the columns within the specified table.

param table_name:

The name of the table.

param database:

The database name, defaulted to currently selected database or None

return:

A pandas dataframe with headers ot Field, Type, Null, Key, Default, and Extra

raises ValueError:

If the database is not selected at the class level

raises ConnectionError:

If the columns cannot be retrieved.

This example shows a scenario where the database analyst has navigated into a database

from cobralib.io import MySQLDB

db = MySQLDB('username', 'password', port=3306, hostname='localhost')
db.change_database('Address')
query = '''CREATE TABLE IF NOT EXIST Names (
    name_id INTEGER AUTO_INCREMENT,
    FirstName VARCHAR(20) NOT NULL,
    MiddleName VARCHAR(20),
    LastName VARCHAR(20) NOT NULL,
    PRIMARY KEY (name_id)
);
'''
db.execute_query(query)
cols = db.get_table_columns('Names')
db.close_conn()
print(cols)
>> index Field      Type        Null   Key     Default  Extra
   0     name_id    Integer     True   Primary  False   auto_increment
   1     FirstName  Varchar(20) False  NA       False   None
   2     MiddleName Varchar(20) True   NA       False   None
   3     LastName   Varchar(20) False  NA       False   None

However, this code can also be executed when not in the database

from cobralib.io import MySQLDB

db = MySQLDB('username', 'password', port=3306, hostname='localhost')
cols = db.get_table_columns('Names', 'Address')
db.close_conn()
print(cols)
>> index Field      Type        Null   Key     Default  Extra
   0     name_id    Integer     True   Primary  False   auto_increment
   1     FirstName  Varchar(20) False  NA       False   None
   2     MiddleName Varchar(20) True   NA       False   None
   3     LastName   Varchar(20) False  NA       False   None
pdf_to_table(pdf_file: str, table_name: str, pdf_headers: dict[str, type], table_columns: list = None, table_idx: int = 0, page_num: int = 0, skip: int = 0) None[source]

Read a table from a PDF file and insert it into the specified MySQL table.

Parameters:
  • pdf_file – The path to the PDF file.

  • table_name – The name of the MySQL table.

  • pdf_headers – A dictionary of column names in the PDF and their data types.

  • table_columns – The names of the columns in the MySQL table (default is None, assumes PDF column names and MySQL column names are the same).

  • table_idx – Index of the table in the PDF (default: 0).

  • page_num – Page number from which to extract the table (default: 0).

  • skip – The number of rows to skip in the PDF table.

Raises:
  • ValueError – If the PDF file, table name, or sheet name is not provided, or if the number of PDF headers and table columns mismatch.

  • Error – If the data insertion fails or the data types are incompatible.

property conn: Any

Protection for the _conn attribute

property cur: Any

Protection for the _cur attribute

property database: Any

Protection for the _database attribute

property db_engine: str

Protection for the _db_engine attribute

class cobralib.db.PostGreSQLDB(username: str, password: str, database: str, port: int = 5432, hostname: str = 'localhost')[source]

Initialize the database connection to a PostgreSQL server.

Parameters:
  • username – The PostgreSQL username.

  • password – The PostgreSQL password.

  • database – The name of the database to connect to.

  • port – The port number for the PostgreSQL server (default is 5432).

  • hostname – The server’s hostname (default is ‘localhost’).

Raises:

ConnectionError – If a connection can not be established.

Variables:
  • conn – The connection attribute of the sqlite3 module.

  • cur – The cursor method for the sqlite3 module.

  • database – The name of the database currently being used.

  • db_engine – A string describing the database engine

change_database(database: str) None[source]

Change to a different PostgreSQL database.

Parameters:

database – The name of the database to switch to.

Raises:

ConnectionError – If there’s an issue establishing a connection to the new database.

close_connection()[source]

Close the connection to the PostgreSQL server and database.

Raises:

ConnectionError – If there’s an issue closing the connection or cursor.

csv_to_table(csv_file: str, table_name: str, csv_headers: dict[str, type], table_headers: list = None, delimiter: str = ',', skip: int = 0) None[source]

Read data from a CSV or TXT file and insert it into the specified table.

Parameters:
  • csv_file – The path to the CSV file or TXT file.

  • table_name – The name of the table.

  • csv_headers – The names of the columns in the TXT file and datatypes as a dictionary.

  • table_headers – The names of the columns in the table (default is None, assumes CSV column names and table column names are the same).

  • delimiter – The seperating delimeter in the text file. Defaulted to ‘,’ for a CSV file, but can work with other delimeters

  • skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0

Raises:
  • ValueError – If the CSV file or table name is not provided, or if the number of CSV columns and table columns mismatch.

  • Error – If the data insertion fails or the data types are incompatible.

Assune we have a csv table with the following Columns, FirstName, MiddleName, LastName. Within the Names database we have a table with no entries that has columns for First and Last.

from cobralib.io import PostGreSQLDB

db = PostGreSQL('username', 'password', 'Names', port=3306,
                hostname='localhost')
db.csv_to_table('csv_file.csv', 'FirstLastName',
                ['FirstName': str, 'LastName': str],
                ['First', 'Last'])
query = "SELDCT * FROM Names;"
result = db.query_db(query)
print(result)
>> index  name_id First   Last
   0      1       Jon     Webb
   1      2       Fred    Smith
   2      3       Jillian Webb

If instead of a csv file, you have a text file that uses spaces as a delimeter, and the first two rows are consumed by file metadata before reaching the header, the following code will work

from cobralib.io import PostGreSQLDB

db = PostGreSQLDB('username', 'password', 'Names',
                   port=3306, hostname='localhost')
db.csv_to_table('txt_file.txt', 'FirstLastName',
                ['FirstName': str, 'LastName': str],
                ['First', 'Last'], delemeter=r"\s+", skip=2)
query = "SELDCT * FROM Names;"
result = db.query_db(query)
print(result)
>> index  name_id First   Last
   0      1       Jon     Webb
   1      2       Fred    Smith
   2      3       Jillian Webb

… [rest of the docstring remains unchanged] …

excel_to_table(excel_file: str, table_name: str, excel_headers: dict[str, type], table_headers: list = None, sheet_name: str = 'Sheet1', skip: int = 0) None[source]

Read data from an Excel file and insert it into the specified table.

Parameters:
  • excel_file – The path to the Excel file.

  • table_name – The name of the table.

  • excel_headers – The names of the columns in the Excel file and their data types as a dictionary

  • table_headers – The names of the columns in the table (default is None, assumes Excel column names and table column names are the same).

  • sheet_name – The name of the sheet in the Excel file (default is ‘Sheet1’).

  • skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0

Raises:
  • ValueError – If the Excel file, table name, or sheet name is not provided, or if the number of Excel columns and table columns mismatch.

  • Error – If the data insertion fails or the data types are incompatible.

Assune we have an excel table with the following Columns, FirstName, MiddleName, LastName. Within the Names database we have a table with no entries that has columns for First and Last.

from cobralib.io import PostGreSQLDB

db = PostGreSQL('username', 'password', 'Names',
                port=3306, hostname='localhost')
db.csv_to_table('excel_file.xlsx', 'FirstLastName',
                {'FirstName': str, 'LastName': str},
                ['First', 'Last'])
query = "SELDCT * FROM Names;"
result = db.query_db(query)
print(result)
>> index  name_id First   Last
   0      1       Jon     Webb
   1      2       Fred    Smith
   2      3       Jillian Webb
execute_query(query: str, params: tuple = None) DataFrame[source]

Executes the provided SQL query and returns the results as a pandas DataFrame.

Parameters:
  • query – The SQL query to execute.

  • params – A tuple of parameters to bind to the query.

Returns:

A pandas DataFrame containing the query results.

get_database_tables(database: str = None) DataFrame[source]

Fetch a list of tables from the specified or current PostgreSQL database.

Parameters:

database – The name of the database to fetch tables from. If not provided, uses the current database.

Returns:

A pandas DataFrame containing the list of tables with the column header “Tables”.

get_databases() DataFrame[source]

Fetch a list of databases from the PostgreSQL server.

Returns:

A pandas DataFrame containing the list of databases with the column header “Databases”.

get_table_columns(table_name: str, database: str = None) DataFrame[source]

Fetch column details for the given table in the specified or current PostgreSQL database.

Parameters:
  • table_name – The name of the table to fetch column details for.

  • database – The name of the database the table resides in. If not provided, uses the current database.

Returns:

A pandas DataFrame containing column details.

pdf_to_table(pdf_file: str, table_name: str, pdf_headers: dict[str, type], table_columns: list = None, table_idx: int = 0, page_num: int = 0, skip: int = 0) None[source]

Read a table from a PDF file and insert it into the specified MySQL table.

Parameters:
  • pdf_file – The path to the PDF file.

  • table_name – The name of the MySQL table.

  • pdf_headers – A dictionary of column names in the PDF and their data types.

  • table_columns – The names of the columns in the MySQL table (default is None, assumes PDF column names and MySQL column names are the same).

  • table_idx – Index of the table in the PDF (default: 0).

  • page_num – Page number from which to extract the table (default: 0).

  • skip – The number of rows to skip in the PDF table.

Raises:
  • ValueError – If the PDF file, table name, or sheet name is not provided, or if the number of PDF headers and table columns mismatch.

  • Error – If the data insertion fails or the data types are incompatible.

property conn: Any

Protection for the _conn attribute

property cur: Any

Protection for the _cur attribute

property database: Any

Protection for the _database attribute

property db_engine: str

Protection for the _db_engine attribute

class cobralib.db.SQLiteDB(database: str)[source]

A class for connection to a SQLite database file using the sqlite3 python package. The usser can access the conn and cur variables, where conn is the connection variable and cur is the connection.cursor() method to expand the capability of this class beyond its methods. NOTE: If the user passes an incorrect database name to the constructor, the class will assume that the user wants to create a database of that name, and will create a new database file.

Parameters:

database – The name of the database file to include its path length.

Raises:

ConnectionError – If a connection can not be established.

Variables:
  • conn – The connection attribute of the sqlite3 module.

  • cur – The cursor method for the sqlite3 module.

  • database – The name of the database currently being used.

  • db_engine – A string describing the database engine

change_database(database: str) None[source]

Method to change the connection from one database file to another

Paramn database:

The new database file to be used to include the path length

close_connection() None[source]

Close the connection to tjhe SQLite database

csv_to_table(csv_file: str, table_name: str, csv_headers: dict[str, type], table_headers: list = None, delimiter: str = ',', skip: int = 0) None[source]

Read data from a CSV or TXT file and insert it into the specified table.

Parameters:
  • csv_file – The path to the CSV file or TXT file.

  • table_name – The name of the table.

  • csv_headers – The names of the columns in the TXT file and datatypes as a dictionary.

  • table_headers – The names of the columns in the table (default is None, assumes CSV column names and table column names are the same).

  • delimiter – The seperating delimeter in the text file. Defaulted to ‘,’ for a CSV file, but can work with other delimeters

  • skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0

Raises:
  • ValueError – If the CSV file or table name is not provided, or if the number of CSV columns and table columns mismatch.

  • Error – If the data insertion fails or the data types are incompatible.

excel_to_table(excel_file: str, table_name: str, excel_headers: dict[str, type], table_headers: list = None, sheet_name: str = 'Sheet1', skip: int = 0) None[source]

Read data from an Excel file and insert it into the specified table.

Parameters:
  • excel_file – The path to the Excel file.

  • table_name – The name of the table.

  • excel_headers – The names of the columns in the Excel file and their data types as a dictionary.

  • table_headers – The names of the columns in the table (default is None, assumes Excel column names and table column names are the same).

  • sheet_name – The name of the sheet in the Excel file (default is ‘Sheet1’).

  • skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0.

Raises:
  • ValueError – If the Excel file, table name, or sheet name is not provided, or if the number of Excel columns and table columns mismatch.

  • Error – If the data insertion fails or the data types are incompatible.

execute_query(query: str, params: tuple = ()) DataFrame[source]

Execute a query with placeholders and return the result as a Pandas DataFrame. The user of this class should ensure that when applicable they parameteratize the inputs to this method to minimize the potential for an injection attack

Parameters:
  • query – The query with placeholders.

  • params – The values to be substituted into the placeholders (default is an empty tuple).

Returns:

A Pandas DataFrame with the query result.

Raises:
  • ValueError – If the database name is not provided.

  • Error – If the query execution fails.

Example usage when parameters are provided:

from mylib.io import SQLiteDB

db = SQLiteDB('example.db')
query = "SELECT * FROM names WHERE name_id = ?"
params = (2,)
result = db.execute_query(query, params)
print(result)
>> index  name_id  FirstName  LastName
   0      2        Fred       Smith

Example usage when no parameters are provided:

from mylib.io import SQLiteDB

db = SQLiteDB('example.db')
query = "SELECT * FROM names"
result = db.execute_query(query)
print(result)
>> index  name_id  FirstName  LastName
 0        1        Jon        Webb
 1        2        Fred       Smith
 2        3        Jillian    Webb
get_database_tables(database: str = None) DataFrame[source]

Method the retrieve a dataframe containing a list of all tables within a SQLite database file. If the user does not pass a database name, the method will return the list of tables in the current database. However, the user can also pass this method the name of another database file, and this will return a list of tables in that database file/

Parameters:

database – The name of the database or database file that the tables will be retrieved from.

Return df:

A dataframe containing all information relating to the tables within the database or database file.

Assuming the user has a database titled Inventory which had the tables Names, Product, Sales.

from cobralib.io import SQLiteDB

db = SQLiteDB('test.db')
dbs = db.get_database_tables("Inventory")
db.close_connection()
print(dbs)
>> index  Tables
   0      Names
   1      Product
   2      Sales
get_databases() DataFrame[source]

Method included for compatibility with RelationalDB Protocol class. This method returns an empty dataframe since SQLite does not support true databases.

:return : An empty pandas dataframe

get_table_columns(table_name: str, database: str = None) DataFrame[source]

Retrieve the names and data types of the columns within the specified table.

param table_name:

The name of the table.

param database:

The database name, defaulted to currently selected database or None

return:

A pandas dataframe with headers ot Field, Type, Null, Key, Default, and Extra

raises ValueError:

If the database is not selected at the class level

raises ConnectionError:

If the columns cannot be retrieved.

This example shows a scenario where the database analyst has navigated into a database

from cobralib.io import SQLiteDB

db = SQLiteDB('test_db.db')
query = '''CREATE TABLE IF NOT EXIST Names (
    name_id INTEGER AUTO_INCREMENT,
    FirstName VARCHAR(20) NOT NULL,
    MiddleName VARCHAR(20),
    LastName VARCHAR(20) NOT NULL,
    PRIMARY KEY (name_id)
);
'''
db.execute_query(query)
cols = db.get_table_columns('Names')
db.close_conn()
print(cols)
>> index Field      Type        Null   Key     Default  Extra
   0     name_id    Integer     True   Primary  False   autoincrement
   1     FirstName  Varchar(20) False  NA       False   None
   2     MiddleName Varchar(20) True   NA       False   None
   3     LastName   Varchar(20) False  NA       False   None

However, this code can also be executed when not in the database

from cobralib.io import MySQLDB

db = MySQLDB('username', 'password', port=3306, hostname='localhost')
cols = db.get_table_columns('Names', 'Address')
db.close_conn()
print(cols)
>> index Field      Type        Null   Key     Default  Extra
   0     name_id    Integer     True   Primary  False   autoincrement
   1     FirstName  Varchar(20) False  NA       False   None
   2     MiddleName Varchar(20) True   NA       False   None
   3     LastName   Varchar(20) False  NA       False   None
pdf_to_table(pdf_file: str, table_name: str, pdf_headers: dict[str, type], table_columns: list = None, table_idx: int = 0, page_num: int = 0, skip: int = 0) None[source]

Read a table from a PDF file and insert it into the specified SQLite table.

Parameters:
  • pdf_file – The path to the PDF file.

  • table_name – The name of the SQLite table.

  • pdf_headers – A dictionary of column names in the PDF and their data types.

  • table_columns – The names of the columns in the SQLite table (default is None, assumes PDF column names and SQLite column names are the same).

  • table_idx – Index of the table in the PDF (default: 0).

  • page_num – Page number from which to extract the table (default: 0).

  • skip – The number of rows to skip in the PDF table.

Raises:
  • ValueError – If the PDF file, table name, or sheet name is not provided, or if the number of PDF headers and table columns mismatch.

  • Error – If the data insertion fails or the data types are incompatible.

property conn: Any

Protection for the _conn attribute

property cur: Any

Protection for the _cur attribute

property database: Any

Protection for the _database attribute

property db_engine: str

Protection for the _db_engine attribute

class cobralib.db.SQLServerDB(username: str, password: str, database: str, port: int = 1433, hostname: str = 'localhost', cert: str = 'yes', driver: str = '{ODBC Driver 18 for SQL Server}')[source]

Initialize the SQLServerDB object with connection parameters.

Parameters:
  • username – The username to connect to the database.

  • password – The password to connect to the database.

  • port – The port number to use for the connection. Default is 1433.

  • hostname – The database server name or IP address. Default is ‘localhost’.

  • database – The initial database to connect to (can be changed later).

  • cert – yes to trust certificat and no to not trust certificate without authentication. Defaulted to yes

  • driver – The ODBC driver to use for connection. Defaulted to “{ODBC Driver 18 for SQL Server}”

Raises:

ConnectionError – If a connection can not be established.

Variables:
  • conn – The connection attribute of the sqlite3 module.

  • cur – The cursor method for the sqlite3 module.

  • database – The name of the database currently being used.

  • db_engine – A string describing the database engine

change_database(database: str)[source]

Change the active database for the current connection.

Parameters:

database – The name of the database to switch to.

Raises:

ConnectionError – if query fails.

close_connection()[source]

Close the database connection.

csv_to_table(csv_file: str, table_name: str, csv_headers: dict[str, type], table_headers: list = None, delimiter: str = ',', skip: int = 0) None[source]

Read data from a CSV or TXT file and insert it into the specified table.

Parameters:
  • csv_file – The path to the CSV file or TXT file.

  • table_name – The name of the table.

  • csv_headers – The names of the columns in the TXT file and datatypes as a dictionary.

  • table_headers – The names of the columns in the table (default is None, assumes CSV column names and table column names are the same).

  • delimiter – The seperating delimeter in the text file. Defaulted to ‘,’ for a CSV file, but can work with other delimeters

  • skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0

Raises:
  • ValueError – If the CSV file or table name is not provided, or if the number of CSV columns and table columns mismatch.

  • Error – If the data insertion fails or the data types are incompatible.

Assune we have a csv table with the following Columns, FirstName, MiddleName, LastName. Within the Names database we have a table with no entries that has columns for First and Last.

from cobralib.io import SQLServerDB

db = SQLServerDB('username', 'password', port=3306, hostname='localhost')
db.change_db('Names')
db.csv_to_table('csv_file.csv', 'FirstLastName',
                ['FirstName': str, 'LastName': str],
                ['First', 'Last'])
query = "SELDCT * FROM Names;"
result = db.query_db(query)
print(result)
>> index  name_id First   Last
   0      1       Jon     Webb
   1      2       Fred    Smith
   2      3       Jillian Webb

If instead of a csv file, you have a text file that uses spaces as a delimeter, and the first two rows are consumed by file metadata before reaching the header, the following code will work

from cobralib.io import SQLServerDB

db = SQLServerDB('username', 'password', port=3306, hostname='localhost')
db.change_db('Names')
db.csv_to_table('txt_file.txt', 'FirstLastName',
                ['FirstName': str, 'LastName': str],
                ['First', 'Last'], delemeter=r"\s+", skip=2)
query = "SELDCT * FROM Names;"
result = db.query_db(query)
print(result)
>> index  name_id First   Last
   0      1       Jon     Webb
   1      2       Fred    Smith
   2      3       Jillian Webb
excel_to_table(excel_file: str, table_name: str, excel_headers: dict[str, type], table_headers: list = None, sheet_name: str = 'Sheet1', skip: int = 0) None[source]

Read data from an Excel file and insert it into the specified table.

Parameters:
  • excel_file – The path to the Excel file.

  • table_name – The name of the table.

  • excel_headers – The names of the columns in the Excel file and their data types as a dictionary

  • table_headers – The names of the columns in the table (default is None, assumes Excel column names and table column names are the same).

  • sheet_name – The name of the sheet in the Excel file (default is ‘Sheet1’).

  • skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0

Raises:
  • ValueError – If the Excel file, table name, or sheet name is not provided, or if the number of Excel columns and table columns mismatch.

  • Error – If the data insertion fails or the data types are incompatible.

Assune we have an excel table with the following Columns, FirstName, MiddleName, LastName. Within the Names database we have a table with no entries that has columns for First and Last.

from cobralib.io import SQLServerDB

db = SQLServerDB('username', 'password', port=3306, hostname='localhost')
db.change_db('Names')
db.csv_to_table('excel_file.xlsx', 'FirstLastName',
                {'FirstName': str, 'LastName': str},
                ['First', 'Last'])
query = "SELDCT * FROM Names;"
result = db.query_db(query)
print(result)
>> index  name_id First   Last
   0      1       Jon     Webb
   1      2       Fred    Smith
   2      3       Jillian Webb
execute_query(query: str, params: tuple = ()) DataFrame[source]

Execute a given query on the SQL Server database.

Parameters:
  • query – The SQL query string to execute.

  • params – Optional tuple containing parameters for the query.

Returns:

DataFrame containing the query results if any, otherwise an empty DataFrame.

Raises:
  • ValueError – If the database name is not provided.

  • ConnectionError – If the query execution fails.

Example usage when parameters are provided:

from cobralib.io import SQLServerDB

db = SQLServerDB('username', 'password', port=3306, hostname='localhost')
query = "SELECT * FROM names WHERE name_id = %s"
params = (2,)
result = db.execute_query(query, params)
print(result)
>> index  name_id  FirstName  LastName
   0      2        Fred       Smith

Example usage when no parameters are provided:

from cobralib.io import SQLServerDB

db = SQLServerDB('username', 'password', port=3306, hostname='localhost')
query = "SELECT * FROM names"
result = db.execute_query(query)
print(result)
>> index  name_id  FirstName  LastName
 0        1        Jon        Webb
 1        2        Fred       Smith
 2        3        Jillian    Webb
get_database_tables(database: str = None) DataFrame[source]

Retrieve a list of tables from the given or current SQL Server database.

Parameters:

database – Optional name of the database to fetch tables from.

Returns:

DataFrame containing the table names.

Raises:
  • ValueError – If no database is currently selected.

  • ConnectionError – If program is not able to get tables

Assuming the user has a database titled Inventory which had the tables Names, Product, Sales.

from cobralib.io import SQLServerDB

db = SQLServerDB('username', 'password', port=3306, hostname='localhost')
dbs = db.get_database_tables("Inventory")
db.close_conn()
print(dbs)
>> index  Tables
   0      Names
   1      Product
   2      Sales
get_databases() DataFrame[source]

Retrieve a list of databases from the SQL Server.

Returns:

DataFrame containing the database names.

Raises:

ConnectionError – If program fails to retrive database

If you assume the server has three databases available to the username, and those databases were Inventory, Address, project_data, you could use this class with the following commands.

from cobralib.io import SQLServerDB

db = SQServerDB('username', 'password', port=3306, hostname='localhost')
dbs = db.get_databases()
db.close_conn()
print(dbs)
>> index  Databases
   0      Address
   1      Inventory
   2      project_data
get_table_columns(table_name: str, database: str = None) DataFrame[source]

Retrieve column details of the specified table from the given or current SQL Server database.

Parameters:
  • table_name – Name of the table to fetch column details from.

  • database – Optional name of the database the table is in.

Returns:

DataFrame containing the column details.

Raises:

ValueError

If the database is not selected at the class level :raises ConnectionError: If the columns cannot be retrieved.

This example shows a scenario where the database analyst has navigated into a database

from cobralib.io import SQLServerDB

db = SQLServerDB('username', 'password', port=3306, hostname='localhost')
db.change_database('Address')
query = '''CREATE TABLE IF NOT EXIST Names (
    name_id INTEGER AUTO_INCREMENT,
    FirstName VARCHAR(20) NOT NULL,
    MiddleName VARCHAR(20),
    LastName VARCHAR(20) NOT NULL,
    PRIMARY KEY (name_id)
);
'''
db.execute_query(query)
cols = db.get_table_columns('Names')
db.close_conn()
print(cols)
>> index Field      Type        Null   Key     Default  Extra
   0     name_id    Integer     True   Primary  False   auto_increment
   1     FirstName  Varchar(20) False  NA       False   None
   2     MiddleName Varchar(20) True   NA       False   None
   3     LastName   Varchar(20) False  NA       False   None

However, this code can also be executed when not in the database

from cobralib.io import SQLServerDB

db = SQLServerDB('username', 'password', port=3306, hostname='localhost')
cols = db.get_table_columns('Names', 'Address')
db.close_conn()
print(cols)
>> index Field      Type        Null   Key     Default  Extra
    0     name_id    Integer     True   Primary  False   auto_increment
    1     FirstName  Varchar(20) False  NA       False   None
    2     MiddleName Varchar(20) True   NA       False   None
    3     LastName   Varchar(20) False  NA       False   None
pdf_to_table(pdf_file: str, table_name: str, pdf_headers: dict[str, type], table_columns: list = None, table_idx: int = 0, page_num: int = 0, skip: int = 0) None[source]

Read a table from a PDF file and insert it into the specified MySQL table.

Parameters:
  • pdf_file – The path to the PDF file.

  • table_name – The name of the MySQL table.

  • pdf_headers – A dictionary of column names in the PDF and their data types.

  • table_columns – The names of the columns in the MySQL table (default is None, assumes PDF column names and MySQL column names are the same).

  • table_idx – Index of the table in the PDF (default: 0).

  • page_num – Page number from which to extract the table (default: 0).

  • skip – The number of rows to skip in the PDF table.

Raises:
  • ValueError – If the PDF file, table name, or sheet name is not provided, or if the number of PDF headers and table columns mismatch.

  • Error – If the data insertion fails or the data types are incompatible.

property conn: Any

Protection for the _conn attribute

property cur: Any

Protection for the _cur attribute

property database: Any

Protection for the _database attribute

property db_engine: str

Protection for the _db_engine attribute