Database Management System
The io module was written to contain functions and classes that can simplify and standardize the way a user interacts with Relational Database Management Systems(RDBMS). This module contains classes that a user can use and expand upong to interact with Microsoft SQL-Server, MySQL, PostGreSQL, and SQLite. In addition, that class contains a comman interface for all four RDBM systems. NOTE: Future versions of this library may also contain interfaces for OracleDB.
Database Commonality
This library impelments several classes that enable a user to use and interact with different database. Each class is written with the Protocol Class class definition so they will have a common interface. Each class is further described in the SQL Classes section; however, it should be noticed, that with exception of the class instantiation, each method maintains the same interface.
Protocol Class
The Protocol class RelationalDB provides an implicit interface for the development of all database classes.
This class provides a contract to all future developers that guides the implementation of each class.
- class cobralib.db.RelationalDB(*args, **kwargs)[source]
A protocol class to handle structural sub-typing of classes used to read and interact with relational databases
- Variables:
database – The name of the database currently being used
conn – The connection attribute of the database management system
cur – The cursor attribute of the database management system.
db_engine – A string representing the type of database engine
- Raises:
ConnectionError – If a connection can not be established
More to be added later
- change_database(database: str) None[source]
Method to change the connection from one database to another.
- Parameters:
database – The new database or database file to be used. If a database file, this must include the path length.
- Raises:
ConnectionError – if query fails.
- close_connection() None[source]
Close the connection to the database managment system
- Raises:
ConnectionError – If the connection does not exist.
- csv_to_table(csv_file: str, table_name: str, csv_headers: dict[str, type], table_headers: list = None, delimiter: str = ',', skip: int = 0) None[source]
Read data from a CSV or TXT file and insert it into the specified table.
- Parameters:
csv_file – The path to the CSV file or TXT file.
table_name – The name of the table.
csv_headers – The names of the columns in the TXT file and datatypes as a dictionary.
table_headers – The names of the columns in the table (default is None, assumes CSV column names and table column names are the same).
delimiter – The seperating delimeter in the text file. Defaulted to ‘,’ for a CSV file, but can work with other delimeters
skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0
- Raises:
ValueError – If the CSV file or table name is not provided, or if the number of CSV columns and table columns mismatch.
Error – If the data insertion fails or the data types are incompatible.
- excel_to_table(excel_file: str, table_name: str, excel_headers: dict[str, type], table_headers: list = None, sheet_name: str = 'Sheet1', skip: int = 0) None[source]
Read data from an Excel file and insert it into the specified table.
- Parameters:
excel_file – The path to the Excel file.
table_name – The name of the table.
excel_headers – The names of the columns in the Excel file and their data types as a dictionary.
table_headers – The names of the columns in the table (default is None, assumes Excel column names and table column names are the same).
sheet_name – The name of the sheet in the Excel file (default is ‘Sheet1’).
skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0.
- Raises:
ValueError – If the Excel file, table name, or sheet name is not provided, or if the number of Excel columns and table columns mismatch.
Error – If the data insertion fails or the data types are incompatible.
- execute_query(query: str, params: tuple = ()) DataFrame[source]
Execute a query with placeholders and return the result as a Pandas DataFrame. The user of this class should ensure that when applicable they parameteratize the inputs to this method to minimize the potential for an injection attack
- Parameters:
query – The query with placeholders.
params – The values to be substituted into the placeholders (default is an empty tuple).
- Returns:
A Pandas DataFrame with the query result.
- Raises:
ValueError – If the database name is not provided.
ConnectionError – If the query execution fails.
- get_database_tables(database: str = None) DataFrame[source]
Method to retrieve a dataframe containing a list of all tables wtihin the SQL database or database file.
- Parameters:
database – The name of the database or database file that the tables will be retrieved from.
- Return df:
A dataframe containing all information relating to the tables within the database or database file.
- Raises:
ConnectionError – If program is not able to get tables
- get_databases() DataFrame[source]
Retrieve the names of all databases available to the user.
- Returns:
A pandas dataframe of database names with a header of Databases
- get_table_columns(table_name: str, database: str = None) DataFrame[source]
Retrieve the names and data types of the columns within the specified table.
- Parameters:
table_name – The name of the table.
database – The database name, defaulted to currently selected database or None
- Returns:
A Pandas Dataframe with the table information
- Raises:
ValueError – If the database is not selected at the class level :raises ConnectionError: If the columns cannot be retrieved.
- pdf_to_table(pdf_file: str, table_name: str, pdf_headers: dict[str, type], table_columns: list = None, table_idx: int = 0, page_num: int = 0, skip: int = 0) None[source]
Read a table from a PDF file and insert it into the specified SQLite table.
- Parameters:
pdf_file – The path to the PDF file.
table_name – The name of the SQLite table.
pdf_headers – A dictionary of column names in the PDF and their data types.
table_columns – The names of the columns in the SQLite table (default is None, assumes PDF column names and SQLite column names are the same).
table_idx – Index of the table in the PDF (default: 0).
page_num – Page number from which to extract the table (default: 0).
skip – The number of rows to skip in the PDF table.
- Raises:
ValueError – If the PDF file, table name, or sheet name is not provided, or if the number of PDF headers and table columns mismatch.
Error – If the data insertion fails or the data types are incompatible.
- property conn: Any
Protection for the _conn attribute
- property cur: Any
Protection for the _cur attribute
- property database: Any
Protection for the _database attribute
SQL Classes
This section outlines the classes that enable a user to interface with MySQL, PostGreSQL, SQLite, and Microsoft SQLServer. With exception of the class instantiation, the interfaces for the methods of each class are identical, which enables a user to easily switch between the use of the different databases.
- class cobralib.db.MySQLDB(username: str, password: str, database: str, port: int = 3306, hostname: str = 'localhost')[source]
A class for connecting to MySQL databases using mysql-connector-python. The user can access the conn and cur variables, where conn is the connection variable and cur is the connection.cursor() method to expand the capability of this class beyond its methods.
- Parameters:
username – The username for the database connection.
password – The password for the database connection.
port – The port number for the database connection. Defaulted to 3306
hostname – The hostname for the database connection (default is ‘localhost’).
database – The database you wish to connect to, defaulted to None
- Raises:
ConnectionError – If a connection can not be established
- Variables:
conn – The connection attribute of the mysql-connector-python module.
cur – The cursor method for the mysql-connector-python module.
db_engine – A string describing the database engine
database – The name of the database currently being used.
- change_database(database: str) None[source]
Change to the specified database within the server.
- Parameters:
database – The name of the database to change to.
- Raises:
ConnectionError – if query fails.
- close_connection() None[source]
Close the connection to the server.
- Raises:
ConnectionError – If the connection does not exist.
- csv_to_table(csv_file: str, table_name: str, csv_headers: dict[str, type], table_headers: list = None, delimiter: str = ',', skip: int = 0) None[source]
Read data from a CSV or TXT file and insert it into the specified table.
- Parameters:
csv_file – The path to the CSV file or TXT file.
table_name – The name of the table.
csv_headers – The names of the columns in the TXT file and datatypes as a dictionary.
table_headers – The names of the columns in the table (default is None, assumes CSV column names and table column names are the same).
delimiter – The seperating delimeter in the text file. Defaulted to ‘,’ for a CSV file, but can work with other delimeters
skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0
- Raises:
ValueError – If the CSV file or table name is not provided, or if the number of CSV columns and table columns mismatch.
Error – If the data insertion fails or the data types are incompatible.
Assune we have a csv table with the following Columns,
FirstName,MiddleName,LastName. Within theNamesdatabase we have a table with no entries that has columns forFirstandLast.from cobralib.io import MySQLDB db = MySQLDB('username', 'password', port=3306, hostname='localhost') db.change_db('Names') db.csv_to_table('csv_file.csv', 'FirstLastName', ['FirstName': str, 'LastName': str], ['First', 'Last']) query = "SELDCT * FROM Names;" result = db.query_db(query) print(result) >> index name_id First Last 0 1 Jon Webb 1 2 Fred Smith 2 3 Jillian Webb
If instead of a csv file, you have a text file that uses spaces as a delimeter, and the first two rows are consumed by file metadata before reaching the header, the following code will work
from cobralib.io import MySQLDB db = MySQLDB('username', 'password', port=3306, hostname='localhost') db.change_db('Names') db.csv_to_table('txt_file.txt', 'FirstLastName', ['FirstName': str, 'LastName': str], ['First', 'Last'], delemeter=r"\s+", skip=2) query = "SELDCT * FROM Names;" result = db.query_db(query) print(result) >> index name_id First Last 0 1 Jon Webb 1 2 Fred Smith 2 3 Jillian Webb
- excel_to_table(excel_file: str, table_name: str, excel_headers: dict[str, type], table_headers: list = None, sheet_name: str = 'Sheet1', skip: int = 0) None[source]
Read data from an Excel file and insert it into the specified table.
- Parameters:
excel_file – The path to the Excel file.
table_name – The name of the table.
excel_headers – The names of the columns in the Excel file and their data types as a dictionary
table_headers – The names of the columns in the table (default is None, assumes Excel column names and table column names are the same).
sheet_name – The name of the sheet in the Excel file (default is ‘Sheet1’).
skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0
- Raises:
ValueError – If the Excel file, table name, or sheet name is not provided, or if the number of Excel columns and table columns mismatch.
Error – If the data insertion fails or the data types are incompatible.
Assune we have an excel table with the following Columns,
FirstName,MiddleName,LastName. Within theNamesdatabase we have a table with no entries that has columns forFirstandLast.from cobralib.io import MySQLDB db = MySQLDB('username', 'password', port=3306, hostname='localhost') db.change_db('Names') db.csv_to_table('excel_file.xlsx', 'FirstLastName', {'FirstName': str, 'LastName': str}, ['First', 'Last']) query = "SELDCT * FROM Names;" result = db.query_db(query) print(result) >> index name_id First Last 0 1 Jon Webb 1 2 Fred Smith 2 3 Jillian Webb
- execute_query(query: str, params: tuple = ()) DataFrame[source]
Execute a query with placeholders and return the result as a Pandas DataFrame. The user of this class should ensure that when applicable they parameteratize the inputs to this method to minimize the potential for an injection attack
- Parameters:
query – The query with placeholders.
params – The values to be substituted into the placeholders (default is an empty tuple).
- Returns:
A Pandas DataFrame with the query result.
- Raises:
ValueError – If the database name is not provided.
ConnectionError – If the query execution fails.
Example usage when parameters are provided:
from cobralib.io import MySQLDB db = MySQLDB('username', 'password', port=3306, hostname='localhost') query = "SELECT * FROM names WHERE name_id = %s" params = (2,) result = db.execute_query(query, params) print(result) >> index name_id FirstName LastName 0 2 Fred Smith
Example usage when no parameters are provided:
from cobralib.io import MySQLDB db = MySQLDB('username', 'password', port=3306, hostname='localhost') query = "SELECT * FROM names" result = db.execute_query(query) print(result) >> index name_id FirstName LastName 0 1 Jon Webb 1 2 Fred Smith 2 3 Jillian Webb
- get_database_tables(database: str = None) DataFrame[source]
Retrieve the names of all tables within the current database.
- Parameters:
database – Database name, defaulted to currently selected database or None
- Returns:
A pandas dataframe of table names with a header of Tables
- Raises:
ValueError – If no database is currently selected.
ConnectionError – If program is not able to get tables
Assuming the user has a database titled
Inventorywhich had the tablesNames,Product,Sales.from cobralib.io import MySQLDB db = MySQLDB('username', 'password', port=3306, hostname='localhost') dbs = db.get_database_tables("Inventory") db.close_conn() print(dbs) >> index Tables 0 Names 1 Product 2 Sales
- get_databases() DataFrame[source]
Retrieve the names of all databases available to the user.
- Returns:
A pandas dataframe of database names with a header of Databases
- Raises:
ConnectionError – If program fails to retrive database
If you assume the server has three databases available to the username, and those databases were
Inventory,Address,project_data, you could use this class with the following commands.from cobralib.io import MySQLDB db = MySQLDB('username', 'password', port=3306, hostname='localhost') dbs = db.get_databases() db.close_conn() print(dbs) >> index Databases 0 Address 1 Inventory 2 project_data
- get_table_columns(table_name: str, database: str = None) DataFrame[source]
Retrieve the names and data types of the columns within the specified table.
- param table_name:
The name of the table.
- param database:
The database name, defaulted to currently selected database or None
- return:
A pandas dataframe with headers ot Field, Type, Null, Key, Default, and Extra
- raises ValueError:
If the database is not selected at the class level
- raises ConnectionError:
If the columns cannot be retrieved.
This example shows a scenario where the database analyst has navigated into a database
from cobralib.io import MySQLDB db = MySQLDB('username', 'password', port=3306, hostname='localhost') db.change_database('Address') query = '''CREATE TABLE IF NOT EXIST Names ( name_id INTEGER AUTO_INCREMENT, FirstName VARCHAR(20) NOT NULL, MiddleName VARCHAR(20), LastName VARCHAR(20) NOT NULL, PRIMARY KEY (name_id) ); ''' db.execute_query(query) cols = db.get_table_columns('Names') db.close_conn() print(cols) >> index Field Type Null Key Default Extra 0 name_id Integer True Primary False auto_increment 1 FirstName Varchar(20) False NA False None 2 MiddleName Varchar(20) True NA False None 3 LastName Varchar(20) False NA False None
However, this code can also be executed when not in the database
from cobralib.io import MySQLDB db = MySQLDB('username', 'password', port=3306, hostname='localhost') cols = db.get_table_columns('Names', 'Address') db.close_conn() print(cols) >> index Field Type Null Key Default Extra 0 name_id Integer True Primary False auto_increment 1 FirstName Varchar(20) False NA False None 2 MiddleName Varchar(20) True NA False None 3 LastName Varchar(20) False NA False None
- pdf_to_table(pdf_file: str, table_name: str, pdf_headers: dict[str, type], table_columns: list = None, table_idx: int = 0, page_num: int = 0, skip: int = 0) None[source]
Read a table from a PDF file and insert it into the specified MySQL table.
- Parameters:
pdf_file – The path to the PDF file.
table_name – The name of the MySQL table.
pdf_headers – A dictionary of column names in the PDF and their data types.
table_columns – The names of the columns in the MySQL table (default is None, assumes PDF column names and MySQL column names are the same).
table_idx – Index of the table in the PDF (default: 0).
page_num – Page number from which to extract the table (default: 0).
skip – The number of rows to skip in the PDF table.
- Raises:
ValueError – If the PDF file, table name, or sheet name is not provided, or if the number of PDF headers and table columns mismatch.
Error – If the data insertion fails or the data types are incompatible.
- property conn: Any
Protection for the _conn attribute
- property cur: Any
Protection for the _cur attribute
- property database: Any
Protection for the _database attribute
- property db_engine: str
Protection for the _db_engine attribute
- class cobralib.db.PostGreSQLDB(username: str, password: str, database: str, port: int = 5432, hostname: str = 'localhost')[source]
Initialize the database connection to a PostgreSQL server.
- Parameters:
username – The PostgreSQL username.
password – The PostgreSQL password.
database – The name of the database to connect to.
port – The port number for the PostgreSQL server (default is 5432).
hostname – The server’s hostname (default is ‘localhost’).
- Raises:
ConnectionError – If a connection can not be established.
- Variables:
conn – The connection attribute of the sqlite3 module.
cur – The cursor method for the sqlite3 module.
database – The name of the database currently being used.
db_engine – A string describing the database engine
- change_database(database: str) None[source]
Change to a different PostgreSQL database.
- Parameters:
database – The name of the database to switch to.
- Raises:
ConnectionError – If there’s an issue establishing a connection to the new database.
- close_connection()[source]
Close the connection to the PostgreSQL server and database.
- Raises:
ConnectionError – If there’s an issue closing the connection or cursor.
- csv_to_table(csv_file: str, table_name: str, csv_headers: dict[str, type], table_headers: list = None, delimiter: str = ',', skip: int = 0) None[source]
Read data from a CSV or TXT file and insert it into the specified table.
- Parameters:
csv_file – The path to the CSV file or TXT file.
table_name – The name of the table.
csv_headers – The names of the columns in the TXT file and datatypes as a dictionary.
table_headers – The names of the columns in the table (default is None, assumes CSV column names and table column names are the same).
delimiter – The seperating delimeter in the text file. Defaulted to ‘,’ for a CSV file, but can work with other delimeters
skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0
- Raises:
ValueError – If the CSV file or table name is not provided, or if the number of CSV columns and table columns mismatch.
Error – If the data insertion fails or the data types are incompatible.
Assune we have a csv table with the following Columns,
FirstName,MiddleName,LastName. Within theNamesdatabase we have a table with no entries that has columns forFirstandLast.from cobralib.io import PostGreSQLDB db = PostGreSQL('username', 'password', 'Names', port=3306, hostname='localhost') db.csv_to_table('csv_file.csv', 'FirstLastName', ['FirstName': str, 'LastName': str], ['First', 'Last']) query = "SELDCT * FROM Names;" result = db.query_db(query) print(result) >> index name_id First Last 0 1 Jon Webb 1 2 Fred Smith 2 3 Jillian Webb
If instead of a csv file, you have a text file that uses spaces as a delimeter, and the first two rows are consumed by file metadata before reaching the header, the following code will work
from cobralib.io import PostGreSQLDB db = PostGreSQLDB('username', 'password', 'Names', port=3306, hostname='localhost') db.csv_to_table('txt_file.txt', 'FirstLastName', ['FirstName': str, 'LastName': str], ['First', 'Last'], delemeter=r"\s+", skip=2) query = "SELDCT * FROM Names;" result = db.query_db(query) print(result) >> index name_id First Last 0 1 Jon Webb 1 2 Fred Smith 2 3 Jillian Webb
… [rest of the docstring remains unchanged] …
- excel_to_table(excel_file: str, table_name: str, excel_headers: dict[str, type], table_headers: list = None, sheet_name: str = 'Sheet1', skip: int = 0) None[source]
Read data from an Excel file and insert it into the specified table.
- Parameters:
excel_file – The path to the Excel file.
table_name – The name of the table.
excel_headers – The names of the columns in the Excel file and their data types as a dictionary
table_headers – The names of the columns in the table (default is None, assumes Excel column names and table column names are the same).
sheet_name – The name of the sheet in the Excel file (default is ‘Sheet1’).
skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0
- Raises:
ValueError – If the Excel file, table name, or sheet name is not provided, or if the number of Excel columns and table columns mismatch.
Error – If the data insertion fails or the data types are incompatible.
Assune we have an excel table with the following Columns,
FirstName,MiddleName,LastName. Within theNamesdatabase we have a table with no entries that has columns forFirstandLast.from cobralib.io import PostGreSQLDB db = PostGreSQL('username', 'password', 'Names', port=3306, hostname='localhost') db.csv_to_table('excel_file.xlsx', 'FirstLastName', {'FirstName': str, 'LastName': str}, ['First', 'Last']) query = "SELDCT * FROM Names;" result = db.query_db(query) print(result) >> index name_id First Last 0 1 Jon Webb 1 2 Fred Smith 2 3 Jillian Webb
- execute_query(query: str, params: tuple = None) DataFrame[source]
Executes the provided SQL query and returns the results as a pandas DataFrame.
- Parameters:
query – The SQL query to execute.
params – A tuple of parameters to bind to the query.
- Returns:
A pandas DataFrame containing the query results.
- get_database_tables(database: str = None) DataFrame[source]
Fetch a list of tables from the specified or current PostgreSQL database.
- Parameters:
database – The name of the database to fetch tables from. If not provided, uses the current database.
- Returns:
A pandas DataFrame containing the list of tables with the column header “Tables”.
- get_databases() DataFrame[source]
Fetch a list of databases from the PostgreSQL server.
- Returns:
A pandas DataFrame containing the list of databases with the column header “Databases”.
- get_table_columns(table_name: str, database: str = None) DataFrame[source]
Fetch column details for the given table in the specified or current PostgreSQL database.
- Parameters:
table_name – The name of the table to fetch column details for.
database – The name of the database the table resides in. If not provided, uses the current database.
- Returns:
A pandas DataFrame containing column details.
- pdf_to_table(pdf_file: str, table_name: str, pdf_headers: dict[str, type], table_columns: list = None, table_idx: int = 0, page_num: int = 0, skip: int = 0) None[source]
Read a table from a PDF file and insert it into the specified MySQL table.
- Parameters:
pdf_file – The path to the PDF file.
table_name – The name of the MySQL table.
pdf_headers – A dictionary of column names in the PDF and their data types.
table_columns – The names of the columns in the MySQL table (default is None, assumes PDF column names and MySQL column names are the same).
table_idx – Index of the table in the PDF (default: 0).
page_num – Page number from which to extract the table (default: 0).
skip – The number of rows to skip in the PDF table.
- Raises:
ValueError – If the PDF file, table name, or sheet name is not provided, or if the number of PDF headers and table columns mismatch.
Error – If the data insertion fails or the data types are incompatible.
- property conn: Any
Protection for the _conn attribute
- property cur: Any
Protection for the _cur attribute
- property database: Any
Protection for the _database attribute
- property db_engine: str
Protection for the _db_engine attribute
- class cobralib.db.SQLiteDB(database: str)[source]
A class for connection to a SQLite database file using the sqlite3 python package. The usser can access the conn and cur variables, where conn is the connection variable and cur is the connection.cursor() method to expand the capability of this class beyond its methods. NOTE: If the user passes an incorrect database name to the constructor, the class will assume that the user wants to create a database of that name, and will create a new database file.
- Parameters:
database – The name of the database file to include its path length.
- Raises:
ConnectionError – If a connection can not be established.
- Variables:
conn – The connection attribute of the sqlite3 module.
cur – The cursor method for the sqlite3 module.
database – The name of the database currently being used.
db_engine – A string describing the database engine
- change_database(database: str) None[source]
Method to change the connection from one database file to another
- Paramn database:
The new database file to be used to include the path length
- csv_to_table(csv_file: str, table_name: str, csv_headers: dict[str, type], table_headers: list = None, delimiter: str = ',', skip: int = 0) None[source]
Read data from a CSV or TXT file and insert it into the specified table.
- Parameters:
csv_file – The path to the CSV file or TXT file.
table_name – The name of the table.
csv_headers – The names of the columns in the TXT file and datatypes as a dictionary.
table_headers – The names of the columns in the table (default is None, assumes CSV column names and table column names are the same).
delimiter – The seperating delimeter in the text file. Defaulted to ‘,’ for a CSV file, but can work with other delimeters
skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0
- Raises:
ValueError – If the CSV file or table name is not provided, or if the number of CSV columns and table columns mismatch.
Error – If the data insertion fails or the data types are incompatible.
- excel_to_table(excel_file: str, table_name: str, excel_headers: dict[str, type], table_headers: list = None, sheet_name: str = 'Sheet1', skip: int = 0) None[source]
Read data from an Excel file and insert it into the specified table.
- Parameters:
excel_file – The path to the Excel file.
table_name – The name of the table.
excel_headers – The names of the columns in the Excel file and their data types as a dictionary.
table_headers – The names of the columns in the table (default is None, assumes Excel column names and table column names are the same).
sheet_name – The name of the sheet in the Excel file (default is ‘Sheet1’).
skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0.
- Raises:
ValueError – If the Excel file, table name, or sheet name is not provided, or if the number of Excel columns and table columns mismatch.
Error – If the data insertion fails or the data types are incompatible.
- execute_query(query: str, params: tuple = ()) DataFrame[source]
Execute a query with placeholders and return the result as a Pandas DataFrame. The user of this class should ensure that when applicable they parameteratize the inputs to this method to minimize the potential for an injection attack
- Parameters:
query – The query with placeholders.
params – The values to be substituted into the placeholders (default is an empty tuple).
- Returns:
A Pandas DataFrame with the query result.
- Raises:
ValueError – If the database name is not provided.
Error – If the query execution fails.
Example usage when parameters are provided:
from mylib.io import SQLiteDB db = SQLiteDB('example.db') query = "SELECT * FROM names WHERE name_id = ?" params = (2,) result = db.execute_query(query, params) print(result) >> index name_id FirstName LastName 0 2 Fred Smith
Example usage when no parameters are provided:
from mylib.io import SQLiteDB db = SQLiteDB('example.db') query = "SELECT * FROM names" result = db.execute_query(query) print(result) >> index name_id FirstName LastName 0 1 Jon Webb 1 2 Fred Smith 2 3 Jillian Webb
- get_database_tables(database: str = None) DataFrame[source]
Method the retrieve a dataframe containing a list of all tables within a SQLite database file. If the user does not pass a database name, the method will return the list of tables in the current database. However, the user can also pass this method the name of another database file, and this will return a list of tables in that database file/
- Parameters:
database – The name of the database or database file that the tables will be retrieved from.
- Return df:
A dataframe containing all information relating to the tables within the database or database file.
Assuming the user has a database titled
Inventorywhich had the tablesNames,Product,Sales.from cobralib.io import SQLiteDB db = SQLiteDB('test.db') dbs = db.get_database_tables("Inventory") db.close_connection() print(dbs) >> index Tables 0 Names 1 Product 2 Sales
- get_databases() DataFrame[source]
Method included for compatibility with RelationalDB Protocol class. This method returns an empty dataframe since SQLite does not support true databases.
:return : An empty pandas dataframe
- get_table_columns(table_name: str, database: str = None) DataFrame[source]
Retrieve the names and data types of the columns within the specified table.
- param table_name:
The name of the table.
- param database:
The database name, defaulted to currently selected database or None
- return:
A pandas dataframe with headers ot Field, Type, Null, Key, Default, and Extra
- raises ValueError:
If the database is not selected at the class level
- raises ConnectionError:
If the columns cannot be retrieved.
This example shows a scenario where the database analyst has navigated into a database
from cobralib.io import SQLiteDB db = SQLiteDB('test_db.db') query = '''CREATE TABLE IF NOT EXIST Names ( name_id INTEGER AUTO_INCREMENT, FirstName VARCHAR(20) NOT NULL, MiddleName VARCHAR(20), LastName VARCHAR(20) NOT NULL, PRIMARY KEY (name_id) ); ''' db.execute_query(query) cols = db.get_table_columns('Names') db.close_conn() print(cols) >> index Field Type Null Key Default Extra 0 name_id Integer True Primary False autoincrement 1 FirstName Varchar(20) False NA False None 2 MiddleName Varchar(20) True NA False None 3 LastName Varchar(20) False NA False None
However, this code can also be executed when not in the database
from cobralib.io import MySQLDB db = MySQLDB('username', 'password', port=3306, hostname='localhost') cols = db.get_table_columns('Names', 'Address') db.close_conn() print(cols) >> index Field Type Null Key Default Extra 0 name_id Integer True Primary False autoincrement 1 FirstName Varchar(20) False NA False None 2 MiddleName Varchar(20) True NA False None 3 LastName Varchar(20) False NA False None
- pdf_to_table(pdf_file: str, table_name: str, pdf_headers: dict[str, type], table_columns: list = None, table_idx: int = 0, page_num: int = 0, skip: int = 0) None[source]
Read a table from a PDF file and insert it into the specified SQLite table.
- Parameters:
pdf_file – The path to the PDF file.
table_name – The name of the SQLite table.
pdf_headers – A dictionary of column names in the PDF and their data types.
table_columns – The names of the columns in the SQLite table (default is None, assumes PDF column names and SQLite column names are the same).
table_idx – Index of the table in the PDF (default: 0).
page_num – Page number from which to extract the table (default: 0).
skip – The number of rows to skip in the PDF table.
- Raises:
ValueError – If the PDF file, table name, or sheet name is not provided, or if the number of PDF headers and table columns mismatch.
Error – If the data insertion fails or the data types are incompatible.
- property conn: Any
Protection for the _conn attribute
- property cur: Any
Protection for the _cur attribute
- property database: Any
Protection for the _database attribute
- property db_engine: str
Protection for the _db_engine attribute
- class cobralib.db.SQLServerDB(username: str, password: str, database: str, port: int = 1433, hostname: str = 'localhost', cert: str = 'yes', driver: str = '{ODBC Driver 18 for SQL Server}')[source]
Initialize the SQLServerDB object with connection parameters.
- Parameters:
username – The username to connect to the database.
password – The password to connect to the database.
port – The port number to use for the connection. Default is 1433.
hostname – The database server name or IP address. Default is ‘localhost’.
database – The initial database to connect to (can be changed later).
cert – yes to trust certificat and no to not trust certificate without authentication. Defaulted to yes
driver – The ODBC driver to use for connection. Defaulted to “{ODBC Driver 18 for SQL Server}”
- Raises:
ConnectionError – If a connection can not be established.
- Variables:
conn – The connection attribute of the sqlite3 module.
cur – The cursor method for the sqlite3 module.
database – The name of the database currently being used.
db_engine – A string describing the database engine
- change_database(database: str)[source]
Change the active database for the current connection.
- Parameters:
database – The name of the database to switch to.
- Raises:
ConnectionError – if query fails.
- csv_to_table(csv_file: str, table_name: str, csv_headers: dict[str, type], table_headers: list = None, delimiter: str = ',', skip: int = 0) None[source]
Read data from a CSV or TXT file and insert it into the specified table.
- Parameters:
csv_file – The path to the CSV file or TXT file.
table_name – The name of the table.
csv_headers – The names of the columns in the TXT file and datatypes as a dictionary.
table_headers – The names of the columns in the table (default is None, assumes CSV column names and table column names are the same).
delimiter – The seperating delimeter in the text file. Defaulted to ‘,’ for a CSV file, but can work with other delimeters
skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0
- Raises:
ValueError – If the CSV file or table name is not provided, or if the number of CSV columns and table columns mismatch.
Error – If the data insertion fails or the data types are incompatible.
Assune we have a csv table with the following Columns,
FirstName,MiddleName,LastName. Within theNamesdatabase we have a table with no entries that has columns forFirstandLast.from cobralib.io import SQLServerDB db = SQLServerDB('username', 'password', port=3306, hostname='localhost') db.change_db('Names') db.csv_to_table('csv_file.csv', 'FirstLastName', ['FirstName': str, 'LastName': str], ['First', 'Last']) query = "SELDCT * FROM Names;" result = db.query_db(query) print(result) >> index name_id First Last 0 1 Jon Webb 1 2 Fred Smith 2 3 Jillian Webb
If instead of a csv file, you have a text file that uses spaces as a delimeter, and the first two rows are consumed by file metadata before reaching the header, the following code will work
from cobralib.io import SQLServerDB db = SQLServerDB('username', 'password', port=3306, hostname='localhost') db.change_db('Names') db.csv_to_table('txt_file.txt', 'FirstLastName', ['FirstName': str, 'LastName': str], ['First', 'Last'], delemeter=r"\s+", skip=2) query = "SELDCT * FROM Names;" result = db.query_db(query) print(result) >> index name_id First Last 0 1 Jon Webb 1 2 Fred Smith 2 3 Jillian Webb
- excel_to_table(excel_file: str, table_name: str, excel_headers: dict[str, type], table_headers: list = None, sheet_name: str = 'Sheet1', skip: int = 0) None[source]
Read data from an Excel file and insert it into the specified table.
- Parameters:
excel_file – The path to the Excel file.
table_name – The name of the table.
excel_headers – The names of the columns in the Excel file and their data types as a dictionary
table_headers – The names of the columns in the table (default is None, assumes Excel column names and table column names are the same).
sheet_name – The name of the sheet in the Excel file (default is ‘Sheet1’).
skip – The number of rows to be skipped if metadata exists before the header definition. Defaulted to 0
- Raises:
ValueError – If the Excel file, table name, or sheet name is not provided, or if the number of Excel columns and table columns mismatch.
Error – If the data insertion fails or the data types are incompatible.
Assune we have an excel table with the following Columns,
FirstName,MiddleName,LastName. Within theNamesdatabase we have a table with no entries that has columns forFirstandLast.from cobralib.io import SQLServerDB db = SQLServerDB('username', 'password', port=3306, hostname='localhost') db.change_db('Names') db.csv_to_table('excel_file.xlsx', 'FirstLastName', {'FirstName': str, 'LastName': str}, ['First', 'Last']) query = "SELDCT * FROM Names;" result = db.query_db(query) print(result) >> index name_id First Last 0 1 Jon Webb 1 2 Fred Smith 2 3 Jillian Webb
- execute_query(query: str, params: tuple = ()) DataFrame[source]
Execute a given query on the SQL Server database.
- Parameters:
query – The SQL query string to execute.
params – Optional tuple containing parameters for the query.
- Returns:
DataFrame containing the query results if any, otherwise an empty DataFrame.
- Raises:
ValueError – If the database name is not provided.
ConnectionError – If the query execution fails.
Example usage when parameters are provided:
from cobralib.io import SQLServerDB db = SQLServerDB('username', 'password', port=3306, hostname='localhost') query = "SELECT * FROM names WHERE name_id = %s" params = (2,) result = db.execute_query(query, params) print(result) >> index name_id FirstName LastName 0 2 Fred Smith
Example usage when no parameters are provided:
from cobralib.io import SQLServerDB db = SQLServerDB('username', 'password', port=3306, hostname='localhost') query = "SELECT * FROM names" result = db.execute_query(query) print(result) >> index name_id FirstName LastName 0 1 Jon Webb 1 2 Fred Smith 2 3 Jillian Webb
- get_database_tables(database: str = None) DataFrame[source]
Retrieve a list of tables from the given or current SQL Server database.
- Parameters:
database – Optional name of the database to fetch tables from.
- Returns:
DataFrame containing the table names.
- Raises:
ValueError – If no database is currently selected.
ConnectionError – If program is not able to get tables
Assuming the user has a database titled
Inventorywhich had the tablesNames,Product,Sales.from cobralib.io import SQLServerDB db = SQLServerDB('username', 'password', port=3306, hostname='localhost') dbs = db.get_database_tables("Inventory") db.close_conn() print(dbs) >> index Tables 0 Names 1 Product 2 Sales
- get_databases() DataFrame[source]
Retrieve a list of databases from the SQL Server.
- Returns:
DataFrame containing the database names.
- Raises:
ConnectionError – If program fails to retrive database
If you assume the server has three databases available to the username, and those databases were
Inventory,Address,project_data, you could use this class with the following commands.from cobralib.io import SQLServerDB db = SQServerDB('username', 'password', port=3306, hostname='localhost') dbs = db.get_databases() db.close_conn() print(dbs) >> index Databases 0 Address 1 Inventory 2 project_data
- get_table_columns(table_name: str, database: str = None) DataFrame[source]
Retrieve column details of the specified table from the given or current SQL Server database.
- Parameters:
table_name – Name of the table to fetch column details from.
database – Optional name of the database the table is in.
- Returns:
DataFrame containing the column details.
- Raises:
ValueError –
If the database is not selected at the class level :raises ConnectionError: If the columns cannot be retrieved.
This example shows a scenario where the database analyst has navigated into a database
from cobralib.io import SQLServerDB db = SQLServerDB('username', 'password', port=3306, hostname='localhost') db.change_database('Address') query = '''CREATE TABLE IF NOT EXIST Names ( name_id INTEGER AUTO_INCREMENT, FirstName VARCHAR(20) NOT NULL, MiddleName VARCHAR(20), LastName VARCHAR(20) NOT NULL, PRIMARY KEY (name_id) ); ''' db.execute_query(query) cols = db.get_table_columns('Names') db.close_conn() print(cols) >> index Field Type Null Key Default Extra 0 name_id Integer True Primary False auto_increment 1 FirstName Varchar(20) False NA False None 2 MiddleName Varchar(20) True NA False None 3 LastName Varchar(20) False NA False None
However, this code can also be executed when not in the database
from cobralib.io import SQLServerDB db = SQLServerDB('username', 'password', port=3306, hostname='localhost') cols = db.get_table_columns('Names', 'Address') db.close_conn() print(cols) >> index Field Type Null Key Default Extra 0 name_id Integer True Primary False auto_increment 1 FirstName Varchar(20) False NA False None 2 MiddleName Varchar(20) True NA False None 3 LastName Varchar(20) False NA False None
- pdf_to_table(pdf_file: str, table_name: str, pdf_headers: dict[str, type], table_columns: list = None, table_idx: int = 0, page_num: int = 0, skip: int = 0) None[source]
Read a table from a PDF file and insert it into the specified MySQL table.
- Parameters:
pdf_file – The path to the PDF file.
table_name – The name of the MySQL table.
pdf_headers – A dictionary of column names in the PDF and their data types.
table_columns – The names of the columns in the MySQL table (default is None, assumes PDF column names and MySQL column names are the same).
table_idx – Index of the table in the PDF (default: 0).
page_num – Page number from which to extract the table (default: 0).
skip – The number of rows to skip in the PDF table.
- Raises:
ValueError – If the PDF file, table name, or sheet name is not provided, or if the number of PDF headers and table columns mismatch.
Error – If the data insertion fails or the data types are incompatible.
- property conn: Any
Protection for the _conn attribute
- property cur: Any
Protection for the _cur attribute
- property database: Any
Protection for the _database attribute
- property db_engine: str
Protection for the _db_engine attribute