Input / Output (io)
The io module contains several functions and classes that can be used to read various ASCII based files.
ReadKeyWords
The ReadKeyWords class is a multi-purpose class that enables a user to read in
several different configuration file formats, and in addition, it allows a user
to create a configuration file with a mixture of YAML, JSON, and XML data.
While this class can read in data of any ASCII based text format, when mixing
configuration formats it is recommended that the .jwc format be used. In
addition this class inherits the ReadYAML, ReadJSON,
and ReadXML classes. Further information on the ReadKeyWords
class can be found in the documentation for the inherited classes.
- class cobralib.io.ReadKeyWords(file_name: str, print_lines: int = 50)[source]
This class is a container for the ReadYAML, ReadJSON, and ReadXML classes. This class is developed specifically to read .jwc file types, which can mix JSON, XML, and YAML formats. Thsi file can be used to read a straight XML, JSON, or YAML file.
- Parameters:
file_name – The file name to be read including the path length
print_lines – The number of lines to be printed to the screen if the user prints an instance of the class. Defaulted to 50
- Raises:
FileNotFoundError – If the file does not exist
Example File
--- # First document in file Float Value: 4.387 Double Value: 1.11111187 integer: 6 String: Hello Float List: [1.1 2.2 3.3 4.4] Yaml Block List: - 1 - 2 - 3 - 4 Yaml Dict: First Key: 3.3 Second Key: 4.4 Third Key: 5.5 Fourth Key: 6.6 String List Hello World How are you JSON Data: {"book": "History of the World, "Year": 1976} XML Data: <root> <book>"History of the World"</book> <Year>1976</Year> </root> --- # Second document in file # Notice that a : character is not required Another Int 3Instantiation Example
# Instantiate the class from io.cobralib import ReadKey Words reader = ReadKeyWords("test_key_words.jwc", print_lines=2) # Print the instance, displaying 2 lines print(reader)>> Float Value: 4.387 # Comment line not to be read >> Double Value: 1.11111187 # Comment line not to be readThe user can also adjust the print_lines attribute after instantiation if they wish to change the number of printed lines
Read Scalar Values
This class can be used to read in key value pairs.
# Instantiate the class from io.cobralib import ReadKey Words reader = ReadKeyWords("test_key_words.jwc") int_value = reader.read_key_value("integer:", int) double_value = reader.read_key_value("Double Value:", np.float64) # Read from second document in file second_doc = reader.read_key_value("Another Int", int, 1) print("Integer Value: ", int_value) print(type) print("Double Value: ", double_value) print(type) print(second_doc)>> Integer Value: 6 >> int >> Double Value: 1.11111187 >> np.float64 >> 3Read List Values
This class can be used to read in lists stored inline or in block formats
# Instantiate the class from io.cobralib import ReadKey Words reader = ReadKeyWords("test_key_words.jwc") inline_list = reader.read_key_value("Float List:", float) block_list = reader.read_key_value("Yaml Block List:", int) print("Inline List: ", inline_list) print("Block List: ", block_list)>> Inline List: [ 1.1, 2.2, 3.3, 4.4 ] >> Block List: [ 1, 2, 3, 4 ]Read JSON and XML
This class can be used to read JSON and XML data associated with key words
# Instantiate the class from io.cobralib import ReadKey Words reader = ReadKeyWords("test_key_words.jwc") json_data = reader.read_json("JSON Data:") xml_data = reader.read_xml("XML Data:") print("JSON Data: ", json_data) print("XML Data: ", xml_data)>> JSON Data: {"book": "History of the World", "Year", 1976} >> XML Data: {"book": "History of the World", "Year", 1976}Read YAML Dictionaries
This class can be used to read dictionaries encoded in YAML formats. Unlike JSON and XML, dictionaries read in from a YAML format must be flat (i.e. no nested dictionaries) and of a uniform data type.
# Instantiate the class from io.cobralib import ReadKey Words reader = ReadKeyWords("test_key_words.jwc") yaml_dict = reader.read_yaml_dict("Yaml Dict:", str, float) print("YAML Dictionary: ", yaml_dict)>> YAML Dictionary: {"First Key": 3.3, "Second Key": 4.4, "Third Key": 5.5, "Fourth Key": 6.6}Note: In order to read in a ditionary of lists, use the
read_yaml_dict_of_listmethod.YAML, JSON, and XML Files
If you wish to read a .yaml, .josn, or .xml file that does not contain mixed data, you can use one of these three methods.
# Instantiate the class from io.cobralib import ReadKey Words yaml_reader = ReadKeyWords("test_key_words.yaml") yaml_data = yaml_reader.read_full_yaml() json_reader = ReadKeyWords("test_key_words.json") json_data = json_reader.read_full.json() xml_reader = ReadKeyWords("test_key_words.xml") xml_data = xml_reader.read_full_xml()
- members:
Read Columnar Data
The following functions can be used to read columnar data from .txt, .csv,
.xls, and .xlsx, and .pdf files.
- cobralib.io.read_csv_columns_by_headers(file_name: str, headers: dict[str, type], skip: int = 0) DataFrame[source]
- Parameters:
file_name – The file name to include path-link
headers – A dictionary of column names and their data types. types are limited to
numpy.int64,numpy.float64, andstrskip – The number of lines to be skipped before reading data
- Return df:
A pandas dataframe containing all relevant information
- Raises:
FileNotFoundError – If the file is found to not exist
This function assumes the file has a comma (i.e. ,) delimiter, if it does not, then it is not a true .csv file and should be transformed to a text function and read by the read_text_columns_by_headers function. Assume we have a .csv file titled
test.csvwith the following format.test.csv ID,
Inventory,
Weight_per,
Number
1,
Shoes,
1.5,
5
2,
t-shirt,
1.8,
3,
3,
coffee,
2.1,
15
4,
books,
3.2,
48
This file can be read via the following command
from cobralib.io import read_csv_columns_by_headers > file_name = 'test.csv' > headers = {'ID': int, 'Inventory': str, 'Weight_per': float. 'Number': int} > df = read_csv_columns_by_headers(file_name, headers) > print(df) ID Inventory Weight_per Number 0 1 shoes 1.5 5 1 2 t-shirt 1.8 3 2 3 coffee 2.1 15 3 4 books 3.2 40
This function can also use the skip attributed read data when the headers are not on the first line. For instance, assume the following csv file;
test1.csv This line is used to provide metadata for the csv file
This line is as well
ID,
Inventory,
Weight_per,
Number
1,
Shoes,
1.5,
5
2,
t-shirt,
1.8,
3,
3,
coffee,
2.1,
15
4,
books,
3.2,
48
This file can be read via the following command
from cobralib.io import read_csv_columns_by_headers > file_name = 'test1.csv' > headers = {'ID': int, 'Inventory': str, 'Weight_per': float, 'Number': int} > df = read_csv_columns_by_headers(file_name, headers, skip=2) > print(df) ID Inventory Weight_per Number 0 1 shoes 1.5 5 1 2 t-shirt 1.8 3 2 3 coffee 2.1 15 3 4 books 3.2 40
- cobralib.io.read_csv_columns_by_index(file_name: str, headers: dict[int, type], col_names: list[str], skip: int = 0) DataFrame[source]
- Parameters:
file_name – The file name to include path-link
headers – A dictionary of column index and their data types. types are limited to
numpy.int64,numpy.float64, andstrcol_names – A list containing the names to be given to each column
skip – The number of lines to be skipped before reading data
- Return df:
A pandas dataframe containing all relevant information
- Raises:
FileNotFoundError – If the file is found to not exist
This function assumes the file has a comma (i.e. ,) delimiter, if it does not, then it is not a true .csv file and should be transformed to a text function and read by the xx function. Assume we have a .csv file titled
test.csvwith the following format.test.csv 1,
Shoes,
1.5,
5
2,
t-shirt,
1.8,
3,
3,
coffee,
2.1,
15
4,
books,
3.2,
48
This file can be read via the following command
from cobralib.io import read_csv_columns_by_index > file_name = 'test.csv' > headers = {0: int, 1: str, 2: float, 3: int} > names = ['ID', 'Inventory', 'Weight_per', 'Number'] > df = read_csv_columns_by_index(file_name, headers, names) > print(df) ID Inventory Weight_per Number 0 1 shoes 1.5 5 1 2 t-shirt 1.8 3 2 3 coffee 2.1 15 3 4 books 3.2 40
This function can also use the skip attributed read data when the headers are not on the first line. For instance, assume the following csv file;
test1.csv This line is used to provide metadata for the csv file
This line is as well
1,
Shoes,
1.5,
5
2,
t-shirt,
1.8,
3,
3,
coffee,
2.1,
15
4,
books,
3.2,
48
This file can be read via the following command
from cobralib.io import read_csv_columns_by_index > file_name = 'test1.csv' > headers = {0: int, 1: str, 2: float, 3: int} > names = ['ID', 'Inventory', 'Weight_per', 'Number'] > df = read_csv_columns_by_index(file_name, headers, names, skip=2) > print(df) ID Inventory Weight_per Number 0 1 shoes 1.5 5 1 2 t-shirt 1.8 3 2 3 coffee 2.1 15 3 4 books 3.2 40
- cobralib.io.read_text_columns_by_headers(file_name: str, headers: dict[str, type], skip: int = 0, delimiter='\\s+') DataFrame[source]
- Parameters:
file_name – The file name to include path-link
headers – A dictionary of column names and their data types. types are limited to
numpy.int64,numpy.float64, andstrskip – The number of lines to be skipped before reading data
delimiter – The type of delimiter separating data in the text file. Defaulted to space delimited, where a space is one or more white spaces. This function can use any delimiter, to include a comma separation; however, a comma delimiter should be a .csv file extension.
- Return df:
A pandas dataframe containing all relevant information
- Raises:
FileNotFoundError – If the file is found to not exist
This function assumes the file has a space delimiter, if Assume we have a .csv file titled
test.txtwith the following format.test.txt ID
Inventory
Weight_per
Number
1
Shoes
1.5
5
2
t-shirt
1.8
3
3
coffee
2.1
15
4
books
3.2
48
This file can be read via the following command
from cobralib.io import read_text_columns_by_headers > file_name = 'test.txt' > headers = {'ID': int, 'Inventory': str, 'Weight_per': float, 'Number': int} > df = read_text_columns_by_headers(file_name, headers) > print(df) ID Inventory Weight_per Number 0 1 shoes 1.5 5 1 2 t-shirt 1.8 3 2 3 coffee 2.1 15 3 4 books 3.2 40
This function can also use the skip attributed read data when the headers are not on the first line. For instance, assume the following csv file;
test.txt This line is used to provide metadata for the csv file
This line is as well
ID
Inventory
Weight_per
Number
1
Shoes
1.5
5
2
t-shirt
1.8
3
3
coffee
2.1
15
4
books
3.2
48
This file can be read via the following command
from cobralib.io import read_text_columns_by_headers > file_name = 'test.txt' > headers = {'ID': int, 'Inventory': str, 'Weight_per': float, 'Number': int} > df = read_text_columns_by_headers(file_name, headers, skip=2) > print(df) ID Inventory Weight_per Number 0 1 shoes 1.5 5 1 2 t-shirt 1.8 3 2 3 coffee 2.1 15 3 4 books 3.2 40
- cobralib.io.read_text_columns_by_index(file_name: str, headers: dict[int, type], col_names: list[str], skip: int = 0, delimiter='\\s+') DataFrame[source]
- Parameters:
file_name – The file name to include path-link
headers – A dictionary of column index` and their data types. types are limited to
numpy.int64,numpy.float64, andstrcol_names – A list containing the names to be given to each column
skip – The number of lines to be skipped before reading data
delimiter – The type of delimiter separating data in the text file. Defaulted to space delimited, where a space is one or more white spaces. This function can use any delimiter, to include a comma separation; however, a comma delimiter should be a .csv file extension.
- Return df:
A pandas dataframe containing all relevant information
- Raises:
FileNotFoundError – If the file is found to not exist
Assume we have a .txt file titled
test.txtwith the following format.test.txt 1
Shoes
1.5
5
2
t-shirt
1.8
3
3
coffee
2.1
15
4
books
3.2
48
This file can be read via the following command
from cobralib.io import read_text_columns_by_index > file_name = 'test.txt' > headers = {0: int, 1: str, 2: float, 3: int} > names = [ headers = {'ID', 'Inventory', 'Weight_per', 'Number'] > df = read_text_columns_by_index(file_name, headers, names) > print(df) ID Inventory Weight_per Number 0 1 shoes 1.5 5 1 2 t-shirt 1.8 3 2 3 coffee 2.1 15 3 4 books 3.2 40
This function can also use the skip attributed read data when the headers are not on the first line. For instance, assume the following csv file;
test.txt This line is used to provide metadata for the csv file
This line is as well
ID
Inventory
Weight_per
Number
1
Shoes
1.5
5
2
t-shirt
1.8
3
3
coffee
2.1
15
4
books
3.2
48
This file can be read via the following command
from cobralib.io import read_text_columns_by_index > file_name = 'test.txt' > headers = {0: int, 1: str, 2: float, 3: int} > names = ['ID', 'Inventory', 'Weight_per', 'Number'] > df = read_text_columns_by_index(file_name, headers, names, skip=2) > print(df) ID Inventory Weight_per Number 0 1 shoes 1.5 5 1 2 t-shirt 1.8 3 2 3 coffee 2.1 15 3 4 books 3.2 40
- cobralib.io.read_pdf_columns_by_headers(file_name: str, headers: dict[str, type], table_idx: int = 0, page_num: int = 0, skip: int = 0) DataFrame[source]
Read a table from a PDF document and save user-specified columns into a pandas DataFrame. This function will read a pdf table that spans multiple pages. NOTE: The pdf document must be a vectorized pdf document and not a scan of another document for this function to work.
- Parameters:
file_name – The file name to include the path-link to the PDF file.
headers – A dictionary of column names and their data types. Data types are limited to
int,float, andstr.table_idx – Index of the table to extract from the page (default: 0).
page_num – Page number from which to extract the table (default: 0).
skip – The number of lines to be skipped before reading data
- Return df:
A pandas DataFrame containing the specified columns from the table.
- Raises:
FileNotFoundError – If the PDF file is found to not exist.
Example usage:
from cobralib.io import read_pdf_columns_by_headers > file_name = 'test.pdf' > headers = {'ID': int, 'Inventory': str, 'Weight_per': float, 'Number': int} > df = read_pdf_columns_by_headers(file_name, headers, table_idx=0, page_num=1) > print(df) ID Inventory Weight_per Number 0 1 shoes 1.5 5 1 2 t-shirt 1.8 3 2 3 coffee 2.1 15 3 4 books 3.2 40
- cobralib.io.read_pdf_columns_by_index(file_name: str, headers: dict[int, type], col_names: list[str], table_idx: int = 0, skip_rows: int = 0, page_num: int = 0) DataFrame[source]
Read a table from a PDF document and save user-specified columns into a pandas DataFrame based on their column index. This function will read a pdf table that spans multiple pages. NOTE: The pdf document must be a vectorized pdf document and not a scan of another document for this function to work.
- Parameters:
file_name – The file name to include the path-link to the PDF file.
headers – A dictionary of column index and their data types. Data types are limited to
int,float, andstr.col_names – A list containing the names to be given to each column.
table_idx – Index of the table to extract from the page (default: 0).
skip_rows – Number of rows to skip before reading the header row (default: 0).
page_num – Page number from which to extract the table (default: 0).
- Return df:
A pandas DataFrame containing the specified columns from the table.
- Raises:
FileNotFoundError – If the PDF file is found to not exist.
Example usage:
from cobralib.io import read_pdf_columns_by_index > file_name = 'test.pdf' > headers = {0: int, 1: str, 2: float, 3: int} > col_names = ['ID', 'Inventory', 'Weight_per', 'Number'] # Column names > df = read_pdf_columns_by_index(file_name, headers, col_names, table_idx=0, skip_rows=2, page_num=1) > print(df) ID Inventory Weight_per Number 0 1 shoes 1.5 5 1 2 t-shirt 1.8 3 2 3 coffee 2.1 15 3 4 books 3.2 40
- cobralib.io.read_excel_columns_by_headers(file_name: str, tab: str, headers: dict[str, type], skip: int = 0) DataFrame[source]
- Parameters:
file_name – The file name to include path-link. Must be an .xls file format. This code will not read .xlsx
tab – The tab or sheet name that data will be read from
headers – A dictionary of column names and their data types. types are limited to
numpy.int64,numpy.float64, andstrskip – The number of lines to be skipped before reading data
- Return df:
A pandas dataframe containing all relevant information
- Raises:
FileNotFoundError – If the file is found to not exist
Assume we have a .xls file titled
test.xlswith the following format in a tab titledprimary.test.xls ID
Inventory
Weight_per
Number
1
Shoes
1.5
5
2
t-shirt
1.8
3
3
coffee
2.1
15
4
books
3.2
48
This file can be read via the following command
> file_name = 'test.xls' > tab = "primary" > headers = {'ID': int, 'Inventory': str, 'Weight_per': float, 'Number': int} > df = read_excel_columns_by_headers(file_name, tab, headers) > print(df) ID Inventory Weight_per Number 0 1 shoes 1.5 5 1 2 t-shirt 1.8 3 2 3 coffee 2.1 15 3 4 books 3.2 40
This function can also use the skip attributed read data when the headers are not on the first line. For instance, assume the following csv file;
test.xls This line is used to provide metadata for the csv file
This line is as well
ID
Inventory
Weight_per
Number
1
Shoes
1.5
5
2
t-shirt
1.8
3
3
coffee
2.1
15
4
books
3.2
48
This file can be read via the following command
from cobralib.io import read_excel_columns_by_headers > file_name = 'test.xls' > tab = "primary" > headers = ['ID': int, 'Inventory': str, 'Weight_per': float, 'Number': int] > df = read_excel_columns_by_headers(file_name, tab, headers, skip=2) > print(df) ID Inventory Weight_per Number 0 1 shoes 1.5 5 1 2 t-shirt 1.8 3 2 3 coffee 2.1 15 3 4 books 3.2 40
- cobralib.io.read_excel_columns_by_index(file_name: str, tab: str, col_index: dict[int, str], col_names: list[str], skip: int = 0) DataFrame[source]
- Parameters:
file_name – The file name to include path-link. Must be an .xls file format. This code will not read .xlsx
tab – The tab or sheet name that data will be read from
col_index – A dictionary of column index` and their data types. types are limited to
numpy.int64,numpy.float64, andstrcol_names – A list containing the names to be given to each column
skip – The number of lines to be skipped before reading data
- Return df:
A pandas dataframe containing all relevant information
- Raises:
FileNotFoundError – If the file is found to not exist
Assume we have a .txt file titled
test.xlswith the following format.test.xls 1
Shoes
1.5
5
2
t-shirt
1.8
3
3
coffee
2.1
15
4
books
3.2
48
This file can be read via the following command
from cobralib.io import read_excel_columns_by_index > file_name = 'test.xls' > tab = 'primary' > headers = {0: int, 1: str, 2: float, 3: int} > names = ['ID', 'Inventory', 'Weight_per', 'Number'] > df = read_excel_columns_by_index(file_name, tab, headers, names) > print(df) ID Inventory Weight_per Number 0 1 shoes 1.5 5 1 2 t-shirt 1.8 3 2 3 coffee 2.1 15 3 4 books 3.2 40
This function can also use the skip attributed read data when the headers are not on the first line. For instance, assume the following csv file;
test.xls This line is used to provide metadata for the csv file
This line is as well
ID
Inventory
Weight_per
Number
1
Shoes
1.5
5
2
t-shirt
1.8
3
3
coffee
2.1
15
4
books
3.2
48
This file can be read via the following command
from cobralib.io import read_excel_columns_by_index > file_name = 'test.xls' > tab = "primary" > headers = {0: int, 1: str, 2: float, 3: int} > names = ['ID', 'Inventory', 'Weight_per', 'Number'] > df = read_excel_columns_by_index(file_name, tab, headers, names, skip=2) > print(df) ID Inventory Weight_per Number 0 1 shoes 1.5 5 1 2 t-shirt 1.8 3 2 3 coffee 2.1 15 3 4 books 3.2 40
Read YAML
This class inherited by the ReadKeyWords class; however, it can be independently used
- class cobralib.io.ReadYAML(file_name: str)[source]
- Parameters:
file_name – The name and path length for the file with the yaml-like format
- Raises:
FileNotFoundError – If the file does not exist.
This class can be used to read a file woith a YAML-like format. This class is tailoered to read basic YAML files, but with looser requirements on how key words are formatted, and stricter requirements on data typing. The methods within this class can be used to read scalar variables from key-variable pairs, lists, and flat dictionaries. This class also enforces type casting for all variables read into memory. This class is more meory efficient than using PyYAML, since it only reads the requested lines to memory.
All code examples described in the documentation for this class reference the read_yaml.yaml file shown below.
--- key: 4.387 list values: - 1 - 2 - 3 - 4 Inline List: [ 1.1, 2.2, 3.3, 4.4 ] Dict List: One: - 1 - 2 - 3 Two: [3, 4, 5] Three: [6, 7, 8] Str Dict List: One: [One, Two, Three] Two: - | Multi Line list - Two - ^ Hello --- name: John Doe age: 30 First List: - 1.1 - 2.2 - 3.3 - 4.4 Numbers: - | Hello World This is Jon - ^ This - ^ Is - Correct Sentence: ^ Hello world Multi Sentence: | This is a multiline sentence, there is no reason to worry! Second Mult Sentence: > This is a multiline sentence, there is no reason to worry! String Value: Hello Again World! bool test1: TRUE bool test2: False bool test3: no bool test4: yes bool test5: on bool test6: Off Ages: Jon: 44 Jill: 32 Bob: 12 String Test: 0: String One 1: ^ Another String 2: > This is multiline one 3: | This is multiline two Dict to List: List One: - 1 - 2 - 3 List Two: - 4 - 5 - 6- read_full_yaml(safe_read: bool = True) Any[source]
Reads the full YAML file and returns it as a PyYAML object.
- Params safe_read:
Whether to read the file in a safe more or not. Defaulted to True
- Return Any:
The full content of the YAML file as a PyYAML object. This method assumes the possibility of multiple documents in one file. The result is returned as a list
Unlike other methods in this class, this method will read an entire yaml file into memory and return a PyYaml object. This is not as memory efficient as the other methods, but this will make the accessing of data quicker for larger files. In addition, the user must adhere to the strict rules of YAML when using this method. The rules for a PyYaml class can be found at PyYaml.
Example 1
An example of a python code to read a list of integer values from the 1st yaml document.
from cobralib.io import ReadYAML reader = ReadYAML('read_yaml.yaml') data = reader.read_full_yaml() # Read in as safe mode print(data[1]['Ages'])
>> {'Jon': 44. 'Jill': 32, 'Bob': 12}
- read_key_value(keyword: str, data_type: type, document_index: int = 0) Any[source]
- Parameters:
keyword – The keyword associated with the value to be read in. Unlike a pure YAML file this value does not have to end with a : symbol
data_type – The data type of the value to be read in
document_index – The number of the yaml document in the yaml file.
- Return value:
The value associated with a keyword
- Raises:
ValueError – If the value can not be cast to the user defined type
This method can be used to read a key-value pair from a yaml or yaml-like file. This method will rcognize the >, ^, and | symbols that symbolize strings that either start on the next line, or multiline strings.
Example 1
An example of a python code to read an float value from the 1st yaml document.
from cobralib.io import ReadYAML reader = ReadYAML('read_yaml.yaml') value = reader.read_key_value('key:', float, 0) print(value) >> 4.387
Example 2
An example to read a multiline string value from the second yaml document in the file
from cobralib.io import ReadYAML reader = ReadYAML('read_yaml.yaml') value = reader.read_key_value('Multi Sentence:', str, 1) new_value = reader.read_key_value('Second Mult Sentence:', str, 1) print(value) print(new_value)
>> This is a multiline sentence, there is no reason to worry! >> This is a multiline sentence, there is no reason to worry!
Example 3
An example that shows the different way boolean values can be read into memory. A value of True, on, or yes will equate to True and values of False, off, no will equate to False. The values in the yaml-like file are case insensitive.
from cobralib.io import ReadYAML reader = ReadYAML('read_yaml.yaml') true_value = reader.read_key_value('bool test1:', bool, 1) yes_value = reader.read_key_value('bool test4:', bool, 1) on_value = reader.read_key_value('bool test5:', bool, 1) false_value = reader.read_key_value('bool test2:', bool, 1) no_value = reader.read_key_value('bool test3:', bool, 1) off_value = reader.read_key_value('bool test6:', bool, 1)
>> True >> True >> True >> False >> False >> False
- read_yaml_dict(keyword: str, key_data_type: type, value_data_type: type, document_index: int = 0) dict[source]
- Parameters:
keyword – The keyword associated with the value to be read in. Unlike a pure YAML file, this value does not have to end with a : symbol.
key_data_type – The data type of the key value.
value_data_type – The data type of the value to be read in
document_index – The number of the yaml document in the yaml file.
- Return value:
The dictionary associated with a keyword
- Raises:
ValueError – If the value can not be cast to the user defined type
This method can be used to read a key-value pair from a yaml or yaml-like file where the value is a dictionary of values. This method will recognize the >, ^, and | symbols that symbolize strings that either start on the next line, or multiline strings. NOTE: This method assumes a flat (i.e. not nested) dictionary structure.
Example 1
An example of a python code to read a list of integer values from the 1st yaml document.
from cobralib.io import ReadYAML reader = ReadYAML('read_yaml.yaml') value = reader.read_yaml_dict('Ages:', 'str', 'int', 1) print(value)
>> {'Jon': 44. 'Jill': 32, 'Bob': 12}
- read_yaml_dict_of_list(keyword: str, key_data_type: type, list_data_type: type, document_index: int = 0) dict[source]
- Parameters:
keyword – The keyword associated with the value to be read in. Unlike a pure YAML file, this value does not have to end with a : symbol.
key_data_type – The data type of the key value.
list_data_type – The data type of the value to be read in
document_index – The number of the yaml document in the yaml file.
- Return value:
The dictionary associated with a keyword
- Raises:
ValueError – If the value can not be cast to the user defined type
This method can be used to read a key-value pair from a yaml or yaml-like file where the value is a dictionary of lists. This method will recognize the >, ^, and | symbols that symbolize strings that either start on the next line, or multiline strings. NOTE: This method assumes a flat (i.e. not nested) dictionary structure.
Example 1
An example of a python code to read a dictionary of integer list values from the 1st yaml document.
from cobralib.io import ReadYAML reader = ReadYAML('read_yaml.yaml') value = reader.read_yaml_dict('Dict List:', 'str', 'int', 0) print(value)
>> {'One': [1, 2, 3], 'Two': [3, 4, 5], 'Three': [6, 7, 8]}
- read_yaml_list(keyword: str, data_type: type, document_index: int = 0) list[Any][source]
- Parameters:
keyword – The keyword associated with the value to be read in. Unlike a pure YAML file, this value does not have to end with a : symbol.
data_type – The data type of the value to be read in
document_index – The number of the yaml document in the yaml file.
- Return value:
The list associated with a keyword
- Raises:
ValueError – If the value can not be cast to the user defined type
This method can be used to read a key-value pair from a yaml or yaml-like file where the value is a list of values. This method will rcognize the >, ^, and | symbols that symbolize strings that either start on the next line, or multiline strings.
Example 1
An example of a python code to read a list of integer values from the 1st yaml document.
from cobralib.io import ReadYAML reader = ReadYAML('read_yaml.yaml') list_values = reader.read_yaml_list('First List:', int, 0) print(list_values)
>> [1.1, 2.2, 3.3, 4.4]
Example 2
This method will also read string values from the list that may use the ^, > or | symbols that signify the string as starting on the next line, a multi-line string that should be read into one line, or a multiline string that should be read as a multiline string.
from cobralib.io import ReadYAML reader = ReadYAML('read_yaml.yaml') list_values = reader.read_yaml_list('Numbers:', int, 0) print(list_values)
>> ['Hello World This is Jon', 'This', 'Is', 'Correct']
Read JSON
This class inherited by the ReadKeyWords class; however, it can be independently used
- class cobralib.io.ReadJSON(file_name: str)[source]
- Parameters:
file_name – The name and path length for the file with the json-like format. While not required, it is recommended that this file use a .jwc extension.
- Raises:
FileNotFoundError – If the file does not exist.
This class can be used to read a file woith a JSON-like format. This class is tailoered to read basic JSON files, but with looser requirements on how key words are formatted, and stricter requirements on data typing. The methods within this class can be used to read scalar variables from key-variable pairs, lists, and flat dictionaries. The file containing json data can be a pure .json file, or it can be mixed with yaml like key value pairs. If the file is mixed, it is recommended that the file be defined with a .jwc extension.
- read_full_json(keyword: str = None) dict | list[source]
Read the entire contents of the file as JSON data. If a keyword is provided, search for that keyword and return the nested dictionaries beneath it.
- Parameters:
keyword – The keyword to search for in the file. If None, returns the entire JSON data.
- Returns:
The JSON data as a dictionary or list.
- Raises:
ValueError – If the keyword is specified but not found in the file.
Unlike the read_json method, this method assumes the entire file is formatted as a .json file. This method will allow a user to read in the entire contents of the json file as a dictionary, or it will read in the dictionaries nested under a specific key word. If you assume the input file titled example.json has the following format
Example 1
{ "key1": "value1", "key2": { "subkey1": "subvalue1", "subkey2": { "subsubkey1": "subsubvalue1", "subsubkey2": "subsubvalue2" } } }
The code to extract data would look like:
from cobralib.io import ReadJSON reader = ReadJSON("example.json") value = reader.read_full_json() print(value) new_value = reader.read_full_json("subkey2") print(new_value)
>> { "key1": "value1", "key2": { "subkey1": "subvalue1", "subkey2": { "subsubkey1": "subsubvalue1", "subsubkey2": "subsubvalue2" } } } >> {"subsubkey1": "subsubvalue1", "subsubkey2": "subsubvalue2"}
- read_json(keyword: str) dict[source]
Search each line for the specified keyword and read the JSON data to the right of the keyword until the termination of brackets.
- Parameters:
keyword – The keyword to search for in each line.
- Returns:
The JSON data as a dictionary.
- Raises:
ValueError – If the keyword is not found or if the JSON data is not valid.
Example 1
This example shows a file that mixes YAML and JSON data types. In order to delinate the file type that contains mixed data, it is recommended that the .jwc file format be used; however, it is not required.
Yaml Dict: - 1 - 2 - 3 Yaml Key: Test String Yaml Dict: One: 1.1 Two: 2.2 Three: 3.3 Json Book Data: {"book": "History of the World", "year": 1976}from cobralib.io import ReadJSON # Instantiate the class reader = ReadJSON("test_key_words.jwc") value = reader.read_json("JSON Book Data:") print(value)
>> {"book": "History of the World", "year": 1976}
Read XML
This class inherited by the ReadKeyWords class; however, it can be independently used
- class cobralib.io.ReadXML(file_name: str)[source]
- Parameters:
file_name – The name and path length for the file with the xml-like format. While not required, it is recommended that this file either be an .xml or .jwc file.
- Raises:
FileNotFoundError – If the file does not exist.
This class can be used to read a file woith a XML-like format. This class is tailoered to read basic XML files, but with looser requirements on how key words are formatted, and stricter requirements on data typing. The methods within this class can be used to read scalar variables from key-variable pairs, lists, and flat dictionaries. The file containing the XML data can contain traditional XML data or yaml-like key value pairs. If the file is mixed, it is recommended that the file be defined with a .jwc extension.
- read_full_xml(keyword: str = None)[source]
Read the XML data. If a keyword is provided, search for the specified keyword in the XML data and return the nested elements beneath it. If no keyword is provided, return the full XML data.
- Parameters:
keyword – The keyword to search for in the XML data.
- Returns:
The XML data as a dictionary object or the nested elements as an ElementTree object if a keyword is provided.
- Raises:
ValueError – If the keyword is specified but not found in the XML data.
If you assume the input file titled example.xml has the following format:
Example 1
<root> <key1>value1</key1> <key2> <subkey1>subvalue1</subkey1> <subkey2> <subsubkey1>subsubvalue1</subsubvalue1> <subsubkey2>subsubvalue2</subsubvalue2> </subkey2> </key2> </root>
The code to extract data would look like:
from cobralib.io import ReadXML reader = ReadXML("example.xml") value = reader.read_full_xml() print(value) >> { "root": { "key1": "value1", "key2": { "subkey1": "subvalue1", "subkey2": { "subsubkey1": "subsubvalue1", "subsubkey2": "subsubvalue2" } } } } new_value = reader.read_full_xml("subkey2") print(new_value)
>> { "subkey1": "subvalue1", "subkey2": { "subsubkey1": "subsubvalue1", "subsubkey2": "subsubvalue2" } }
- read_xml(keyword: str) dict[source]
Search each line for the specified keyword and read the XML data to the right of the keyword until the termination of tags.
- Parameters:
keyword – The keyword to search for in each line.
- Returns:
The XML data as a dictionary.
- Raises:
ValueError – If the keyword is not found or if the XML data is not valid.
Example 1
Yaml Dict: - 1 - 2 - 3 Yaml Key: Test String Yaml Dict: One: 1.1 Two: 2.2 Three: 3.3 XML Book Data: <root> <book>"History of the World"</book> <Year>1976</Year> </root>from cobralib.io import ReadXML reader = ReadXML("example.jwc") value = reader.read_xml("XML Book Data") print(value)
>> {"book": "History of the World", "year": 1976}
Logger
This class is a wrapper around the logging module that adds the ability to truncate a log file to a user specified number of log values.
- class cobralib.io.Logger(filename, console_level, file_level, max_lines)[source]
Custom logging class that writes messages to both console and log file.
- Parameters:
filename – The name of the file to write logs to.
console_level – The minimum logging level for the console. Should be one of: ‘NOTSET’, ‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’, ‘CRITICAL’.
file_level – The minimum logging level for the log file. Should be one of: ‘NOTSET’, ‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’, ‘CRITICAL’.
max_lines – The maximum number of lines in the log file. When exceeded, the oldest entries are deleted.
- Raises:
ValueError – If console_level or file_level are not valid logging levels.
IOError – If an I/O error occurs when opening the file.
Example usage:
# create logger with filename='my_log.log', console_level='INFO', # file_level='DEBUG', and max_lines=100 logger = Logger('my_log.log', 'INFO', 'DEBUG', 100) # log a DEBUG message logger.log('DEBUG', 'This is a debug message') # log an INFO message logger.log('INFO', 'This is an info message')
- cobralib.io.write_yaml_file(file_path: str, data: dict, append: bool = False) None[source]
Write or append data to a YAML file.
- Parameters:
file_path – The path of the YAML file
data – The data to be written or appended as a dictionary
append – True to append data to the file, False to overwrite the file or create a new one (default: False)
- Raises:
FileNotFoundError – If the file does not exist in append mode
from corbalib.io import write_yaml_file dict_file = {'sports' : ['soccer', 'football', 'basketball', 'cricket', 'hockey', 'table tennis']}, {'countries' : ['Pakistan', 'USA', 'India', 'China', 'Germany', 'France', 'Spain']} # Create new yaml file write_yaml_file('new_file.yaml', data, dict_file, append=False)
This will create a file titled new_file.yaml with the following contents
- sports: - soccer - football - basketball - cricket - hockey - table tennis - countries: - Pakistan - USA - India - China - Germany - France - Spain