Input / Output (io)

The io module contains several functions and classes that can be used to read various ASCII based files.

ReadKeyWords

The ReadKeyWords class is a multi-purpose class that enables a user to read in several different configuration file formats, and in addition, it allows a user to create a configuration file with a mixture of YAML, JSON, and XML data. While this class can read in data of any ASCII based text format, when mixing configuration formats it is recommended that the .jwc format be used. In addition this class inherits the ReadYAML, ReadJSON, and ReadXML classes. Further information on the ReadKeyWords class can be found in the documentation for the inherited classes.

class cobralib.io.ReadKeyWords(file_name: str, print_lines: int = 50)[source]
This class is a container for the ReadYAML, ReadJSON, and ReadXML classes. This class is developed specifically to read .jwc file types, which can mix JSON, XML, and YAML formats. Thsi file can be used to read a straight XML, JSON, or YAML file.

Parameters:

file_name – The file name to be read including the path length

print_lines – The number of lines to be printed to the screen if the user prints an instance of the class. Defaulted to 50

Raises:

FileNotFoundError – If the file does not exist
Example File
---
# First document in file

Float Value: 4.387
Double Value: 1.11111187
integer: 6
String: Hello
Float List: [1.1 2.2 3.3 4.4]
Yaml Block List:
    - 1
    - 2
    - 3
    - 4
Yaml Dict:
    First Key: 3.3
    Second Key: 4.4
    Third Key: 5.5
    Fourth Key: 6.6
String List Hello World How are you
JSON Data: {"book": "History of the World, "Year": 1976}
XML Data: <root>
             <book>"History of the World"</book>
             <Year>1976</Year>
          </root>

---
# Second document in file

# Notice that a : character is not required
Another Int 3
Instantiation Example
# Instantiate the class
from io.cobralib import ReadKey Words
reader = ReadKeyWords("test_key_words.jwc", print_lines=2)

# Print the instance, displaying 2 lines
print(reader)
>> Float Value: 4.387 # Comment line not to be read
>> Double Value: 1.11111187 # Comment line not to be read
The user can also adjust the print_lines attribute after instantiation if they wish to change the number of printed lines
Read Scalar Values

This class can be used to read in key value pairs.
# Instantiate the class
from io.cobralib import ReadKey Words
reader = ReadKeyWords("test_key_words.jwc")
int_value = reader.read_key_value("integer:", int)
double_value = reader.read_key_value("Double Value:", np.float64)
# Read from second document in file
second_doc = reader.read_key_value("Another Int", int, 1)
print("Integer Value: ", int_value)
print(type)
print("Double Value: ", double_value)
print(type)
print(second_doc)
>> Integer Value: 6
>> int
>> Double Value: 1.11111187
>> np.float64
>> 3
Read List Values

This class can be used to read in lists stored inline or in block formats
# Instantiate the class
from io.cobralib import ReadKey Words
reader = ReadKeyWords("test_key_words.jwc")
inline_list = reader.read_key_value("Float List:", float)
block_list = reader.read_key_value("Yaml Block List:", int)
print("Inline List: ", inline_list)
print("Block List: ", block_list)
>> Inline List: [ 1.1, 2.2, 3.3, 4.4 ]
>> Block List: [ 1, 2, 3, 4 ]
Read JSON and XML

This class can be used to read JSON and XML data associated with key words
# Instantiate the class
from io.cobralib import ReadKey Words
reader = ReadKeyWords("test_key_words.jwc")
json_data = reader.read_json("JSON Data:")
xml_data = reader.read_xml("XML Data:")
print("JSON Data: ", json_data)
print("XML Data: ", xml_data)
>> JSON Data: {"book": "History of the World", "Year", 1976}
>> XML Data: {"book": "History of the World", "Year", 1976}
Read YAML Dictionaries

This class can be used to read dictionaries encoded in YAML formats. Unlike JSON and XML, dictionaries read in from a YAML format must be flat (i.e. no nested dictionaries) and of a uniform data type.
# Instantiate the class
from io.cobralib import ReadKey Words
reader = ReadKeyWords("test_key_words.jwc")
yaml_dict = reader.read_yaml_dict("Yaml Dict:", str, float)
print("YAML Dictionary: ", yaml_dict)
>> YAML Dictionary: {"First Key": 3.3, "Second Key": 4.4,
                     "Third Key": 5.5, "Fourth Key": 6.6}
Note: In order to read in a ditionary of lists, use the read_yaml_dict_of_list method.
YAML, JSON, and XML Files

If you wish to read a .yaml, .josn, or .xml file that does not contain mixed data, you can use one of these three methods.
# Instantiate the class
from io.cobralib import ReadKey Words
yaml_reader = ReadKeyWords("test_key_words.yaml")
yaml_data = yaml_reader.read_full_yaml()

json_reader = ReadKeyWords("test_key_words.json")
json_data = json_reader.read_full.json()

xml_reader = ReadKeyWords("test_key_words.xml")
xml_data = xml_reader.read_full_xml()
members:

Read Columnar Data

The following functions can be used to read columnar data from .txt, .csv, .xls, and .xlsx, and .pdf files.

cobralib.io.read_csv_columns_by_headers(file_name: str, headers: dict[str, type], skip: int = 0) → DataFrame[source]

Parameters:

file_name – The file name to include path-link
headers – A dictionary of column names and their data types. types are limited to numpy.int64, numpy.float64, and str
skip – The number of lines to be skipped before reading data

Return df:

A pandas dataframe containing all relevant information

Raises:

FileNotFoundError – If the file is found to not exist

This function assumes the file has a comma (i.e. ,) delimiter, if it does not, then it is not a true .csv file and should be transformed to a text function and read by the read_text_columns_by_headers function. Assume we have a .csv file titled test.csv with the following format.

test.csv
ID,	Inventory,	Weight_per,	Number
1,	Shoes,	1.5,	5
2,	t-shirt,	1.8,	3,
3,	coffee,	2.1,	15
4,	books,	3.2,	48

This file can be read via the following command

from cobralib.io import read_csv_columns_by_headers

> file_name = 'test.csv'
> headers = {'ID': int, 'Inventory': str, 'Weight_per': float. 'Number': int}
> df = read_csv_columns_by_headers(file_name, headers)
> print(df)
    ID Inventory Weight_per Number
 0  1  shoes     1.5        5
 1  2  t-shirt   1.8        3
 2  3  coffee    2.1        15
 3  4  books     3.2        40

This function can also use the skip attributed read data when the headers are not on the first line. For instance, assume the following csv file;

test1.csv
This line is used to provide metadata for the csv file
This line is as well
ID,	Inventory,	Weight_per,	Number
1,	Shoes,	1.5,	5
2,	t-shirt,	1.8,	3,
3,	coffee,	2.1,	15
4,	books,	3.2,	48

This file can be read via the following command

from cobralib.io import read_csv_columns_by_headers

> file_name = 'test1.csv'
> headers = {'ID': int, 'Inventory': str, 'Weight_per': float, 'Number': int}
> df = read_csv_columns_by_headers(file_name, headers, skip=2)
> print(df)
    ID Inventory Weight_per Number
 0  1  shoes     1.5        5
 1  2  t-shirt   1.8        3
 2  3  coffee    2.1        15
 3  4  books     3.2        40

cobralib.io.read_csv_columns_by_index(file_name: str, headers: dict[int, type], col_names: list[str], skip: int = 0) → DataFrame[source]

Parameters:

file_name – The file name to include path-link
headers – A dictionary of column index and their data types. types are limited to numpy.int64, numpy.float64, and str
col_names – A list containing the names to be given to each column
skip – The number of lines to be skipped before reading data

Return df:

A pandas dataframe containing all relevant information

Raises:

FileNotFoundError – If the file is found to not exist

This function assumes the file has a comma (i.e. ,) delimiter, if it does not, then it is not a true .csv file and should be transformed to a text function and read by the xx function. Assume we have a .csv file titled test.csv with the following format.

test.csv
1,	Shoes,	1.5,	5
2,	t-shirt,	1.8,	3,
3,	coffee,	2.1,	15
4,	books,	3.2,	48

This file can be read via the following command

from cobralib.io import read_csv_columns_by_index

> file_name = 'test.csv'
> headers = {0: int, 1: str, 2: float, 3: int}
> names = ['ID', 'Inventory', 'Weight_per', 'Number']
> df = read_csv_columns_by_index(file_name, headers, names)
> print(df)
    ID Inventory Weight_per Number
 0  1  shoes     1.5        5
 1  2  t-shirt   1.8        3
 2  3  coffee    2.1        15
 3  4  books     3.2        40

This function can also use the skip attributed read data when the headers are not on the first line. For instance, assume the following csv file;

test1.csv
This line is used to provide metadata for the csv file
This line is as well
1,	Shoes,	1.5,	5
2,	t-shirt,	1.8,	3,
3,	coffee,	2.1,	15
4,	books,	3.2,	48

This file can be read via the following command

from cobralib.io import read_csv_columns_by_index

> file_name = 'test1.csv'
 > headers = {0: int, 1: str, 2: float, 3: int}
> names = ['ID', 'Inventory', 'Weight_per', 'Number']
> df = read_csv_columns_by_index(file_name, headers,
                                 names, skip=2)
> print(df)
    ID Inventory Weight_per Number
 0  1  shoes     1.5        5
 1  2  t-shirt   1.8        3
 2  3  coffee    2.1        15
 3  4  books     3.2        40

cobralib.io.read_text_columns_by_headers(file_name: str, headers: dict[str, type], skip: int = 0, delimiter='\\s+') → DataFrame[source]

Parameters:

file_name – The file name to include path-link
headers – A dictionary of column names and their data types. types are limited to numpy.int64, numpy.float64, and str
skip – The number of lines to be skipped before reading data
delimiter – The type of delimiter separating data in the text file. Defaulted to space delimited, where a space is one or more white spaces. This function can use any delimiter, to include a comma separation; however, a comma delimiter should be a .csv file extension.

Return df:

A pandas dataframe containing all relevant information

Raises:

FileNotFoundError – If the file is found to not exist

This function assumes the file has a space delimiter, if Assume we have a .csv file titled test.txt with the following format.

test.txt
ID	Inventory	Weight_per	Number
1	Shoes	1.5	5
2	t-shirt	1.8	3
3	coffee	2.1	15
4	books	3.2	48

This file can be read via the following command

from cobralib.io import read_text_columns_by_headers

> file_name = 'test.txt'
> headers = {'ID': int, 'Inventory': str, 'Weight_per': float, 'Number': int}
> df = read_text_columns_by_headers(file_name, headers)
> print(df)
    ID Inventory Weight_per Number
 0  1  shoes     1.5        5
 1  2  t-shirt   1.8        3
 2  3  coffee    2.1        15
 3  4  books     3.2        40

This function can also use the skip attributed read data when the headers are not on the first line. For instance, assume the following csv file;

test.txt
This line is used to provide metadata for the csv file
This line is as well
ID	Inventory	Weight_per	Number
1	Shoes	1.5	5
2	t-shirt	1.8	3
3	coffee	2.1	15
4	books	3.2	48

This file can be read via the following command

from cobralib.io import read_text_columns_by_headers

> file_name = 'test.txt'
> headers = {'ID': int, 'Inventory': str, 'Weight_per': float, 'Number': int}
> df = read_text_columns_by_headers(file_name, headers, skip=2)
> print(df)
    ID Inventory Weight_per Number
 0  1  shoes     1.5        5
 1  2  t-shirt   1.8        3
 2  3  coffee    2.1        15
 3  4  books     3.2        40

cobralib.io.read_text_columns_by_index(file_name: str, headers: dict[int, type], col_names: list[str], skip: int = 0, delimiter='\\s+') → DataFrame[source]

Parameters:

file_name – The file name to include path-link
headers – A dictionary of column index` and their data types. types are limited to numpy.int64, numpy.float64, and str
col_names – A list containing the names to be given to each column
skip – The number of lines to be skipped before reading data
delimiter – The type of delimiter separating data in the text file. Defaulted to space delimited, where a space is one or more white spaces. This function can use any delimiter, to include a comma separation; however, a comma delimiter should be a .csv file extension.

Return df:

A pandas dataframe containing all relevant information

Raises:

FileNotFoundError – If the file is found to not exist

Assume we have a .txt file titled test.txt with the following format.

test.txt
1	Shoes	1.5	5
2	t-shirt	1.8	3
3	coffee	2.1	15
4	books	3.2	48

This file can be read via the following command

from cobralib.io import read_text_columns_by_index

> file_name = 'test.txt'
> headers = {0: int, 1: str, 2: float, 3: int}
> names = [ headers = {'ID', 'Inventory', 'Weight_per', 'Number']
> df = read_text_columns_by_index(file_name, headers, names)
> print(df)
    ID Inventory Weight_per Number
 0  1  shoes     1.5        5
 1  2  t-shirt   1.8        3
 2  3  coffee    2.1        15
 3  4  books     3.2        40

This function can also use the skip attributed read data when the headers are not on the first line. For instance, assume the following csv file;

test.txt
This line is used to provide metadata for the csv file
This line is as well
ID	Inventory	Weight_per	Number
1	Shoes	1.5	5
2	t-shirt	1.8	3
3	coffee	2.1	15
4	books	3.2	48

This file can be read via the following command

from cobralib.io import read_text_columns_by_index

> file_name = 'test.txt'
> headers = {0: int, 1: str, 2: float, 3: int}
> names = ['ID', 'Inventory', 'Weight_per', 'Number']
> df = read_text_columns_by_index(file_name, headers,
                                  names, skip=2)
> print(df)
    ID Inventory Weight_per Number
 0  1  shoes     1.5        5
 1  2  t-shirt   1.8        3
 2  3  coffee    2.1        15
 3  4  books     3.2        40

cobralib.io.read_pdf_columns_by_headers(file_name: str, headers: dict[str, type], table_idx: int = 0, page_num: int = 0, skip: int = 0) → DataFrame[source]

Read a table from a PDF document and save user-specified columns into a pandas DataFrame. This function will read a pdf table that spans multiple pages. NOTE: The pdf document must be a vectorized pdf document and not a scan of another document for this function to work.

Parameters:

file_name – The file name to include the path-link to the PDF file.
headers – A dictionary of column names and their data types. Data types are limited to int, float, and str.
table_idx – Index of the table to extract from the page (default: 0).
page_num – Page number from which to extract the table (default: 0).
skip – The number of lines to be skipped before reading data

Return df:

A pandas DataFrame containing the specified columns from the table.

Raises:

FileNotFoundError – If the PDF file is found to not exist.

Example usage:

from cobralib.io import read_pdf_columns_by_headers

> file_name = 'test.pdf'
> headers = {'ID': int, 'Inventory': str, 'Weight_per': float, 'Number': int}
> df = read_pdf_columns_by_headers(file_name, headers, table_idx=0, page_num=1)
> print(df)
    ID Inventory Weight_per Number
 0  1  shoes     1.5        5
 1  2  t-shirt   1.8        3
 2  3  coffee    2.1        15
 3  4  books     3.2        40

cobralib.io.read_pdf_columns_by_index(file_name: str, headers: dict[int, type], col_names: list[str], table_idx: int = 0, skip_rows: int = 0, page_num: int = 0) → DataFrame[source]

Read a table from a PDF document and save user-specified columns into a pandas DataFrame based on their column index. This function will read a pdf table that spans multiple pages. NOTE: The pdf document must be a vectorized pdf document and not a scan of another document for this function to work.

Parameters:

file_name – The file name to include the path-link to the PDF file.
headers – A dictionary of column index and their data types. Data types are limited to int, float, and str.
col_names – A list containing the names to be given to each column.
table_idx – Index of the table to extract from the page (default: 0).
skip_rows – Number of rows to skip before reading the header row (default: 0).
page_num – Page number from which to extract the table (default: 0).

Return df:

A pandas DataFrame containing the specified columns from the table.

Raises:

FileNotFoundError – If the PDF file is found to not exist.

Example usage:

from cobralib.io import read_pdf_columns_by_index

> file_name = 'test.pdf'
> headers = {0: int, 1: str, 2: float, 3: int}
> col_names = ['ID', 'Inventory', 'Weight_per', 'Number']  # Column names
> df = read_pdf_columns_by_index(file_name, headers, col_names,
                                 table_idx=0, skip_rows=2, page_num=1)
> print(df)
    ID Inventory Weight_per Number
 0  1  shoes     1.5        5
 1  2  t-shirt   1.8        3
 2  3  coffee    2.1        15
 3  4  books     3.2        40

cobralib.io.read_excel_columns_by_headers(file_name: str, tab: str, headers: dict[str, type], skip: int = 0) → DataFrame[source]

Parameters:

file_name – The file name to include path-link. Must be an .xls file format. This code will not read .xlsx
tab – The tab or sheet name that data will be read from
headers – A dictionary of column names and their data types. types are limited to numpy.int64, numpy.float64, and str
skip – The number of lines to be skipped before reading data

Return df:

A pandas dataframe containing all relevant information

Raises:

FileNotFoundError – If the file is found to not exist

Assume we have a .xls file titled test.xls with the following format in a tab titled primary.

test.xls
ID	Inventory	Weight_per	Number
1	Shoes	1.5	5
2	t-shirt	1.8	3
3	coffee	2.1	15
4	books	3.2	48

This file can be read via the following command

> file_name = 'test.xls'
> tab = "primary"
> headers = {'ID': int, 'Inventory': str, 'Weight_per': float, 'Number': int}
> df = read_excel_columns_by_headers(file_name, tab, headers)
> print(df)
    ID Inventory Weight_per Number
 0  1  shoes     1.5        5
 1  2  t-shirt   1.8        3
 2  3  coffee    2.1        15
 3  4  books     3.2        40

This function can also use the skip attributed read data when the headers are not on the first line. For instance, assume the following csv file;

test.xls
This line is used to provide metadata for the csv file
This line is as well
ID	Inventory	Weight_per	Number
1	Shoes	1.5	5
2	t-shirt	1.8	3
3	coffee	2.1	15
4	books	3.2	48

This file can be read via the following command

from cobralib.io import read_excel_columns_by_headers

> file_name = 'test.xls'
> tab = "primary"
> headers = ['ID': int, 'Inventory': str, 'Weight_per': float, 'Number': int]
> df = read_excel_columns_by_headers(file_name, tab,
                                     headers, skip=2)
> print(df)
    ID Inventory Weight_per Number
 0  1  shoes     1.5        5
 1  2  t-shirt   1.8        3
 2  3  coffee    2.1        15
 3  4  books     3.2        40

cobralib.io.read_excel_columns_by_index(file_name: str, tab: str, col_index: dict[int, str], col_names: list[str], skip: int = 0) → DataFrame[source]

Parameters:

file_name – The file name to include path-link. Must be an .xls file format. This code will not read .xlsx
tab – The tab or sheet name that data will be read from
col_index – A dictionary of column index` and their data types. types are limited to numpy.int64, numpy.float64, and str
col_names – A list containing the names to be given to each column
skip – The number of lines to be skipped before reading data

Return df:

A pandas dataframe containing all relevant information

Raises:

FileNotFoundError – If the file is found to not exist

Assume we have a .txt file titled test.xls with the following format.

test.xls
1	Shoes	1.5	5
2	t-shirt	1.8	3
3	coffee	2.1	15
4	books	3.2	48

This file can be read via the following command

from cobralib.io import read_excel_columns_by_index

> file_name = 'test.xls'
> tab = 'primary'
> headers = {0: int, 1: str, 2: float, 3: int}
> names = ['ID', 'Inventory', 'Weight_per', 'Number']
> df = read_excel_columns_by_index(file_name, tab, headers, names)
> print(df)
    ID Inventory Weight_per Number
 0  1  shoes     1.5        5
 1  2  t-shirt   1.8        3
 2  3  coffee    2.1        15
 3  4  books     3.2        40

This function can also use the skip attributed read data when the headers are not on the first line. For instance, assume the following csv file;

test.xls
This line is used to provide metadata for the csv file
This line is as well
ID	Inventory	Weight_per	Number
1	Shoes	1.5	5
2	t-shirt	1.8	3
3	coffee	2.1	15
4	books	3.2	48

This file can be read via the following command

from cobralib.io import read_excel_columns_by_index

> file_name = 'test.xls'
> tab = "primary"
> headers = {0: int, 1: str, 2: float, 3: int}
> names = ['ID', 'Inventory', 'Weight_per', 'Number']
> df = read_excel_columns_by_index(file_name, tab, headers,
                                   names, skip=2)
> print(df)
    ID Inventory Weight_per Number
 0  1  shoes     1.5        5
 1  2  t-shirt   1.8        3
 2  3  coffee    2.1        15
 3  4  books     3.2        40

Read YAML

This class inherited by the ReadKeyWords class; however, it can be independently used

class cobralib.io.ReadYAML(file_name: str)[source]

Parameters:: file_name – The name and path length for the file with the yaml-like format
Raises:: FileNotFoundError – If the file does not exist.

This class can be used to read a file woith a YAML-like format. This class is tailoered to read basic YAML files, but with looser requirements on how key words are formatted, and stricter requirements on data typing. The methods within this class can be used to read scalar variables from key-variable pairs, lists, and flat dictionaries. This class also enforces type casting for all variables read into memory. This class is more meory efficient than using PyYAML, since it only reads the requested lines to memory.

All code examples described in the documentation for this class reference the read_yaml.yaml file shown below.

---
key: 4.387
list values:
  - 1
  - 2
  - 3
  - 4
Inline List: [ 1.1, 2.2, 3.3, 4.4 ]
Dict List:
    One:
      - 1
      - 2
      - 3
    Two: [3, 4, 5]
    Three: [6, 7, 8]
Str Dict List:
  One: [One, Two, Three]
  Two:
    - |
      Multi Line
      list
    - Two
    - ^
      Hello
---
name: John Doe
age: 30
First List:
  - 1.1
  - 2.2
  - 3.3
  - 4.4
Numbers:
  - |
    Hello World
    This is Jon
  - ^
    This
  - ^
    Is
  - Correct
Sentence: ^
  Hello world
Multi Sentence: |
  This is a multiline sentence,
  there is no reason to worry!
Second Mult Sentence: >
  This is a multiline sentence,
    there is no reason to worry!
String Value: Hello Again World!

bool test1: TRUE

bool test2: False

bool test3: no

bool test4: yes

bool test5: on

bool test6: Off

Ages:
  Jon: 44
  Jill: 32
  Bob: 12

String Test:
  0: String One
  1: ^
    Another String
  2: >
    This is multiline
     one
  3: |
    This is multiline
    two

  Dict to List:
    List One:
      - 1
      - 2
      - 3
    List Two:
      - 4
      - 5
      - 6

read_full_yaml(safe_read: bool = True) → Any[source]

Reads the full YAML file and returns it as a PyYAML object.

Params safe_read:: Whether to read the file in a safe more or not. Defaulted to True
Return Any:: The full content of the YAML file as a PyYAML object. This method assumes the possibility of multiple documents in one file. The result is returned as a list

Unlike other methods in this class, this method will read an entire yaml file into memory and return a PyYaml object. This is not as memory efficient as the other methods, but this will make the accessing of data quicker for larger files. In addition, the user must adhere to the strict rules of YAML when using this method. The rules for a PyYaml class can be found at PyYaml.

Example 1

An example of a python code to read a list of integer values from the 1st yaml document.

from cobralib.io import ReadYAML

reader = ReadYAML('read_yaml.yaml')
data = reader.read_full_yaml()  # Read in as safe mode
print(data[1]['Ages'])

>> {'Jon': 44. 'Jill': 32, 'Bob': 12}

read_key_value(keyword: str, data_type: type, document_index: int = 0) → Any[source]

Parameters:

keyword – The keyword associated with the value to be read in. Unlike a pure YAML file this value does not have to end with a : symbol
data_type – The data type of the value to be read in
document_index – The number of the yaml document in the yaml file.

Return value:

The value associated with a keyword

Raises:

ValueError – If the value can not be cast to the user defined type

This method can be used to read a key-value pair from a yaml or yaml-like file. This method will rcognize the >, ^, and | symbols that symbolize strings that either start on the next line, or multiline strings.

Example 1

An example of a python code to read an float value from the 1st yaml document.

from cobralib.io import ReadYAML

reader = ReadYAML('read_yaml.yaml')
value = reader.read_key_value('key:', float, 0)
print(value)
>> 4.387

Example 2

An example to read a multiline string value from the second yaml document in the file

from cobralib.io import ReadYAML

reader = ReadYAML('read_yaml.yaml')
value = reader.read_key_value('Multi Sentence:', str, 1)
new_value = reader.read_key_value('Second Mult Sentence:', str, 1)
print(value)
print(new_value)

>> This is a multiline sentence,
   there is no reason to worry!
>> This is a multiline sentence, there is no reason to worry!

Example 3

An example that shows the different way boolean values can be read into memory. A value of True, on, or yes will equate to True and values of False, off, no will equate to False. The values in the yaml-like file are case insensitive.

from cobralib.io import ReadYAML

reader = ReadYAML('read_yaml.yaml')
true_value = reader.read_key_value('bool test1:', bool, 1)
yes_value = reader.read_key_value('bool test4:', bool, 1)
on_value = reader.read_key_value('bool test5:', bool, 1)
false_value = reader.read_key_value('bool test2:', bool, 1)
no_value = reader.read_key_value('bool test3:', bool, 1)
off_value = reader.read_key_value('bool test6:', bool, 1)

>> True
>> True
>> True
>> False
>> False
>> False

read_yaml_dict(keyword: str, key_data_type: type, value_data_type: type, document_index: int = 0) → dict[source]

Parameters:

keyword – The keyword associated with the value to be read in. Unlike a pure YAML file, this value does not have to end with a : symbol.
key_data_type – The data type of the key value.
value_data_type – The data type of the value to be read in
document_index – The number of the yaml document in the yaml file.

Return value:

The dictionary associated with a keyword

Raises:

ValueError – If the value can not be cast to the user defined type

This method can be used to read a key-value pair from a yaml or yaml-like file where the value is a dictionary of values. This method will recognize the >, ^, and | symbols that symbolize strings that either start on the next line, or multiline strings. NOTE: This method assumes a flat (i.e. not nested) dictionary structure.

Example 1

An example of a python code to read a list of integer values from the 1st yaml document.

from cobralib.io import ReadYAML

reader = ReadYAML('read_yaml.yaml')
value = reader.read_yaml_dict('Ages:', 'str', 'int', 1)
print(value)

>> {'Jon': 44. 'Jill': 32, 'Bob': 12}

read_yaml_dict_of_list(keyword: str, key_data_type: type, list_data_type: type, document_index: int = 0) → dict[source]

Parameters:

keyword – The keyword associated with the value to be read in. Unlike a pure YAML file, this value does not have to end with a : symbol.
key_data_type – The data type of the key value.
list_data_type – The data type of the value to be read in
document_index – The number of the yaml document in the yaml file.

Return value:

The dictionary associated with a keyword

Raises:

ValueError – If the value can not be cast to the user defined type

This method can be used to read a key-value pair from a yaml or yaml-like file where the value is a dictionary of lists. This method will recognize the >, ^, and | symbols that symbolize strings that either start on the next line, or multiline strings. NOTE: This method assumes a flat (i.e. not nested) dictionary structure.

Example 1

An example of a python code to read a dictionary of integer list values from the 1st yaml document.

from cobralib.io import ReadYAML

reader = ReadYAML('read_yaml.yaml')
value = reader.read_yaml_dict('Dict List:', 'str', 'int', 0)
print(value)

>> {'One': [1, 2, 3], 'Two': [3, 4, 5], 'Three': [6, 7, 8]}

read_yaml_list(keyword: str, data_type: type, document_index: int = 0) → list[Any][source]

Parameters:

keyword – The keyword associated with the value to be read in. Unlike a pure YAML file, this value does not have to end with a : symbol.
data_type – The data type of the value to be read in
document_index – The number of the yaml document in the yaml file.

Return value:

The list associated with a keyword

Raises:

ValueError – If the value can not be cast to the user defined type

This method can be used to read a key-value pair from a yaml or yaml-like file where the value is a list of values. This method will rcognize the >, ^, and | symbols that symbolize strings that either start on the next line, or multiline strings.

Example 1

An example of a python code to read a list of integer values from the 1st yaml document.

from cobralib.io import ReadYAML

reader = ReadYAML('read_yaml.yaml')
list_values = reader.read_yaml_list('First List:', int, 0)
print(list_values)

>> [1.1, 2.2, 3.3, 4.4]

Example 2

This method will also read string values from the list that may use the ^, > or | symbols that signify the string as starting on the next line, a multi-line string that should be read into one line, or a multiline string that should be read as a multiline string.

from cobralib.io import ReadYAML

reader = ReadYAML('read_yaml.yaml')
list_values = reader.read_yaml_list('Numbers:', int, 0)
print(list_values)

>> ['Hello World
     This is Jon',
    'This',
    'Is',
    'Correct']

Read JSON

This class inherited by the ReadKeyWords class; however, it can be independently used

class cobralib.io.ReadJSON(file_name: str)[source]

Parameters:: file_name – The name and path length for the file with the json-like format. While not required, it is recommended that this file use a .jwc extension.
Raises:: FileNotFoundError – If the file does not exist.

This class can be used to read a file woith a JSON-like format. This class is tailoered to read basic JSON files, but with looser requirements on how key words are formatted, and stricter requirements on data typing. The methods within this class can be used to read scalar variables from key-variable pairs, lists, and flat dictionaries. The file containing json data can be a pure .json file, or it can be mixed with yaml like key value pairs. If the file is mixed, it is recommended that the file be defined with a .jwc extension.

read_full_json(keyword: str = None) → dict | list[source]

Read the entire contents of the file as JSON data. If a keyword is provided, search for that keyword and return the nested dictionaries beneath it.

Parameters:: keyword – The keyword to search for in the file. If None, returns the entire JSON data.
Returns:: The JSON data as a dictionary or list.
Raises:: ValueError – If the keyword is specified but not found in the file.

Unlike the read_json method, this method assumes the entire file is formatted as a .json file. This method will allow a user to read in the entire contents of the json file as a dictionary, or it will read in the dictionaries nested under a specific key word. If you assume the input file titled example.json has the following format

Example 1

{
 "key1": "value1",
 "key2": {
     "subkey1": "subvalue1",
     "subkey2": {
         "subsubkey1": "subsubvalue1",
         "subsubkey2": "subsubvalue2"
      }
   }
}

The code to extract data would look like:

from cobralib.io import ReadJSON
reader = ReadJSON("example.json")
value = reader.read_full_json()
print(value)
new_value = reader.read_full_json("subkey2")
print(new_value)

>> {
    "key1": "value1",
    "key2": {
        "subkey1": "subvalue1",
        "subkey2": {
            "subsubkey1": "subsubvalue1",
            "subsubkey2": "subsubvalue2"
        }
    }
}
>> {"subsubkey1": "subsubvalue1", "subsubkey2": "subsubvalue2"}

read_json(keyword: str) → dict[source]

Search each line for the specified keyword and read the JSON data to the right of the keyword until the termination of brackets.

Parameters:: keyword – The keyword to search for in each line.
Returns:: The JSON data as a dictionary.
Raises:: ValueError – If the keyword is not found or if the JSON data is not valid.

Example 1

This example shows a file that mixes YAML and JSON data types. In order to delinate the file type that contains mixed data, it is recommended that the .jwc file format be used; however, it is not required.

Yaml Dict:
    - 1
    - 2
    - 3
Yaml Key: Test String
Yaml Dict:
    One: 1.1
    Two: 2.2
    Three: 3.3
Json Book Data: {"book": "History of the World", "year": 1976}

from cobralib.io import ReadJSON
# Instantiate the class
reader = ReadJSON("test_key_words.jwc")
value = reader.read_json("JSON Book Data:")
print(value)

>> {"book": "History of the World", "year": 1976}

Read XML

This class inherited by the ReadKeyWords class; however, it can be independently used

class cobralib.io.ReadXML(file_name: str)[source]

Parameters:: file_name – The name and path length for the file with the xml-like format. While not required, it is recommended that this file either be an .xml or .jwc file.
Raises:: FileNotFoundError – If the file does not exist.

This class can be used to read a file woith a XML-like format. This class is tailoered to read basic XML files, but with looser requirements on how key words are formatted, and stricter requirements on data typing. The methods within this class can be used to read scalar variables from key-variable pairs, lists, and flat dictionaries. The file containing the XML data can contain traditional XML data or yaml-like key value pairs. If the file is mixed, it is recommended that the file be defined with a .jwc extension.

read_full_xml(keyword: str = None)[source]

Read the XML data. If a keyword is provided, search for the specified keyword in the XML data and return the nested elements beneath it. If no keyword is provided, return the full XML data.

Parameters:: keyword – The keyword to search for in the XML data.
Returns:: The XML data as a dictionary object or the nested elements as an ElementTree object if a keyword is provided.
Raises:: ValueError – If the keyword is specified but not found in the XML data.

If you assume the input file titled example.xml has the following format:

Example 1

<root>
    <key1>value1</key1>
    <key2>
        <subkey1>subvalue1</subkey1>
        <subkey2>
            <subsubkey1>subsubvalue1</subsubvalue1>
            <subsubkey2>subsubvalue2</subsubvalue2>
        </subkey2>
    </key2>
</root>

The code to extract data would look like:

from cobralib.io import ReadXML
reader = ReadXML("example.xml")
value = reader.read_full_xml()
print(value)

>> {
    "root": {
        "key1": "value1",
        "key2": {
            "subkey1": "subvalue1",
            "subkey2": {
                "subsubkey1": "subsubvalue1",
                "subsubkey2": "subsubvalue2"
            }
        }
    }
}

new_value = reader.read_full_xml("subkey2")
print(new_value)

>> {
    "subkey1": "subvalue1",
    "subkey2": {
        "subsubkey1": "subsubvalue1",
        "subsubkey2": "subsubvalue2"
    }
}

read_xml(keyword: str) → dict[source]

Search each line for the specified keyword and read the XML data to the right of the keyword until the termination of tags.

Parameters:: keyword – The keyword to search for in each line.
Returns:: The XML data as a dictionary.
Raises:: ValueError – If the keyword is not found or if the XML data is not valid.

Example 1

Yaml Dict:
    - 1
    - 2
    - 3
Yaml Key: Test String
Yaml Dict:
    One: 1.1
    Two: 2.2
    Three: 3.3
XML Book Data: <root>
                  <book>"History of the World"</book>
                  <Year>1976</Year>
               </root>

from cobralib.io import ReadXML
reader = ReadXML("example.jwc")
value = reader.read_xml("XML Book Data")
print(value)

>> {"book": "History of the World", "year": 1976}

Logger

This class is a wrapper around the logging module that adds the ability to truncate a log file to a user specified number of log values.

class cobralib.io.Logger(filename, console_level, file_level, max_lines)[source]

Custom logging class that writes messages to both console and log file.

Parameters:

filename – The name of the file to write logs to.
console_level – The minimum logging level for the console. Should be one of: ‘NOTSET’, ‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’, ‘CRITICAL’.
file_level – The minimum logging level for the log file. Should be one of: ‘NOTSET’, ‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’, ‘CRITICAL’.
max_lines – The maximum number of lines in the log file. When exceeded, the oldest entries are deleted.

Raises:

ValueError – If console_level or file_level are not valid logging levels.
IOError – If an I/O error occurs when opening the file.

Example usage:

# create logger with filename='my_log.log', console_level='INFO',
# file_level='DEBUG', and max_lines=100

logger = Logger('my_log.log', 'INFO', 'DEBUG', 100)

# log a DEBUG message
logger.log('DEBUG', 'This is a debug message')

# log an INFO message
logger.log('INFO', 'This is an info message')

log(level, msg)[source]

Write a log entry.

Parameters:

level – The level of the log entry. Should be one of: ‘NOTSET’, ‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’, ‘CRITICAL’.
msg – The message to be logged.

Raises:

ValueError – If level is not a valid logging level.

cobralib.io.write_yaml_file(file_path: str, data: dict, append: bool = False) → None[source]

Write or append data to a YAML file.

Parameters:

file_path – The path of the YAML file
data – The data to be written or appended as a dictionary
append – True to append data to the file, False to overwrite the file or create a new one (default: False)

Raises:

FileNotFoundError – If the file does not exist in append mode

from corbalib.io import write_yaml_file

dict_file = {'sports' : ['soccer', 'football', 'basketball',
             'cricket', 'hockey', 'table tennis']},
             {'countries' : ['Pakistan', 'USA', 'India',
             'China', 'Germany', 'France', 'Spain']}
# Create new yaml file
write_yaml_file('new_file.yaml', data, dict_file, append=False)

This will create a file titled new_file.yaml with the following contents

- sports:

  - soccer
  - football
  - basketball
  - cricket
  - hockey
  - table tennis
- countries:

  - Pakistan
  - USA
  - India
  - China
  - Germany
  - France
  - Spain