API Reference¶
This document describe the API of the collectionbatchtool module. The
following sections are included:
Module-level functions¶
-
apply_specify_context(collection_name, specify_user, quiet=True)¶ Set up the Specify context.
Parameters:
-
apply_user_settings(filepath, quiet=True)¶ Read and apply user settings in a configuration file.
Parameters: - filepath (str) – Path to the configuration file.
- quiet (bool, default True) – If True, no output will be written to standard output.
-
initiate_database(database, host, user, passwd, quiet=True)¶ Initiate the database.
Parameters:
-
query_to_dataframe(database, query)¶ Return result from a peewee
SelectQueryas apandas.DataFrame.
The TableDataset class¶
-
class
TableDataset(model, key_columns, static_content, where_clause, frame)¶ Bases:
objectStore a dataset corresponding to a database table.
-
model¶ peewee.BaseModelA Specify data model corresponding to a table.
-
key_columns¶ dict
Key-fields and SourceID-columns for the model.
-
static_content¶ dict
Data to automatically inserted for the model.
-
where_clause¶ peewee.ExpressionCondition for getting relevant data from the database.
-
describe_columns()¶ Return a
pandas.DataFramedescribing the columns in the current model.
-
from_csv(filepath, quiet=True, **kwargs)¶ Read dataset from a CSV file.
Parameters: - filepath (str) – File path or object.
- quiet (bool, default True) – If True, no output will be written to standard output.
- **kwargs – Arbitrary keyword arguments available in
pandas.read_csv().
-
from_database(quiet=True)¶ Read table data from the database.
Parameters: quiet (bool, default True) – If True, no output will be written to standard output.
-
get_match_count(target_column, match_columns)¶ Return counts for matches and possible matches.
Parameters: - target_column (str) – Column that should have a value if any value in match_columns is not null.
- match_columns (str or List[str]) – Column or columns used for updating values in target_column.
Returns: matches, possible matches
Return type:
-
get_mismatches(target_column, match_columns)¶ Return a
pandas.Seriesor apandas.DataFramewith non-matching values.Parameters: - target_column (str) – Column that should have a value if any value in match_columns is not null.
- match_columns (str or List[str]) – Column or columns used for updating values in target_column.
-
match_database_records(match_columns, quiet=True)¶ Update primary key values for records that match database.
Parameters: - match_columns (str or List[str]) – Columns to be matched against the database.
- quiet (bool, default False) – If True, no output will be written to standard output.
-
to_csv(filepath, update_sourceid=False, drop_empty_columns=False, quiet=True, encoding='utf-8', float_format='%g', index=False, **kwargs)¶ Write dataset a comma-separated values (CSV) file.
Parameters: - filepath (str) – File path or object.
- update_sourceid (bool, default False) – If True, copying ID-columns to SourceID-columns before writing to the CSV file.
- drop_empty_columns (bool, default False) – Drop columns that does not contain any data.
- quiet (bool, default True) – If True, no output will be written to standard output.
- encoding (str, default 'utf-8') – A string representing the encoding to use in the output file.
- float_format (str or None, default '%g') – Format string for floating point numbers.
- index (bool, default False) – Write row names (index).
- **kwargs – Arbitrary keyword arguments available in
pandas.DataFrame.to_csv().
-
to_database(defaults=None, update_record_metadata=True, chunksize=10000, quiet=True)¶ Load a dataset into the corresponding table and update the dataset’s primary key column from the database.
Parameters: - defaults (dict) – Column name and value to insert instead of nulls.
- update_record_metadata (bool, default True) – If True, record metadata will be generated during import, otherwise the metadata will be loaded from the dataset.
- chunksize (int) – Size of chunks being uploaded.
- quiet (bool, default True) – If True, no output will be written to standard output.
-
update_database_records(columns, update_record_metadata=True, chunksize=10000, quiet=True)¶ Update records in database with matching primary key values.
Parameters: - columns (str or List[str]) – Column or columns with new values.
- update_record_metadata (bool, default True) – If True, record metadata will be generated during import, otherwise the metadata will be updated from the dataset.
- chunksize (int) – Size of chunks being updated; default 1000.
- quiet (bool, default True) – If True, no output will be written to standard output.
-
update_foreign_keys(from_datasets, quiet=False)¶ Update foreign key values from a related dataset based on sourceid values.
Parameters: - from_datasets (
TableDatasetor List[TableDataset]) – Dataset(s) from which foreign key values will be updated. - quiet (bool, default False) – If True, no output will be written to standard output.
- from_datasets (
-
update_sourceid(quiet=True)¶ Copy values from ID-columns to SourceID-columns.
Parameters: quiet (bool, default True) – If True, no output will be written to standard output.
-
write_mapping_to_csv(filepath, quiet=True, float_format='%g', index=False, **kwargs)¶ Write ID-column mapping a comma-separated values (CSV) file.
Parameters: - filepath (str) – File path or object.
- quiet (bool, default True) – If True, no output will be written to standard output.
- float_format (str or None, default '%g') – Format string for floating point numbers.
- index (bool, default False) – Write row names (index).
- **kwargs – Arbitrary keyword arguments available in
pandas.DataFrame.to_csv().
-
all_columns¶ List containing all columns in the dataset.
-
database_columns¶ List with available database columns.
-
database_query¶ Database query for reading the data from the database.
-
file_columns¶ List containing only the columns that can be written to or read from a file.
-
frame¶ A
pandas.DataFrameto hold the data.
-
primary_key_column¶ Name of the primary key column.
-
The TreeDataset class¶
-
class
TreeDataset¶ Bases:
objectA dataset corresponding to a tree table in Specify.
-
update_rankid_column(dataset, quiet=True)¶ Update RankID based on SourceID-column.
Parameters: - dataset (
TableDataset) – A treedefitem-dataset from which RankID should be updated. - quiet (bool, default True) – If True, no output will be written to standard output.
Notes
This method exists in order to update the redundant RankID-columns in
TreeDatasetdataframes.- dataset (
-
TableDataset subclasses¶
-
class
AgentDataset¶ Bases:
collectionbatchtool.TableDatasetDataset corresponding to the agent-table.
-
class
CollectingeventattributeDataset¶ Bases:
collectionbatchtool.TableDatasetDataset corresponding to the collectingeventattribute-table.
-
class
CollectingeventDataset¶ Bases:
collectionbatchtool.TableDatasetDataset corresponding to the collectingevent-table.
-
class
CollectionobjectattributeDataset¶ Bases:
collectionbatchtool.TableDatasetDataset corresponding to the collectionobjectattribute-table.
-
class
CollectionobjectDataset¶ Bases:
collectionbatchtool.TableDatasetDataset corresponding to the collectionobject-table.
-
class
CollectorDataset¶ Bases:
collectionbatchtool.TableDatasetDataset corresponding to the collector-table.
-
class
DeterminationDataset¶ Bases:
collectionbatchtool.TableDatasetDataset corresponding to the determination-table.
-
class
GeographyDataset¶ Bases:
collectionbatchtool.TableDataset,collectionbatchtool.TreeDatasetDataset corresponding to the geography-table.
-
class
GeographytreedefitemDataset¶ Bases:
collectionbatchtool.TableDatasetDataset corresponding to the geographytreedefitem-table.
-
class
LocalityDataset¶ Bases:
collectionbatchtool.TableDatasetDataset corresponding to the locality-table.
-
class
StorageDataset¶ Bases:
collectionbatchtool.TableDataset,collectionbatchtool.TreeDatasetDataset corresponding to the storage-table.
-
class
StoragetreedefitemDataset¶ Bases:
collectionbatchtool.TableDatasetDataset corresponding to the storagetreedefitem-table.
-
class
PreparationDataset¶ Bases:
collectionbatchtool.TableDatasetDataset corresponding to the preparation-table.
-
class
PreptypeDataset¶ Bases:
collectionbatchtool.TableDatasetDataset corresponding to the preptype-table.
-
class
TaxonDataset¶ Bases:
collectionbatchtool.TableDataset,collectionbatchtool.TreeDatasetDataset corresponding to the taxon-table.
-
class
TaxontreedefitemDataset¶ Bases:
collectionbatchtool.TableDatasetDataset corresponding to the taxontreedefitem-table.