Metadata Dictionary
The CancerModels.org Metadata Dictionary expresses the details of the data model, which adheres to specific formats and restrictions to ensure a standard of data quality. The following describes the attributes and permissible values for all of the fields within the clinical tsv files for the CancerModels.org platform. For more information about the Metadata dictionary.
0 New field0 Updated field0 Deleted field
6 files with 94 fields
Attribute:
All
Patient (patient)
12 Fields
The collection of descriptors associated with patient information for data submission to CancerModels.org.
Sheet Name Example: patient
Field & Description
Attributes
Type
Permissible Values
Notes
patient_id
Unique anonymous/de-identified ID for the patient from the provider.Required
TEXT
Values must meet the regular expression
#/fields/format/ALPHANUMERIC
sex
Sex of the patient.Required
TEXT
Any of the following:
female
male
other
Not provided
Not collected
Any of the following values: ['male', 'female', 'other', 'not collected', 'not provided']
history
Cancer relevant comorbidity or environmental exposure.TEXT
Values must meet the regular expression
#/fields/format/Free
ethnicity
Patient Ethnic group. Can be derived from self-assessment or genetic analysis.TEXT
Values must meet the regular expression
#/fields/format/NCIT
ethnicity_assessment_method
Patient Ethnic group assessment method.TEXT
Any of the following:
self-assessed
genetic
Not provided
Not collected
Any of the following: ['self-assessed', 'genetic', 'not provided', 'not collected']
initial_diagnosis
Diagnosis of the patient when first diagnosed at age_at_initial_diagnosis - this can be different than the diagnosis at the time of collection which is collected in the sample section.TEXT
Values must meet the regular expression
#/fields/format/NCIT
age_at_initial_diagnosis
This is the age of first diagnostic. Can be prior to the age at which the tissue sample was collected for implant.TEXT
Values must meet the regular expression
#/fields/format/AGE
age_category
Age category at time of sampling.TEXT
Any of the following:
adult
pediatric
fetus
Not provided
Not collected
Any of the following values: ['adult', 'pediatric', 'fetus', 'not collected', 'not provided']
smoking_status
Patient's smoking historyTEXT
Values must meet the regular expression
Any of the following values: ['current smoker', 'ex-smoker', 'non-smoker', or 'never smoked'; followed by free text]
alcohol_status
Alcohol intake of the patient, self-reportedTEXT
Any of the following:
yes
no
Not provided
Not collected
Any of the following: ['yes', 'no', 'not provided', 'not collected']
alcohol_frequency
The average number of days per week on which the patient consumes alcohol, self-reported.TEXT
Values must meet the regular expression
#/fields/format/Free
family_history_of_cancer
If a first-degree relative of the patient has been diagnosed with a cancer of the same or different type.TEXT
Values must meet the regular expression
#/fields/format/Free
Patient Sample (patient_sample)
23 Fields
The collection of descriptors associated with clinical information of the tumor sample for data submission to CancerModels.org.
Sheet Name Example: patient_sample
Field & Description
Attributes
Type
Permissible Values
Notes
patient_id
Unique anonymous/de-identified ID for the patient from the provider.Required
TEXT
Values must meet the regular expression
#/fields/format/ALPHANUMERIC
sample_id
Unique ID of the patient tumour sample used to generate cancer models.Required
TEXT
Values must meet the regular expression
#/fields/format/ALPHANUMERIC
collection_date
Date of collections. Important for understanding the time relationship between models generated for the same patient.TEXT
Values must meet the regular expression
MMM YYYY
collection_event
Collection event corresponding to each time a patient was sampled to generate a cancer model, subsequent collection events are incremented by 1.TEXT
Values must meet the regular expression
collection event + 'event number'
months_since_collection_1
The time difference between the 1st collection event and the current one (in months).TEXT
Values must meet the regular expression
Numeric.
The collection event 1 should be 0, collection event 2 should be 6 if 6 months have elapase between collection 1 and collection 2 and collection event 3 should be 9 if 9 months have elapsed between collection 1 and collection 3.
age_in_years_at_collection
Patient age at collection.Required
TEXT
Values must meet the regular expression
#/fields/format/AGE
diagnosis
Diagnosis at time of collection of the patient tumour used in the cancer modelRequired
TEXT
Values must meet the regular expression
#/fields/format/NCIT
tumour_type
Collected tumour type.Required
TEXT
Any of the following:
primary
metastatic
recurrent
refractory
pre-malignant
2 more
#/fields/format/NCIT
primary_site
Site of the primary tumor where primary cancer is originating from (may not correspond to the site of the current tissue sample).Required
TEXT
Values must meet the regular expression
#/fields/format/NCIT
collection_site
Site of collection of the tissue sample (can be different than the primary site if tumour type is metastatic).Required
TEXT
Values must meet the regular expression
#/fields/format/NCIT
collection_method
Method of collection of the tissue sampleTEXT
Values must meet the regular expression
Any of the following: Biopsy, surgical resection, blood draw, free text
staging_system
Stage classification system used to describe the stage, add the version if available.TEXT
Values must meet the regular expression
#/fields/format/Free
grade
The implanted tumour grade value.TEXT
Values must meet the regular expression
#/fields/format/Free
grading_system
Grade classification corresponding used to describe the stage, add the version if available.TEXT
Values must meet the regular expression
#/fields/format/Free
virology_status
Positive virology status at the time of collection. Any relevant virology information which can influence cancer like EBV, HIV, HPV status.TEXT
Values must meet the regular expression
#/fields/format/NCIT
gene_mutation_status
Outcome of mutational status tests for the following genes: BRAF, PIK3CA, PTEN, KRAS.TEXT
Values must meet the regular expression
HGNC Symbols
sharable
Is patient treatement information avalaible and sharable? If yes fill out following treatment columns: treatment_naive_at_collection, treated_at_collection, treated_prior_to_collection.Required
TEXT
Any of the following:
yes
no
Not provided
Not collected
Any of the following: ['yes', 'no', 'not provided', 'not collected']
treatment_naive_at_collection
Was the patient treatment naive at the time of collection? This includes the patient being treated at the time of tumour sample collection and if the patient was treated prior to the tumour sample collection.
The value will be 'yes' if either treated_at_collection or treated_prior_to_collection are 'yes'.Required
TEXT
Any of the following:
yes
no
Not provided
Not collected
Any of the following: ['yes', 'no', 'not provided', 'not collected']
treated_at_collection
Was the patient being treated for cancer at the time of tumour sample collection. This includes any of the following: radiotherapy, chemotherapy, targeted therapy, homorno-therapy.TEXT
Any of the following:
yes
no
Not provided
Not collected
Any of the following: ['yes', 'no', 'not provided', 'not collected']
treated_prior_to_collection
Was the patient treated for cancer prior to the time of tumour sample collection. This includes any of the following: radiotherapy, chemotherapy, targeted therapy, homorno-therapy.TEXT
Any of the following:
yes
no
Not provided
Not collected
Any of the following: ['yes', 'no', 'not provided', 'not collected']
response_to_treatment
Patient’s response to treatment.TEXT
Values must meet the regular expression
RECIST criteria: CR, PR, PD, SD
model_id
Unique identifier for all the cancer models derived from the same tissue sample. Needs to be unique.Required
TEXT
Values must meet the regular expression
#/fields/format/ALPHANUMERIC
Sharing (sharing)
9 Fields
The collection of descriptors associated with model data sharability for data submission to CancerModels.org.
Sheet Name Example: sharing
Field & Description
Attributes
Type
Permissible Values
Notes
model_id
Unique identifier for all the cancer models derived from the same tissue sample.Required
TEXT
Values must meet the regular expression
#/fields/format/ALPHANUMERIC
accessibility
Define any limitation of access of the model per type of users like academia, industry, academia and industry, or national limitation if needed (e.g. no specific consent for sequencing).Required
TEXT
Any of the following:
academia
industry
academia and industry
Any of the following: ['academia', 'industry', 'academia and industry']
europdx_access_modality
If part of EUROPDX consortium fill this in. Designates a model is accessible for transnational access through the EDIReX infrastructure, or only on a collaborative basis (i.e. upon approval of the proposed project by the owner of the model)TEXT
Any of the following:
transnational access
collaboration only
Not applicable
Not provided
Not collected
Any of the following: ['transnational access', 'collaboration only', 'not provided']
date_submitted
Please add the date of submission to the resourceTEXT
Values must meet the regular expression
DD/MM/YYYY
model_availability
Model availability status, i.e. if the model is still available to purchase. TEXT
Any of the following:
available
NA
discontinued
Any of the following: ['available','NA', 'discontinued']
email
Contact email for any requests from users about models. If multiple, include as comma separated list.Required
TEXT
Values must meet the regular expression
Email address
name
Contact person (should match that included in email column).Required
TEXT
Values must meet the regular expression
#/fields/format/Free
form_url
If the center has a contact form rather than a contact email include the link here.TEXT
Values must meet the regular expression
URL
database_url
If the institution has a public database and wants that link to be shared, include it here.TEXT
Values must meet the regular expression
URL
Pdx Model (pdx_model)
14 Fields
The collection of descriptors associated with PDX models for data submission to CancerModels.org
Sheet Name Example: pdx_model
Field & Description
Attributes
Type
Permissible Values
Notes
model_id
Unique identifier for all the cancer models derived from the same tissue sample.Required
TEXT
Values must meet the regular expression
#/fields/format/ALPHANUMERIC
host_strain_name
Host mouse strain name (e.g. NOD-SCID, NSG, etc).Required
TEXT
Values must meet the regular expression
Mouse strain name
host_strain_nomenclature
The full nomenclature form of the host mouse strain name.Required
TEXT
Values must meet the regular expression
Use most precise nomenclature where possible.
engraftment_site
Organ or anatomical site used for the PDX tumour engraftment (e.g. mammary fat pad, Right flank).Required
TEXT
Values must meet the regular expression
Use anatomical graft site, not collected, or not provided.
engraftment_type
PDX Engraftment Type: Orthotopic if the tumour was engrafted at a corresponding anatomical site (e.g. patient tumour of primary site breast was grafted in mouse mammary fat pad). If grafted subcuteanously use hererotopic.Required
TEXT
Any of the following:
heterotopic
orthotopic
Not provided
Not collected
Any of the following: ['heterotopic', 'orthotopic', 'not collected', 'not provided'].
sample_type
Description of the type of material grafted into the mouse. (e.g. tissue fragments, cell suspension).Required
TEXT
Values must meet the regular expression
#/fields/format/Free
sample_state
PDX Engraftment material state (e.g. fresh or frozen). If other please describe.TEXT
Values must meet the regular expression
#/fields/format/Free
passage_number
Passage number: When different host strains, or PDX Engraftment Site or PDX Engraftment Type or PDX Engraftment Material were used during the PDX line generation, please add passage - add rows per model as needed.
We assume that passage 0 corresponds to first engraftment - if not the case please indicate what passage 0 correspond to. passage number- add rows if columns D,E,F, G changes - if no change and always same D,E,F,G add 'all' as passage value to specify the conditions are the same in all passages.Required
TEXT
Values must meet the regular expression
Numeric or all.
publications
If model has been part of a published study include PubMed IDs separated by commas.TEXT
Values must meet the regular expression
#/fields/format/PMID
supplier
Please provide the supplier brief acronym or name followed by a colon and thel number or name use to reference the model.TEXT
Values must meet the regular expression
#/fields/format/Free
supplier_type
Model supplier type - commercial, academic, other.Required
TEXT
Any of the following:
commercial
academic
other
Any of the following: ['commercial', 'academic', 'other']
catalog_number
Catalogue number of model, if commercial.TEXT
Values must meet the regular expression
#/fields/format/Free
vendor_link
Link to purchasable cell model, if commercial.TEXT
Values must meet the regular expression
URL
external_ids
Depmap accession, Cellusaurus accession or another id. Please place in comma separated list.TEXT
Values must meet the regular expression
#/fields/format/Free
Model Validation (model_validation)
11 Fields
The collection of descriptors associated with cancer model validation techniques for data submission to CancerModels.org
Sheet Name Example: model_validation
Field & Description
Attributes
Type
Permissible Values
Notes
model_id
Unique identifier for all the cancer models derived from the same tissue sample.Required
TEXT
Values must meet the regular expression
#/fields/format/ALPHANUMERIC
validation_technique
Any technique used to validate PDX against their original patient tumour, including fingerprinting, histology, immunohistochemistry.Required
TEXT
Values must meet the regular expression
#/fields/format/Free
description
Short description of what was compared and what was the result: (e.g. high, good, moderate concordance between xenograft, 'model validated against histological features of same diagnosis' or 'not determined') - It needs to be clear if the model is validated or not.Required
TEXT
Values must meet the regular expression
#/fields/format/Free
validation_host_strain_nomenclature
Validation host mouse strain, following mouse strain nomenclature from MGI JAX.TEXT
Values must meet the regular expression
Full host strain name, not collected, or not provided
morphological_features
If model type is 3D: Organoid or cell lines, Morphological features of the model.TEXT
Values must meet the regular expression
#/fields/format/Free
SNP_analysis
Was SNP analysis done on the model?TEXT
Any of the following:
yes
no
Not provided
Not collected
Any of the following: ['yes', 'no', 'not provided', 'not collected']
STR_analysis
Was STR analysis done on the model?TEXT
Any of the following:
yes
no
Not provided
Not collected
Any of the following: ['yes', 'no', 'not provided', 'not collected']
tumour_status
Gene expression validation of established model.TEXT
Values must meet the regular expression
#/fields/format/Free
model_purity
Presence of tumour vs stroma or normal cells.TEXT
Values must meet the regular expression
#/fields/format/Free
comments
Comments about the model that cannot be expressed by other fields.TEXT
Values must meet the regular expression
#/fields/format/Free
Cell Model (cell_model)
25 Fields
The collection of descriptors associated with cell and organoid models for data submission to CancerModels.org
Sheet Name Example: cell_model
Field & Description
Attributes
Type
Permissible Values
Notes
model_id
Unique identifier for all the cancer models derived from the same tissue sample.Required
TEXT
Values must meet the regular expression
#/fields/format/ALPHANUMERIC
model_name
Most common name associated with the model. Please use the CCLE name if available.Required
TEXT
Values must meet the regular expression
#/fields/format/Free
model_name_aliases
Additional model names, if known.TEXT
Values must meet the regular expression
#/fields/format/Free
type
Type of organoid or cell model.Required
TEXT
Any of the following:
organoid
CRC
3-D: other
2D: other
cell line
2 more
One of the following: ['organoid', 'CRC', '3-D: other', '2D: other', 'cell line'].
parent_id
Please add the model Id of the model used to generate the model. If the model was not in this set, please refer to it by external id.TEXT
Values must meet the regular expression
#/fields/format/ALPHANUMERIC
origin_patient_sample_id
Unique ID of the patient tumour sample used to generate the model.TEXT
Values must meet the regular expression
#/fields/format/ALPHANUMERIC
growth_properties
Observed growth properties of the related model.Required
TEXT
Values must meet the regular expression
One of the following: ['embedded 3d culture', 'adherent', 'mix of adherent and suspension', 'suspension'].
media_id
Unique identifier for each media formulation (Catalogue number).Required
TEXT
Values must meet the regular expression
#/fields/format/Free
growth_media
Base media formulation the model was grown in.Required
TEXT
Values must meet the regular expression
#/fields/format/Free
plate_coating
Coating on plate model was grown in.Required
TEXT
Any of the following:
laminin
matrigel
collagen
none
Any of the following values: ['laminin', 'matrigel', 'collagen', 'none']
other_plate_coating
Other coating on plate model was grown in (not mentioned above).Required
TEXT
Any of the following:
peg-based hydrogel
none
Any of the following values: ['peg-based hydrogel', 'none']
passage_number
Passage number at time of sequencing/screening.Required
TEXT
Values must meet the regular expression
#/fields/format/Free
contaminated
Is there contamination present in the model.Required
TEXT
Any of the following:
yes
no
Not provided
Not collected
Any of the following: ['yes', 'no', 'not provided', 'not collected']
contamination_details
What are the details of the contamination.TEXT
Values must meet the regular expression
#/fields/format/Free
supplements
Additional supplements the model was grown with.TEXT
Values must meet the regular expression
#/fields/format/Free
drug
Additional drug/compounds the model was grown with.TEXT
Values must meet the regular expression
#/fields/format/Free
drug_concentration
Concentration of Additional drug/compounds the model was grown with.TEXT
Values must meet the regular expression
#/fields/format/Free
publications
If the model has been part of a published study include PubMed IDs separated by commas.TEXT
Values must meet the regular expression
#/fields/format/PMID
supplier
Please provide the supplier brief acronym or name followed by a colon and thel number or name use to reference the model.TEXT
Values must meet the regular expression
#/fields/format/Free
supplier_type
Model supplier type - commercial, academic, other.Required
TEXT
Any of the following:
commercial
academic
other
Any of the following: ['commercial', 'academic', 'other']
catalog_number
Catalogue number of model, if commercial.TEXT
Values must meet the regular expression
#/fields/format/Free
vendor_link
Link to purchasable cell model, if commercial.TEXT
Values must meet the regular expression
URL
rrid
Cellosaurus IDTEXT
Values must meet the regular expression
#/fields/format/Free
external_ids
Depmap accession, Cellusaurus accession or another id. Please place in comma separated list.TEXT
Values must meet the regular expression
#/fields/format/Free
comments
Add crucial comments about the model that cannot be expressed by other fields.Required
TEXT
Values must meet the regular expression
#/fields/format/Free