Endpoint Chaining¶

The CDA provides a custom python tool for searching CDA data. Q (short for Query) offers several ways to search and filter data, and several input modes:

Q.() builds a query that can be used by run() or count()
Q.run() returns data for the specified search
Q.count() returns summary information (counts) data that fit the specified search
columns() returns entity field names
unique_terms() returns entity field contents

Before we do any work, we need to import several functions from cdapython:

Q and query which power the search
columns which lets us view entity field names
unique_terms which lets view entity field contents

We're also importing functions from several other packages to make viewing and manipulating tables easier. The opt. settings are pre-configuring how itables should display our tables, with scrolling and paging enabled. Finally, we're telling cdapython to report it's version so we can be sure we're using the one we mean to:

In [1]:

            
                Copied!
                
                    
                    
                
                

        
from cdapython import Q, columns, unique_terms, query
import numpy as np
import pandas as pd
from itables import init_notebook_mode, show
init_notebook_mode(all_interactive=True)
import itables.options as opt
opt.maxBytes=0
opt.scrollX="200px"
opt.scrollCollapse=True
opt.paging=True
opt.maxColumns=0
print(Q.get_version())
from cdapython import Q, columns, unique_terms, query
import numpy as np
import pandas as pd
from itables import init_notebook_mode, show
init_notebook_mode(all_interactive=True)
import itables.options as opt
opt.maxBytes=0
opt.scrollX="200px"
opt.scrollCollapse=True
opt.paging=True
opt.maxColumns=0
print(Q.get_version())

2022.12.21

Endpoint Chaining¶

We're going to build on our previous basic search to see what information exists about cancers that were first diagnosed in the brain.

In [2]:

            
                Copied!
                
myquery = Q('primary_diagnosis_site = "brain"')
myquery = Q('primary_diagnosis_site = "brain"')

Previously we looked at subject, research_subject, specimen and file results separately, but we can also combine these.

Let's say what we're really interested in is finding analysis done on specimens, so we're looking for files that belong to specimens that match our search. To do this, we can chain our query to the specimen endpoint and then to the files endpoint and get the combined result:

In [3]:

            
                Copied!
                
myqueryspecimenfiles =  myquery.specimen.file.run()
myqueryspecimenfiles
myqueryspecimenfiles =  myquery.specimen.file.run()
myqueryspecimenfiles

Getting results from database

                        Total execution time: 0
                        min 3.352 sec 3352 ms

Out[3]:

            
            Offset: 0
            Count: 100
            Total Row Count: 771260
            More pages: True

We get back 130819 files that belong to specimens that meet our search criteria. As before, we can preview the results by using the .to_dataframe() function:

In [4]:

            
                Copied!
                
myqueryspecimenfiles.to_dataframe()
myqueryspecimenfiles.to_dataframe()

Out[4]:

file_id	file_identifier	label	data_category	data_type	file_format	file_associated_project	drs_uri	byte_size	checksum	data_modality	imaging_modality	dbgap_accession_number	imaging_series	specimen_id	subject_id	researchsubject_id
Loading... (need help?)

Valid Endpoint Chains

Not all endpoints can be chained together. This is a restriction caused by the data itself. `diagnosis` and `treatment` information does not have files directly attached to it, instead these files are associated with the `researchsubject`, as such both "myquery.treatment.files.run()" and "myquery.diagnosis.files.run()" will fail, as there are no files to retrieve. The mutation table will have a .file version in the future, but is still being harmonized. Current valid chains are:

myquery.subject.file.run: This will return all the files that meet the query and that are directly tied to subject
myquery.researchsubject.file.run:This will return all the files that meet the query and that are directly tied to researchsubject
myquery.specimen.file.run: This will return all the files that meet the query and that are directly tied to specimen
myquery.subject.file.count.run: This will return the count of files that meet the query and that are directly tied to subject
myquery.researchsubject.file.count.run:This will return the count of files that meet the query and that are directly tied to researchsubject
myquery.specimen.file.count.run: This will return the count of files that meet the query and that are directly tied to specimen

Last update: 2022-11-03